Accessing MaizeCODE Data¶
Customized apps (e.g. MCrna-0.0.1) are built to perform QC and preliminary quantifications on the MaizeCODE raw RNAseq and RAMPAGE data. For each MaizeCODE experiment, the analyses of all replicates are saved as a SciApps workflow (with a unique ID), which records the relationship between raw reads and their derived results. The following sections illustrate the details about the MCRNAseq app, how users can check the QC results of any MaizeCODE experiments, as well as using the preliminary results for performing downstream differential expression analysis between any two tissues.
The MCrna App¶
The MCrna app wraps six tools, FastQC, bbduk, MultiQC, STAR, RSEM, and StringTie, together for QC and quantification of each replicate of an RNAseq (or RAMPAGE) experiment. The order of running these tools for processing one MaizeCODE RNA-seq experiment (two replicates) is shown below.
For each replicate, raw read files are preprocessed by bbduk to remove the low-quality portion of the read and adapter contaminations. FastQC is then used to check the quality of both raw and processed reads, and FastQC results are summarized by MultiQC into an HTML formatted report. The trimmed reads are aligned to the reference genome with STAR, then the alignment file is used to quantify the gene expression level with RSEM and to assemble transcripts with StringTie.
The results of the MCrna app include the MultiQC report, the gene quantification file, the browser track signals, the alignments, and the assembled transcripts, all stored in the CyVerse cloud; therefore, they are ready for being visualized or used in the downstream analysis (see more details below).
Load a MaizeCODE RNAseq Experiment¶
In the above section, we described the MCrna app/module used in processing RNAseq/RAMPAGE data. Here we will show how to locate a specific experiment/workflow (e.g. ‘RNAseq for B73 root’) and load it on SciApps.org to examine outputs, parameters used, inputs, and associated metadata.
Open https://www.SciApps.org, click Data (top menu) then MaizeCODE. Alternatively, you can access MaizeCODE experiments directly at the MaizeCODE data page to browse the list of MaizeCODE experiments/workflows, as shown below:
Five operations are supported for a selected workflow (by checking the radio button before it):
- ‘Relaunch’: Display filled app forms in the main panel
- ‘Visualize’: Display workflow diagram and load job histories to the right panel
- ‘Load’: Load job histories to the right panel
- ‘Share’: Get a direct link to the workflow for sharing
- ‘Metadata’: Display the experimental metadata associated with the workflow
- Locate an experiment by searching with keyword (e.g., ‘B73 root’)
Experiments can also be located by searching with a workflow id (e.g. ‘74c29d16-132b-40a8-a50b-71a324613a5a’ for B73 root RNAseq experiment).
Select (or check) the experiment (e.g., MC_B73_B73v4_root_RNAseq), then click load to load analysis results into the History panel. The results of the RNAseq workflow/experiment for B73 root tissue are shown below, with outputs of the first job/replicate expanded by clicking the job name. Results include the MultiQC report, the gene quantification file (with prefix ‘rsem’), the forward (‘sig_f’) and backward (‘sig_r’) browser track signals, alignment file and index (.bam, .bam.bai), and assembled transcripts in gtf format.
From left to right, there are four icons next to each job name:
- Checkbox: If checked, the job will be added to the workflow building page (if loaded)
- Information: More about the status of the analysis job and link to the output folder
- Relaunch: Load the app form filled with inputs and parameters used before
- Visualization: Generate URLs for visualizing in a web Browser (e.g., .html, .txt, .jpg) or Genome Browser (e.g., .bw, .bam, .gtf)
Click the Visualization (‘eye’ shaped) icon next to the job name to bring up the visualization panel shown below. You can then select a file (by checking the radio button before it) to get URLs of output files (as shown below for the bam file) for genome browsers.
If clicking on Visualize (e.g., when the multiqc_report.html file is selected), the file will be displayed in a new tab of your web browser window, so please check if pop-ups from SciApps are blocked by your browser and disable it if needed.
To add the URL you got from the last step to the SciApps JBrowse, click Tools (from SciApps top menu), then JBrowse to load JBrowse. As shown below, select ‘Maize B73v4’, click File/Open track file or URL, then paste the URLs under Remote URLs - one per line (not shown). For displaying alignments, you need to add URLs for both the bam and index (.bai) files.
Find differentially expressed genes¶
As an example, to find genes that are differentially expressed between the root and ear tissues of B73, please follow these steps.
Log into SciApps at https://www.SciApps.org/ before submitting any analysis jobs.
Make sure you have followed this instruction to enable ‘SciApps service’ from the CyVerse user portal. Otherwise, your job will fail at the archiving step.
From the MaizeCODE data page, search ‘B73 ear’ and ‘B73 root’ to find then load each experiment into the History panel, as shown in the last section.
Search ‘RSEM_de’ or directly locate the RSEM_de-1.3.0 app under the Comparison category in the left Apps panel. Click to load the app form.
As shown above, for each replicate, drag and drop the gene quantification result (filenames starting with “rsem”) into the input field, then click the “Submit job” button to run the differential expression analysis. A new job will appear in the History panel and it only takes a few minutes to get the list of differentially expressed genes back since alignments and gene quantifications are already done and archived in the cloud.
Use the ‘+ Insert’ and ‘- Remove’ button to add/remove the number of input fields, based on the number of replicates available.
When the job is completed (when the visualization or eye-shaped button is no longer grayed out), click the output file name (deg_GeneMat.de.txt for the RSEM_de-1.3.0 job) to preview the result, as shown below.
Each line describes a gene and contains 7 fields: the gene name, posterior probability of being equally expressed (PPEE), posterior probability of being differentially expressed (PPDE), posterior fold change of Sample 1 over Sample 2 (PostFC), real fold change of Sample 1 over Sample 2 (RealFC), mean count of Sample 1 (C1Mean) and mean count of Sample 2 (C2Mean). For fold changes, PostFC is recommended over the RealFC. For more details, please check the tutorial.
Find differentially expressed transcripts¶
As an example, in this section, we will use transcript-level differential expression analysis to demonstrate how to leverage SciApps workflows and apps to perform downstream analysis with the MaizeCODE data. We will start with examining a public isoform-level expression analysis workflow (step 1), construct a new workflow from the public workflow (since we don’t want to repeat the alignment with STAR and transcript assembly with StringTie, which have already been completed with running the MCrna app), then run the newly constructed workflow with archived MaizeCODE results.
Click Workflow/Public workflows, then select RNA-seq2 to ‘Visualize’ the workflow, which will also load job histories into the History panel, as shown below. The workflow uses the STAR_align-2.5.3 app which is similar to the MCrna-0.0.1 app except that it does not trim the read or generate the QC report. The assembled transcripts are merged with the StringTie_merge-1.3.3 app, then passed along with the STAR alignment file to the StringTie-1.3.3 app for a second-round transcript assembly, before calling the Ballgown-2.10.0 app for finding differentially expressed transcripts.
The green button on the workflow diagram (representing each job) is numbered consistently with the order of jobs in the history panel.
In this step, we will construct a new workflow by removing the STAR_align steps from the above workflow. As shown below, this is done by checking (selecting) jobs 5-10 and then clicking on the ‘build a workflow’ link above the jobs. The diagram of the new workflow is shown below. Save it as your private workflow for using it in step 4 below.
Follow instructions above to load both B73 root and ear RNAseq experiments into the History panel.
Go to Workflow/My workflows to load the newly saved workflow.
You might need to check twice to see the new workflow (check ‘Home’ then back to ‘My workflows’), which should be the first one in the list.
As shown below, clear the input fields for step 1 of the workflow, then drag and drop transcript outputs (filenames starting with ‘str’) into the input fields. Also, we need to set the ‘Select the staged annotation file’ as ‘Zea mays (AGPv4)’ for steps 1-5.
Scroll down the app forms, then drag and drop the alignment file (.bam) into steps 3, 4, 2, 5 as shown above. The order is determined by the input fields of step 6, as shown below (Sample 1 has outputs of steps 3 and 4, and Sample 2 has outputs of steps 2 and 5).
Make sure to clear the input field before dragging and dropping new input. Make sure you have set the ‘Select the staged annotation file’ as ‘Zea mays (AGPv4)’ for steps 1-5.
Submit the workflow and the workflow diagram with live status will be shown as below.
Different colors of the app button represent different status: blue (running), yellow (pending), green (completed), and red (failed). Depending on the size of input files to be staged and queue status of the computing cluster, it might take a while for the status to get updated. You can save the workflow and check the status later by visualizing the diagram.
When the workflow is completed (when all app buttons are green), click Ballgown’s output file (de_iso.tsv) to preview the result, as shown below.
Each line describes a transcript and contains 4 fields: the fold change, the p-value, the q-value, and the transcript ID. Novel transcripts (not annotated) are named as “MSTRG.*.*” and the coordinates of each transcript can be found in the t_data.ctab file for each StringTie output. For more details, please check this tutorial.
This tutorial covers how to use SciApps to access MaizeCODE data and how to perform downstream analysis with MaizeCODE results, including describing the details of the MCrna app, loading an RNAseq experiment to access its outputs, running differential expression analysis at both the gene and transcript (isoform) level. By storing MaizeCODE data and analysis results in the cloud, all downstream analyses can be completed in a timely fashion by any community users.
For users who want to share the analyses of their data with the MaizeCODE project, please contact support@SciApps.org with the workflow IDs. Then your analysis will be added to the list of MaizeCODE experiments.
Fix or improve this documentation
- Search for an answer: |CyVerse Learning Center|
- Ask us for help: click |Intercom| on the lower right-hand side of the page
- Report an issue or submit a change: |Github Repo Link|
- Send feedback: Tutorials@CyVerse.org