Filter and Trim High-throughput Sequencing Reads with Trimmomatic¶
A high-throughput sequencing run generates large files containing perhaps as many as several 10’s of millions of individual sequencing reads. After assessment of sequencing quality using a software such as FastQC, filtering and trimming steps can remove populations of low quality reads, remove sequenicng adaptors, and trim low-quality regions of individual reads. Trimmomatic is a popular software that perform several manipulations to prepare reads for downstream analysis.
Downloads, access, and services¶
In order to complete this tutorial you will need access to the following services/software
|CyVerse account||You will need a CyVerse account to complete this exercise||Register|
We will use the following CyVerse platform(s):
|Platform||Interface||Link||Platform Documentation||Learning Center Documentation|
|Discovery Environment||Web/Point-and-click||Discovery Environment||DE Manual||Guide|
Input and example data¶
In order to complete this quickstart you will need to have the following inputs prepared
|Input File(s)||Format||Preparation/Notes||Example Data|
|High-throughput sequencing reads||compressed FASTQ (.fq.gz or .fastq.gz - compressed)||No pre-processing of these reads is necessary.||See Trimmomatic inputs|
Get started: Filter, Trim, and Process High-throughput Sequenicng Reads with Trimmomatic¶
Several of the most popular options for Trimmomatic will be shown here. For all of the options, and additional details including the ordering of cleaning/ filtering steps, see the full Trimmomatic documentation.
Login to the Discovery Environment.
Click on the ‘Data’ panel. In the desired directory, click the ‘File’ menu, select ‘Create’ and then ‘New Plain Text File’. Create a Trimmomatic Settings file by entering the desired Trimmomatic functions (one per line) to set the options used by the Trimmomatic program. Click ‘Save’ and save the file with a ‘.txt’ extenstion in the desired directory. See an example Trimmomatic Settings file.
Trimmomatic has several individual functions (see full Trimmomatic documentation). To specifiy a function and its parameters, you will usually give the function name, followed by a colon separated set of parameters. Commonly used functions include:
- “SLIDINGWINDOW:<windowSize>:<requiredQuality>”: Perform a sliding window trimming, cutting once the average quality within the window falls below a threshold
- “LEADING:<quality>”:Cut bases off the start of a read, if below a threshold quality
- “TRAILING:<quality>”: Cut bases off the end of a read, if below a threshold quality
- “MINLEN:<length>”: Drop the read if it is below a specified length
Additionally, you can provide Trimmomatic with a file containing a list of adaptor sequences to be trimmed.
- “ILLUMINACLIP:<fastaWithAdaptersEtc>:<seed mismatches>:<palindrome clip threshold>:<simple clip threshold>” : Cut adapter and other illumina-specific sequences from the read.
Click ‘Apps’ and open the Trimmomatic App: Trimmomatic-programmable-0.36. Name your analysis, and if desired enter commands and select or adjust the output folder.
Under settings, select ‘paaired-ended’ or ‘single-ended’. Under ‘Enter a folder of sequencing files:’ select a folder containing one or more sequencing files (.fq.gz or .fastq.gz).
Under ‘Trimmer settings file in text format’ browse to the location of the Trimmomatic settings file you created in step 2.
If you are using the ‘ILLUMINACLIP’ function, browse to the location of the fasta file containing Illumina adaptor sequences. (You may find some relavant Illumina adaptors.
Click ‘Launch Analysis’ to launch the analysis. Click the ‘Analysis’ button to view job status and obtain results.
Once completed, the Discovery Environment Trimmomatic App will return the trimmed reads:
Paired End Outputs - 4 outputs for each pair (R1/R2) of reads:
||Every pair of sequence reads will generate a set of paired reads that have been trimmed according to the functions specified in the provided trimmomatics settings file.||See Example outputs|
Single End Outputs - 2 outputs for each pair (R1/R2) of reads:
|trimreadname_R1.fq/.fastq||Every sequence will generate a trimmed file.||None provided.|
To confirm that Trimmomatic processing has achived the desired results, you may wish to evaluate the reads using FastQC.