Use Biocontainers to assemble your tools¶
In this section, we will introduce Biocontainers as an easy way to use bioinformatics tools and their dependancies
Discussion - Conda vs. containers
Why and when should we use Conda or containers?
We suggest completing the Creating and Running Docker Containers tutorial before this section, especially if you have not used containers before.
You will have to have some rationale for which tools you use and why. For our tutorial example we will perform the following steps with the following containers:
Exploring containers with Biocontainers¶
Once you have identified the container of interest, you need to explore how to run the container and the tool itself. For example,let’s build a two-step workflow with some sample data. Before we move ahead with out own workflow, we willtry to check the quality of a fastq file with fastqc and then use trimmomatic to trim the file.
We can get a small fastq file from the CyVerse data store:
mkdir /scratch/test-workflow cd /scratch/test-workflow iget -P /iplant/home/shared/cyverse_training/datasets/PRJNA79729/fastq_files/SRR064156.fastq.gz .
We can run fastqc on this file using a container from Biocontainers
docker run quay.io/biocontainers/fastqc:0.11.7--4 fastqc SRR064156.fastq.gz
This produces an error:
"Skipping 'SRR064156.fastq.gz' which didn't exist, or couldn't be read"
So, we have to go through the (often long) process of getting our individual tools to work before we can assemble them into a pipeline.
The problem is that the docker container can not “see” the input file; we need to become more familiar with the Docker LINK. We would then learn that we need to mount our local (Atmosphere) disk so that the file can be seen:
Rerun the docker command using the -v option.
#mount atmosphere directory /scratch/test-workflow #to a directory /work which will be created in the container docker run -v /scratch/test-workflow/:/work quay.io/biocontainers/fastqc:0.11.7--4 fastqc /work/SRR064156.fastq.gz
Try running fastqc on the sample fastq file above using another version available on biocontainers.
Combining Biocontainers in a bash script¶
Ideally, we want to automate how we handle data. One way to automatically have our applications work is to develop a scrip. Let’s add one more tool (trimmomatic) to our example workflow before doing the main workflow for the tutorial. You can use the trimmomatic manual to determine how the Docker command should work.
Using the trimmomatic container below, make a docker command that trims the single read using a sliding window of 4 bases and trimming when average quality is less than 30 (phred score). Use up 8 threads and single-end mode.quay.io/biocontainers/trimmomatic:0.39--1
Answerdocker run -v /scratch/test-workflow/:/work quay.io/biocontainers/trimmomatic:0.39--1 trimmomatic SE -threads 8 /work/SRR064156.fastq.gz /work/SRR064156_trimmed.fastq.gz SLIDINGWINDOW:4:30
A script could look something like this
#!/bin/bash # Make a directory and stage our data mkdir -p /scratch/example-script/data DATADIRECTORY=/scratch/example-script/data #Import data from CyVerse data store iget -P /iplant/home/shared/cyverse_training/datasets/PRJNA79729/fastq_files/SRR064156.fastq.gz $DATADIRECTORY #Make a directory for our analysis mkdir -p /scratch/example-script/analyses ANALYSISDIR=/scratch/example-script/analyses #Use Docker container to do fastqc docker run -v $DATADIRECTORY:/work quay.io/biocontainers/fastqc:0.11.7--4 fastqc /work/SRR064156.fastq.gz #move results to analyses directory mkdir -p $ANALYSISDIR/fastqc mv $DATADIRECTORY/*fastqc* $ANALYSISDIR/fastqc #Use Docker container to do trimmomatic docker run -v $DATADIRECTORY:/work quay.io/biocontainers/trimmomatic:0.39--1 trimmomatic SE -threads 8 /work/SRR064156.fastq.gz /work/SRR064156_trimmed.fastq.gz SLIDINGWINDOW:4:30 #move results to analyses directory mkdir -p $ANALYSISDIR/trimmomatic mv $DATADIRECTORY/*_trimmed.fastq.gz $ANALYSISDIR/trimmomatic
Discussion - Bash script
Is this Bash script a good solution? What problems could we run into when making our larger workflow? What could improve this script?
Fix or improve this documentation
Search for an answer: CyVerse Learning Center