Merge Paired-End Reads of Ancient DNA using BBMerge

Author : InsideDNA Time : 02 February 2017 Read time : 3 min

Ancient DNA generally preservers in form of short fragments comparable in length with length of read made by modern sequencers. Therefore, it could be useful to merge overlapping paired-end reads of short fragment into single sequence. A special tool called BBMerge can help us with this task.

1.   Upload source data

In this tutorial we will work with reads of thousand year old tuberculosis strain, which was explored during this research. You can download original files here, but in this tutorial we will work with a small subset of these reads to make processing faster. You can download smaller files using this link.

Log into InsideDNA application, navigate into Files tab and create a folder called Ancient tuberculosis.

Merge paired-end reads of ancient DNA using BBMerge screen 1

Upload files with reads into this folder.

Merge paired-end reads of ancient DNA using BBMerge screen 2

2. Accessing the quality of raw reads

We will use fastqc tool to check out the quality of our raw data. To run this tool, navigate to Terminal Tab, connect virtual Terminal and enter the following command into it: 

isub -t fastqc -c 4 -r 3.6 -e "/srv/dna_tools/fastqc/fastqc
/data/userXXX/Ancient_tuberculosis/R1.fastq
/data/userXXX/Ancient_tuberculosis/R2.fastq -o /data/userXXX/Ancient_tuberculosis/"

Here and further you should replace XXX with your own userID, which you can find in the header of Terminal tab.

Merge paired-end reads of ancient DNA using BBMerge screen 3

Press Enter to submit your task.

This task will produce fastqc reports for both files with reads and save these reports into Ancient tuberculosis folder. You can monitor progress of your task in Tasks folder.

Merge paired-end reads of ancient DNA using BBMerge screen 4

3. Trimming reads for quality and discarding adapter sequences

When fastqc has done its job, you can move onto Files tab and Ancient tuberculosis folder and download files with reports – R1_fastqc.html and R2_fastc.html. Open these files in any browser and explore their content.

Merge paired-end reads of ancient DNA using BBMerge screen 5

The main problem with these reads is the contamination with adapter sequences, which we need to discard. We will use trimmomatic tools for thispurpose. Additional use of this tool is the separation of paired reads from unpaired ones, which are often present even in files, obtained as paired-end reads. For bbmerge tool we need to have only reads with pairs.

To clean up reads from adapters we need file with Illumina TrueSeq adapters for paired-end reads. Which you can download here. Upload this file into Ancient tuberculosis folder.

Enter the following command for trimmomatic tool into Terminal: 

isub -t trimmomatic -c 4 -r 3.6 -e "java -jar /srv/dna_tools/trimmomatic_0.33/trimmomatic-0.33.jar PE -threads 4 -phred33
/data/userXXX/Ancient_tuberculosis/R1.fastq
/data/user410/Ancient_tuberculosis/R2.fastq
/data/userXXX/Ancient_tuberculosis/R1_paired.fastq ,br/>/data/userXXX/Ancient_tuberculosis/R1_unpaired.fastq
/data/userXXX/Ancient_tuberculosis/R2_paired.fastq
/data/userXXX/Ancient_tuberculosis/R2_unpaired.fastq ILLUMINACLIP:
/data/userXXX/Ancient_tuberculosis/TruSeq3-PE.fa:2:30:10"

When the task is done, you will find files with processed reads in Ancient tuberculosis folder. We will take ones with index “paired” as input for bbmerge tool.

4. Running BBMerge

To run bbmerge, enter the following command into Terminal:

isub -t bbmerge -c 4 -r 3.6 -e "/srv/dna_tools/bbmap-36.02/bbmerge.sh
in1=/data/userXXX/Ancient_tuberculosis/R1_paired.fastq
in2=/data/userXXX/Ancient_tuberculosis/R2_paired.fastq qin=33
out=/data/userXXX/Ancient_tuberculosis/merged.fq
2>/data/userXXX/Ancient_tuberculosis/stats.txt"

When this task is finished, file merged.fastq in Ancient tuberculosis will contain merged reads. You can view the statistics of merging by running the following command via Terminal:

less /data/userXXX/Ancient_tuberculosis/stats.txt

Merge paired-end reads of ancient DNA using BBMerge screen 6

As you can see, in our case about 70% or input reads were successfully merged, which is a rather good result.

Well done, now you learned some basics of ancient reads processing!

You may also interested in

Binning Reads using BBSplit- A Metagenomics Tool

Analysis of ancient DNA samples using mapDamage

Follow us on Facebook and Twitter to be the first to read our new tutorials!

Run this tool More tutorials