RAD Tags Enrichment in RADSeq for Next-generation Population Genetics

Author : InsideDNA Time : 31 January 2017 Read time : 5 min

Next-generation sequencing technologies are making a substantial impact on many areas of biology, including the analysis of genetic diversity in populations. However, genome-scale population genetic studies are very expensive. Restriction-site associated DNA sequencing (RAD-Seq) technology is used to overcome this limitation by sequencing only specific regions (RAD tags) of the genome. This method provides high resolution population genomic data at reasonable costs. One may read technical details on this method here: https://academic.oup.com/bfg/article/9/5-6/416/182576/RADSeq-next-generation-population-genetics.

Often before sequencing hundreds of samples for population genetic studies, it is necessary to check an experimental setup. Usually several samples are sequenced to estimate read enrichment at RAD tags. In this tutorial we will perform computational part of this preliminary analysis. We will check whether RAD tags are enriched with reads in public RAD-Seq sample.

First of all, we need to open terminal window on the InsideDNA platform (show terminal button at the bottom right corner of InsideDNA homepage).

RAD Tags Enrichment in RADSeq for Next-generation Population Genetics screen 1

Type several commands in the command line:

mkdir radseq -- to create working directory

cd radseq -- to move into working directory.

Then we need to download RAD-Seq sample for further analysis. We will use one of S. cerevisiae datasets from this project: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=ERP014019. We will use wget command to download ERR1229303.sra (or any another sample).

wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/ERR/ERR122/ERR1229303/ERR1229303.sra

After download completion you will see ERR1229303.sra file in the working directory (use ls command). This file contains RAD-Seq reads aligned on S. Cerevisiae genome. One may realign these reads with bowtie2 (bowtie2-build, bowtie2-align), however we will skip this step and use given coordinates. Also we need to download, unpack and merge S. Cerevisiae genome fasta files (per each chromosome).

wget http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/chromFa.tar.gz
mkdir yeast
tar -xvf chromFa.tar.gz -C ./yeast/
cat ./yeast/*.fa > ./yeast/yeast.fa
ls ./yeast/

RAD Tags Enrichment in RADSeq for Next-generation Population Genetics screen 2

Now let’s convert our SRA file into SAM format and then further into BAM.

isub -c 2 -r 7.5 -t tosam -e '/srv/dna_tools/sratoolkit.2.5.7/bin/sam-dump
data/userXXX/radseq/ERR1229303.sra > /data/userXXX/radseq/ERR1229303.sam'
isub -c 2 -r 7.5 -t tobam -e '/srv/dna_tools/samtools/bin/samtools view -bS
/data/userXXX/radseq/ERR1229303.sam > /data/userXXX/radseq/ERR1229303.bam'

where userXXX - your user name.

Now let’s perform in silico restriction digestion. We will use a python script for this purpose. You may copy it from tutorials_data directory:

cp ~/tutorials_data/radseq/restr.py ./

In the nutshell, this script opens yeast.fa fasta file and looks for all MfeI and MboI restriction sites in the genome. Then it filters out all fragments with the length between 150 and 500 bp and prints their coordinates in the fragments.bed file. Most of the functions that we used are from Biopython module. Run the script with the following command:

isub -c 2 -r 7.5 -t restr -e 'python /data/user540/radseq/restr.py'

Eventually, you will see fragments.bed file in the working directory (use ls and less commands) which contains RAD tags coordinates.

RAD Tags Enrichment in RADSeq for Next-generation Population Genetics screen 3

To count the number of RAD tags per genome use wc -l [filename] command. There will be 3972 lines (fragments) in the file.

Now we will calculate a read coverage of restricted fragments. Moreover we will estimate whether the average read coverage of RAD fragments is higher than one of the rest of the genome. 

Copy the script sc1.sh from ~/tutorials_data/radseq/ directory and read it.

RAD Tags Enrichment in RADSeq for Next-generation Population Genetics screen 4

This script:

  1. sorts restricted fragments and saves them in fragments.sorted.bed file;

  2. calculates read coverage of all restricted fragments and writes it in out1 file;

  3. calculates average coverage and writes it in 1.txt file;

  4. 500 times permutes fragment coordinates (but leaves length distribution intact) and for each set of random fragments repeats steps 2 and 3 of this pipeline;

  5. removes all temporary files.

To run this script you need to paste your user name where it is necessary and create genome.size file which simply contains lengths of all chromosomes.

cd yeast
for i in `ls chr*`; do echo -ne $i'\t'; tail -n +2 $i | wc -c; done > ../genome.size
cd ..

Finally, we need to remove manually ‘.fa’ suffixes from chromosome names in the genome.size file.

RAD Tags Enrichment in RADSeq for Next-generation Population Genetics screen 5

isub -c 2 -r 7.5 -t perm -e '/data/user540/radseq/sc1.sh'

To see the results let’s open 1.txt file. Second line (after the header) is an average read coverage of RAD tags across the genome. In our case it is equal to 91.1. All other numeric lines are average coverages of permuted fragments.

RAD Tags Enrichment in RADSeq for Next-generation Population Genetics screen 6

We can see that these values are much smaller than RAD tags coverage. We can check it in the following way:

less 1.txt | sort -n

RAD Tags Enrichment in RADSeq for Next-generation Population Genetics screen 7

As we can see the biggest coverage value is observed for RAD tags (the last line). Empirical p-value < 0.002.

Finally we can estimate an enrichment level in the following way:

tail -n +3 1.txt | awk '{ total += $1 } END { print total/NR }'

An average coverage of random fragments ~ 1.72, which means that coverage of RAD tags is 91.1 / 1.72 ~ 53 times higher than coverage of the rest of the genome.

You may also interested in

Mapping sequencing reads to a reference genome with Bowtie2: a step-by-step guide

Follow us on Facebook and Twitter to be the first to read our new tutorials!

Run this tool More tutorials