In this tutorial we will run VDJtools on InsideDNA platform. VDJtools framework includes different tools for repertoire sequencing (Rep-Seq) data processing. Repertoire sequencing is a method for studying the immune repertoire of the organism (composition of all immunoglobuline or T cell receptor genes) that are expressed in blood lymphocytes of the organism. In the nutshell, lymphocyte mRNA (or DNA) is extracted, non-specific primers are added to synthesize cDNA from immunoglobuline/TCR mRNA which is then amplified and sequenced. Based on the sequence of variable region (V-regions) of immunoglobuline gene the number of clone lines (clonotypes) and their abundances are predicted. This method allows to find overrepresented immunoglobuline variants associated with disease or clonal expansion.
We will use the dataset which you can find on the GitHub page of VDJtools project (https://github.com/mikessh/vdjtools-examples). This dataset includes Rep-Seq data that was obtained from donors that had hematopoietic stem cell transfer (HSCT) at different time points around HSCT. 48 months before HSCT and 4, 10 and 37 months after. We will study the changes in lymphocyte immune repertoire during this time period.
First of all, we need to open the terminal of InsideDNA platform (“Show terminal” button at the bottom-right corner of the screen) and type the following commands:
mkdir vdj – to create working directory;
cd vdj – to move into it;
Now we need to download the metadata file and all the samples:
ls - to see the content of working directory;
If you read metadata.txt file (less command) you will see that it includes information on all samples in the analysis: filename, path to the file and the number of months from (or before HSCT). Sample files include the data on all clonotypes that were observed in the sample (reads count, frequency, V and J fragments ids). We need to modify the metadata.txt file before moving to the next step (for example, use nano text editor). You need to change the path to the file in the following way /data/userXXXX/vdj/[filename], where userXXXX is your username.
So now we can look at some statistics on our samples. To do this we can use CalcBasicStats routine, run it in the following way:
isub -c 4 -r 3.6 -t calcstat -e '/srv/dna_tools/vdjtools-1.1.1/vdjtools CalcBasicStats -m /data/userXXXX/vdj/metadata.txt /data/userXXXX/vdj/1'
where –m option specifies input metadata file and /data/userXXXX/vdj/1 specifies the output prefix. Always use the whole path to your directory (!).
As a result you will see 1.basicstats.txt file in the working directory which includes information on all samples. This file includes columns with read counts (3rd column), number of clonotypes (4th), mean clonotype frequencies (5th) etc.
There is also another way to get more detailed information on our samples. CalcSegmentUsage routine calculates frequencies of all V or J segments in each of our samples. To run this routine - type the following command:
isub -c 4 -r 3.6 -t calcsegu -e '/srv/dna_tools/vdjtools-1.1.1/vdjtools CalcSegmentUsage -f "Time post HSCT, months" -m /data/userXXXX/vdj/metadata.txt -n /data/userXXXX/vdj/2'
where –m specifies metadata file, -f – the column of metadata with sample ids, -n – output file prefix.
You will see 2 output files: 2.segments.wt.J.txt and 2.segments.wt.V.txt. Both files have the same structure: sample id (1st column), months number (2nd) and frequencies of all V or J (depending on the file) regions in each of our samples.
Now we can take a look at the frequency of different clonotypes i. e. V + J regions combinations. Run the following command to compare frequencies of clonotypes before (-48 months) and right after the HSCT (4 months). Run the following command:
isub -c 4 -r 3.6 -t ovp -e '/srv/dna_tools/vdjtools-1.1.1/vdjtools OverlapPair /data/userXXXX/vdj/minus48months.txt.gz /data/userXXXX/vdj/4months.txt.gz /data/userXXXX/vdj/3'
where minus48months.txt.gz and 4months.txt.gz are 2 input sample files and 3 specifies the output prefix.
You will see 3 output files with prefix '3.'. 3.paired.strict.table.txt includes the table with different clonotypes frequencies in 2 samples before and after the HSCT (last 2 columns) and significance of such difference in read counts (3rd last column). 3.paired.strict.table.collapsed.txt includes the same table but only for top (highest frequency) clonotypes.
Finally, we can run TrackClonotypes routine to summarize the changes in clonotype frequencies throughout the whole period:
isub -c 4 -r 3.6 -t c4 -e '/srv/dna_tools/vdjtools-1.1.1/vdjtools TrackClonotypes -m /data/userXXXX/vdj/metadata.txt -f "Time post HSCT, months" -x 0 /data/userXXXX/vdj/4'
where -m specifies metadata file, -f - the column in metadata file with time points, -x 0 - time point of the treatment (HSCT in our case) and 4 is output file prefix.
You will see 3 output files with pretty much the same structure as the one for OverlapPair output. The difference is that 4.tracking.strict.table.txt file includes columns for clonotype frequencies in all 4 samples (before, 4, 10 and 37 months). Now we can trace these frequency changes in relatively easy format or plot them. 4.tracking.strict.table.collapsed.txt file again includes only top clonotypes, their number you can specify by -t option in the command above.
Follow us on Facebook and Twitter to be the first to read our new tutorials!