Runinng Bismark pipeline for analysis of sodium bisulfite-treated DNA sequences

Author : InsideDNA Time : 24 November 2015 Read time : 6 min

BS-seq is an important method for analysis of DNA methylation. It provides a snapshot of a cell’s epigenomic state and reveals genome-wide cytosine methylation at single base resolution. One of the powerful tools for BS-seq data analysis is Bismark suite. Bismark can discriminate between cytosines in CpG, CHG and CHH context and allow to visualize and interpret methylation data. Output data can be mapped to genome viewer. In this tutorial, we demonstrate basic usage of Bismark in InsideDNA platform.

Cytosine methylation of DNA is an epigenetic mechanism which plays an important role in control of gene expression, silencing or genomic imprinting. Next generation sequencing allows three different methods to study DNA methylation: methylated DNA immunoprecipitation (MeDIP-Seq) or methylated DNA binding domain sequencing (MBD-Seq) and direct sequencing of sodium bisulfite-treated DNA (BS-Seq). Bismark is a suite of seven tools for efficient analysis of BS-Seq sequences. Basic pipeline includes bismark_genome_preparation tool for preparation of reference genome; bismark tool for alignments of bisulfite-treated reads to a reference genome and cytosine methylation calls; and bismark_methylation_extractor script which extracts the methylation call for every single C analysed.

In this tutorial, we will explain how to use Bismark within InsideDNA both through the UI interface and console. You can download entire dataset for this tutorial here or by clicking on Download data button at the top of the page.

1. Upload human genome (chromosome 1) and test BS-seq data to InsideDNA

Log in (or sign up if you have not yet) into InsideDNA application and read Introduction Tutorial to get familiar with different options available on the website. Once you learned the basics, navigate into Files tab. Create a new folder called “bismark”:

Upload unrared files from our dataset into this folder (upload 1 file at a time).

2. Add Bismark tools to your Bismark project

Navigate into My Tools tab. Create a new project by clicking on + Add new project. Name it Bismark

Now, search in the search field for “bismark”. Five tools will be returned. Click on + button on bismark_genome_preparation and choose Bismark project in the dropdown list.

bismark_genome_preparation should appear in your Bismark project. Repeat this step for bismark and bismark_methylation_extractor.

3. Initialize a task with bismark_genome_preparation (click to run)

Click on Run tool button for bismark_genome_preparation. You need to specify a directory containing the genome you want to align your reads against (FastA files with either .fa or .fasta extension, single or multiple sequence entries per file). bismark_genome_preparation will create two individual folders within this directory, one for a C->T converted genome and the other one for the G->A converted genome.

You will have a Tool Settings menu opened for bismark_genome_preparation. Here you need to specify the Task name, tool parameters and computing settings. Then you will need to preview the task and submit it. Specify the task name which is easy for you to recognize later on (for example, hg_bismark_prep). Specify input directory with human genome chromosome 1 (build 38).

This task require a lot of computing power – so, keep core number and RAM high (e.g. 8 cores and 52RAM).

Preview task and submit it.

4. Monitoring task progress.

Monitor the progress of your task. It will be done in a couple of minutes, but right now it is in a Running group. Once done – it will be moved to a Completed group and we can verify that nothing went wrong by looking at the error log in the right panel.

5. Obtaining the files

Now, let’s move to the File Manager (FM). Click on Files in top menu and navigate into root/bismark directory. Here you will have two individual folders within this directory, one for a C->T converted genome and the other one for the G->A converted genome

6. Initialize a task with bismark (click to run)

Here we will need to specify BS-seq data to align to reference genome and to perform cytosine methylation calls. Specify Input directory with reference genome as directory which you created with Bismark_Genome_Preparation tool. Activate (checkbox) “Input fasta/fastq files containing to be aligned” and select test_dataset.fa. If you had paired-end data – you should have specified them in two options above.

Choose output directory (create it by clicking Add new and call it “result”) and increase parallel search threads to 8.

Select 8 cores and 52 Ram.

Preview task and submit it.

Monitor task progress in Tasks as you did for Bismark_Genome_Preparation tool

And verify tool output in the /bismark/result folder :

7. Initialize a task with bismark_methylation_extractor (click to run).

Finally, we want to extract context-dependent (CpG/CHG/CHH) methylation with bismark_methylation_extractor. Click on bismark_methylation_extractor tool and specify following settings:

Preview task and submit it. Monitor task progress in Tasks as you did for two previous steps.

8. Obtaining the files

Now, let’s move to the File Manager (FM). Click on Files in top menu and navigate into root/bismark/extractor directory.

9. Running entire pipeline in a console:

Before doing steps below, please, read carefully Console introductory tutorial

Go to Files and select Console in the left menu. cd to the directory where you have unpacked genome and BS-seq test fastq file.

Using mkdir command create folder "result" and "extractor".

Use Vi editor to copy-paste content below.

idna_bismark_genome_preparation --path_to_bowtie /srv/dna_tools/bowtie2-2.2.3 --verbose /data/user35/bismark/hg38_1
idna_bismark -p 8 --samtools_path /srv/dna_tools/samtools_0.1.19 --path_to_bowtie /srv/dna_tools/bowtie2-2.2.3 --output_dir /data/user35/bismark/result /data/user35/bismark/hg38_1 /data/user35/bismark/test_data.fastq
idna_bismark_methylation_extractor -s --comprehensive --multicore 8 --output /data/user35/bismark/extractor /data/user35/bismark/result/test_data.fastq_bismark.sam.gz

Name this file bismark_pipeline.sh and execute isub command on this file (as shown below):

Monitor task execution in Tasks and verify output in your output folder.

 

Follow us on Facebook and Twitter to be the first to read our new tutorials!

Run this tool More tutorials