One of the typical tasks when comparing datasets between multiple genomes or transcriptomes is to build a Venn diagram of overlapping orthologs, gene clusters, or transcripts. The same task can be applied when comparing overlap between gene clusters in metagenomics studies. However, majority of tools allowing for comparative genomic analysis do not provide any simple way to obtain a suitable file for plotting of Venn diagrams. Here we present simple pipeline of two tools built with InsideDNA platform. This pipeline (1) transforms OrthoMCL and MCL output csv file into format suitable for VennDiagram function in R and 2) plots resulting file as a Venn diagram in tiff format.

1. Input data

Input file for the pipeline is obtained with OrthoMCL tutorial. The file contains list of clusters of orthologs for three bacterial genomes of Ralstonia species. You can either go and run the tutorial or download directly resulting csv file with orthologues clusters.

2. Compile all tools into a single project

Log in (or sign up if you have not yet) into InsideDNA application and read Introduction Tutorial to get familiar with different options available on the website. Once you learned the basics, go to Tools in the main (top) navigation menu.

Let’s create out project called Venn

and add into this project two tools necessary for our pipeline: parseOrthoMCL2Venn and plotOrthoMCL2Venn

3. Running parseOrthoMCL2Venn (click to run)

First step in this pipeline is formatting of csv (or tsv) file produced in OrthoMCL tutorial into format suitable for VennDiagram function in R. Please make sure that your file is tab-delimited if you are working on your own dataset.

The settings for this tool are extremely simple – we only have to specify input csv (tsv) file as produced by OrthoMCL pipeline (mclOutput.csv) and give name for our output file:

Let’s preview and submit task:

In our case we work on three Bacterial species and therefore our output csv file will contain 3 columns. Each column contains all orthologues group ids where a particular species was detected. Inside it looks as follow:

4. Running plotOrthoMCL2Venn (click to run)

The second and last step of this pipeline is plotting of csv file obtained with parseOrthoMCL2Venn as a pdf image. The plotOrthoMCL2Venn is a tool based on VennDiagram R package and it utilizes function venn.diagram. The tool is simplified compared to the original R function, but you can always download file obtained with parseOrthoMCL2Venn and play with it in R locally by yourself.

In this tool you simply have to specify an input file (produced by parseOrthoMCL2Venn) and output pdf image name.

By default, this tool will take the first row of an input csv file as names of data partitions and randomly assign colors to each partition. You can modify partition titles and colors by changing default to your own names/colors (if you change it - watch out for format (A,B,C or blue,green,yellow - only commas, no spaces!!) and order - order is same as column order in input csv file).

5.Obtaining the files

Now let’s move to the File Manager (FM). Click on Files and navigate into Bacterial_orthomcl/­result/Bacterial_orthomcl/­result/­ and download RalstoniaVenn.pdf.

The result of this tool is a pdf image with a diagram like below.

Here we can see that among our three bacterial species, most overlap happens between species Ralstonia eutropha and Ralstonia solanacearum totaling in 74 orthologues groups. The least overlap occurs between Ralstonia solanacearum and Ralstonia pickettii with 44 groups. The total overlap between all three species is 16 orthologues groups.


