Reconstructing species phylogeny from nucleotide sequences with Phyml

Author : InsideDNA Time : 17 August 2015 Read time : 5 min

When researchers need to reconstruct a relatively large phylogeny for multiple genes (e.g. sequenced de-novo and obtained from the NCBI database), after source sequences are obtained, aligned and combined into a single matrix, the last important step is phylogeny reconstruction. Here we present a simple way reconstruct a phylogeny from the DNA matrix based on multiple genes with Phyml tool.

For simplicity, let’s consider following case: you have sequenced several species from Balanophoraceae plant family for 5.8S ribosomal RNA and 18S ribosomal RNA products and now would like to:

  • evaluate which other Balanophoraceae species were sequences and for which genes
  • download these sequences from GenBANK in fasta format and combine downloaded sequences with the de novo sequences
  • align sequences and prepare gene matrix for all obtained genes (combined)
  • reconstruct a phylogeny for entire Balanophoraceae family

Not to blow the length of this tutorial, we will cover here step 4, and step 1step 2 and step 3 steps are discussed in other tutorials (so, subscribe to our newsletter and check the Tutorial page roll).

We have obtained fasta files for 18S ribosomal RNA, 28S ribosomal RNA and 5.8S ribosomal RNA with the help of geneCoverage and geneCoverage2fasta scripts. We then aligned individual genes with MAFFT and produced a gene matrix for all genes with SequenceMatrix. Now we are going to reconstruct a phylogeny by using Phyml tool.

Reconstructing species phylogeny from nucleotide sequences with Phyml

We are now going to use aligned sequences to reconstruction a phylogeny for our dataset. We will treat an entire dataset as a single matrix which evolves under same substation model, because Phyml cannot yet handle partitioned DNA matrix (as would, for example, BEAST or MrBayes).

1. Upload DNA matrix to InsideDNA

Log in into InsideDNA application and navigate into Files tab. Go to Balanophoraceae_project and create a new folder called phylo.

Upload a combined DNA matrix (Balanophoraceae_src.phy) created in tutorial 3 from your local machine to InsideDNA project and place it in root/Balanophoraceae_project/phylo/ folder. If you have not followed tutorial 3 please download the matrix here.

2. Add Phyml to your Balanophoraceae project

Navigate into Balanophoraceae project in My Tools tab. If you followed tutorial 1, tutorial 2 and tutorial 3, you have three tools already present there – geneCoverage, geneCoverage2fasta and MAFFT.

If you haven’t done previous tutorials, then you will need to create a new project called Balanophoraceae by clicking on + Add new project and then naming it Balanophoraceae

Now, search in the search field for Phyml tool. Click on add button and choose Balanophoraceae project in the dropdown list. Phyml should appear in your project.

3. Initialize a task (click to run Phyml)

Now we are going to initialize Phyml for phylogeny reconstruction. First, click on Run tool button.

You will have a Tool Settings menu opened for Phyml. Here you need to specify the Task name, tool parameters and queue. Then you will need to preview the task and submit it. Specify the task name which is easy for you to recognize later on. For instance, Balan_phyml_GTR as we first going to reconstruct phylogeny assuming GTR model of nucleotide substitution.

Now we need to select an input file which is our Phyml sequence file we obtained in tutorial 2 (you can also download the matrix here). Click on Browse and navigate into Balanophoraceae_project/phylo folder and select Balanophoraceae_src.phy file.

You can leave the rest of parameters to default. We don’t need to specify output files, because phyml will create all output files in our working directory (Balanophoraceae_project/phylo) with the prefix _. We have also specified bootstraps number. This will allow us to know how well certain nodes are supported with our data (sequences). Uncheck box and specify 20 bootstrap replicates. For the tree topology select BEST

Select 4 cores/3.5 Gb RAM queue, click on Task preview button and then on Submit. Go to Tasks to monitor the progress.

4. Monitoring task progress.

Just like you did in previous tutorials – monitor the progress of your Tasks. It will be done in a couple of minutes, but right now both of them are in Running group. Once done – they are moved to Completed group and we can verify that nothing went wrong by looking at the error log in the right panel.

5. Obtain the files and visualize phylogeny with FigTree

Now let’s move to the File Manager (FM). Click on Files and navigate into Balanophoraceae_project/ phylo directory. Here you will see all files associated with the task and reconstructed phylogeny. You are going to have:

  • _phyml_lk.txt : site likelihood value(s)
  • _phyml_tree.txt : inferred tree(s)
  • _phyml_stat.txt : detailed execution statistics
  • _phyml_boot_trees.txt : bootstrap trees (special case)
  • _phyml_boot_stats.txt : bootstrap statistics (special case)

We are now going to check our phylogeny. For that download all sequences to your local machine and install FigTree software. This software will easily handle the phylogeny and show the bootstrap support on the tree nodes.



Follow us on Facebook and Twitter to be the first to read our new tutorials!

Run this tool More tutorials