One of the most powerful frameworks for inference of population genetics or genomics scenarios is Approximate Bayesian Computation (ABC). Compared to conventional approaches such as likelihood-based ones, ABC-based strategy allows to effectively model complex population history scenarios and offers a flexible way of assessing a fit of alternative hypotheses. One of the popular tools for ABC modeling is DIY-ABC. It has an easy to use interface, but, unfortunately, requires a powerful cluster for generation of thousands of datasets for ABC evaluation. Here we describe a simple 2 steps wrapper pipeline for simulating large amount of ABC datasets. Our wrapper will be particularly useful for those who don’t have UNIX skills or cluster but still need large amount of ABC simulations.
In essence, ABC computation is based on the idea that instead of having an explicit model with a well-defined likelihood one can simply generate lots of datasets with different parameters and then compare simulated datasets to a real dataset. Comparison is done with so-called summary statistics – values that summarize datasets (simulated and real). Simulated dataset with the values of summary statistics close enough to the values of summary statistics of the real data represent a correct population genetics scenario.
Nevertheless, there is a big limitation – to confidently compare different scenarios, ABC approach requires a large number of simulated datasets (thousands or even hundreds of thousands). Therefore, ABC is a computationally costly method and normally it requires a large cluster.
DIY-ABC approaches this problem in two steps:
In InsideDNA we added two simple tools that allow users who don’t have cluster or good UNIX skills to quickly and easily simulate large number of ABC datasets.
Log in (or sign up if you have not yet) into InsideDNA application and read Introduction Tutorial to get familiar with different options available on the website. Once you learned the basics, navigate into Files tab. Create a new folder called diy-abc
Navigate into My Tools tab. Create a new project by clicking on + Add new project. Name it diy_abc
Now, search in the search field for “abc”. Three tools will be returned. Click on + button on DIYABC_rndgen and choose diy_abc project in the dropdown list. DIYABC_rndgen should appear in your diy_abc project.
Repeat this operation for DIYABC_sim. You now should have two tools in your diy_abc project
Click on Run tool button for DIYABC_rndgen. This tool will generate one or more RNG files necessary for simulation of datasets. For now we will generate only 1 RNG file with a capacity of being multithreaded on 16 cores. In principle, if you’d like to parallelize computations not only between cores, but also between nodes, more RNG files can be generated.
You will have a Tool Settings menu opened for DIYABC_rndgen. Here you need to specify the Task name, tool parameters and computing settings. Then you will need to preview the task and submit it. Specify the task name which is easy for you to recognize later on (for example, abc_rnd).
Specify the directory root/diy_abc with your input data.
Specify number of cores to use – 16 in our case and number of computers – 1 in our case. More cores you choose, faster simulation of ABC datasets will be done. Number of computers is necessary when you want to use multiple nodes, but this may be tricky, so keep number of computers equal to 1. This task doesn’t require much computing power – so, keep core number and RAM low.
Preview task and submit it.
Monitor the progress of your task. It will be done in a couple of minutes, but right now it is in a Running group. Once done – it will be moved to a Completed group and we can verify that nothing went wrong by looking at the error log in the right panel.
Now, let’s move to the File Manager (FM). Click on Files in top menu and navigate into root/diy_abc directory. Here you will have a single RNG file. If you have chosen multiple computers (nodes) you will have more RNG files (equal to number of computers).
Now, lets move back to the Tools and launch DIYABC_sim. This is the tool where most intense computation is going to happen.
First, remember that we have only one RNG file with a suffix *_0000.bin. Also, we selected 16 cores for this RNG file. So, now we will specify settings as follow:
Preview and submit it.
Check how things are going in the Task section. Remember, it may take quite a while for simulations to finish – this is an ABC, indeed.
When your task is completed, click on Files and navigate into root/diy_abc directory. Here you will find all the resulting files with simulations.
You should now transfer these files back to your local machine and analyze them with GUI of DIY-ABC. Summarizing simulated dataset is computationally simple operation which doesn’t require powerful cluster. So, you should be able to easily do it on your own machine.
Follow us on Facebook and Twitter to be the first to read our new tutorials!Run this tool More tutorials