58 citations
211 runs

mapDamage

By Jonsson H., Orlando L., Last update 1494698999
All tools Run this tool

mapDamage description

MapDamage2 is a computational framework, which tracks and quantifies DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms. Ancient DNA (aDNA) molecules in fossilized bones and teeth, coprolites, sediments, mummified specimens and museum collections represent fantastic sources of information for evolutionary biologists, revealing the agents of past epidemics and the dynamics of past populations. However, the analysis of aDNA generally faces two major issues. Firstly, sequences consist of a mixture of endogenous and various exogenous backgrounds, mostly microbial. Secondly, high nucleotide misincorporation rates can be observed as a result of severe post-mortem DNA damage. Such misincorporation patterns are instrumental to authenticate ancient sequences versus modern contaminants. mapDamage identifies such patterns from next-generation sequencing (NGS) sequence datasets. The absence of formal statistical modeling of the DNA damage process, however, precluded rigorous quantitative comparisons across samples. mapDamage incorporates statistical model of DNA damage. Assuming that damage events depend only on sequencing position and post-mortem deamination, the Bayesian statistical framework provides estimates of four key features of aDNA molecules: the average length of overhangs, nick frequency and cytosine deamination rates in both double-stranded regions and overhangs. The model enables rescaling base quality scores according to their probability of being damaged. mapDamage handles NGS datasets with ease and is compatible with a wide range of DNA library protocols. Two files are needed for mapDamage: A valid SAM or BAM file with a correct header, as argument to the -i option. A FASTA file that contains reference sequences used for mapping reads, as argument to the -r option. References described in the SAM or BAM header and the FASTA file must be coherent, i.e, the references must have identical names and lengths. Extra sequences present in the FASTA header raise a warning but the program will proceed since all necessary references are available. As an alternative, one can run only the plotting, statistic estimations or rescaling on an already processed dataset. Use a combination of -d option followed by a valid folder and the --plot-only, --stats-only or --rescale-only options. The tool assumes the pairs are facing inwards when counting the position specific misincorporations and rescaling quality scores. An additional assumption in the rescaling process is the pairs are non-overlapping. Make sure the pairing information in the BAM or SAM file is correct as we rely on the paired end information provided by the file. We advise not using non-overlapping paired ends coming from highly degraded samples (small template lengths) as they are likely contamination.


Parent program: mapDamage

MapDamage2 is a computational framework, which tracks and quantifies DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms. Ancient DNA (aDNA) molecules in fossilized bones and teeth, coprolites, sediments, mummified specimens and museum collections represent fantastic sources of information for evolutionary biologists, revealing the agents of past epidemics and the dynamics of past populations