gc_extrap uses rational function approximations to Good and Toulmin estimator to predict the genomic coverage, i.e. the number of bases covered at least once, from deeper sequencing in a single cell or low input sequencing experiment based on the observed coverage counts. The option is available to predict the coverage based on binned coverage counts to speed up the estimates. gc_extrap requires mapped read or bed format input, so the tool bam2mr is provided to convert bam format read to mapped read format. Output is a text file with four columns. The first is the total number of sequenced and mapped bases, second gives the corresponding expected number of distinct bases covered, and the third and fourth give the lower and upper limits of the confidence interval. Specifying verbose will print out the coverage counts histogram of the input file.
Parent program: Preseq
The preseq package is aimed at predicting the yield of distinct reads from a genomic library from an initial sequencing experiment. The estimates can then be used to examine the utility of further sequencing, optimize the sequencing depth, or to screen multiple libraries to avoid low complexity samples