c_curve samples reads without replacement from the given mapped sequenced read file or duplicate count file to estimate the yield of the experiment and the subsampled experiments. These estimates are used construct the complexity curve of the experiment. Output is a text file with two columns. The first gives the total number of reads and the second the corresponding number of distinct reads.
Parent program: Preseq
The preseq package is aimed at predicting the yield of distinct reads from a genomic library from an initial sequencing experiment. The estimates can then be used to examine the utility of further sequencing, optimize the sequencing depth, or to screen multiple libraries to avoid low complexity samples