Tomtom program searches one or more query motifs against one or more databases of target motifs (and their DNA reverse complements), and reports for each query a list of target motifs, ranked by p-value. The E-value and the q-value of each match is also reported. The q-value is the minimal false discovery rate at which the observed similarity would be deemed significant. The output contains results for each query, in the order that the queries appear in the input file. For a given pair of motifs, the program considers all offsets, while requiring a minimum number of overlapping positions. For a given offset, each overlapping position is scored using one of seven column similarity functions defined below. Columns in the query motif that don*t overlap the target motif are assigned a score equal to the median score of the set of random matches to that column. In order to compute the scores, Tomtom needs to know the frequencies of the letters of the sequence alphabet in the database being searched (the 'background' letter frequencies). By default, the background letter frequencies included in the MEME input files are used. The scores of columns that overlap for a given offset are summed. This summed score is then converted to a p-value. The reported p-value is the minimal p-value over all possible offsets. To compensate for multiple testing, each reported p-value is converted to an E-value by multiplying it by twice the number of target motifs. As a second type of multiple-testing correction, q-values for each match are computed from the set of p-values and reported.
Parent program: meme
MEME is a tool for discovering motifs in a group of related DNA or protein sequences. MEME takes as input a group of DNA or protein sequences and outputs as many motifs as requested up to a user-specified statistical confidence threshold. MEME uses statistical modeling techniques to automatically choose the best width, number of occurrences, and description for each motif.