GLAM2 is a software package for finding motifs in sequences, typically amino-acid or nucleotide sequences. A motif is a re-occurring sequence pattern: typical examples are the TATA box and the CAAX prenylation motif. The main innovation of GLAM2 is that it allows insertions and deletions in motifs. You give glam2 a set of sequences, and it finds the strongest motif shared by these sequences. More exactly, glam2 gives you an alignment of segments of the sequences. Each sequence contributes at most one segment to the alignment. glam2 assigns scores to alignments: the score favours alignment of similar residues, and disfavours insertions and deletions, but less so if they repeatedly occur at the same, presumably fragile, positions. glam2 attempts to find a maximal-scoring alignment for your sequences. To use glam2 effectively, you need to understand roughly how it works. glam2 starts from a random alignment, and makes many small, random changes to it, which are designed to find high-scoring alignments in the long run. The longer you let it run, the more likely it is to find a maximal-scoring alignment. To check that a reproducible, high-scoring motif has been found, the whole procedure is run several (e.g. 10) times from different starting alignments. If all runs produce identical alignments, we have maximum confidence that this is the optimal motif. (To gain even more confidence, consider varying the initial motif width: see below.) If a few of the runs produce different, lower-scoring motifs, we still have high confidence. If all the runs produce completely different alignments, we have low confidence, and the run-length needs to be increased. An alternative is to check that similar, but not necessarily identical, alignments are found repeatedly. This suggests that the optimal motif has been found, if not the exactly optimal alignment. With large numbers of sequences, there are so many possible alignments that it is not feasible to find the precisely optimal one, and this is the best that can be hoped for. Furthermore, the precisely optimal alignment is not very meaningful: it is rather like writing a moderately accurate value to twelve decimal places.
Parent program: meme
MEME is a tool for discovering motifs in a group of related DNA or protein sequences. MEME takes as input a group of DNA or protein sequences and outputs as many motifs as requested up to a user-specified statistical confidence threshold. MEME uses statistical modeling techniques to automatically choose the best width, number of occurrences, and description for each motif.