Gapcleaner finds the most-parsimonious ordering of indels and the minimal chain decomposition of the partially ordered set (POSET) of the alignment. In FSA, alignment are defined as a set of homology statements between characters. This definition of an alignment is expressed mathematically as a partially ordered set (POSET), which is equivalent to viewing an alignment as a Directed Acyclic Graph (DAG) as implemented in the POA alignment program. Gapcleaning procedure, thus, corresponds to minimizing the number of gap-openings across the sequences and to finding a minimal-chain decomposition of the alignment POSET. Such a minimal-chain decomposition (Dilworth decomposition) of an arbitrary POSET can be found in time cubic in the number of nodes. Thanks to the special structure of sequence graphs FSA can find this minimum-indel global alignment in time linear in the number of vertices and edges in the graph with a depth-first search.
Parent program: fsa
FSA is a probabilistic multiple sequence alignment algorithm which uses a 'distance-based' approach to aligning homologous protein, RNA or DNA sequences. Much as distance-based phylogenetic reconstruction methods like Neighbor-Joining build a phylogeny using only pairwise divergence estimates, FSA builds a multiple alignment using only pairwise estimations of homology. FSA can be used for all alignment problems, including: detailed analysis of a single family of proteins or RNAs, large-scale alignment of thousands of sequences and genome alignment of megabases of orthologous sequences.