OrthomclAdjustFasta creates an OrthoMCL compliant .fasta file, by adjusting definition lines. Fields in the definition line must be separated by either ** or *|*. Any spaces immediately following the *>* are ignored. The first field is 1. For example, in the following definition line, the ID (AP_000668.1) is in field 4: >gi|89106888|ref|AP_000668.1|. If your input files do not meet the requirements, you can do some simple perl or awk processing of them to create the required input files to this program, or the required output files. This program is provided as a convenience, but OrthoMCL users are expected to have the scripting skills to provide OrthoMCL compliant .fasta files. OrthoMCL is a genome-scale algorithm for grouping orthologous protein sequences. It provides not only groups shared by two or more species or genomes, but also groups representing species-specific gene expansion families. So it serves as an important utility for automated eukaryotic genome annotation. OrthoMCL starts with reciprocal best hits within each genome as potential in-paralog or recent paralog pairs and reciprocal best hits across any two genomes as potential ortholog pairs. Related proteins are interlinked in a similarity graph. Then MCL (Markov Clustering algorithm,Van Dongen 2000; www.micans.org or mcl) is invoked to split mega-clusters. This process is analogous to the manual review in COG construction. MCL clustering is based on weights between each pair of proteins, so to correct for differences in evolutionary distance the weights are normalized before running MCL.
Parent program: orthomcl
OrthoMCL allows automatic identification of orthologous groups. The OrthoMCL uses NCBI reciprocal (one-to-one) BLAST and Markov clustering algorithms in conjunction with a relational database (MySQL) to store the intermediate data. OrthoMCL produces results similar to INPARANOID when applied to two genomes. OrthoMCL differs from the EGO strategy and the COG algorithm commonly applied to prokaryotic genomes and allows an improved recognition of 'recent' paralogs.