ORCAN (ORtholog sCANner) is a web-based meta-server for one-lick evolutionary and functional annotation of protein of interest by chanining together results from 14 tools.
As input the service accepts sequence in plain text or FASTA format. Once the sequence is submitted, results of completed partial tasks are directly reported to the user.
There are 3 main steps in pipeline
User provides a single protein sequence (plain/FASTA format) and specifies the output proteome to search for orthologs. One the form is submitted, ORCAN runs 4 most popular orthology prediction tools: InParanoid 4.1, Reciprocal Best BLAST Hit (RBH), Reciprocal Smallest Distance (RSD) and OrthoMCL. These methods present distinct ways for predicting ortholog pairs. They are among the most representative of graph-based methods.
Program | Reference | Short description |
---|---|---|
InParanoid | Remm et al., J Mol Biol, 2001 | InParanoid exploits a BLAST-based strategy to identify orthologs as reciprocal best hits between two species, while applying additional rules to accom- modate paralogs arising from duplication after speciation (in-paralogs). |
RBH | Altschul et al., Nucleic Acids Res, 1997 | Orthologs are assumed if two genes each in a different genome find each other as the best hit in the other genome. ORCAN uses its own implementation. |
RSD | Wall et al., Bioinformatics, 2003 | RSD relies on global sequence alignment and maximum likelihood estimation of evolutionary distances to detect orthologs between two genomes |
OrthoMCL | Li et al., Genome Research, 2003 | OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. |
ORCAN queries 4 high quality orthology databases: OrthoMCL 5, OrthoDB, eggNOG 4.5 and HOGENOM.
Database | Reference | Organisms | Short description |
---|---|---|---|
eggNOG 4.5 | Huerta-Cepas et al., Nucleic Acids Res, 2015 | 2,031 | A database of orthologous groups and functional annotation |
HOGENOM | Penel et al., BMC Bioinformatics, 2009 | 1,470 | Database of Complete Genome Homologous Genes Families |
InParanoid 8 | Sonnhammer & Östlund, Nucleic Acids Res, 2015 | 273 | Ortholog groups with inparalogs |
OrthoDB 8 | Kriventseva et al., Nucleic Acids Res, 2015 | 1,470 | The Hierarchical Catalog of Orthologs |
OrthoMCL 5 | Fischer et al., Current Protocols in Bioinformatics, 2011 | 150 | Orthologous groups of protein sequences |
ORCAN exploits 5 additional comparisons between query protein sequence and identified orthologs: pairwise sequence alignments, annotation of protein domains, detection of functional motifs, association of Gene Ontology Terms and retrieval of relevant articles.
Analysis | Tools | ORCAN Features |
---|---|---|
global and local pair-wise sequence alignment | needle and water (EMBOSS package) |
|
annotation and comparison of protein domains | HMMER3 is used against Pfam 29.0 (December 2015, 16295 entries) |
|
identification of functional protein motifs | ps_scan (provided by Swiss Institute of Bioinformatics) is used to search PROSITE data file (all patterns and profiles) |
|
retrieval and comparison of Gene Ontology Terms | Own implementations: ORCAN fetches from Gene Ontology Database GO terms associated with a given protein |
|
retrieval and comparison of articles | Own implementations: ORCAN fetches from PubMed and UniProtKB papers relevant to query and orthologous proteins |
ORCAN uses two approaches to find PubMed articles relevant to the query:
|
ORCAN uses plurality-based rating system with scores ranging from 1 to 10, with 10 indicating the most evolutionary and functionally relevant hits. To note, the rating system aims to provide intuitive indicators for the level of similarity, but not act as a statistical predictor of functionality. Seven features are considered in the rating system:
This feature is especially useful when orthology tools predict different and non-overlapping set of orthologs for a given query. By default, all 14 analyses contribute equally to the rating system. However, by clicking on the given orthologous assignment, user can assign weights (integers from 0 to 10) to different tools thus adjusting their level of contribution.
Feature | Criterion | Position | Weights | |||
---|---|---|---|---|---|---|
True | False | N/A | Default | Custom | ||
Orthology preidction tools | InParanoid | +1 | 0 | - | 1 | Integers from 0 to 1 |
OrthoMCL | +1 | 0 | - | 1 | ||
RBH | +1 | 0 | - | 1 | ||
RSD | +1 | 0 | - | 1 | ||
Orthology databases | eggNOG | +1 | 0 | - | 1 | |
HOGENOM | +1 | 0 | - | 1 | ||
InParanoid | +1 | 0 | - | 1 | ||
OrthoDB | +1 | 0 | - | 1 | ||
OMAbrowser | +1 | 0 | - | 1 | ||
Global pairwise alignment between query and ortholog | Identity X% | +X/100 (from 0 to 1) | 0 | - | 1 | |
Global alignment ofr protein domains present in query and ortholog | Depending on score | +normalized score (from 0 to 1) | 0 | +0.5 | 1 | |
Fractions of motifs shared by query and ortholog | X% | +X/100 (from 0 to 1) | 0 | +0.5 | 1 | |
Fractions of GO terms shared by query and ortholog | X% | +X/100 (from 0 to 1) | 0 | +0.5 | 1 | |
Fractions of PubMed articles shared by query and ortholog | X% | +X/100 (from 0 to 1) | 0 | - | 1 |
ORCAN operates on latest, most up to date protein sequence data - as soon as UniProt Complete Proteomes releases a new set of reference proteomes, the data are automatically integrated without stopping or disrupting web services.
Likewise, ORCAN automatically synchronizes data from Pfam and Prosite databases.