The server queries 5 high quality orthology databases (eggNOG, OMA, OrthoDB, HOGENOM, InParanoid 8) and runs 4 popular orthology prediction tools (InParanoid 4.1, OrthoMCL, RBH, RSD) against latest version of UniProt Reference Proteomes.
Then, the server exploits 5 additional comparisons between query and orthologous sequences:
In the end, ORCAN asses the orthologous assignments using a plurality-based rating system with scores ranging from from 1 to 10.
Orthology prediction tools and databases, integrated in ORCAN, may very often provide different orthologous assignments for a given query sequences. Some resources may also lack any predictions for user's query. So how does ORCAN behave in such cases? First, ORCAN collects all unique orthologous assignments returned by prediction tools or databses. Then, for each orthologous assignment, ORCAN counts the number of tools supporting this prediction and presents the overall rating score.
You can see how all it works by running the 'Example 2' demonstration on the ORCAN submission page. In this example, 4 orthology prediction tools and 5 orthology datbases predict 3 potential orthologs. ORCAN presents a list of all predictions along with their overall confidence level.
The rating system assigns a score (from 1 to 10) to each orthology prediction. This score is based on the level of consistency of the given prediction across the various databases and tools. For each orthologous assignment ORCAN returns the overall score by considering seven different features:
The rating system is useful when the tools integrated in ORCAN predict different orthologs for a given query.
By default, all 13 analyses contribute equally to the rating system. However, you can assign weights (integers from 1 to 10) to different tools thus adjusting their level of contribution in calculating an overall score for given orthologous assignment. Weight of 10 means that a given tool is 10 times more important than tools of weight of 1).
If you want to provide your own weights for certain tools, click on the icon on a submission or resulting page. Your configuration will be saved for all further searches for orthologs.
The orthology prediction tools (InParanoid, OrthoMCL and RSD) integrated in ORCAN can predict one-to-many and many-to-many orthologous relationships for a given query. Likewise, most databases used in ORCAN can also return clusters of orthologs. If this is the case, ORCAN will present a list of all potential orthologs with their corresponding confidence level.
You can see how all it works by running the 'Example 2' demonstration on the ORCAN submission page. In this example, 4 orthology prediction tools and 5 orthology datbases predict 3 potential orthologs.
ORCAN uses the original implementations of OrthoMCL 2.0.9 and InParanoid 4.1 with developer-recommended default parameters. The speed-up in their running time shown in ORCAN comes from the fact that the orthology prediction procedure is performed on a pre-selected set of proteins that show recognizable sequence similarity to the sequence of interest, rather than using a full set of proteins, of which the majority is functionally and evolutionary unrelated to the query sequence.
Technically, the search of orthologs in species B for a query protein A1 from species A can be described as follows. The protein sequence A1 is used as query in two BLAST searches, against proteome A and B. From both BLAST results, we collect all protein sequences that might be evolutionary relevant to the query protein – this is done by selecting first 20 highest-scored protein hits and all other protein matches (if they exist) that obtained e-value less than 1e-5. The e-value cut-off is most commonly used in searches for homologous proteins (e.g. default in OrthoMCL, RSD, Pfam). In addition, the selection of 20 top-ranked proteins (regardless of e-value) guarantees that true ortholog (if exists) of the query protein is present among selected proteins, even in the case of searches, which include phylogenetically distant organisms (e.g. human and bacteria). In the next step, the FASTA sequences of the selected proteins are retrieved from proteomes A and B and used to create corresponding datasets, A` and B`. These datasets A` and B` are then used as input to OrthoMCL 2.0.9 and InParanoid 4.1. Once the orthology prediction is finished, ORCAN parse the output files and reports the potential orthologs and co-orthologs for the user’s query sequence.
Of note, the four popular orthology prediction algorithms that are integrated in ORCAN (i.e. InParanoid, OrthoMCL, RBH and RSD) are so-called graph-based methods, which means that they use different variants (depending on the underlying algorithm) of reciprocal BLAST searches to find the ‘nearest neighbor’ (Kuzniar et al., Trends in Genetics, 2008). Obviously, in order to be methodologically correct, all graph-based programs require complete proteomes for orthology inference. However, graph-based approach does not require to predict all orthologs between two proteomes when user is interested only in one protein. For example, in the simplest RBH case, one can BLAST a protein of interest against a subject's complete proteome, then take a best hit and BLAST it the other way against query's complete proteome. Also, the RSD program lets users to find orthologs for specific sequences in the query genome without the time-consuming step of calculating all other orthologs.