ORCAN | About

About ORCAN

ORCAN (ORtholog sCANner) is a web-based meta-server for one-lick evolutionary and functional annotation of protein of interest by chanining together results from 14 tools.

4 orthology predictors
5 on-line databases
5 functional annotation tools

As input the service accepts sequence in plain text or FASTA format. Once the sequence is submitted, results of completed partial tasks are directly reported to the user.

Resources

#UniProtKB, #Pfam, #Prosite, #PubMed, #Gene Ontology

Orthology predictors & Databases

#RBH, #RSD, #OrthoMCL, #InParanoid, #HOGENOM, #OrthoDB, #OMAbrowser, #eggNOG

MAIN STEPS IN PIPELINE

There are 3 main steps in pipeline

1. Provide a protein sequence

Paste your protein sequence (plain/FASTA format).

2. Orthology predictions

ORCAN queries 5 orthology databases and runs 4 orthology prediction tools.

3. Functional annotation

ORCAN gathers information about potential function of query and predicted orthologs.

4 Orthology prediction tools

User provides a single protein sequence (plain/FASTA format) and specifies the output proteome to search for orthologs. One the form is submitted, ORCAN runs 4 most popular orthology prediction tools: InParanoid 4.1, Reciprocal Best BLAST Hit (RBH), Reciprocal Smallest Distance (RSD) and OrthoMCL. These methods present distinct ways for predicting ortholog pairs. They are among the most representative of graph-based methods.

Program	Reference	Short description
InParanoid	Remm et al., J Mol Biol, 2001	InParanoid exploits a BLAST-based strategy to identify orthologs as reciprocal best hits between two species, while applying additional rules to accom- modate paralogs arising from duplication after speciation (in-paralogs).
RBH	Altschul et al., Nucleic Acids Res, 1997	Orthologs are assumed if two genes each in a different genome find each other as the best hit in the other genome. ORCAN uses its own implementation.
RSD	Wall et al., Bioinformatics, 2003	RSD relies on global sequence alignment and maximum likelihood estimation of evolutionary distances to detect orthologs between two genomes
OrthoMCL	Li et al., Genome Research, 2003	OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs.

4 Orthology Database

ORCAN queries 4 high quality orthology databases: OrthoMCL 5, OrthoDB, eggNOG 4.5 and HOGENOM.

Database	Reference	Organisms	Short description
eggNOG 4.5	Huerta-Cepas et al., Nucleic Acids Res, 2015	2,031	A database of orthologous groups and functional annotation
HOGENOM	Penel et al., BMC Bioinformatics, 2009	1,470	Database of Complete Genome Homologous Genes Families
InParanoid 8	Sonnhammer & Östlund, Nucleic Acids Res, 2015	273	Ortholog groups with inparalogs
OrthoDB 8	Kriventseva et al., Nucleic Acids Res, 2015	1,470	The Hierarchical Catalog of Orthologs
OrthoMCL 5	Fischer et al., Current Protocols in Bioinformatics, 2011	150	Orthologous groups of protein sequences

Functional annotation

ORCAN exploits 5 additional comparisons between query protein sequence and identified orthologs: pairwise sequence alignments, annotation of protein domains, detection of functional motifs, association of Gene Ontology Terms and retrieval of relevant articles.

Analysis	Tools	ORCAN Features
global and local pair-wise sequence alignment	needle and water (EMBOSS package)	summarizing both alignments providing textual output of needle and water.
annotation and comparison of protein domains	HMMER3 is used against Pfam 29.0 (December 2015, 16295 entries)	interactive visualization of architecture of protein domains (content and order) comparison of proteins' domain architectures using dynamic programing approach (global alignment) providing textual output of Pfam searches
identification of functional protein motifs	ps_scan (provided by Swiss Institute of Bioinformatics) is used to search PROSITE data file (all patterns and profiles)	graphical comparison of identified protein motifs between query protein and orthologs textual output returned by ps_scan
retrieval and comparison of Gene Ontology Terms	Own implementations: ORCAN fetches from Gene Ontology Database GO terms associated with a given protein	graphical comparison of GO terms between query and ortholog proteins
retrieval and comparison of articles	Own implementations: ORCAN fetches from PubMed and UniProtKB papers relevant to query and orthologous proteins	ORCAN uses two approaches to find PubMed articles relevant to the query: The server connects with UniProt records of query and orthologous proteins to link PubMed literature (those publications associated with more than 100 protein records are excluded). ORCAN directly searches the PubMed database using gene names (if available) and UniProt accessions.

Orthology Ranking system

ORCAN uses plurality-based rating system with scores ranging from 1 to 10, with 10 indicating the most evolutionary and functionally relevant hits. To note, the rating system aims to provide intuitive indicators for the level of similarity, but not act as a statistical predictor of functionality. Seven features are considered in the rating system:

orthology predictions (InParanoid, OrthoMCL, RBH, RSD)
orthologs retrieved from the databases
pairwise alignment between the query and hit sequences
content and order of protein domains
post-translational modifications
PubMed citations linked to the protein in the current version of NCBI databases
Gene Ontology terms

This feature is especially useful when orthology tools predict different and non-overlapping set of orthologs for a given query. By default, all 14 analyses contribute equally to the rating system. However, by clicking on the given orthologous assignment, user can assign weights (integers from 0 to 10) to different tools thus adjusting their level of contribution.

Feature	Criterion	Position			Weights
Feature	Criterion	True	False	N/A	Default	Custom
Orthology preidction tools	InParanoid	+1	0	-	1	Integers from 0 to 1
	OrthoMCL	+1	0	-	1
	RBH	+1	0	-	1
	RSD	+1	0	-	1
Orthology databases	eggNOG	+1	0	-	1
	HOGENOM	+1	0	-	1
	InParanoid	+1	0	-	1
	OrthoDB	+1	0	-	1
	OMAbrowser	+1	0	-	1
Global pairwise alignment between query and ortholog	Identity X%	+X/100 (from 0 to 1)	0	-	1
Global alignment ofr protein domains present in query and ortholog	Depending on score	+normalized score (from 0 to 1)	0	+0.5	1
Fractions of motifs shared by query and ortholog	X%	+X/100 (from 0 to 1)	0	+0.5	1
Fractions of GO terms shared by query and ortholog	X%	+X/100 (from 0 to 1)	0	+0.5	1
Fractions of PubMed articles shared by query and ortholog	X%	+X/100 (from 0 to 1)	0	-	1

Updates

ORCAN operates on latest, most up to date protein sequence data - as soon as UniProt Complete Proteomes releases a new set of reference proteomes, the data are automatically integrated without stopping or disrupting web services.

Likewise, ORCAN automatically synchronizes data from Pfam and Prosite databases.