Learn about ORCAN

About

About ORCAN

ORCAN (ORtholog sCANner) is a web-based meta-server for one-lick evolutionary and functional annotation of protein of interest by chanining together results from 14 tools.

  • 4 orthology predictors
  • 5 on-line databases
  • 5 functional annotation tools

As input the service accepts sequence in plain text or FASTA format. Once the sequence is submitted, results of completed partial tasks are directly reported to the user.

#UniProtKB, #Pfam, #Prosite, #PubMed, #Gene Ontology

MAIN STEPS IN PIPELINE

There are 3 main steps in pipeline

1. Provide a protein sequence
Paste your protein sequence (plain/FASTA format).
2. Orthology predictions
ORCAN queries 5 orthology databases and runs 4 orthology prediction tools.
3. Functional annotation
ORCAN gathers information about potential function of query and predicted orthologs.

orcan's Main Features

4 Orthology prediction tools

User provides a single protein sequence (plain/FASTA format) and specifies the output proteome to search for orthologs. One the form is submitted, ORCAN runs 4 most popular orthology prediction tools: InParanoid 4.1, Reciprocal Best BLAST Hit (RBH), Reciprocal Smallest Distance (RSD) and OrthoMCL. These methods present distinct ways for predicting ortholog pairs. They are among the most representative of graph-based methods.

Program Reference Short description
InParanoid Remm et al., J Mol Biol, 2001 InParanoid exploits a BLAST-based strategy to identify orthologs as reciprocal best hits between two species, while applying additional rules to accom- modate paralogs arising from duplication after speciation (in-paralogs).
RBH Altschul et al., Nucleic Acids Res, 1997 Orthologs are assumed if two genes each in a different genome find each other as the best hit in the other genome. ORCAN uses its own implementation.
RSD Wall et al., Bioinformatics, 2003 RSD relies on global sequence alignment and maximum likelihood estimation of evolutionary distances to detect orthologs between two genomes
OrthoMCL Li et al., Genome Research, 2003 OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs.

4 Orthology Database

ORCAN queries 4 high quality orthology databases: OrthoMCL 5, OrthoDB, eggNOG 4.5 and HOGENOM.

Database Reference Organisms Short description
eggNOG 4.5 Huerta-Cepas et al., Nucleic Acids Res, 2015 2,031 A database of orthologous groups and functional annotation
HOGENOM Penel et al., BMC Bioinformatics, 2009 1,470 Database of Complete Genome Homologous Genes Families
InParanoid 8 Sonnhammer & Östlund, Nucleic Acids Res, 2015 273 Ortholog groups with inparalogs
OrthoDB 8 Kriventseva et al., Nucleic Acids Res, 2015 1,470 The Hierarchical Catalog of Orthologs
OrthoMCL 5 Fischer et al., Current Protocols in Bioinformatics, 2011 150 Orthologous groups of protein sequences

Functional annotation

ORCAN exploits 5 additional comparisons between query protein sequence and identified orthologs: pairwise sequence alignments, annotation of protein domains, detection of functional motifs, association of Gene Ontology Terms and retrieval of relevant articles.

Analysis Tools ORCAN Features
global and local pair-wise sequence alignment needle and water (EMBOSS package)
  • summarizing both alignments
  • providing textual output of needle and water.
annotation and comparison of protein domains HMMER3 is used against Pfam 29.0 (December 2015, 16295 entries)
  • interactive visualization of architecture of protein domains (content and order)
  • comparison of proteins' domain architectures using dynamic programing approach (global alignment)
  • providing textual output of Pfam searches
identification of functional protein motifs ps_scan (provided by Swiss Institute of Bioinformatics) is used to search PROSITE data file (all patterns and profiles)
  • graphical comparison of identified protein motifs between query protein and orthologs
  • textual output returned by ps_scan
retrieval and comparison of Gene Ontology Terms Own implementations: ORCAN fetches from Gene Ontology Database GO terms associated with a given protein
  • graphical comparison of GO terms between query and ortholog proteins
retrieval and comparison of articles Own implementations: ORCAN fetches from PubMed and UniProtKB papers relevant to query and orthologous proteins ORCAN uses two approaches to find PubMed articles relevant to the query:
  • The server connects with UniProt records of query and orthologous proteins to link PubMed literature (those publications associated with more than 100 protein records are excluded).
  • ORCAN directly searches the PubMed database using gene names (if available) and UniProt accessions.

Orthology Ranking system

ORCAN uses plurality-based rating system with scores ranging from 1 to 10, with 10 indicating the most evolutionary and functionally relevant hits. To note, the rating system aims to provide intuitive indicators for the level of similarity, but not act as a statistical predictor of functionality. Seven features are considered in the rating system:

  • orthology predictions (InParanoid, OrthoMCL, RBH, RSD)
  • orthologs retrieved from the databases
  • pairwise alignment between the query and hit sequences
  • content and order of protein domains
  • post-translational modifications
  • PubMed citations linked to the protein in the current version of NCBI databases
  • Gene Ontology terms

This feature is especially useful when orthology tools predict different and non-overlapping set of orthologs for a given query. By default, all 14 analyses contribute equally to the rating system. However, by clicking on the given orthologous assignment, user can assign weights (integers from 0 to 10) to different tools thus adjusting their level of contribution.

Feature Criterion Position Weights
True False N/A Default Custom
Orthology preidction tools InParanoid +1 0 - 1 Integers from 0 to 1
OrthoMCL +1 0 - 1
RBH +1 0 - 1
RSD +1 0 - 1
Orthology databases eggNOG +1 0 - 1
HOGENOM +1 0 - 1
InParanoid +1 0 - 1
OrthoDB +1 0 - 1
OMAbrowser +1 0 - 1
Global pairwise alignment between query and ortholog Identity X% +X/100 (from 0 to 1) 0 - 1
Global alignment ofr protein domains present in query and ortholog Depending on score +normalized score (from 0 to 1) 0 +0.5 1
Fractions of motifs shared by query and ortholog X% +X/100 (from 0 to 1) 0 +0.5 1
Fractions of GO terms shared by query and ortholog X% +X/100 (from 0 to 1) 0 +0.5 1
Fractions of PubMed articles shared by query and ortholog X% +X/100 (from 0 to 1) 0 - 1

Updates

ORCAN operates on latest, most up to date protein sequence data - as soon as UniProt Complete Proteomes releases a new set of reference proteomes, the data are automatically integrated without stopping or disrupting web services.

Likewise, ORCAN automatically synchronizes data from Pfam and Prosite databases.