FAQ comments

Although the role of the GW binary-code in AGO-binding activity is clearly evolutionarily conserved, yet the more precise definition of this domain is not nearly as straightforward as it might sound.

For one thing, AGO-binding sites have been found in multiple unrelated proteins such as transcription elongation factors, polymerase V subunit NRPE1, putative oxidoreductase, GW182 protein family in mammals⁠ or human prion protein⁠.

Second, AGO-binding sites are remarkably plastic to amino acid substitutions, mostly in plants and some fungi, and hence very often evolve faster than other protein regions. Thus in most cases the sequences outside the defining GW/WG core pattern diverge to a point where positional homology cannot be precisely determined even between closely-related species.

Third, the exact number of GW/WG repeats differs between highly related proteins, such as the NRPE1 orthologs in different plants⁠, or even within paralogs, such as human GW182 family members. Although one or two of these repeats are sufficient for functional AGO interactions in fission yeasts⁠, in some proteins, such as SPT5 in plants, these unusual motifs are repeated up to to 45 times.

Forth, GW/WG repeats are often separated in the primary sequence by various distances⁠, e.g. spanning hundreds of amino-acid residues in viral proteins⁠, and seem to function cooperatively during AGO-binding events.

Finally, the amino acid context of Trp determines the protein partner (i.e. AGO, CCR4–NOT deadenylase complexes) and modulates the strength of the interaction. Our view of GW proteins has evolved in past five years from one of static platforms for clustering AGO proteins to more dynamic picture in which short W-containing modules coordinate all downstream steps in gene silencing through organizing heterogeneous ensembles of proteins, the composition of which changes in different organisms and locations in the cell, both during development and in response to pathogens.

W-motifs, though critical in RNA gene regulation, have not yet been annotated in available on-line resources. Moreover, it is unlikely for a researcher to find a complete list of currently known GW proteins across central protein databases and/or specialized domain resources. Also, going through the literature to gain general view on functional W-containing motifs is impractical, especially for newcomers as each expert research group often focuses on single species-specific protein. For these reasons, we developed Whub, a freely available integrative web portal to facilitate efficient management and annotation of W-containing motifs.
Whub offers a variety of features such as: (1) a catalog of known AGO-binding proteins that aims at giving immediate insight into current knowledge about the respective gene, including a focus on mutagenesis information associated with RNAi phenotypes; (2) a computational framework for analyzing the impact of flanking residues in several thousands of single W-containing motifs for Ago-binding from plants, animals and viruses; (3) an improved sequence-scanning tool, Wsearch, that find highest-scored short W-containing motifs and asses the contribution of any of them, singly or in combination, in AGO-binding activity; (4) a machine-learning (Random forest) AGO-binding protein classifier; (5) an interactive game enabling user to in silico design synthetic or modify existing GW/WG domains through series of drag-and-drops; (6) a handy and unique way of browsing the bibliographic citations concerning iRNA-related W-motifs; (7) a downloadable stand-alone software for large-scale high-throughput protein sequence analysis.
Literature citations are shown to users as cards, each containing first author, journal and year of the publication. Whub allows to filter the literature cards on the fly through different categories such as taxonomy, protein family, function or publication type; for example through a few mouse clicks or finger taps users can narrow the corpus to research articles concerning AGO-binding proteins in plants sorted by the year of publication. User can click on any card, Whub fetches PubMed citation dynamically, when available, and displays the summary in library, as well as shows a list of GW proteins studied in the selected article and summarizes the number of experiments performed.
If you make use of the data presented here, please cite: [..]
On Whub's home page, in the Changelog protlet we provide a record of changes made to the project with the corresponding date and time.

All Trp-containing motifs are considered in predictions rather then only known rigid dipeptide seeds (e.g. GW/WG, SW/WS, TW/WT). The prediction procedure runs in both directions from W by calculating cumulative score until it reaches the maximum, meaning that it guarantees to find the best-scoring single motif.

P-value is a probability of finding W-containing motif in background proteins (UniProt database) with the same or higher score.
Wsearch features a speedy detection of highest-scoring single functional W-motifs and selection of PWM matrix. As matrices put more weight on residues close to Trp, the chance of detecting false, artifactual long and low-complexity sequences of overall compositional compatibility (e.g. glycine-rich domains, WW domains) is minimized. The new service now allows the prediction of potentially functional single motifs, determination of their boundaries as well as statistical quantifications of predicted sequences, alone or in combination. A major focus of development has been to impose a minimum of assumptions on the prediction procedure. First, all Trp-containing motifs are considered in predictions rather then only known rigid dipeptide seeds (e.g. GW/WG, SW/WS, TW/WT). Second, the prediction procedure runs in both directions from W by calculating cumulative score until it reaches the maximum, meaning that it guarantees to find the best-scoring motif. Finally, Wsearch provides single score value, in contrast to two-parameter Agos' scoring system.
At the moment the on-line version of Wsearch can only scan one protein at a time. However, a standalone version of Wsearch performs predictions in batch mode.

Wsearch uses Position-Specific Scoring Matrix to determine the highest-scored Trp-containing motifs. It estimates probability value of finding such score in background proteins (UniProt database).

i-Wsearch is based on machine learning method. It classifies whether Trp constitutes an AGO-binding site or not. It employs the sliding window technique to encode the amino acid residues flanking Trp residues of proteins. Each amino acid residue in the Trp neighbor profile is characterized by 3 descriptors including flexibility, hydrophilicity and volume. In addition i-Wsearch takes into account two amino-acid distances to nearest Trp residues at N- and C-termini from motif. In other words, whether Trp reisude belongs to the AGO-binding class or not is determined by its neighbor residues context and distances to nearest Trp.

Scanning tools are based purely on computational and statistical analysis. As a result, they return motifs which constitute good candidates. Further experiments will be necessary to decipher the exact role of predicted motifs in RNAi.
The visualization gives the user color-coded information about preferences of amino acids to be specifically present (red) or absent (blue) on certain motif positions. Strong positive selection for certain residues is represented by dark red squares and smaller positive selection by lighter squares. In similar way, strong negative selection for amino acid to be present in certain positions is highlighted as dark blue squares.
These score values represent log-odds score for certain amino acid to be present on certain position. The observed frequencies of amino acids in functional motifs (Pobs) are compared to the corresponding expected frequencies (Pexp) obtained from background subsequences in UniProt. They are calculated according to the following formula: Dia = 2 x log2(Pobs/Pexp). In similar way, substitution matrices (e.g. BLOSUM) are calculated.
Whub provides information about overall amino acid composition of W-motifs, which is presented to the user as a donut chart and table that contains more detailed numerical data such as log-odds, frequency ratios.
Comments
comments powered by Disqus