Although the role of the GW binary-code in AGO-binding activity is clearly evolutionarily conserved, yet the more precise definition of this domain is not nearly as straightforward as it might sound.
For one thing, AGO-binding sites have been found in multiple unrelated proteins such as transcription elongation factors, polymerase V subunit NRPE1, putative oxidoreductase, GW182 protein family in mammals or human prion protein.
Second, AGO-binding sites are remarkably plastic to amino acid substitutions, mostly in plants and some fungi, and hence very often evolve faster than other protein regions. Thus in most cases the sequences outside the defining GW/WG core pattern diverge to a point where positional homology cannot be precisely determined even between closely-related species.
Third, the exact number of GW/WG repeats differs between highly related proteins, such as the NRPE1 orthologs in different plants, or even within paralogs, such as human GW182 family members. Although one or two of these repeats are sufficient for functional AGO interactions in fission yeasts, in some proteins, such as SPT5 in plants, these unusual motifs are repeated up to to 45 times.
Forth, GW/WG repeats are often separated in the primary sequence by various distances, e.g. spanning hundreds of amino-acid residues in viral proteins, and seem to function cooperatively during AGO-binding events.
Finally, the amino acid context of Trp determines the protein partner (i.e. AGO, CCR4–NOT deadenylase complexes) and modulates the strength of the interaction. Our view of GW proteins has evolved in past five years from one of static platforms for clustering AGO proteins to more dynamic picture in which short W-containing modules coordinate all downstream steps in gene silencing through organizing heterogeneous ensembles of proteins, the composition of which changes in different organisms and locations in the cell, both during development and in response to pathogens.
All Trp-containing motifs are considered in predictions rather then only known rigid dipeptide seeds (e.g. GW/WG, SW/WS, TW/WT). The prediction procedure runs in both directions from W by calculating cumulative score until it reaches the maximum, meaning that it guarantees to find the best-scoring single motif.
Wsearch uses Position-Specific Scoring Matrix to determine the highest-scored Trp-containing motifs. It estimates probability value of finding such score in background proteins (UniProt database).
i-Wsearch is based on machine learning method. It classifies whether Trp constitutes an AGO-binding site or not. It employs the sliding window technique to encode the amino acid residues flanking Trp residues of proteins. Each amino acid residue in the Trp neighbor proﬁle is characterized by 3 descriptors including flexibility, hydrophilicity and volume. In addition i-Wsearch takes into account two amino-acid distances to nearest Trp residues at N- and C-termini from motif. In other words, whether Trp reisude belongs to the AGO-binding class or not is determined by its neighbor residues context and distances to nearest Trp.