Category | Analysis | Tool | Features | Implementation | Authors |
---|---|---|---|---|---|
Mapping | Transcript quantification | kallisto | Transcript abundance quantification from RNA-seq data (uses pseudoalignment for rapid determination of read compatibility with targets) | Software (C++) | Bray et al. (2016) |
Sailfish | Estimation of isoform abundances from reference sequences and RNA-seq data (k-mer based); | Software (C++) | Patro et al. (2014) | ||
Salmon | Quantification of the expression of transcripts using RNA-seq data (uses k-mers). | Software (C++) | Patro et al. (2017) | ||
RNA-Skim | RNA-seq quantification at transcript-level (partitions the transcriptome into disjoint transcript clusters; uses sig-mers - a special type of k-mers). | Software (C++) | Zhang & Wang (2014) | ||
Variant calling | ChimeRScope | Fusion transcript prediction using gene k-mers profiles of the RNA-seq paired-end reads | Software (Java) | Li et al. (2017) | |
DiscoSnp | Reference-free detection of isolated SNPs from read datasets | Software (C / Python) | Uricaru et al. (2015) | ||
FastGT | Genotyping known SNV/SNP variants directly from raw NGS sequence reads by counting unique k-mers | Software (C) | Pajuste et al. (2017) | ||
Phy-Mer | Reference-independent mitochondrial haplogroup classifier from NGS data (k-mer based) | Software (Python) | Navarro-Gomez et al. (2015) | ||
LAVA | Genotyping of known SNPs (dbSNP and Affymetrix's Genome-Wide Human SNP Array) from raw NGS reads (k-mer based) | Software (C) | Shajii et al. (2016) | ||
MICADo | Detection of mutations in targeted 3rd generation NGS data (can distinguish patients’ specific mutations; algorithm uses k-mers and is based on colored de Bruijn graphs) | Software (Python) | Rudewicz et al. (2016) | ||
General mapper | Minimap | Lightweight and fast read mapper and read overlap detector (uses the concept of 'minimazers', a special type of k-mers) | Software (C) | Li (2016) | |
Assembly | De novo genome assembly | MHAP | Produce highly continuous assembly (fully resolved chromosome arms) from 3rd generation long and noisy reads (10 kbp) using a dimensionality reduction technique MinHash | Software (Java) | Berlin et al. (2016) |
Miniasm | Assembler of long noisy reads (SMRT, ONT) using the Overlap-Layout Consensus (OLC) approach without the necessity of error correction stage (uses minimap) | Software (C) | Li (2016) | ||
LINKS | Scaffolding genome assembly with error-containing long sequence (eg. ONT or PacBio reads, draft genomes) | Software (Perl) | Warren et al. (2016) | ||
Read clustering | afcluster | Clustering of reads from different genes and different species based on k-mer counts | Software (C++) | Solovyov & Lipkin (2013) | |
QCluster | Clustering of reads with alignment-free measures (k-mer based) and quality values | Software (C++) | Comin et al. (2015) | ||
Reads error correction | Lighter | Correcting sequencing errors in raw, whole genome sequencing reads (k-mer based) | Software (C++) | Song et al. (2013) | |
QuorUM | Error corrector for Illumina reads using k-mers | Software (C++) | Marçais et al. (2015) | ||
Trowel | Software (C++) | Lim et al. (2014) | |||
Metagenomics | Assembly-free phylogenomics | AAF | Phylogeny reconstruction directly from unassembled raw sequence data from whole genome sequencing projects; provides bootstrap support to assess uncertainty in the tree topology (k-mer based) | Software (Python) | Fan et al. (2015) |
kSNP v3 | Reference-free SNP identification and estimation of phylogenetic trees using SNPs (based on k-mer analysis) | Software (C) | |||
kWIP | Reconstruct relatedness from unassembled raw sequence data. | Software (C++) | Murray et al. (2017) | ||
NGS-MC | Phylogeny of species based on NGS reads using alignment-free sequence dissimilarity measures d2* and d2S under different Markov chain models (using k-words) | R package | |||
Species identification / Taxonomic profiling | CLARK | Taxonomic classification of metagenomic reads to known bacterial genomes using k-mer search and LCA assignment | Software (C++) | Ounit & Lonardi (2016) | |
FOCUS | Reports organisms present in metagenomic samples and profiles their abundances (uses composition based approach and non-negative least squares for prediction) | Web service Software (Python) |
Silva et al. (2014) | ||
GSM | Estimation of abundances of microbial genomes in metagenomic samples (k-mer based) | Software (Go) | Pham et al. (2017) | ||
Mash | Species identification using assembled or unassembled Illumina, PacBio, and ONT data (based on MinHash dimensionality-reduction technique) | Software (C++) | Ondov et al. (2016) | ||
Kraken | Taxonomic assignment in metagenome analysis by exact k-mer search; LCA assignment of short reads based on a comprehensive sequence database | Software (C++) | Wood & Salzberg (2014) | ||
LMAT | Assignment of taxonomic labels to reads by k-mers searches in precomputed database | Software (C++ / Python) | Ames et al. (2013) | ||
Simka | compares metagenomic datasets based on their k-mers counts and computes a collection of distances classically used in ecology to compare communities. | Software (C++ / Python) | Benoit et al. (2016) | ||
stringMLST | k-mer based tool for multi locus sequence typing (MLST) directly from the genome sequencing reads | Software (Python) | Gupta et al. (2017) | ||
Taxonomer | k-mer-based ultrafast metagenomics tool for assigning taxonomy to sequencing reads from clinical and environmental samples. | Web service | Flygare et al. (2016) | ||
Other | d2-tools | Word-based (k-tuple) comparison (pairwise dissimilarity matrix using d2S measure) of metatranscriptomic samples from NGS reads | Software (Python/R) | ||
d2SBin | Contig-binning improving tool, which adjusted the contigs among bins based on the output of any existing binning tools. The tool is taxonomy-free only on the k-tuples for single metagenomic sample. | Software (C++) | Wang et al. (2017) | ||
VirHostMatcher | Prediction of hosts from metagenomic viral sequences based on oligonucleotide frequency (ONF) using various distance measures (e.g., d2) | Software (C++) | Ahlgren et al. (2017) | ||
MetaFast | Statistics calculation of metagenome sequences and the distances between them based on assembly using de Bruijn graphs and Bray-Curtis dissimilarity measure | Software (Java) | Ulyantsev et al. (2017) |
Category | Name | Features | Implementation | Authors |
---|---|---|---|---|
Pairwise & multiple sequence comparisons | ALF | Calculation of pairwise similarity scores (using N2 measure) for sequences in fasta file | Software (C++) | Göke et al. (2012) |
decaf+py | 13 word-based measures; Lempel-Ziv complexity-based measure; Average Common Substring distance; W-metric | Software (Python) | Ren et al. (2013) | |
multiAlignFree | Multiple alignment-free sequence comparison using 5 word-based statistics | R package | ||
NASC | Non-Aligned Sequence Comparison: 4 word-based measures (e.g. Mahalonobis distance); 2 IT-based measures (Kolmogorov complexity) | Matlab framework | Vinga & Almeida (2003) | |
Whole-genome phylogeny | ALFRED ALFRED-G |
Phylogenetic tree reconstruction based on the Average Common Substring (ACS) approach | Software (C++) | |
andi | Computation of evolutionary distances between closely related genomes by approximation of local alignments (k-mer based da measure); scalable to thousands of bacterial genomes | Software (C) | Haubold et al. (2015) | |
CAFE | Alignment-free analysis platform for studying the relationships among genomes and metagenomes (offer 28 word-based dissimilarity measures) | Software (C) | Lu et al. (2017) | |
CVTree3 | Phylogeny reconstruction from whole genome sequences based on word composition | Web service | ||
DLTree | Automated whole genome/proteome-based phylogenetic analysis based on alignment-free dynamical language method. | Web service | Wu et al. (2017) | |
FFP | Feature Frequency Profile-based measures for whole genome/proteome comparisons (from viral to mammalian scale) | Software (C/Perl) | ||
jD2Stat (JIWA) | Generation of the distance matrix using 𝐷2S statistics to extract k-mers from large-scale unaligned genome sequences | Software (JAVA) | Chan et al. (2014) | |
kr | Efficient word-based estimation of mutation distances from unaligned genomes | Software (C) | Haubold et al. (2009) | |
FSWM | fast approach to estimate phylogenetic distances between large genomic sequences based on inexact word matches | Software (C++) Web service |
Leimeister et al. (2017) | |
kmacs | k-mismatch average common substring approach to alignment-free sequence comparison | Leimeister & Morgenstern (2014) | ||
Spaced | Fast alignment-free sequence comparison using spaced-word frequencies (a few minutes for pair of eukaryotic genomes of a few hundred Mb) | Leimeister et al. (2014) | ||
SlopeTree | Whole genome phylogeny that corrects for Horizontal Gene Transfer | Software (C++) | Bromberg et al. (2016) | |
Underlying Approach | Phylogeny of whole genomes using composition of subwords (Underlying Approach) | Software (JAVA) | Comin & Verzotto (2012) | |
Sequence similarity search tool | RAFTS3 | Searches of similar protein sequences against a protein database (>300 times faster than BLAST) | Matlab | Vialle et al. (2016) |
Annotation of long noncoding RNA | lncScore | Prediction of long noncoding RNA from assembled novel transcripts | Software (Python) | Zhao et al. (2016) |
FEELnc | Prediction of lncRNAs from RNA-seq samples based on a Random Forest model trained with multi k-mer frequencies and relaxed open reading frames. | Software (Perl/R) | Wucher et al. (2017) | |
Horizontal Gene Transfer (HGT) | alfy | Alignment-free local homology calculation for detecting horizontal gene transfer | Software (C) | |
rush | Detection of recombination between two unaligned DNA sequences | Software (C) | Haubold et al. (2013) | |
Smash | Identification and visualization of genomic rearrangements between pairs of DNA sequences | Software (C) | Pratas et al. (2015) | |
TF-IDF | Detection of HGT regions and the transfer direction in nucleotide/protein sequences | Software (C++) | ||
Regulatory elements | D2Z | Identification of functionally related homologous regulatory elements | Software (Perl) | Kantorovitz et al. (2007) |
MatrixREDUCE | Prediction of functional regulatory targets of TFs by predicting the total affinity of each promoter and orthologous promoters | Software (Python) | Ward & Bussemaker (2008) | |
RRS | Detection of functionally similar group of enhancers and their regions | Software (Perl/C) | Koohy et al. (2010) | |
Sequence clustering | d2_cluster | Clustering EST and full-length cDNA sequences | Software (C) | Burke et al. (1999) |
d2-vlmc | Word-based clustering of metatranscriptomic samples using variable length Markov chains | Software (Python) | Liao et al. (2016) | |
mBKM | Clustering of DNA sequences using Shannon entropy and Euclidean distance | Software (Java) | Wei et al. (2012) | |
kClust | Large-scale clustering of protein sequences (down to 20-30% sequence identity) | Software (C++) | Hauser et al. (2013) | |
Other | COMET | Rapid classification of HIV-1 nucleotide sequences into subtypes based on prediction by partial matching compression | Web service | Struck et al. (2014) |
PPI | Identification of protein-protein interaction by coevolution analysis using discrete Fourier transform (DFT). | Software (Python) | Yin et al. (2017) | |
VaxiJen | Antigen prediction based on auto cross covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties. | Web service | Doytchinova & Flower (2007) |