Please enable JS

Tools (70)

Alignment-free sequence comparison tools available for research purposes
Category Analysis Tool Features Implementation Authors
Mapping Transcript quantification kallisto Transcript abundance quantification from RNA-seq data (uses pseudoalignment for rapid determination of read compatibility with targets) Software (C++) Bray et al. (2016)
Sailfish Estimation of isoform abundances from reference sequences and RNA-seq data (k-mer based); Software (C++) Patro et al. (2014)
Salmon Quantification of the expression of transcripts using RNA-seq data (uses k-mers). Software (C++) Patro et al. (2017)
RNA-Skim RNA-seq quantification at transcript-level (partitions the transcriptome into disjoint transcript clusters; uses sig-mers - a special type of k-mers). Software (C++) Zhang & Wang (2014)
Variant calling ChimeRScope Fusion transcript prediction using gene k-mers profiles of the RNA-seq paired-end reads Software (Java) Li et al. (2017)
DiscoSnp Reference-free detection of isolated SNPs from read datasets Software (C / Python) Uricaru et al. (2015)
FastGT Genotyping known SNV/SNP variants directly from raw NGS sequence reads by counting unique k-mers Software (C) Pajuste et al. (2017)
Phy-Mer Reference-independent mitochondrial haplogroup classifier from NGS data (k-mer based) Software (Python) Navarro-Gomez et al. (2015)
LAVA Genotyping of known SNPs (dbSNP and Affymetrix's Genome-Wide Human SNP Array) from raw NGS reads (k-mer based) Software (C) Shajii et al. (2016)
MICADo Detection of mutations in targeted 3rd generation NGS data (can distinguish patients’ specific mutations; algorithm uses k-mers and is based on colored de Bruijn graphs) Software (Python) Rudewicz et al. (2016)
General mapper Minimap Lightweight and fast read mapper and read overlap detector (uses the concept of 'minimazers', a special type of k-mers) Software (C) Li (2016)
Assembly De novo genome assembly MHAP Produce highly continuous assembly (fully resolved chromosome arms) from 3rd generation long and noisy reads (10 kbp) using a dimensionality reduction technique MinHash Software (Java) Berlin et al. (2016)
Miniasm Assembler of long noisy reads (SMRT, ONT) using the Overlap-Layout Consensus (OLC) approach without the necessity of error correction stage (uses minimap) Software (C) Li (2016)
LINKS Scaffolding genome assembly with error-containing long sequence (eg. ONT or PacBio reads, draft genomes) Software (Perl) Warren et al. (2016)
Read clustering afcluster Clustering of reads from different genes and different species based on k-mer counts Software (C++) Solovyov & Lipkin (2013)
QCluster Clustering of reads with alignment-free measures (k-mer based) and quality values Software (C++) Comin et al. (2015)
Reads error correction Lighter Correcting sequencing errors in raw, whole genome sequencing reads (k-mer based) Software (C++) Song et al. (2013)
QuorUM Error corrector for Illumina reads using k-mers Software (C++) Marçais et al. (2015)
Trowel Software (C++) Lim et al. (2014)
Metagenomics Assembly-free phylogenomics AAF Phylogeny reconstruction directly from unassembled raw sequence data from whole genome sequencing projects; provides bootstrap support to assess uncertainty in the tree topology (k-mer based) Software (Python) Fan et al. (2015)
kSNP v3 Reference-free SNP identification and estimation of phylogenetic trees using SNPs (based on k-mer analysis) Software (C)
  • Gardner et al. (2013)
  • Gardner et al. (2015)
kWIP Reconstruct relatedness from unassembled raw sequence data. Software (C++) Murray et al. (2017)
NGS-MC Phylogeny of species based on NGS reads using alignment-free sequence dissimilarity measures d2* and d2S under different Markov chain models (using k-words) R package
  • Song et al. (2013)
  • Ren et al. (2016)
Species identification / Taxonomic profiling CLARK Taxonomic classification of metagenomic reads to known bacterial genomes using k-mer search and LCA assignment Software (C++) Ounit & Lonardi (2016)
FOCUS Reports organisms present in metagenomic samples and profiles their abundances (uses composition based approach and non-negative least squares for prediction) Web service
Software (Python)
Silva et al. (2014)
GSM Estimation of abundances of microbial genomes in metagenomic samples (k-mer based) Software (Go) Pham et al. (2017)
Mash Species identification using assembled or unassembled Illumina, PacBio, and ONT data (based on MinHash dimensionality-reduction technique) Software (C++) Ondov et al. (2016)
Kraken Taxonomic assignment in metagenome analysis by exact k-mer search; LCA assignment of short reads based on a comprehensive sequence database Software (C++) Wood & Salzberg (2014)
LMAT Assignment of taxonomic labels to reads by k-mers searches in precomputed database Software (C++ / Python) Ames et al. (2013)
Simka compares metagenomic datasets based on their k-mers counts and computes a collection of distances classically used in ecology to compare communities. Software (C++ / Python) Benoit et al. (2016)
stringMLST k-mer based tool for multi locus sequence typing (MLST) directly from the genome sequencing reads Software (Python) Gupta et al. (2017)
Taxonomer k-mer-based ultrafast metagenomics tool for assigning taxonomy to sequencing reads from clinical and environmental samples. Web service Flygare et al. (2016)
Other d2-tools Word-based (k-tuple) comparison (pairwise dissimilarity matrix using d2S measure) of metatranscriptomic samples from NGS reads Software (Python/R)
  • Jiang et al. (2012)
  • Wang et al. (2014)
d2SBin Contig-binning improving tool, which adjusted the contigs among bins based on the output of any existing binning tools. The tool is taxonomy-free only on the k-tuples for single metagenomic sample. Software (C++) Wang et al. (2017)
VirHostMatcher Prediction of hosts from metagenomic viral sequences based on oligonucleotide frequency (ONF) using various distance measures (e.g., d2) Software (C++) Ahlgren et al. (2017)
MetaFast Statistics calculation of metagenome sequences and the distances between them based on assembly using de Bruijn graphs and Bray-Curtis dissimilarity measure Software (Java) Ulyantsev et al. (2017)
Category Name Features Implementation Authors
Pairwise & multiple sequence comparisons ALF Calculation of pairwise similarity scores (using N2 measure) for sequences in fasta file Software (C++) Göke et al. (2012)
decaf+py 13 word-based measures; Lempel-Ziv complexity-based measure; Average Common Substring distance; W-metric Software (Python) Ren et al. (2013)
multiAlignFree Multiple alignment-free sequence comparison using 5 word-based statistics R package
  • Höhl et al. (2006)
  • Höhl et al. (2007)
NASC Non-Aligned Sequence Comparison: 4 word-based measures (e.g. Mahalonobis distance); 2 IT-based measures (Kolmogorov complexity) Matlab framework Vinga & Almeida (2003)
Whole-genome phylogeny ALFRED
ALFRED-G
Phylogenetic tree reconstruction based on the Average Common Substring (ACS) approach Software (C++)
  • Thankachan et al. (2016)
  • Thankachan et al. (2017)
andi Computation of evolutionary distances between closely related genomes by approximation of local alignments (k-mer based da measure); scalable to thousands of bacterial genomes Software (C) Haubold et al. (2015)
CAFE Alignment-free analysis platform for studying the relationships among genomes and metagenomes (offer 28 word-based dissimilarity measures) Software (C) Lu et al. (2017)
CVTree3 Phylogeny reconstruction from whole genome sequences based on word composition Web service
  • Qi et al. (2004)
  • Zuo & Hao (2015)
DLTree Automated whole genome/proteome-based phylogenetic analysis based on alignment-free dynamical language method. Web service Wu et al. (2017)
FFP Feature Frequency Profile-based measures for whole genome/proteome comparisons (from viral to mammalian scale) Software (C/Perl)
  • Sims et al. (2009)
  • Jun et al. (2010)
  • Sims et al. (2011)
jD2Stat (JIWA) Generation of the distance matrix using 𝐷2S statistics to extract k-mers from large-scale unaligned genome sequences Software (JAVA) Chan et al. (2014)
kr Efficient word-based estimation of mutation distances from unaligned genomes Software (C) Haubold et al. (2009)
FSWM fast approach to estimate phylogenetic distances between large genomic sequences based on inexact word matches Software (C++)
Web service
Leimeister et al. (2017)
kmacs k-mismatch average common substring approach to alignment-free sequence comparison Leimeister & Morgenstern (2014)
Spaced Fast alignment-free sequence comparison using spaced-word frequencies (a few minutes for pair of eukaryotic genomes of a few hundred Mb) Leimeister et al. (2014)
SlopeTree Whole genome phylogeny that corrects for Horizontal Gene Transfer Software (C++) Bromberg et al. (2016)
Underlying Approach Phylogeny of whole genomes using composition of subwords (Underlying Approach) Software (JAVA) Comin & Verzotto (2012)
Sequence similarity search tool RAFTS3 Searches of similar protein sequences against a protein database (>300 times faster than BLAST) Matlab Vialle et al. (2016)
Annotation of long noncoding RNA lncScore Prediction of long noncoding RNA from assembled novel transcripts Software (Python) Zhao et al. (2016)
FEELnc Prediction of lncRNAs from RNA-seq samples based on a Random Forest model trained with multi k-mer frequencies and relaxed open reading frames. Software (Perl/R) Wucher et al. (2017)
Horizontal Gene Transfer (HGT) alfy Alignment-free local homology calculation for detecting horizontal gene transfer Software (C)
  • Domazet-Lošo & Haubold (2011)
  • Domazet-Lošo & Haubold (2011)
rush Detection of recombination between two unaligned DNA sequences Software (C) Haubold et al. (2013)
Smash Identification and visualization of genomic rearrangements between pairs of DNA sequences Software (C) Pratas et al. (2015)
TF-IDF Detection of HGT regions and the transfer direction in nucleotide/protein sequences Software (C++)
  • Cong et al. (2016)
  • Cong et al. (2017)
Regulatory elements D2Z Identification of functionally related homologous regulatory elements Software (Perl) Kantorovitz et al. (2007)
MatrixREDUCE Prediction of functional regulatory targets of TFs by predicting the total affinity of each promoter and orthologous promoters Software (Python) Ward & Bussemaker (2008)
RRS Detection of functionally similar group of enhancers and their regions Software (Perl/C) Koohy et al. (2010)
Sequence clustering d2_cluster Clustering EST and full-length cDNA sequences Software (C) Burke et al. (1999)
d2-vlmc Word-based clustering of metatranscriptomic samples using variable length Markov chains Software (Python) Liao et al. (2016)
mBKM Clustering of DNA sequences using Shannon entropy and Euclidean distance Software (Java) Wei et al. (2012)
kClust Large-scale clustering of protein sequences (down to 20-30% sequence identity) Software (C++) Hauser et al. (2013)
Other COMET Rapid classification of HIV-1 nucleotide sequences into subtypes based on prediction by partial matching compression Web service Struck et al. (2014)
PPI Identification of protein-protein interaction by coevolution analysis using discrete Fourier transform (DFT). Software (Python) Yin et al. (2017)
VaxiJen Antigen prediction based on auto cross covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties. Web service Doytchinova & Flower (2007)