Tools (70)

Alignment-free sequence comparison tools available for research purposes

Home/Tools

NGS tools
General tools

Category	Analysis	Tool	Features	Implementation	Authors
Mapping	Transcript quantification	kallisto	Transcript abundance quantification from RNA-seq data (uses pseudoalignment for rapid determination of read compatibility with targets)	Software (C++)	Bray et al. (2016)
		Sailfish	Estimation of isoform abundances from reference sequences and RNA-seq data (k-mer based);	Software (C++)	Patro et al. (2014)
		Salmon	Quantification of the expression of transcripts using RNA-seq data (uses k-mers).	Software (C++)	Patro et al. (2017)
		RNA-Skim	RNA-seq quantification at transcript-level (partitions the transcriptome into disjoint transcript clusters; uses sig-mers - a special type of k-mers).	Software (C++)	Zhang & Wang (2014)
	Variant calling	ChimeRScope	Fusion transcript prediction using gene k-mers profiles of the RNA-seq paired-end reads	Software (Java)	Li et al. (2017)
		DiscoSnp	Reference-free detection of isolated SNPs from read datasets	Software (C / Python)	Uricaru et al. (2015)
		FastGT	Genotyping known SNV/SNP variants directly from raw NGS sequence reads by counting unique k-mers	Software (C)	Pajuste et al. (2017)
		Phy-Mer	Reference-independent mitochondrial haplogroup classifier from NGS data (k-mer based)	Software (Python)	Navarro-Gomez et al. (2015)
		LAVA	Genotyping of known SNPs (dbSNP and Affymetrix's Genome-Wide Human SNP Array) from raw NGS reads (k-mer based)	Software (C)	Shajii et al. (2016)
		MICADo	Detection of mutations in targeted 3rd generation NGS data (can distinguish patients’ specific mutations; algorithm uses k-mers and is based on colored de Bruijn graphs)	Software (Python)	Rudewicz et al. (2016)
	General mapper	Minimap	Lightweight and fast read mapper and read overlap detector (uses the concept of 'minimazers', a special type of k-mers)	Software (C)	Li (2016)
Assembly	De novo genome assembly	MHAP	Produce highly continuous assembly (fully resolved chromosome arms) from 3rd generation long and noisy reads (10 kbp) using a dimensionality reduction technique MinHash	Software (Java)	Berlin et al. (2016)
		Miniasm	Assembler of long noisy reads (SMRT, ONT) using the Overlap-Layout Consensus (OLC) approach without the necessity of error correction stage (uses minimap)	Software (C)	Li (2016)
		LINKS	Scaffolding genome assembly with error-containing long sequence (eg. ONT or PacBio reads, draft genomes)	Software (Perl)	Warren et al. (2016)
	Read clustering	afcluster	Clustering of reads from different genes and different species based on k-mer counts	Software (C++)	Solovyov & Lipkin (2013)
	Read clustering	QCluster	Clustering of reads with alignment-free measures (k-mer based) and quality values	Software (C++)	Comin et al. (2015)
	Reads error correction	Lighter	Correcting sequencing errors in raw, whole genome sequencing reads (k-mer based)	Software (C++)	Song et al. (2013)
		QuorUM	Error corrector for Illumina reads using k-mers	Software (C++)	Marçais et al. (2015)
		Trowel	Error corrector for Illumina reads using k-mers	Software (C++)	Lim et al. (2014)
Metagenomics	Assembly-free phylogenomics	AAF	Phylogeny reconstruction directly from unassembled raw sequence data from whole genome sequencing projects; provides bootstrap support to assess uncertainty in the tree topology (k-mer based)	Software (Python)	Fan et al. (2015)
		kSNP v3	Reference-free SNP identification and estimation of phylogenetic trees using SNPs (based on k-mer analysis)	Software (C)	Gardner et al. (2013) Gardner et al. (2015)
		kWIP	Reconstruct relatedness from unassembled raw sequence data.	Software (C++)	Murray et al. (2017)
		NGS-MC	Phylogeny of species based on NGS reads using alignment-free sequence dissimilarity measures d₂^* and d₂^S under different Markov chain models (using k-words)	R package	Song et al. (2013) Ren et al. (2016)
	Species identification / Taxonomic profiling	CLARK	Taxonomic classification of metagenomic reads to known bacterial genomes using k-mer search and LCA assignment	Software (C++)	Ounit & Lonardi (2016)
		FOCUS	Reports organisms present in metagenomic samples and profiles their abundances (uses composition based approach and non-negative least squares for prediction)	Web service Software (Python)	Silva et al. (2014)
		GSM	Estimation of abundances of microbial genomes in metagenomic samples (k-mer based)	Software (Go)	Pham et al. (2017)
		Mash	Species identification using assembled or unassembled Illumina, PacBio, and ONT data (based on MinHash dimensionality-reduction technique)	Software (C++)	Ondov et al. (2016)
		Kraken	Taxonomic assignment in metagenome analysis by exact k-mer search; LCA assignment of short reads based on a comprehensive sequence database	Software (C++)	Wood & Salzberg (2014)
		LMAT	Assignment of taxonomic labels to reads by k-mers searches in precomputed database	Software (C++ / Python)	Ames et al. (2013)
		Simka	compares metagenomic datasets based on their k-mers counts and computes a collection of distances classically used in ecology to compare communities.	Software (C++ / Python)	Benoit et al. (2016)
		stringMLST	k-mer based tool for multi locus sequence typing (MLST) directly from the genome sequencing reads	Software (Python)	Gupta et al. (2017)
		Taxonomer	k-mer-based ultrafast metagenomics tool for assigning taxonomy to sequencing reads from clinical and environmental samples.	Web service	Flygare et al. (2016)
	Other	d2-tools	Word-based (k-tuple) comparison (pairwise dissimilarity matrix using d₂^S measure) of metatranscriptomic samples from NGS reads	Software (Python/R)	Jiang et al. (2012) Wang et al. (2014)
		d2SBin	Contig-binning improving tool, which adjusted the contigs among bins based on the output of any existing binning tools. The tool is taxonomy-free only on the k-tuples for single metagenomic sample.	Software (C++)	Wang et al. (2017)
		VirHostMatcher	Prediction of hosts from metagenomic viral sequences based on oligonucleotide frequency (ONF) using various distance measures (e.g., d₂)	Software (C++)	Ahlgren et al. (2017)
		MetaFast	Statistics calculation of metagenome sequences and the distances between them based on assembly using de Bruijn graphs and Bray-Curtis dissimilarity measure	Software (Java)	Ulyantsev et al. (2017)

Category	Name	Features	Implementation	Authors
Pairwise & multiple sequence comparisons	ALF	Calculation of pairwise similarity scores (using N2 measure) for sequences in fasta file	Software (C++)	Göke et al. (2012)
	decaf+py	13 word-based measures; Lempel-Ziv complexity-based measure; Average Common Substring distance; W-metric	Software (Python)	Ren et al. (2013)
	multiAlignFree	Multiple alignment-free sequence comparison using 5 word-based statistics	R package	Höhl et al. (2006) Höhl et al. (2007)
	NASC	Non-Aligned Sequence Comparison: 4 word-based measures (e.g. Mahalonobis distance); 2 IT-based measures (Kolmogorov complexity)	Matlab framework	Vinga & Almeida (2003)
Whole-genome phylogeny	ALFRED ALFRED-G	Phylogenetic tree reconstruction based on the Average Common Substring (ACS) approach	Software (C++)	Thankachan et al. (2016) Thankachan et al. (2017)
	andi	Computation of evolutionary distances between closely related genomes by approximation of local alignments (k-mer based da measure); scalable to thousands of bacterial genomes	Software (C)	Haubold et al. (2015)
	CAFE	Alignment-free analysis platform for studying the relationships among genomes and metagenomes (offer 28 word-based dissimilarity measures)	Software (C)	Lu et al. (2017)
	CVTree3	Phylogeny reconstruction from whole genome sequences based on word composition	Web service	Qi et al. (2004) Zuo & Hao (2015)
	DLTree	Automated whole genome/proteome-based phylogenetic analysis based on alignment-free dynamical language method.	Web service	Wu et al. (2017)
	FFP	Feature Frequency Profile-based measures for whole genome/proteome comparisons (from viral to mammalian scale)	Software (C/Perl)	Sims et al. (2009) Jun et al. (2010) Sims et al. (2011)
	jD2Stat (JIWA)	Generation of the distance matrix using 𝐷²_S statistics to extract k-mers from large-scale unaligned genome sequences	Software (JAVA)	Chan et al. (2014)
	kr	Efficient word-based estimation of mutation distances from unaligned genomes	Software (C)	Haubold et al. (2009)
	FSWM	fast approach to estimate phylogenetic distances between large genomic sequences based on inexact word matches	Software (C++) Web service	Leimeister et al. (2017)
	kmacs	k-mismatch average common substring approach to alignment-free sequence comparison		Leimeister & Morgenstern (2014)
	Spaced	Fast alignment-free sequence comparison using spaced-word frequencies (a few minutes for pair of eukaryotic genomes of a few hundred Mb)		Leimeister et al. (2014)
	SlopeTree	Whole genome phylogeny that corrects for Horizontal Gene Transfer	Software (C++)	Bromberg et al. (2016)
	Underlying Approach	Phylogeny of whole genomes using composition of subwords (Underlying Approach)	Software (JAVA)	Comin & Verzotto (2012)
Sequence similarity search tool	RAFTS3	Searches of similar protein sequences against a protein database (>300 times faster than BLAST)	Matlab	Vialle et al. (2016)
Annotation of long noncoding RNA	lncScore	Prediction of long noncoding RNA from assembled novel transcripts	Software (Python)	Zhao et al. (2016)
Annotation of long noncoding RNA	FEELnc	Prediction of lncRNAs from RNA-seq samples based on a Random Forest model trained with multi k-mer frequencies and relaxed open reading frames.	Software (Perl/R)	Wucher et al. (2017)
Horizontal Gene Transfer (HGT)	alfy	Alignment-free local homology calculation for detecting horizontal gene transfer	Software (C)	Domazet-Lošo & Haubold (2011) Domazet-Lošo & Haubold (2011)
	rush	Detection of recombination between two unaligned DNA sequences	Software (C)	Haubold et al. (2013)
	Smash	Identification and visualization of genomic rearrangements between pairs of DNA sequences	Software (C)	Pratas et al. (2015)
	TF-IDF	Detection of HGT regions and the transfer direction in nucleotide/protein sequences	Software (C++)	Cong et al. (2016) Cong et al. (2017)
Regulatory elements	D2Z	Identification of functionally related homologous regulatory elements	Software (Perl)	Kantorovitz et al. (2007)
	MatrixREDUCE	Prediction of functional regulatory targets of TFs by predicting the total affinity of each promoter and orthologous promoters	Software (Python)	Ward & Bussemaker (2008)
	RRS	Detection of functionally similar group of enhancers and their regions	Software (Perl/C)	Koohy et al. (2010)
Sequence clustering	d2_cluster	Clustering EST and full-length cDNA sequences	Software (C)	Burke et al. (1999)
	d2-vlmc	Word-based clustering of metatranscriptomic samples using variable length Markov chains	Software (Python)	Liao et al. (2016)
	mBKM	Clustering of DNA sequences using Shannon entropy and Euclidean distance	Software (Java)	Wei et al. (2012)
	kClust	Large-scale clustering of protein sequences (down to 20-30% sequence identity)	Software (C++)	Hauser et al. (2013)
Other	COMET	Rapid classification of HIV-1 nucleotide sequences into subtypes based on prediction by partial matching compression	Web service	Struck et al. (2014)
	PPI	Identification of protein-protein interaction by coevolution analysis using discrete Fourier transform (DFT).	Software (Python)	Yin et al. (2017)
	VaxiJen	Antigen prediction based on auto cross covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties.	Web service	Doytchinova & Flower (2007)