Skip to content

Tools Annotation

Curation of Annotation — listings under the GWAS Tools tab.

Summary Table

Click a column header to sort the table.

NAME Main citation YEAR
ANNOVAR
Wang K et al., Nucleic Acids Res, 2010
2010
SnpEff
Cingolani P et al., Fly (Austin), 2012
2012
VEP
McLaren W et al., Genome Biol, 2016
2016
loftee
Karczewski KJ et al., Nature, 2020
2020

ANNOVAR

Tool
PUBMED_LINK
20601685
FULL NAME
Annotate Variation
DESCRIPTION
ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, hg38, as well as mouse, worm, fly, yeast and many others).
URL
https://annovar.openbioinformatics.org/en/latest/
TITLE
ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data.
Main citation
Wang K, Li M, Hakonarson H. (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res, 38 (16) e164. doi:10.1093/nar/gkq603. PMID 20601685
ABSTRACT
High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a 'variants reduction' protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.
DOI
10.1093/nar/gkq603

SnpEff

Tool
PUBMED_LINK
22728672
FULL NAME
SNP effect
DESCRIPTION
Genetic variant annotation and functional effect prediction toolbox. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).
URL
http://pcingola.github.io/SnpEff/
TITLE
A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.
Main citation
Cingolani P, Platts A, Wang le L, Coon M, ...&, Ruden DM. (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin), 6 (2) 80-92. doi:10.4161/fly.19695. PMID 22728672
ABSTRACT
We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w(1118); iso-2; iso-3 strain and the reference y(1); cn(1) bw(1) sp(1) strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in the 5'UTR. We found, as expected, that the SNP frequency is proportional to the recombination frequency (i.e., highest in the middle of chromosome arms). We also found that start-gain or stop-lost SNPs in Drosophila melanogaster often result in additions of N-terminal or C-terminal amino acids that are conserved in other Drosophila species. It appears that the 5' and 3' UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus. As genome sequencing is becoming inexpensive and routine, SnpEff enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.
DOI
10.4161/fly.19695

VEP

Tool
PUBMED_LINK
27268795
FULL NAME
Ensembl Variant Effect Predictor
DESCRIPTION
The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.
URL
https://asia.ensembl.org/info/docs/tools/vep/index.html
TITLE
The Ensembl Variant Effect Predictor.
Main citation
McLaren W, Gil L, Hunt SE, Riat HS, ...&, Cunningham F. (2016) The Ensembl Variant Effect Predictor. Genome Biol, 17 (1) 122. doi:10.1186/s13059-016-0974-4. PMID 27268795
ABSTRACT
The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.
DOI
10.1186/s13059-016-0974-4

loftee

Tool
PUBMED_LINK
32461654
FULL NAME
Loss-Of-Function Transcript Effect Estimator
DESCRIPTION
A VEP plugin to identify LoF (loss-of-function) variation. Currently assesses variants that are stop-gained, splice site disrupting and Frameshift variants.
URL
https://github.com/konradjk/loftee
TITLE
The mutational constraint spectrum quantified from variation in 141,456 humans.
Main citation
Karczewski KJ, Francioli LC, Tiao G, Cummings BB, ...&, MacArthur DG. (2020) The mutational constraint spectrum quantified from variation in 141,456 humans. Nature, 581 (7809) 434-443. doi:10.1038/s41586-020-2308-7. PMID 32461654
ABSTRACT
Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.
DOI
10.1038/s41586-020-2308-7