Tools Annotation

Curation of Annotation — listings under the GWAS Tools tab.

Summary Table

Click a column header to sort the table.

NAME	Main citation	YEAR
ANNOVAR	Wang K et al., Nucleic Acids Res, 2010	2010
SnpEff	Cingolani P et al., Fly (Austin), 2012	2012
VEP	McLaren W et al., Genome Biol, 2016	2016
loftee	Karczewski KJ et al., Nature, 2020	2020

ANNOVAR

Tool

PUBMED_LINK

20601685

FULL NAME

Annotate Variation

DESCRIPTION

ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, hg38, as well as mouse, worm, fly, yeast and many others).

Show full descriptionShow less

URL

https://annovar.openbioinformatics.org/en/latest/

TITLE

ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data.

Main citation

Wang K, Li M, Hakonarson H. (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res, 38 (16) e164. doi:10.1093/nar/gkq603. PMID 20601685

ABSTRACT

High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a 'variants reduction' protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.

Show full abstractShow less

DOI

10.1093/nar/gkq603

SnpEff

Tool

PUBMED_LINK

22728672

FULL NAME

SNP effect

DESCRIPTION

Genetic variant annotation and functional effect prediction toolbox. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).

Show full descriptionShow less

URL

http://pcingola.github.io/SnpEff/

TITLE

A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.

Main citation

Cingolani P, Platts A, Wang le L, Coon M, ...&, Ruden DM. (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin), 6 (2) 80-92. doi:10.4161/fly.19695. PMID 22728672

ABSTRACT

We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w(1118); iso-2; iso-3 strain and the reference y(1); cn(1) bw(1) sp(1) strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in the 5'UTR. We found, as expected, that the SNP frequency is proportional to the recombination frequency (i.e., highest in the middle of chromosome arms). We also found that start-gain or stop-lost SNPs in Drosophila melanogaster often result in additions of N-terminal or C-terminal amino acids that are conserved in other Drosophila species. It appears that the 5' and 3' UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus. As genome sequencing is becoming inexpensive and routine, SnpEff enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.

Show full abstractShow less

DOI

10.4161/fly.19695

VEP

Tool

PUBMED_LINK

27268795

FULL NAME

Ensembl Variant Effect Predictor

DESCRIPTION

The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.

Show full descriptionShow less

URL

https://asia.ensembl.org/info/docs/tools/vep/index.html

TITLE

The Ensembl Variant Effect Predictor.

Main citation

McLaren W, Gil L, Hunt SE, Riat HS, ...&, Cunningham F. (2016) The Ensembl Variant Effect Predictor. Genome Biol, 17 (1) 122. doi:10.1186/s13059-016-0974-4. PMID 27268795

ABSTRACT

The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.

Show full abstractShow less

DOI

10.1186/s13059-016-0974-4

loftee

Tool

PUBMED_LINK

32461654

FULL NAME

Loss-Of-Function Transcript Effect Estimator

DESCRIPTION

A VEP plugin to identify LoF (loss-of-function) variation. Currently assesses variants that are stop-gained, splice site disrupting and Frameshift variants.

Show full descriptionShow less

URL

https://github.com/konradjk/loftee

TITLE

The mutational constraint spectrum quantified from variation in 141,456 humans.

Main citation

Karczewski KJ, Francioli LC, Tiao G, Cummings BB, ...&, MacArthur DG. (2020) The mutational constraint spectrum quantified from variation in 141,456 humans. Nature, 581 (7809) 434-443. doi:10.1038/s41586-020-2308-7. PMID 32461654

ABSTRACT

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.

Show full abstractShow less

DOI

10.1038/s41586-020-2308-7