Skip to content

Annotation

Summary Table

NAME CITATION YEAR
ANNOVAR Wang K, Li M, Hakonarson H. (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data Nucleic Acids Res., 38 (16) e164. doi:10.1093/nar/gkq603. PMID 20601685 2010
SnpEff Cingolani P, Platts A, Wang leL, Coon M, ...&, Ruden DM. (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3 Fly , 6 (2) 80-92. doi:10.4161/fly.19695. PMID 22728672 2012
VEP McLaren W, Gil L, Hunt SE, Riat HS, ...&, Cunningham F. (2016) The Ensembl Variant Effect Predictor Genome Biol., 17 (1) 122. doi:10.1186/s13059-016-0974-4. PMID 27268795 2016
loftee Karczewski KJ, Francioli LC, Tiao G, Cummings BB, ...&, MacArthur DG. (2020) The mutational constraint spectrum quantified from variation in 141,456 humans Nature, 581 (7809) 434-443. doi:10.1038/s41586-020-2308-7. PMID 32461654 2020

ANNOVAR

  • NAME : ANNOVAR
  • SHORT NAME : ANNOVAR
  • FULL NAME : Annotate Variation
  • DESCRIPTION : ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, hg38, as well as mouse, worm, fly, yeast and many others).
  • URL : https://annovar.openbioinformatics.org/en/latest/
  • TITLE : ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data
  • DOI : 10.1093/nar/gkq603
  • ABSTRACT : High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a 'variants reduction' protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.
  • CITATION : Wang K, Li M, Hakonarson H. (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data Nucleic Acids Res., 38 (16) e164. doi:10.1093/nar/gkq603. PMID 20601685
  • JOURNAL_INFO : Nucleic acids research ; Nucleic Acids Res. ; 2010 ; 38 ; 16 ; e164
  • PUBMED_LINK : 20601685

SnpEff

  • NAME : SnpEff
  • SHORT NAME : SnpEff
  • FULL NAME : SNP effect
  • DESCRIPTION : Genetic variant annotation and functional effect prediction toolbox. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).
  • URL : http://pcingola.github.io/SnpEff/
  • TITLE : A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3
  • DOI : 10.4161/fly.19695
  • ABSTRACT : We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w(1118); iso-2; iso-3 strain and the reference y(1); cn(1) bw(1) sp(1) strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in the 5'UTR. We found, as expected, that the SNP frequency is proportional to the recombination frequency (i.e., highest in the middle of chromosome arms). We also found that start-gain or stop-lost SNPs in Drosophila melanogaster often result in additions of N-terminal or C-terminal amino acids that are conserved in other Drosophila species. It appears that the 5' and 3' UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus. As genome sequencing is becoming inexpensive and routine, SnpEff enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.
  • CITATION : Cingolani P, Platts A, Wang leL, Coon M, ...&, Ruden DM. (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3 Fly , 6 (2) 80-92. doi:10.4161/fly.19695. PMID 22728672
  • JOURNAL_INFO : Fly ; Fly ; 2012 ; 6 ; 2 ; 80-92
  • PUBMED_LINK : 22728672

VEP

  • NAME : VEP
  • SHORT NAME : VEP
  • FULL NAME : Ensembl Variant Effect Predictor
  • DESCRIPTION : The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.
  • URL : https://asia.ensembl.org/info/docs/tools/vep/index.html
  • TITLE : The Ensembl Variant Effect Predictor
  • DOI : 10.1186/s13059-016-0974-4
  • ABSTRACT : The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.
  • CITATION : McLaren W, Gil L, Hunt SE, Riat HS, ...&, Cunningham F. (2016) The Ensembl Variant Effect Predictor Genome Biol., 17 (1) 122. doi:10.1186/s13059-016-0974-4. PMID 27268795
  • JOURNAL_INFO : Genome biology ; Genome Biol. ; 2016 ; 17 ; 1 ; 122
  • PUBMED_LINK : 27268795

loftee

  • NAME : loftee
  • SHORT NAME : LOFTEE
  • FULL NAME : Loss-Of-Function Transcript Effect Estimator
  • DESCRIPTION : A VEP plugin to identify LoF (loss-of-function) variation. Currently assesses variants that are stop-gained, splice site disrupting and Frameshift variants.
  • URL : https://github.com/konradjk/loftee
  • TITLE : The mutational constraint spectrum quantified from variation in 141,456 humans
  • DOI : 10.1038/s41586-020-2308-7
  • ABSTRACT : Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.
  • CITATION : Karczewski KJ, Francioli LC, Tiao G, Cummings BB, ...&, MacArthur DG. (2020) The mutational constraint spectrum quantified from variation in 141,456 humans Nature, 581 (7809) 434-443. doi:10.1038/s41586-020-2308-7. PMID 32461654
  • JOURNAL_INFO : Nature ; Nature ; 2020 ; 581 ; 7809 ; 434-443
  • PUBMED_LINK : 32461654