Skip to content

WGS

Catalog entries using this tag (links open the entry card on its page):

Entries

Whole-genome sequencing

UK Biobank WGS
STAGE_PERIOD
phased release
DESCRIPTION
WGS on a large fraction of participants supports CNV analysis, short-variant refinement, and single-sample graph workflows; companion GWAS products (e.g. Pan-UKB) extend array-based analyses across ancestries.
URL
https://www.ukbiobank.ac.uk/

b37

Reference 1000 Genomes WGS
FULL NAME
Broad Institute Homo_sapiens_assembly19 (b37)
DESCRIPTION
GRCh37-compatible reference FASTA used across Broad Institute and 1000 Genomes workflows: chromosomes 1-22, X, Y, MT, plus GL/NC unlocalized and unplaced contigs (as in the distributed assembly19 package). Coordinate system matches the 1KG/b37 ecosystem used by many GWAS imputation and joint-calling pipelines.
URL
https://data.broadinstitute.org/snowman/hg19/
KEYWORDS
GRCh37; 1000 Genomes; Broad; b37; reference FASTA
Main citation
Broad Institute / 1000 Genomes Project. Homo_sapiens_assembly19.fasta (b37). https://data.broadinstitute.org/snowman/hg19/

b38

Reference WGS
FULL NAME
Broad Institute Homo_sapiens_assembly38 (b38)
DESCRIPTION
GRCh38-based reference FASTA distributed with GATK and Broad pipelines (Homo_sapiens_assembly38), including primary chromosomes and standard alternate contigs (hs38d5 decoy is distributed separately). Default reference for many germline short-variant and joint-genotyping workflows on cloud and HPC.
URL
https://storage.googleapis.com/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta
KEYWORDS
GRCh38; GATK; Broad; b38; reference FASTA
Main citation
Broad Institute. Homo_sapiens_assembly38.fasta (GATK GRCh38 reference bundle). https://storage.googleapis.com/genomics-public-data/references/hg38/v0/

CHM13

Reference Structural variants WGS
PUBMED_LINK
35357919
FULL NAME
T2T-CHM13 v1.1 complete hydatidiform mole assembly
DESCRIPTION
Telomere-to-telomere (T2T) assembly of the CHM13 hydatidiform mole cell line, providing the first gap-resolved maps of centromeres and the full Y (from a composite). Use as a complement to GRCh38 for studying repetitive and structurally variable loci; chromosome naming and coordinates differ from GRC primary assemblies; use liftover and T2T-specific tooling where appropriate.
URL
https://github.com/marbl/CHM13
KEYWORDS
T2T; telomere-to-telomere; complete genome; CHM13; GRCh38 alternative
TITLE
The complete sequence of a human genome.
Main citation
Nurk S, Koren S, Rhie A, Rautiainen M, ...&, Phillippy AM. (2022) The complete sequence of a human genome. Science, 376 (6588) 44-53. doi:10.1126/science.abj6987. PMID 35357919
ABSTRACT
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
DOI
10.1126/science.abj6987

CPC

Pangenome Reference WGS
PUBMED_LINK
37316654
FULL NAME
Chinese Pangenome Consortium (phase I core)
DESCRIPTION
Phase I data from the Chinese Pangenome Consortium: 116 high-quality haplotype-phased de novo assemblies from 58 core samples across 36 minority Chinese ethnic groups (high-fidelity long-read coverage). Adds substantial novel sequence and variant discovery relative to GRCh38 and supports population-specific reference panels for Asian-ancestry genomics.
URL
https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA011422
KEYWORDS
pangenome; Chinese populations; long-read; haplotype; GRCh38
TITLE
A pangenome reference of 36 Chinese populations.
Main citation
Gao Y, Yang X, Chen H, Tan X, ...&, Xu S. (2023) A pangenome reference of 36 Chinese populations. Nature, 619 (7968) 112-121. doi:10.1038/s41586-023-06173-7. PMID 37316654
ABSTRACT
Human genomics is witnessing an ongoing paradigm shift from a single reference sequence to a pangenome form, but populations of Asian ancestry are underrepresented. Here we present data from the first phase of the Chinese Pangenome Consortium, including a collection of 116 high-quality and haplotype-phased de novo assemblies based on 58 core samples representing 36 minority Chinese ethnic groups. With an average 30.65× high-fidelity long-read sequence coverage, an average contiguity N50 of more than 35.63 megabases and an average total size of 3.01 gigabases, the CPC core assemblies add 189 million base pairs of euchromatic polymorphic sequences and 1,367 protein-coding gene duplications to GRCh38. We identified 15.9 million small variants and 78,072 structural variants, of which 5.9 million small variants and 34,223 structural variants were not reported in a recently released pangenome reference1. The Chinese Pangenome Consortium data demonstrate a remarkable increase in the discovery of novel and missing sequences when individuals are included from underrepresented minority ethnic groups. The missing reference sequences were enriched with archaic-derived alleles and genes that confer essential functions related to keratinization, response to ultraviolet radiation, DNA repair, immunological responses and lifespan, implying great potential for shedding new light on human evolution and recovering missing heritability in complex disease mapping.
DOI
10.1038/s41586-023-06173-7

GRCh37.p13

Reference WGS
FULL NAME
Genome Reference Consortium Human Build 37 patch release 13
DESCRIPTION
NCBI/GRC human assembly build 37, patch 13 (GCF_000001405.25): the authoritative GRCh37 patch-level reference used for stable accessioning and alignment. Distinct from UCSC hg19/Broad b37 contig naming; always verify chromosome naming and inclusion of ALT/patch contigs when mixing resources.
URL
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.25/
KEYWORDS
GRCh37; GRC; NCBI; reference assembly; patch 13
Main citation
Genome Reference Consortium. Human genome assembly GRCh37.p13 (GCF_000001405.25). National Center for Biotechnology Information.

GRCh38.p14

Reference WGS
FULL NAME
Genome Reference Consortium Human Build 38 patch release 14
DESCRIPTION
NCBI/GRC human assembly build 38, patch 14 (GCF_000001405.40): current GRC primary human reference on the GRCh38 line, including cumulative sequence fixes and scaffold updates through p14. Use this accession when you need the exact GRC patch level that matches NCBI/RefSeq alignment products.
URL
https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.40
KEYWORDS
GRCh38; GRC; NCBI; reference assembly; patch 14
Main citation
Genome Reference Consortium. Human genome assembly GRCh38.p14 (GCF_000001405.40). National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.40

GRCh39 (indefinitely postponed) (GRCh39)

Reference WGS
FULL NAME
Genome Reference Consortium Human Build 39 (not pursued)
DESCRIPTION
The Genome Reference Consortium announced that work toward a distinct GRCh39 assembly line was indefinitely postponed; human reference updates continue on the GRCh38 series (patches) and complementary resources such as T2T-CHM13 and pangenome references. Check the GRC human page for current guidance and patch releases.
URL
https://www.ncbi.nlm.nih.gov/grc/human
KEYWORDS
GRC; GRCh39; reference assembly; postponed
Main citation
Genome Reference Consortium. Human genome reference updates (GRCh39 indefinitely postponed; continued GRCh38 patches). https://www.ncbi.nlm.nih.gov/grc/human

hg19

Reference WGS
FULL NAME
UCSC hg19 (GRCh37) reference bundle
DESCRIPTION
UCSC Genome Browser distribution of the GRCh37-era human reference (hg19): chromosomes chr1-22, chrX, chrY, chrM, plus unlocalized and unplaced contigs, alternate loci (e.g. chr6_apd_hap1), and related patches as packaged for the browser. Widely used in legacy pipelines and liftOver chains to/from hg38.
URL
https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/
KEYWORDS
GRCh37; UCSC; reference genome; FASTA; legacy assembly
Main citation
UCSC Genome Browser. Human reference assembly hg19 (GRCh37-aligned). https://hgdownload.soe.ucsc.edu/goldenPath/hg19/

hg38

Reference WGS
FULL NAME
UCSC hg38 (GRCh38) reference bundle
DESCRIPTION
UCSC Genome Browser distribution of the human reference aligned to GRCh38 (primary assembly plus standard patches and decoys as packaged in the browser bigZips downloads). Chromosome names use the chr1-chrM convention; coordinates match the corresponding GRC assembly for the same patch level when sequences are identical.
URL
https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/
KEYWORDS
GRCh38; UCSC; reference genome; FASTA; primary assembly
Main citation
UCSC Genome Browser. Human reference assembly hg38 (GRCh38-aligned). https://hgdownload.soe.ucsc.edu/goldenPath/hg38/

HPRC first draft pangenome (HPRC draft)

Pangenome Reference Structural variants WGS
PUBMED_LINK
37165242
FULL NAME
Human Pangenome Reference Consortium first-draft pangenome
DESCRIPTION
First-draft human pangenome from the HPRC: 47 phased diploid assemblies from diverse samples, aligned and summarized relative to GRCh38. Adds substantial euchromatic polymorphic sequence and duplicated gene content versus a single linear reference; intended for pangenome-aware alignment, variant calling, and downstream graph-based genomics (see HPRC data portal and companion software).
URL
https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html
KEYWORDS
HPRC; pangenome; graph genome; haplotypes; GRCh38
TITLE
A draft human pangenome reference.
Main citation
Liao WW, Asri M, Ebler J, Doerr D, ...&, Paten B. (2023) A draft human pangenome reference. Nature, 617 (7960) 312-324. doi:10.1038/s41586-023-05896-x. PMID 37165242
ABSTRACT
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
DOI
10.1038/s41586-023-05896-x

hs37d5

Reference 1000 Genomes WGS
FULL NAME
1000 Genomes GRCh37 + decoy (hs37d5)
DESCRIPTION
GRCh37 (b37-style) primary chromosomes and contigs plus the hs37d5 decoy sequence set (HuRef/BAC/Fosmid/NA12878-derived sequences) to reduce spurious alignments in short-read mapping. Standard reference for Phase 3-era 1000 Genomes alignment and many imputation and low-pass WGS workflows that target the 1KG coordinate system.
URL
https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/
KEYWORDS
GRCh37; decoy; 1000 Genomes; alignment; hs37d5
Main citation
1000 Genomes Project / Broad Institute. hs37d5 reference (GRCh37 plus decoy sequences). https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/

humanG1Kv37

Reference 1000 Genomes WGS
FULL NAME
1000 Genomes human_g1k_v37 reference
DESCRIPTION
GRCh37-based reference FASTA distributed by the 1000 Genomes Project (human_g1k_v37): chromosomes 1-22, X, Y, MT, plus GL unlocalized/unplaced contigs, without separate haplotype scaffolds or EBV. Commonly used as the Phase 1/III alignment reference when harmonizing with public 1KG VCFs and phase panels.
URL
https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/
KEYWORDS
GRCh37; 1000 Genomes; reference FASTA; human_g1k_v37
Main citation
1000 Genomes Project. human_g1k_v37 reference (GRCh37). https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/