WGS

Catalog entries using this tag (links open the entry card on its page):

Entries

High-coverage on GRCh38 (NYGC)

STAGE_PERIOD

2019–

DESCRIPTION

High-coverage whole-genome sequencing of a subset of Phase 3 samples on GRCh38; improves rare-variant discovery, phasing, and structural-variant catalogs while staying aligned with the 1000 Genomes sample framework.

Show full descriptionShow less

URL

https://www.internationalgenome.org/

Whole-genome sequencing

UK Biobank WGS

STAGE_PERIOD

phased release

DESCRIPTION

WGS on a large fraction of participants supports CNV analysis, short-variant refinement, and single-sample graph workflows; companion GWAS products (e.g. Pan-UKB) extend array-based analyses across ancestries.

Show full descriptionShow less

URL

https://www.ukbiobank.ac.uk/

b37

Reference 1000 Genomes WGS

FULL NAME

Broad Institute Homo_sapiens_assembly19 (b37)

DESCRIPTION

GRCh37-compatible reference FASTA used across Broad Institute and 1000 Genomes workflows: chromosomes 1-22, X, Y, MT, plus GL/NC unlocalized and unplaced contigs (as in the distributed assembly19 package). Coordinate system matches the 1KG/b37 ecosystem used by many GWAS imputation and joint-calling pipelines.

Show full descriptionShow less

URL

https://data.broadinstitute.org/snowman/hg19/

KEYWORDS

GRCh37; 1000 Genomes; Broad; b37; reference FASTA

Show full keywordsShow less

Main citation

Broad Institute / 1000 Genomes Project. Homo_sapiens_assembly19.fasta (b37). https://data.broadinstitute.org/snowman/hg19/

b38

Reference WGS

FULL NAME

Broad Institute Homo_sapiens_assembly38 (b38)

DESCRIPTION

GRCh38-based reference FASTA distributed with GATK and Broad pipelines (Homo_sapiens_assembly38), including primary chromosomes and standard alternate contigs (hs38d5 decoy is distributed separately). Default reference for many germline short-variant and joint-genotyping workflows on cloud and HPC.

Show full descriptionShow less

URL

https://storage.googleapis.com/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta

KEYWORDS

GRCh38; GATK; Broad; b38; reference FASTA

Show full keywordsShow less

Main citation

Broad Institute. Homo_sapiens_assembly38.fasta (GATK GRCh38 reference bundle). https://storage.googleapis.com/genomics-public-data/references/hg38/v0/

CHM13

Reference Structural variants WGS

PUBMED_LINK

35357919

FULL NAME

T2T-CHM13 v1.1 complete hydatidiform mole assembly

DESCRIPTION

Telomere-to-telomere (T2T) assembly of the CHM13 hydatidiform mole cell line, providing the first gap-resolved maps of centromeres and the full Y (from a composite). Use as a complement to GRCh38 for studying repetitive and structurally variable loci; chromosome naming and coordinates differ from GRC primary assemblies; use liftover and T2T-specific tooling where appropriate.

Show full descriptionShow less

URL

https://github.com/marbl/CHM13

KEYWORDS

T2T; telomere-to-telomere; complete genome; CHM13; GRCh38 alternative

Show full keywordsShow less

TITLE

The complete sequence of a human genome.

Main citation

Nurk S, Koren S, Rhie A, Rautiainen M, ...&, Phillippy AM. (2022) The complete sequence of a human genome. Science, 376 (6588) 44-53. doi:10.1126/science.abj6987. PMID 35357919

ABSTRACT

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.

Show full abstractShow less

DOI

10.1126/science.abj6987

CPC

Pangenome Reference WGS

PUBMED_LINK

37316654

FULL NAME

Chinese Pangenome Consortium (phase I core)

DESCRIPTION

Phase I data from the Chinese Pangenome Consortium: 116 high-quality haplotype-phased de novo assemblies from 58 core samples across 36 minority Chinese ethnic groups (high-fidelity long-read coverage). Adds substantial novel sequence and variant discovery relative to GRCh38 and supports population-specific reference panels for Asian-ancestry genomics.

Show full descriptionShow less

URL

https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA011422

KEYWORDS

pangenome; Chinese populations; long-read; haplotype; GRCh38

Show full keywordsShow less

TITLE

A pangenome reference of 36 Chinese populations.

Main citation

Gao Y, Yang X, Chen H, Tan X, ...&, Xu S. (2023) A pangenome reference of 36 Chinese populations. Nature, 619 (7968) 112-121. doi:10.1038/s41586-023-06173-7. PMID 37316654

ABSTRACT

Human genomics is witnessing an ongoing paradigm shift from a single reference sequence to a pangenome form, but populations of Asian ancestry are underrepresented. Here we present data from the first phase of the Chinese Pangenome Consortium, including a collection of 116 high-quality and haplotype-phased de novo assemblies based on 58 core samples representing 36 minority Chinese ethnic groups. With an average 30.65× high-fidelity long-read sequence coverage, an average contiguity N50 of more than 35.63 megabases and an average total size of 3.01 gigabases, the CPC core assemblies add 189 million base pairs of euchromatic polymorphic sequences and 1,367 protein-coding gene duplications to GRCh38. We identified 15.9 million small variants and 78,072 structural variants, of which 5.9 million small variants and 34,223 structural variants were not reported in a recently released pangenome reference1. The Chinese Pangenome Consortium data demonstrate a remarkable increase in the discovery of novel and missing sequences when individuals are included from underrepresented minority ethnic groups. The missing reference sequences were enriched with archaic-derived alleles and genes that confer essential functions related to keratinization, response to ultraviolet radiation, DNA repair, immunological responses and lifespan, implying great potential for shedding new light on human evolution and recovering missing heritability in complex disease mapping.

Show full abstractShow less

DOI

10.1038/s41586-023-06173-7

GRCh37.p13

Reference WGS

FULL NAME

Genome Reference Consortium Human Build 37 patch release 13

DESCRIPTION

NCBI/GRC human assembly build 37, patch 13 (GCF_000001405.25): the authoritative GRCh37 patch-level reference used for stable accessioning and alignment. Distinct from UCSC hg19/Broad b37 contig naming; always verify chromosome naming and inclusion of ALT/patch contigs when mixing resources.

Show full descriptionShow less

URL

https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.25/

KEYWORDS

GRCh37; GRC; NCBI; reference assembly; patch 13

Show full keywordsShow less

Main citation

Genome Reference Consortium. Human genome assembly GRCh37.p13 (GCF_000001405.25). National Center for Biotechnology Information.

GRCh38.p14

Reference WGS

FULL NAME

Genome Reference Consortium Human Build 38 patch release 14

DESCRIPTION

NCBI/GRC human assembly build 38, patch 14 (GCF_000001405.40): current GRC primary human reference on the GRCh38 line, including cumulative sequence fixes and scaffold updates through p14. Use this accession when you need the exact GRC patch level that matches NCBI/RefSeq alignment products.

Show full descriptionShow less

URL

https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.40

KEYWORDS

GRCh38; GRC; NCBI; reference assembly; patch 14

Show full keywordsShow less

Main citation

Genome Reference Consortium. Human genome assembly GRCh38.p14 (GCF_000001405.40). National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.40

GRCh39 (indefinitely postponed) (GRCh39)

Reference WGS

FULL NAME

Genome Reference Consortium Human Build 39 (not pursued)

DESCRIPTION

The Genome Reference Consortium announced that work toward a distinct GRCh39 assembly line was indefinitely postponed; human reference updates continue on the GRCh38 series (patches) and complementary resources such as T2T-CHM13 and pangenome references. Check the GRC human page for current guidance and patch releases.

Show full descriptionShow less

URL

https://www.ncbi.nlm.nih.gov/grc/human

KEYWORDS

GRC; GRCh39; reference assembly; postponed

Show full keywordsShow less

Main citation

Genome Reference Consortium. Human genome reference updates (GRCh39 indefinitely postponed; continued GRCh38 patches). https://www.ncbi.nlm.nih.gov/grc/human

hg19

Reference WGS

FULL NAME

UCSC hg19 (GRCh37) reference bundle

DESCRIPTION

UCSC Genome Browser distribution of the GRCh37-era human reference (hg19): chromosomes chr1-22, chrX, chrY, chrM, plus unlocalized and unplaced contigs, alternate loci (e.g. chr6_apd_hap1), and related patches as packaged for the browser. Widely used in legacy pipelines and liftOver chains to/from hg38.

Show full descriptionShow less

URL

https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/

KEYWORDS

GRCh37; UCSC; reference genome; FASTA; legacy assembly

Show full keywordsShow less

Main citation

UCSC Genome Browser. Human reference assembly hg19 (GRCh37-aligned). https://hgdownload.soe.ucsc.edu/goldenPath/hg19/

hg38

Reference WGS

FULL NAME

UCSC hg38 (GRCh38) reference bundle

DESCRIPTION

UCSC Genome Browser distribution of the human reference aligned to GRCh38 (primary assembly plus standard patches and decoys as packaged in the browser bigZips downloads). Chromosome names use the chr1-chrM convention; coordinates match the corresponding GRC assembly for the same patch level when sequences are identical.

Show full descriptionShow less

URL

https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/

KEYWORDS

GRCh38; UCSC; reference genome; FASTA; primary assembly

Show full keywordsShow less

Main citation

UCSC Genome Browser. Human reference assembly hg38 (GRCh38-aligned). https://hgdownload.soe.ucsc.edu/goldenPath/hg38/

HPRC first draft pangenome (HPRC draft)

Pangenome Reference Structural variants WGS

PUBMED_LINK

37165242

FULL NAME

Human Pangenome Reference Consortium first-draft pangenome

DESCRIPTION

First-draft human pangenome from the HPRC: 47 phased diploid assemblies from diverse samples, aligned and summarized relative to GRCh38. Adds substantial euchromatic polymorphic sequence and duplicated gene content versus a single linear reference; intended for pangenome-aware alignment, variant calling, and downstream graph-based genomics (see HPRC data portal and companion software).

Show full descriptionShow less

URL

https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html

KEYWORDS

HPRC; pangenome; graph genome; haplotypes; GRCh38

Show full keywordsShow less

TITLE

A draft human pangenome reference.

Main citation

Liao WW, Asri M, Ebler J, Doerr D, ...&, Paten B. (2023) A draft human pangenome reference. Nature, 617 (7960) 312-324. doi:10.1038/s41586-023-05896-x. PMID 37165242

ABSTRACT

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.

Show full abstractShow less

DOI

10.1038/s41586-023-05896-x

hs37d5

Reference 1000 Genomes WGS

FULL NAME

1000 Genomes GRCh37 + decoy (hs37d5)

DESCRIPTION

GRCh37 (b37-style) primary chromosomes and contigs plus the hs37d5 decoy sequence set (HuRef/BAC/Fosmid/NA12878-derived sequences) to reduce spurious alignments in short-read mapping. Standard reference for Phase 3-era 1000 Genomes alignment and many imputation and low-pass WGS workflows that target the 1KG coordinate system.

Show full descriptionShow less

URL

https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/

KEYWORDS

GRCh37; decoy; 1000 Genomes; alignment; hs37d5

Show full keywordsShow less

Main citation

1000 Genomes Project / Broad Institute. hs37d5 reference (GRCh37 plus decoy sequences). https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/

humanG1Kv37

Reference 1000 Genomes WGS

FULL NAME

1000 Genomes human_g1k_v37 reference

DESCRIPTION

GRCh37-based reference FASTA distributed by the 1000 Genomes Project (human_g1k_v37): chromosomes 1-22, X, Y, MT, plus GL unlocalized/unplaced contigs, without separate haplotype scaffolds or EBV. Commonly used as the Phase 1/III alignment reference when harmonizing with public 1KG VCFs and phase panels.

Show full descriptionShow less

URL

https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/

KEYWORDS

GRCh37; 1000 Genomes; reference FASTA; human_g1k_v37

Show full keywordsShow less

Main citation

1000 Genomes Project. human_g1k_v37 reference (GRCh37). https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/