1000 Genomes
Catalog entries using this tag (links open the entry card on its page):
- HapMap Phase I — Projects
- HapMap Phase II — Projects
- HapMap Phase III — Projects
- High-coverage on GRCh38 (NYGC) — Projects
- Phase 1 — Projects
- Phase 3 — Projects
- Pilot — Projects
- b37 — References
- hs37d5 — References
- humanG1Kv37 — References
Entries
HapMap Phase I
STAGE_PERIOD
2003–2005
DESCRIPTION
International HapMap Project first data release: ~1 million SNPs in CEU, YRI, and JPT+CHB; produced the first genome-wide LD and recombination maps and drove early GWAS SNP selection and imputation panels.
URL
HapMap Phase II
STAGE_PERIOD
2005–2007
DESCRIPTION
Expanded SNP density (~3.1M SNPs) and haplotype structure across the same core panels; improved tagging coverage and supported finer-scale association and phasing workflows before large-scale resequencing.
URL
HapMap Phase III
STAGE_PERIOD
2007–2009
DESCRIPTION
Extended to 11 populations and ~1.6M SNPs; broader ancestry representation and LD maps that informed the design and early phases of the 1000 Genomes Project.
URL
High-coverage on GRCh38 (NYGC)
STAGE_PERIOD
2019–
DESCRIPTION
High-coverage whole-genome sequencing of a subset of Phase 3 samples on GRCh38; improves rare-variant discovery, phasing, and structural-variant catalogs while staying aligned with the 1000 Genomes sample framework.
URL
Phase 1
STAGE_PERIOD
2010–2011
DESCRIPTION
Expanded low-coverage WGS (~1,092 individuals) with exome capture and dense SNP genotyping; primary SNP and indel reference for early imputation panels.
URL
Phase 3
STAGE_PERIOD
2012–2015
DESCRIPTION
~2,504 individuals across 26 populations; GRCh37/38 VCF releases became the standard allele-frequency, LD, and imputation backbone for GWAS and SV pipelines.
URL
Pilot
STAGE_PERIOD
2008–2010
DESCRIPTION
Proof-of-concept low-coverage whole-genome sequencing and SNP arrays across multiple populations; established protocols and data model for the main project.
URL
b37
FULL NAME
Broad Institute Homo_sapiens_assembly19 (b37)
DESCRIPTION
GRCh37-compatible reference FASTA used across Broad Institute and 1000 Genomes workflows: chromosomes 1-22, X, Y, MT, plus GL/NC unlocalized and unplaced contigs (as in the distributed assembly19 package). Coordinate system matches the 1KG/b37 ecosystem used by many GWAS imputation and joint-calling pipelines.
URL
KEYWORDS
GRCh37; 1000 Genomes; Broad; b37; reference FASTA
Main citation
Broad Institute / 1000 Genomes Project. Homo_sapiens_assembly19.fasta (b37). https://data.broadinstitute.org/snowman/hg19/
hs37d5
FULL NAME
1000 Genomes GRCh37 + decoy (hs37d5)
DESCRIPTION
GRCh37 (b37-style) primary chromosomes and contigs plus the hs37d5 decoy sequence set (HuRef/BAC/Fosmid/NA12878-derived sequences) to reduce spurious alignments in short-read mapping. Standard reference for Phase 3-era 1000 Genomes alignment and many imputation and low-pass WGS workflows that target the 1KG coordinate system.
URL
KEYWORDS
GRCh37; decoy; 1000 Genomes; alignment; hs37d5
Main citation
1000 Genomes Project / Broad Institute. hs37d5 reference (GRCh37 plus decoy sequences). https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/
humanG1Kv37
FULL NAME
1000 Genomes human_g1k_v37 reference
DESCRIPTION
GRCh37-based reference FASTA distributed by the 1000 Genomes Project (human_g1k_v37): chromosomes 1-22, X, Y, MT, plus GL unlocalized/unplaced contigs, without separate haplotype scaffolds or EBV. Commonly used as the Phase 1/III alignment reference when harmonizing with public 1KG VCFs and phase panels.
URL
KEYWORDS
GRCh37; 1000 Genomes; reference FASTA; human_g1k_v37
Main citation
1000 Genomes Project. human_g1k_v37 reference (GRCh37). https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/