Genomic Language Model
Catalog entries using this tag (links open the entry card on its page):
Entries
DNA Foundation Benchmark (DNA FM Benchmark)
PUBMED_LINK
FULL NAME
Benchmarking DNA Foundation Models for Genomic and Genetic Tasks
DESCRIPTION
First comprehensive, unbiased benchmark of five DNA foundation models (DNABERT-2, Nucleotide Transformer V2, HyenaDNA, Caduceus-Ph, GROVER) across 57 datasets spanning sequence classification, gene expression prediction, variant effect quantification, and TAD recognition using zero-shot embeddings. Key finding: mean token embedding pooling consistently outperforms other strategies. Model choice should align with task — Caduceus-Ph excels at TFBS, NT-v2 at pathogenic variants, HyenaDNA scales to long sequences. Specialized models (Enformer, Sei) still outperform general DNA models on QTL prediction.
URL
TITLE
Benchmarking DNA foundation models for genomic and genetic tasks.
Main citation
Feng H, Wu L, Zhao B, Huff C, Zhang J, Wu J, Lin L, Wei P, Wu C. (2025) Benchmarking DNA foundation models for genomic and genetic tasks. Nature Communications, 16:10780. doi:10.1038/s41467-025-65823-8. PMID 41315262
ABSTRACT
The rapid evolution of DNA foundation models promises to revolutionize genomics, yet comprehensive evaluations are lacking. Here, we present a comprehensive, unbiased benchmark of five models (DNABERT-2, Nucleotide Transformer V2, HyenaDNA, Caduceus-Ph, and GROVER) across diverse genomic and genetic tasks including sequence classification, gene expression prediction, variant effect quantification, and TAD region recognition, using zero-shot embeddings. Our analysis reveals that mean token embedding consistently and significantly improves sequence classification performance. Model performance varies among tasks and datasets; while general purpose DNA foundation models showed competitive performance in pathogenic variant identification, they were less effective in predicting gene expression and identifying putative causal QTLs compared to specialized models.
DOI
10.1038/s41467-025-65823-8