AI Regulatory model
Curation of Regulatory model — listings under the AI tab.
Gene Regulation Prediction Models
Deep learning models predicting molecular phenotypes from DNA sequence:
- Early work (2015-2018): Chromatin effect prediction from short DNA windows (DeepSEA, Zhou & Troyanskaya PMID 26301841, Nat Methods 2015) and tissue-specific expression prediction (ExPecto, Zhou et al. PMID 30013180, Nat Genet 2018).
- Long-range integration (2021-2022): Models incorporating up to 200kb context via transformer/enhancer architectures (Enformer, Avsec et al. PMID 34316034, Nat Methods 2021; Sei, Chen et al. PMID 35404663, Nat Genet 2022).
- Cell-type resolution (2025): Tissue- and cell-type-specific expression prediction with improved positional encoding (Borzoi, Linder et al. PMID 39704929, Nat Genet 2025; Flashzoi, Linder et al. Nat Genet 2025).
Trend: 1kb → 200kb context, tissue-average → cell-type-specific, static → condition-aware.
Summary Table
Click a column header to sort the table.
| NAME | Main citation | YEAR |
|---|---|---|
| Borzoi | Linder J et al., Nat Genet, 2025 |
2025 |
| DeepSEA | Zhou J et al., Nat Methods, 2015 |
2015 |
| Enformer | Avsec Ž et al., Nat Methods, 2021 |
2021 |
| ExPecto | Zhou J et al., Nat Genet, 2018 |
2018 |
| Flashzoi | Hingerl JC et al., Bioinformatics, 2025 |
2025 |
| Sei | Chen KM et al., Nat Genet, 2022 |
2022 |
Borzoi
PUBMED_LINK
DESCRIPTION
Borzoi is a deep learning model from Calico that predicts cell-type-specific and tissue-specific RNA-seq coverage from DNA sequence. It scores variant effects across transcription, splicing, and polyadenylation, and extracts cis-regulatory motifs driving RNA expression. Published in Nature Genetics.
URL
KEYWORDS
RNA-seq, gene regulation, splicing, polyadenylation, variant effect, CNN, deep learning, eQTL
TITLE
Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation.
Main citation
Linder J, Srivastava D, Yuan H, Agarwal V, Kelley DR. (2025) Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation. Nat Genet, 57 (4) 949-961. doi:10.1038/s41588-024-02053-6. PMID 39779956
ABSTRACT
Sequence-based machine-learning models trained on genomics data improve genetic variant interpretation by providing functional predictions describing their impact on the cis-regulatory code. However, current tools do not predict RNA-seq expression profiles because of modeling challenges. Here, we introduce Borzoi, a model that learns to predict cell-type-specific and tissue-specific RNA-seq coverage from DNA sequence. Using statistics derived from Borzoi's predicted coverage, we isolate and accurately score DNA variant effects across multiple layers of regulation, including transcription, splicing and polyadenylation. Evaluated on quantitative trait loci, Borzoi is competitive with and often outperforms state-of-the-art models trained on individual regulatory functions. By applying attribution methods to the derived statistics, we extract cis-regulatory motifs driving RNA expression and post-transcriptional regulation in normal tissues. The wide availability of RNA-seq data across species, conditions and assays profiling specific aspects of regulation emphasizes the potential of this approach to decipher the mapping from DNA sequence to regulatory function.
DOI
10.1038/s41588-024-02053-6
DeepSEA
PUBMED_LINK
DESCRIPTION
DeepSEA is a foundational deep learning model that predicts the chromatin effects of noncoding variants directly from DNA sequence, including DNase I sensitivity, histone mark profiles, and transcription factor binding. Published in Nature Methods, it was one of the first deep learning approaches for noncoding variant interpretation.
URL
KEYWORDS
noncoding variant, chromatin, epigenetics, deep learning, DNase I, histone marks, TF binding, regulatory effect
TITLE
Predicting effects of noncoding variants with deep learning-based sequence model.
Main citation
Zhou J, Troyanskaya OG. (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods, 12 (10) 931-934. doi:10.1038/nmeth.3547. PMID 26301843
ABSTRACT
Noncoding variants are of tremendous biological importance, and their functional interpretation is a critical challenge in genomics. Here we introduce DeepSEA, a deep learning-based sequence model that directly predicts the chromatin effects of sequence alterations at single-nucleotide sensitivity. DeepSEA captures regulatory sequence context and learns a wide range of regulatory features including DNase I sensitivity, histone mark profiles, and transcription factor binding. The model achieves state-of-the-art accuracy for predicting the functional consequences of noncoding variants and can be applied to prioritize disease-associated variants from large-scale sequencing studies.
DOI
10.1038/nmeth.3547
Enformer
PUBMED_LINK
DESCRIPTION
Enformer is a DeepMind model that integrates long-range DNA interactions (up to 200 kb) using a CNN + Transformer architecture to predict gene expression, chromatin profiles, and TF binding from DNA sequence. Achieves SOTA on variant effect prediction and regulatory element annotation.
URL
KEYWORDS
gene expression prediction, long-range interactions, CNN, Transformer, DeepMind, chromatin, variant effect, regulatory genomics
TITLE
Effective gene expression prediction from sequence by integrating long-range interactions.
Main citation
Avsec Ž, Agarwal V, Visentin D, ...&, Kelley DR. (2021) Effective gene expression prediction from sequence by integrating long-range interactions. Nat Methods, 18 (10) 1196-1203. doi:10.1038/s41592-021-01252-x. PMID 34608324
ABSTRACT
Quantitative gene expression measurements across cell types and tissues can provide a complete picture of the dynamic functions of the genome. However, gene expression is challenging to predict from sequence alone because of the enormous distances over which regulatory elements act. Enformer combines CNN and Transformer architectures to integrate information from up to 200 kb of DNA sequence, enabling accurate prediction of gene expression, chromatin state, and transcription factor binding. Enformer achieves state-of-the-art predictions across diverse genomic assays and accurately predicts the effects of genetic variants on gene expression.
DOI
10.1038/s41592-021-01252-x
ExPecto
PUBMED_LINK
DESCRIPTION
ExPecto (Basenji2) is a deep learning framework that predicts the causal effects of both coding and noncoding genetic variants on gene expression levels and disease risk directly from DNA sequence, without requiring any prior knowledge of regulatory elements or annotations.
KEYWORDS
variant effect, gene expression, disease risk, ab initio, deep learning, noncoding, tissue-specific
TITLE
Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk.
Main citation
Zhou J, Theesfeld CL, Yao K, Chen KM, Wong AK, Troyanskaya OG. (2018) Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat Genet, 50 (8) 1171-1179. doi:10.1038/s41588-018-0160-6. PMID 30013180
ABSTRACT
The complexity and scale of human genetics studies present a significant challenge for interpreting the functional consequences of genetic variants. Here we introduce ExPecto, a deep learning framework that can predict the causal effects of genetic variants on gene expression levels and disease risk directly from DNA sequence. ExPecto uses a modular architecture with a core deep convolutional neural network to learn regulatory features from sequence, followed by spatial transformation and tissue-specific prediction layers. We demonstrate that ExPecto can accurately predict variant effects on expression across a wide range of tissues and can identify pathogenic variants from population-scale sequencing data. The framework enables ab initio prediction of expression and disease risk without relying on prior annotations of regulatory elements.
DOI
10.1038/s41588-018-0160-6
Flashzoi
PUBMED_LINK
DESCRIPTION
Flashzoi is an enhanced Borzoi model that replaces relative positional encodings with rotary positional encodings (RoPE) and uses FlashAttention-2, achieving over 3x faster training/inference and up to 2.4x reduced memory usage while maintaining or improving accuracy on RNA-seq coverage prediction, variant effects, and enhancer-promoter linking.
URL
KEYWORDS
Borzoi, FlashAttention, RoPE, accelerated inference, regulatory genomics, variant effect, enhancer-promoter
TITLE
Flashzoi: an enhanced Borzoi for accelerated genomic analysis.
Main citation
Hingerl JC, Karollus A, Gagneur J. (2025) Flashzoi: an enhanced Borzoi for accelerated genomic analysis. Bioinformatics, 41 (9) btaf467. doi:10.1093/bioinformatics/btaf467. PMID 40905959
ABSTRACT
Accurately predicting how DNA sequence drives gene regulation and how genetic variants alter gene expression is a central challenge in genomics. Borzoi, which models over ten thousand genomic assays including RNA-seq coverage from over half a megabase of sequence context alone promises to become an important foundation model in regulatory genomics, both for massively annotating variants and for further model development. However, the currently used relative positional encodings limit Borzoi's computational efficiency. We present Flashzoi, an enhanced Borzoi model that leverages rotary positional encodings and FlashAttention-2. This achieves over 3-fold faster training and inference and up to 2.4-fold reduced memory usage, while maintaining or improving accuracy in modeling various genomic assays including RNA-seq coverage, predicting variant effects, and enhancer-promoter linking. Flashzoi's improved efficiency facilitates large-scale genomic analyses and opens avenues for exploring more complex regulatory mechanisms and modeling.
DOI
10.1093/bioinformatics/btaf467
Sei
PUBMED_LINK
DESCRIPTION
Sei is a deep learning model that generates a comprehensive map of regulatory activity from DNA sequence, predicting 21,907 chromatin features across cell types and contexts. Enables interpretation of noncoding variants in terms of specific regulatory functions and tissues, providing a global atlas of human cis-regulation.
KEYWORDS
regulatory activity map, chromatin, transcriptional regulation, deep learning, sequence-to-activity, human genetics, 21,907 features
TITLE
A sequence-based global map of regulatory activity for deciphering human genetics.
Main citation
Chen KM, Wong AK, Troyanskaya OG, Zhou J. (2022) A sequence-based global map of regulatory activity for deciphering human genetics. Nat Genet, 54 (7) 940-949. doi:10.1038/s41588-022-01102-2. PMID 35817977
ABSTRACT
Deciphering the impact of noncoding variants on gene regulation is a major challenge in human genetics. While deep learning models can accurately predict regulatory features from DNA sequence, interpreting these predictions to understand variant effects across diverse contexts remains difficult. Here we present Sei, a deep learning model that produces a comprehensive, sequence-based global map of regulatory activity. Sei predicts 21,907 chromatin profiles encompassing a wide range of cell types and regulatory features, and organizes them into 40 tissue-agnostic regulatory activities using a hierarchical model. We demonstrate that Sei accurately predicts regulatory effects and can identify disease-relevant variants across diverse conditions, enabling a more complete understanding of human genetic variation.
DOI
10.1038/s41588-022-01102-2