Skip to content

EHR

Catalog entries using this tag (links open the entry card on its page):

Entries

Deep EHR

AI Clinical EHR Deep Learning Google Research Digital Medicine
PUBMED_LINK
31304366
FULL NAME
Scalable and Accurate Deep Learning with Electronic Health Records
DESCRIPTION
Pioneering deep learning framework for EHR data by Google Research, using Fast Healthcare Interoperability Resources (FHIR) format to represent patients' raw EHR records. Trained on 216,221 patients across 2 US medical centers with 46.8 billion data points including clinical notes. Achieved AUROC 0.93-0.94 for in-hospital mortality, 0.75-0.76 for 30-day readmission, and 0.90 for final discharge diagnoses, outperforming traditional clinical predictive models. 2,800+ citations, widely considered the landmark paper for deep learning on EHR data.
TITLE
Scalable and accurate deep learning with electronic health records.
Main citation
Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, Liu X, Marcus J, Sun M, Sundberg P, Yee H, Zhang K, Zhang Y, Flores G, Duggan GE, Irvine J, Le Q, Litsch K, Mossin A, Tansuwan J, Wang D, Wexler J, Wilson J, Ludwig D, Volchenboum SL, Chou K, Pearson M, Madabushi S, Shah NH, Butte AJ, Howell MD, Cui C, Corrado GS, Dean J. (2018) Scalable and accurate deep learning with electronic health records. npj Digital Medicine, 1:18. doi:10.1038/s41746-018-0029-1. PMID 31304366
ABSTRACT
Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. We propose a representation of patients' entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. Deep learning models achieved high accuracy for predicting in-hospital mortality (AUROC 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient's final discharge diagnoses (AUROC 0.90).
DOI
10.1038/s41746-018-0029-1

MIMIC-IV

AI Datasets Clinical EHR ICU Critical Care PhysioNet MIMIC de-identified EHR
PUBMED_LINK
36596836
FULL NAME
Medical Information Mart for Intensive Care IV
DESCRIPTION
MIMIC-IV is a large, freely-available de-identified clinical database comprising over 300,000 patients admitted to the Beth Israel Deaconess Medical Center (2008-2019). It includes comprehensive ICU and Emergency Department data: demographics, vital signs, laboratory measurements, medications, procedures, diagnoses (ICD codes), imaging reports, nursing notes, and mortality outcomes. The relational database (BigQuery or local PostgreSQL) links hospital admissions (ADMISSIONS), patient stays (ICUSTAYS), charted observations (CHARTEVENTS), lab events (LABEVENTS), microbiology data (MICROBIOLOGYEVENTS), prescriptions (PRESCRIPTIONS), and discharge summaries. MIMIC-IV replaces MIMIC-III (2001-2012) with a modernized schema, cleaner data model, and expanded coverage. Widely used for developing and benchmarking clinical AI models (mortality prediction, sepsis detection, phenotyping, NLP), it requires credentialed access via PhysioNet (CITI Data or Specimens Only course). Supporting datasets include MIMIC-CXR (chest X-ray images) and MIMIC-NOTE (de-identified clinical notes).
URL
https://physionet.org/content/mimiciv/
KEYWORDS
EHR, ICU, clinical database, de-identified, Beth Israel, critical care, MIMIC, medical informatics
TITLE
MIMIC-IV, a freely accessible electronic health record dataset.
Main citation
Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, Pollard TJ, Hao S, Moody B, Gow B, Lehman LH, Celi LA, Mark RG. (2023) MIMIC-IV, a freely accessible electronic health record dataset. Scientific Data, 10:1. doi:10.1038/s41597-022-01899-x. PMID 36596836
ABSTRACT
MIMIC-IV is a publicly available database of de-identified electronic health records for patients admitted to the Beth Israel Deaconess Medical Center (BIDMC) in Boston, Massachusetts. The database is updated annually and is freely available to credentialed researchers. MIMIC-IV contains information on patient demographic characteristics, vital signs, laboratory measurements, medications, and diagnoses. We describe the process of creating the database, the structure of the data, and the tools available to users. MIMIC-IV is a valuable resource for researchers in critical care, clinical informatics, and machine learning.
DOI
10.1038/s41597-022-01899-x

MIRA

AI Agent Clinical AI EHR Autonomous Agent MIRA Nature
PUBMED_LINK
42310457
FULL NAME
MIRA: Medical Intelligence for Reasoning and Action — an autonomous AI agent operating in a sandboxed EHR environment
DESCRIPTION
MIRA is an autonomous AI agent powered by GPT-4o (T=0.01) with o1-preview for structured reasoning, operating within a sandboxed HL7 FHIR-based EHR environment. It navigates 85,000+ clinical decision options across 8 emergency department diagnoses, using 11 FHIR-compliant tools (PatientHistory, PhysicalExam, Lab/Urine/Microbiology/Radiology requests, Medication/Procedure ordering, Plan, Admission). Evaluated on 574 real MIMIC-IV patient cases, MIRA outperformed two independent physician cohorts in diagnostic accuracy, guideline-concordant treatment, medication safety, and appropriate admission decisions. All tool parameter validity is enforced through token masking, making hallucination of non-existent options programmatically impossible.
URL
https://www.nature.com/articles/s41586-026-10675-5
TITLE
Towards autonomous medical artificial intelligence agents.
Main citation
Ferber D, Hilgers L, Höper C, Kinny-Köster B, Eckardt JN, Egger-Heidrich K, Bill M, Schneider MMK, Clusmann J, Kadric L, Oehme M, Mayrhofer-Schmid M, Oeser A, Wölflein G, Wiest IC, Middeke JM, Iafrate AJ, Truhn D, Jäger D, Kather JN. (2026) Towards autonomous medical artificial intelligence agents. Nature. doi:10.1038/s41586-026-10675-5. PMID 42310457
ABSTRACT
Large language models (LLMs) show great potential for clinical decision-making, yet most applications remain narrow, task-specific chat tools rather than systems integrated into clinical workflows. However, building physician copilots will require models that operate within the electronic health record (EHR), with governed access to patient data and the ability to initiate permitted EHR actions within defined safety constraints. Here we show that MIRA (Medical Intelligence for Reasoning and Action), an autonomous artificial intelligence agent operating in a sandboxed EHR environment, can navigate a large clinical action space to obtain patient histories; order and interpret laboratory, imaging and microbiology tests; generate differential diagnoses; and formulate treatment plans such as prescribing medications, scheduling surgical procedures and planning admissions. In simulations on real patient cases spanning multiple diagnoses, MIRA outperformed physicians in diagnostic accuracy and made guideline-concordant, medication-safe and appropriate admission decisions.
DOI
10.1038/s41586-026-10675-5

MixEHR-SAGE

AI GWAS Topic Modeling PheWAS EHR Phenotyping UK Biobank Brief Bioinform
PUBMED_LINK
41627341
FULL NAME
MixEHR-SAGE - Multi-modal Topic Modeling for PheWAS and GWAS
DESCRIPTION
MixEHR-SAGE is a PheCode-guided multi-modal topic model that integrates diagnoses, procedures, and medications from EHR to enhance phenotyping for GWAS. By combining expert-informed priors with probabilistic inference, it identifies over 1000 interpretable phenotype topics from UK Biobank data and improves disease incidence prediction and GWAS discovery. Published in Briefings in Bioinformatics.
TITLE
PheCode-guided multi-modal topic modeling of electronic health records improves disease incidence prediction and GWAS discovery from UK Biobank.
ABSTRACT
Phenome-wide association studies rely on disease definitions derived from diagnostic codes, often failing to leverage the full richness of electronic health records (EHR). We present MixEHR-SAGE, a PheCode-guided multi-modal topic model that integrates diagnoses, procedures, and medications to enhance phenotyping from large-scale EHRs. Applied to 350,000 individuals with high-quality genetic data, MixEHR-SAGE-derived risk scores accurately predicted disease incidence and improved GWAS discovery.
DOI
10.1093/bib/bbag030

All of Us — EHR data model harmonization (i2b2 → OMOP)

EHR Data Model OMOP
PUBMED_LINK
30779778
STAGE_PERIOD
2019
DESCRIPTION
Transformation of i2b2-sourced electronic health record data into the OMOP Common Data Model for the All of Us Research Program, enabling standardized cross-institutional analysis of clinical data.
URL
https://allofus.nih.gov/
TITLE
Data model harmonization for the All Of Us Research Program: Transforming i2b2 data into the OMOP common data model

MVP — Large-scale disease-specific GWAS & WGS

WGS PheWAS Multi-biobank EHR
STAGE_PERIOD
2023–2025
DESCRIPTION
Expanded disease-specific GWAS across hundreds of traits leveraging the deep EHR phenotyping in the VA system. Whole-genome sequencing of a subset of participants for comprehensive variant discovery. MVP data contributed to multi-biobank meta-analyses with FinnGen and UK Biobank spanning thousands of phenotypes.
URL
https://www.mvp.va.gov/