Skip to content

Bioinformatics

Catalog entries using this tag (links open the entry card on its page):

Entries

BioMedAgent

AI Agent Biomedical Multi-Agent Bioinformatics
PUBMED_LINK
41912700
FULL NAME
BioMedAgent: self-evolving multi-agent LLM framework for biomedical data analysis
DESCRIPTION
BioMedAgent is a self-evolving LLM multi-agent framework that learns to use diverse bioinformatics tools and chain them into executable workflows through interactive exploration and memory retrieval algorithms. It allows biomedical users to initiate tasks using natural language, without requiring computational expertise. Evaluated on BioMed-AQA benchmark (327 biomedical data tasks), BioMedAgent achieved a 77% success rate, surpassing other LLM agents, and generalized robustly to the external BixBench dataset. Beyond benchmarks, it autonomously performs cross-omics analysis, machine-learning modelling and pathology image segmentation.
URL
https://www.nature.com/articles/s41551-026-01634-6
TITLE
Empowering AI data scientists using a multi-agent LLM framework with self-evolving capabilities for autonomous, tool-aware biomedical data analyses.
Main citation
Bu D, Sun J, Li K, He Z, Huang W, Hu J, Zhang S, Lei S, Huo P, Wang Z, Wang S, Wang T, Gao K, Wu Y, Zhao L, Wang K, Li G, Song H, Jin Y, Zhang K, Chen R, Zhao Y. (2026) Empowering AI data scientists using a multi-agent LLM framework with self-evolving capabilities for autonomous, tool-aware biomedical data analyses. Nature Biomedical Engineering. doi:10.1038/s41551-026-01634-6. PMID 41912700
ABSTRACT
Artificial intelligence agents are emerging as powerful applications of large language models (LLMs), automating complex tasks and enabling scientific data exploration. However, their use in biomedical data analysis remains limited by the difficulty of handling specialized tools and multistep reasoning. Here we introduce BioMedAgent, a self-evolving LLM multi-agent framework, which learns to use diverse bioinformatics tools and chain them into executable workflows through interactive exploration and memory retrieval algorithms. It allows biomedical users to initiate tasks using natural language, without requiring computational expertise. Evaluated on our newly released BioMed-AQA benchmark comprising 327 biomedical data tasks, BioMedAgent achieved a 77% success rate, surpassing other LLM agents, and generalized robustly to the external BixBench dataset. Beyond benchmarks, it autonomously performs cross-omics analysis, machine-learning modelling and pathology image segmentation, highlighting its potential to advance biomedical research and extend to other scientific domains requiring complex tool integration and multistep reasoning.
DOI
10.1038/s41551-026-01634-6

BixBench

AI Benchmark Bioinformatics LLM Agent Computational Biology FutureHouse
FULL NAME
BixBench — Comprehensive Benchmark for LLM-based Agents in Computational Biology
DESCRIPTION
BixBench by FutureHouse and ScienceMachine is a benchmark designed to evaluate AI agents on real-world bioinformatics tasks. Features 61 real-world analytical scenarios with 205 associated questions, supporting both open-answer and multiple-choice evaluation. Tests agents on data analysis, insight generation, and result interpretation in bioinformatics. Current frontier models achieve only ~21% accuracy, highlighting significant room for improvement.
URL
https://github.com/Future-House/BixBench
Main citation
Mitchener L, Laurent J, Wellawatte G, et al. (2025) BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational Biology. arXiv:2503.00096.
ABSTRACT
Artificial intelligence (AI) is changing scientific research at a rapid pace and is beginning to enable the automation of complex analytical tasks. One of the most promising fields for AI-driven automation is bioinformatics, where data-focused research lends itself to purely computational analysis. We introduce BixBench, a benchmark designed to evaluate AI agents on real-world bioinformatics tasks. BixBench challenges AI models with open-ended analytical research scenarios, requiring them to analyze data, generate insights, and interpret results autonomously. The benchmark comprises over 50 real-world scenarios with nearly 300 associated open-answer questions.
DOI
10.48550/arXiv.2503.00096