Open sourcing Tahoe-100M

Historic day for builders in bio: We have open-sourced Tahoe-100M, largest single-cell atlas ever—by a wide margin. This is a huge leap forward for AI models of cells & drug discovery.
We are open sourcing Tahoe-100M to start a movement and to set a new standard.
We believe that incremental, reductionist steps are not enough to push us beyond structural protein models and to usher us into the next inflection point in biology: in silico models of human cell, capable of discovering drug molecules that can cure diseased cells.
For that, giant leaps are needed. Tahoe-100M sets a new bar. 100M single-cell data points, 60,000 drug-patient interactions, 50x larger than all public perturbative single-cell data combined - Built using our Mosaic platform.
This is a huge step towards removing the key bottleneck in building virtual models of cell: large-scale, single-cell data from diverse biological contexts, especially from diseased cells perturbed by drugs.
We invite the entire ML & bio community to build with us. Tahoe-100M is open. Let’s take ambitious leaps—openly, boldly, and with urgency—to defeat debilitating diseases.
It’s morning in biology. Let’s get to work.
How to access and download Tahoe-100M:
https://huggingface.co/datasets/tahoebio/Tahoe-100M
https://arcinstitute.org/tools/virtualcellatlas
Read the manuscript on Bioarxiv: https://www.biorxiv.org/content/10.1101/2025.02.20.639398v1