Open sourcing Tahoe-100M

Authored by
Tahoe Team
Released on
February 25, 2025
Authored by
Tahoe Team
Released on
February 25, 2025
Summary

Historic day for builders in bio: We have open-sourced Tahoe-100M, largest single-cell atlas ever—by a wide margin. This is a huge leap forward for AI models of cells & drug discovery.

We are open sourcing Tahoe-100M to start a movement and to set a new standard.

We believe that incremental, reductionist steps are not enough to push us beyond structural protein models and to usher us into the next inflection point in biology: in silico models of human cell, capable of discovering drug molecules that can cure diseased cells.

For that, giant leaps are needed. Tahoe-100M sets a new bar. 100M single-cell data points, 60,000 drug-patient interactions, 50x larger than all public perturbative single-cell data combined - Built using our Mosaic platform.

This is a huge step towards removing the key bottleneck in building virtual models of cell: large-scale, single-cell data from diverse biological contexts, especially from diseased cells perturbed by drugs.

We invite the entire ML & bio community to build with us. Tahoe-100M is open. Let’s take ambitious leaps—openly, boldly, and with urgency—to defeat debilitating diseases.

It’s morning in biology. Let’s get to work.

Read the press release.

How to access and download Tahoe-100M:

https://huggingface.co/datasets/tahoebio/Tahoe-100M

https://arcinstitute.org/tools/virtualcellatlas

Read the manuscript on Bioarxiv: https://www.biorxiv.org/content/10.1101/2025.02.20.639398v1

Watch No Priors Podcast about Tahoe-100M, with our founders and our friends at Arc Institute.