Basecamp Research announced the Trillion Gene Atlas on March 18, a project to sequence genomic data from over 100 million species and use it to train AI systems capable of designing new medicines. The startup has partnered with Anthropic, Ultima Genomics, PacBio, and NVIDIA to execute what would expand known genetic diversity 100-fold.
What They’re Building
The AI drug discovery industry has hit what Basecamp calls a “data wall.” Public genomic databases contain sequences from a tiny fraction of Earth’s species, limiting the diversity of molecular structures AI models can learn from. Basecamp’s solution: build a proprietary database of genes from millions of species that scientists have never sequenced before.
Their EDEN foundation models, released in January, were trained on BaseData—a genomic database the company says is more than 10 times larger than all public resources combined. The models have already demonstrated real results in the lab, including:
- Designing antimicrobial peptides with 97% hit rates
- Achieving zero-shot activity in human T-cells without clinical training data
- Creating what the company calls “AI-Programmable Gene Insertion”
The Trillion Gene Atlas aims to expand this database by orders of magnitude, processing genomic samples from thousands of collection sites worldwide.
The Partnership Structure
Each partner contributes specific capabilities:
Anthropic is integrating Claude into Basecamp’s therapeutic design workflows. The reasoning capabilities allow researchers to iterate on molecular designs through natural language rather than specialized software.
Ultima Genomics provides ultra-high-throughput sequencing via its UG200 Series, enabling the scale required for 100 million species.
PacBio contributes long-read sequencing, which captures complete gene structures that short-read methods miss.
NVIDIA supplies the compute infrastructure and Parabricks tools for processing the massive resulting datasets.
What This Means
If Basecamp delivers on its claims, the implications are significant. Their EDEN models are described as “the first capable of designing diverse therapeutics directly from a disease prompt”—essentially, tell the AI what disease you want to treat and it proposes candidate molecules.
That’s a bold claim. Current AI drug discovery still requires extensive human expertise to translate model outputs into viable drug candidates. But the company says it has validated zero-shot therapeutic activity in wet-lab conditions, meaning the AI’s designs worked without being specifically trained on similar molecules.
The timeline is aggressive: Basecamp claims it can compress over 20 years of biological data gathering and analysis into less than two years.
The Fine Print
Basecamp Research is a startup, not a pharmaceutical company with approved drugs on the market. Their EDEN models have shown promising in-vitro results, but no AI-designed therapeutics from this approach have entered clinical trials yet.
The 97% hit rate for antimicrobial peptides sounds impressive, but “hit” in early drug discovery doesn’t mean “works in humans.” Most drug candidates fail in later stages regardless of how promising their initial profiles looked.
The partnership with Anthropic is notable—it’s one of the first major applications of frontier reasoning models in drug discovery. But integrating AI into therapeutic design workflows is easier announced than accomplished. Regulatory pathways for AI-designed drugs remain uncertain.
Still, the scale of the Trillion Gene Atlas project is unprecedented. If even a fraction of its goals materialize, it could provide training data that fundamentally changes what AI drug discovery models can achieve. The coming years will reveal whether that potential translates into medicines that actually help patients.