Connect with us

Announcements

NVIDIA-backed AI firm drops 5M drug maps to fast-track breakthrough therapies

InnovationSandboxAQ hopes SAIR dataset will compress drug discovery into a single, fast AI prediction step powered by synthetic molecular data.
Updated:

Published

on

SandboxAQ hopes SAIR dataset will compress drug discovery into a single, fast AI prediction step powered by synthetic molecular data.

Researcher studies molecular models on a digital screen in a lab. (representational image)

SandboxAQ official website

SandboxAQ, an AI startup spun out of Google and backed by NVIDIA, has released a massive new dataset it hopes will revolutionize early-stage drug discovery.

On Wednesday, the company unveiled the Structurally Augmented IC50 Repository (SAIR), a trove of over 5.2 million computationally generated protein-drug molecule co-structures, each tagged with real-world potency data.

The aim is to make it easier and faster for researchers to determine whether a potential drug will bind effectively to its target protein.

That’s a crucial question scientists must answer before advancing a drug candidate into further testing.

Targeting the bind between drugs and proteins

SandboxAQ’s dataset is designed to support models that predict whether a small molecule will stick to a specific protein. That interaction determines if a drug will inhibit or modify a biological process, such as halting the spread of disease.

Advertisement

Traditionally, researchers use experimental methods to study these structures. The process is costly and time-consuming.

It starts with obtaining a 3D structure of a target protein and then testing thousands of molecules for how they bind. Predicting both the pose and potency of the molecule requires repeated computation and refinement.

Synthetic molecules, real-world accuracy

“This is a long-standing problem in biology that we’ve all, as an industry, been trying to solve for,” Nadia Harhen, general manager of AI simulation at SandboxAQ, told Reuters.

“All of these computationally generated structures are tagged to a ground-truth experimental data, and so when you pick this data set and you train models, you can actually use the synthetic data in a way that’s never been done before.”

To bypass the data bottleneck, SandboxAQ used NVIDIA chips to generate synthetic structures. These are not observed in labs but calculated from real experimental data using the Boltz-1x co-folding model.

For each protein-drug pair from public datasets like ChEMBL and BindingDB, the team created five different 3D poses. They then cross-referenced these predictions with computational potency values to retain only the most accurate ones. The final SAIR dataset includes those high-confidence entries.

Examples of 3D co-folded protein-drug complexes found in the SAIR release.
Credit – SandboxAQ

Boosting AI model training with open data

AI models like AlphaFold2 and newer systems such as AlphaFold3 and Boltz-2 have made major progress in predicting 3D structures and binding poses. But they still struggle when dealing with unfamiliar proteins or molecules outside their training data.

RECOMMENDED ARTICLES

One way to improve that is through more training data. However, creating new structural data experimentally is expensive, which is the very problem AI hopes to fix.

And while pharma companies hold private datasets, they rarely share them publicly.

Advertisement

By generating synthetic structural data from widely available potency records, SAIR offers a workaround.

Researchers can now use this resource to train models that not only predict structure but also potency, without access to proprietary databases.

From data to drug candidates, virtually

SandboxAQ will make the SAIR dataset freely available to researchers. At the same time, it plans to charge for access to its proprietary AI models trained on this data.

These tools aim to rival lab-based experiments, predicting protein binding quickly, virtually, and with real-world accuracy.

ABOUT THE AUTHOR

Aamir Khollam Aamir is a seasoned tech journalist with experience at Exhibit Magazine, Republic World, and PR Newswire. With a deep love for all things tech and science, he has spent years decoding the latest innovations and exploring how they shape industries, lifestyles, and the future of humanity.

RELATED ARTICLES

JOBS

Loading opportunities…

Advertisement

Source: Interesting Engineering

Continue Reading
Advertisement
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Exit mobile version