However, something absolutely key when understanding how we can successfully use AI, is that its utility is a direct result of findings from ‘real’ experiments. The reason AlphaFold could be developed was because in 1971 a – in a moment of real vision – the Brookhaven Data Bank (a protein data bank) was founded to standardise and catalogue all future protein 3D structures. At the time, it contained only seven structures, now it boasts almost quarter of a million structures, representing over 750,000 distinct protein snapshots. This regulated repository positioned generations of computational scientists to systematically analyse these data for patterns – and is exactly the kind of repository upon which AI can be trained.
What’s needed next?
The reason AI methods cannot predict the structure of proteins like c-Myc is because we lack the experimental data that holds the key information needed by the AIs to learn.
We estimate that, in total, drug discovery research has only tested compounds against one quarter of the human proteome. AI algorithms, therefore, have limited chance of identifying completely novel areas of chemistry required to address some of the remaining three quarters.
So, we need more experimental data. Evidently, the unparalleled foundation of deep, well annotated data, coupled to decades of computational-led understanding of patterns in these data, have positioned our field to be a major beneficiary of the AI revolution. Moving forward we must invest in key data generation. AI ‘creates’ by interpolating from existing data and any extrapolation remains within confined boundaries. AI cannot leap into the complete unknown without data ‘stepping stones’. Without this data, it’d be like expecting generative AI to create accurate images of the life forms roaming the exoplanet Kepler-62e based solely on photographs from Earth.
We must also define boundaries of capability and applicability for each algorithm we develop. Overhyping will erode trust. The scientific method, not dogma and theatrics, must dictate our investment decisions and our use of AI moving forward.

