Machine learning and AI in biological science, drug discovery and medicine

01 March 2023 09:00 - 17:00 Online Free Watch online
Digital Biology 02 by Khyati Trehan

This Royal Society conference brought together stakeholders from industry and academia to explore advances in machine learning and artificial intelligence for biological research, drug discovery and medicine. Whilst these computational technologies are already transforming biological, clinical and pharmaceutical research, significant obstacles remain, in particular around skills, financing, and accessibility. Concerted industry/academia collaboration is required to address these challenges.

Three sessions around biology, chemistry and medicine included talks on machine learning for target discovery and 'omic technologies, AI-based chemoinformatics, and computational clinical trial design, as well as the classical prediction of protein folding, followed by a keynote address from Dr Demis Hassabis CBE FREng FRS (DeepMind).

The conference concluded with a panel discussion that addressed how machine learning and AI can be used to advance medicine and healthcare.

Conference report

Download the conference report.

About the conference series

Supported by AstraZeneca, the meeting formed part of the Royal Society’s Transforming our future series in the life sciences. These meetings are unique, high-level events that address the scientific and technical challenges of the next decade. Each conference features cutting-edge science from industry and academia and brings together leading experts from the scientific community, including regulatory, charity and funding bodies. 

Watch the event recording

Click watch on YouTube to view the full video playlist.

Organisers

  • Mihaela van der Schaar photograph

    Professor Mihaela van der Schaar

    Mihaela van der Schaar is the John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge and a Fellow at The Alan Turing Institute in London. In addition to leading the van der Schaar lab, Mihaela is founder and director of the Cambridge Centre for AI in Medicine (CCAIM).

    Mihaela was elected IEEE Fellow in 2009. She has received numerous awards, including the Oon Prize on Preventative Medicine from the University of Cambridge (2018), a National Science Foundation CAREER Award (2004), 3 IBM Faculty Awards, the IBM Exploratory Stream Analytics Innovation Award, the Philips Make a Difference Award and several best paper awards, including the IEEE Darlington Award.

    Mihaela is personally credited as inventor on 35 USA patents, many of which are still frequently cited and adopted in standards. She has made over 45 contributions to international standards for which she received 3 ISO Awards. In 2019, a Nesta report determined that Mihaela was the most-cited female AI researcher in the UK.

  • Harren Jhoti photo

    Dr Harren Jhoti OBE FMedSci FRS

    Dr Harren Jhoti FMedSci FRS is a structural biologist whose main interest has been rational drug design. He is President & CEO of Astex Pharmaceuticals in Cambridge, a biotech company he co-founded in 1999. He pioneered the development of ragment-based drug discovery, an approach now widely used in pharmaceutical and academic drug discovery centres to discover new medicines.

    Astex’s first drug, called Kisqali, which originated from a Novartis collaboration, was approved in 2017 for patients with metastatic breast cancer in the US and EU. In 2013 Astex was acquired by Otsuka Pharmaceuticals for $886m, and operates as a wholly-owned subsidiary of the Japanese company. Prior to Astex, Harren was Head of Structural Biology at GlaxoWellcome (now GSK).

    In 2018 Harren received the Lifetime Achievement Award from the BIA, the UK BioIndustry Association. He is a Fellow of the Academy of Medical Sciences, the Royal Society of Chemistry and the Royal Society of Biology. He was awarded the Prous Institute-Overton and Meyer Award by the European Federation for Medicinal Chemistry in 2012 and named by the Royal Society of Chemistry as World Entrepreneur of the Year for 2007.

  • Dr Claus Bendtsen

    Dr Claus Bendtsen

    Claus Bendtsen is an executive director at AstraZeneca where he heads Data Sciences & Quantitative Biology as part of Discovery Sciences. Prior to joining AstraZeneca, he held positions at Novartis and Merck & Co. Earlier in his career he co-founded three start-ups and worked in academia. He holds a PhD in applied mathematics and an MBA. He has more than 75 publications and 20 years of experience in the pharmaceutical industry.

Schedule

09:00-09:05 Opening remarks
Sir Adrian Smith

Sir Adrian Smith

President, the Royal Society

Chair

Dr Claus Bendtsen

Dr Claus Bendtsen

AstraZeneca

09:10-09:30 AI-augmented target discovery

The complexity of human disease poses significant challenges in the translation of basic research into safe and effective therapies, with at least 50% of drugs failing in Phase II and Phase III trials for lack of efficacy.

To tackle this complexity and enable data-driven discoveries, BenevolentAI has built a comprehensive knowledge graph incorporating and capitalising on many orthogonal data modalities, to build a detailed mechanistic representation of the dysregulated processes that underlie human disease. Powerful AI and ML tools are used to interrogate this corpus of knowledge, to hypothesise novel biological targets of potential therapeutic value for any disease of interest. These hypotheses are experimentally validated in physiologically relevant human patient-derived cell based systems, before entering the BenevolentAI drug discovery portfolio.

In this talk, Dr Phelan will describe the current status of AI in drug discovery and her vision for future technological solutions that will help to overcome some of the key challenges required to affect change across the entire R&D value chain.

Dr Anne Phelan

Dr Anne Phelan

BenevolentAI, UK

09:30-09:50 Mapping and navigating biology and chemistry with genome-scale imaging

Image-based readouts of biology are information-rich and inexpensive. Yet historically, bespoke data collection methods and the intrinsically unstructured nature of image data have made these assays difficult to work with at scale.

This presentation will discuss advances made at Recursion to industrialise the use of cellular imaging to drive drug discovery. In particular, the use of deep learning allows the transformation of unstructured images into biologically meaningful representations, and enables a 'map of biology' relating genetic and chemical perturbations to scale drug discovery. Dr Haque will further discuss how publicly-shared resources from Recursion, including the RxRx3 dataset and MolRecTM application, enable downstream research both on cellular images themselves and on deep learning-derived embeddings, making advanced image analysis more accessible to researchers worldwide.

Dr Imran Haque

Dr Imran Haque

Recursion Pharmaceuticals

09:50-10:10 Digital Twins for personalised oncology

Cancer incidence is steadily increasing and is a major burden for patients, their families and society. Despite new treatments such as immunotherapies, currently only around 25% of cancer patients respond to treatments with drugs or biologicals leading to a cure or a decrease in disease progression (Spear et al 2001). New approaches are needed.

In this talk, Walter will introduce Digital Twin strategies that aim to improve cancer diagnosis and treatment by constructing computer models of cancer patients. These models allow simulation of the disease in silico and optimisation and selection of the best possible therapy for each patient. Using childhood cancer as a paradigm, Walter will discuss the status of Digital Twin technology, what the current challenges are and what he hopes the field will achieve over the next five to ten years.

Professor Walter Kolch

Professor Walter Kolch

University College Dublin

10:10-10:35 Q&A and discussion
Professor Walter Kolch

Professor Walter Kolch

University College Dublin

Dr Imran Haque

Dr Imran Haque

Recursion Pharmaceuticals

Dr Anne Phelan

Dr Anne Phelan

BenevolentAI, UK

Chair

Harren Jhoti photo

Dr Harren Jhoti OBE FMedSci FRS

Astex Pharmaceuticals

11:10-11:30 Integrating chemical and biological data: a focus on relevance and translation will boost in vivo-relevant drug discovery

The amount of chemical and biological data has increased in both public and private domains, and algorithm and hardware design for machine learning has also progressed tremendously in the last ten years. This has enabled rapid development of machine learning for drug discovery. Several ‘AI-designed drugs’ have already entered clinical phases, and press releases now describe the design of functional proteins and antibodies from scratch.

However, the attempt to marry algorithms with drug discovery often disregards the in vivo relevance of our current capabilities for processing chemical and biological data. In this talk, Dr Bender will pose that reductionist thinking remains pervasive in the field, and how, in combination with a lack of relevant data, our limited ability to handle it computationally with respect to in vivo-relevant decisions and the formation of many narrow specialist domains, this is undermining our ability to harness the full potential of available chemical and biological data.

This talk will discuss how changing several areas, including data usage, algorithms and human mindset, might enable society to fully benefit from available computer power when it comes to in vivo-relevant decision making in drug discovery in the future.

Professor Andreas Bender

Professor Andreas Bender

University of Cambridge and Pangea Botanica

11:30-11:50 Machine learning to predict protein function from sequence - therapeutic applications

A central challenge in biochemistry is the prediction of the functional properties of a protein from its amino acid sequence, as it can lead to the discovery of new proteins with specific functionality and a better understanding of the functional effect of genomic mutations. Experimental and computational data enable the training and validation of powerful machine learning models that predict protein function directly from sequence. This talk will present deep learning models that accurately predict functional domains within protein sequences, and large language models that generate textual descriptions of protein sequences, collectively adding millions of annotations to public databases.

Technical breakthroughs enable data on the sequence-to-function relationship to be rapidly acquired. However, the cost and latency of wet lab experiments means that we require new methods to find 'hits' (sequences that meet the function requirements of the campaign) in few experimental rounds, where each round contains a large batch of sequence designs.

In this talk, Dr Colwell will discuss model-based optimisation approaches that take advantage of sample inefficient methods to find diverse sequence candidates for experimental evaluation. The potential of these approaches will be illustrated through three case studies demonstrating the design and experimental validation of proteins and peptides for therapeutic applications.

11:50-12:10 Exploring the ability of machine learning-based virtual screening models to identify the functional groups responsible for binding

There is significant interest in developing machine learning models that can predict protein-ligand binding with high accuracy. Many recently proposed structure-based virtual screening models have done just this. However, there are still challenges in ensuring that these models are doing more than exploiting ligand-specific biases in the dataset making them potentially excellent predictors for proteins/ligands that are contained within the dataset but unable to generalise to unseen examples. 

One way to assess if a model can generalise is to test whether it understands the rules of intermolecular binding. Using synthetic data, the Deane lab has investigated whether different methods (from fingerprint-based random forests to deep learning virtual screening models) are learning more than the biases in datasets.

The Deane lab found that their deep learning based virtual screening model, PointVS, identifies important functional groups with more efficiency than other methods tested. This suggests that it may generalise more effectively to new examples.

Using attribution, the Deane lab demonstrated that PointVS can identify important interactions in real protein ligand complexes, and further, that it can be used to extract important binding pharmacophores from a given protein target when supplied with a number of bound structures. This information was then used to perform fragment elaboration, resulting in improvements in docking scores when compared to using structural information from a traditional data-based approach. 

This not only provides definitive proof that PointVS is learning to identify important binding interactions, but also constitutes the first deep learning-based method for extracting structural information from a target for molecule design. This presents an exciting opportunity for the future of lead discovery and drug development.

Professor Charlotte Deane MBE

Professor Charlotte Deane MBE

University of Oxford

12:10-12:35 Q&A and discussion
Professor Andreas Bender

Professor Andreas Bender

University of Cambridge and Pangea Botanica

Professor Charlotte Deane MBE

Professor Charlotte Deane MBE

University of Oxford

Chair

Mihaela van der Schaar photograph

Professor Mihaela van der Schaar

University of Cambridge

13:40-14:00 Transforming the practice of medicine - a human-centred perspective

‘Healthcare is perhaps AI’s most urgent application' – Satya Nadella, CEO of Microsoft.

This urgency has been brought to life for all by the COVID-19 pandemic; now, healthcare providers are expected to perform and transform at scale, in real time. The recent disruptions in AI have the potential to revolutionise medical care, but to fully leverage its potential, a human-centred approach is essential.

This talk will share various opportunities and challenges for AI to deliver real-world impact, while prioritising the human aspect of high-quality care. Illustrative examples will highlight how a human-centred approach is crucial in transforming the practice of medicine for the future.

Aditya Nori

Aditya Nori

Microsoft Health Futures

14:00-14:20 Closing the loop with AI: integrating large scale population health databases, observational cohorts and clinical trials for drug discovery

A new era of data in biology and medicine is upon us - caused by the impact of cheap, high-quality measurement technology at scale (genome sequencing, single cell methods) and the increasing use of electronic health records. The discovery of new medicines needs to integrate data from large population level datasets with finely-detailed, smaller-scale and disease-specific observational cohorts.

The entire pharmaceutical development process can be considered a learning loop, with feedback into early discovery. An example of this is the use of new complex cellular models (often referred to as biological digital twins) that use machine learning models to bridge to the responses of individual patients.

This talk will illustrate how GSK is using machine learning models increasingly to integrate complex multimodal multi-scale data in both clinical development and early discovery.

Dr Kim Branson

Dr Kim Branson

GSK

14:20-14:40 Machine learning for translatable biomarkers and targets

Despite significant progress in modern medicine, the design and development of new medicines remains very challenging and with a low probability of success. Improving our understanding of the underlying biology of disease can help identify new therapies and predict which interventions will positively (or negatively) affect clinical outcomes in diverse groups of patients.

This talk will describe how insitro is using cutting-edge machine learning methods to develop a new approach to drug development that uses biological and clinical data to design novel, safe, and effective therapies that help more people, faster and at a lower cost. The process begins with the generation and aggregation of large amounts of high-content biomarker data from both human samples and human-derived cellular systems. These data are used to create a representation of biological states, enabling the construction of machine learning models that reveal novel therapeutic targets, identify coherent patient segments and predict the effect of different therapies on different patients. It will also explore insitro’s vision for the value of AI-driven biomarker identification in precision medicine.

Professor Daphne Koller

Professor Daphne Koller

insitro, US

14:40-15:05 Q&A and discussion
Professor Daphne Koller

Professor Daphne Koller

insitro, US

Aditya Nori

Aditya Nori

Microsoft Health Futures

Dr Kim Branson

Dr Kim Branson

GSK

Chair

Mihaela van der Schaar photograph

Professor Mihaela van der Schaar

University of Cambridge

15:35-16:45 Panel discussion
Jia-Yi Har

Jia-Yi Har

Cathay Capital

Prof Eoin McKinney

Prof Eoin McKinney

University of Cambridge

Dr Thomas Callender

Dr Thomas Callender

University College London

Dr Danielle Belgrave

Dr Danielle Belgrave

DeepMind

16:45-17:25 Using AI to accelerate scientific discovery

The past decade has seen incredible advances in artificial intelligence. DeepMind has been in the vanguard of many of these big breakthroughs, pioneering the development of self-learning systems like AlphaGo, the first program to beat the world champion at the complex game of Go. Games have proven to be a great training ground for developing and testing AI algorithms, but the aim at DeepMind has always been to build general learning systems ultimately capable of solving important problems in the real world.

We are on the cusp of an exciting new era in science, with AI poised to be a powerful tool for accelerating scientific discovery itself. DeepMind recently demonstrated this potential with AlphaFold, a solution to the 50-year grand challenge of protein structure prediction, culminating in the release of the most accurate and complete picture of the human proteome and release of the predicted structures of over 200 million proteins - nearly all catalogued proteins known to science.

Dr Demis Hassabis, Google Deepmind

Dr Demis Hassabis, Google Deepmind

Chair

Sir Adrian Smith

President, the Royal Society