Chair of Session 3
Professor David A van Dyk, Imperial College London, UK
Distilling natural laws from experimental data: from particle physics to computational biology
Professor Hod Lipson, Cornell University, USA
Can machines discover scientific laws automatically? For centuries, scientists have attempted to identify and document analytical laws that underlie physical phenomena in nature. Despite the prevalence of computing power, the process of finding natural laws and their corresponding equations has resisted automation. This talk will outline a series of recent research projects, starting with self-reflecting robotic systems, and ending with machines that can formulate hypotheses, design experiments, and interpret the results, to discover new scientific laws. While the computer can discover new laws, will we still understand them? Our ability to have insight into science may not keep pace with the rate and complexity of automatically-generated discoveries. Are we entering a post-singularity scientific age, where computers not only discover new science, but now also need to find ways to explain it in a way that humans can understand? We will see examples from art to architecture, from psychology to cosmology, from big science to small science.
Model-based machine learning
Professor Christopher Bishop FREng, Microsoft Research Cambridge
Traditional machine learning is characterised by a bewildering variety of techniques, such as logistic regression, support vector machines, neural networks, Kalman filters, and many others, as well as numerous variants of these. Each has its own merits, and each has its own associated algorithms for fitting adjustable parameters to a training data set. Selecting an appropriate technique can be difficult, and adapting it to a specific application requires detailed understanding of that technique and involves corresponding modifications to the source code.
In recent years that has been a growing interest in a simpler, yet much more powerful, paradigm called model-based machine learning. This allows a very broad range of machine learning models to be specified compactly within a simple development environment. Training the model becomes a task in probabilistic inference, and is decoupled from the specification of the model itself and hence can be automated. The majority of standard techniques correspond to specific choices for the model and arise naturally as special cases, while variants of these techniques to suit specific applications are easily constructed, and alternative related structures can readily be compared. Newcomers to the field of machine learning need only to understand the model specification environment in order to gain access to a huge range of models. The model-based approach to machine learning is particularly powerful when enabled through a probabilistic programming language.
Nonparametric probabilistic modelling
Professor Zoubin Ghahramani FRS, University of Cambridge
Uncertainty, data, and inference play a fundamental role in modelling. Probabilistic approaches to modelling have transformed scientific data analysis, artificial intelligence and machine learning, and have made it possible to exploit the many opportunities arising from the recent explosion of big data problems arising in the sciences, society and commerce. Once a probabilistic model is defined, Bayesian statistics (which used to be called "inverse probability") can be used to make inferences and predictions from the model. Bayesian methods work best when they are applied to models that are flexible enough to capture the complexity of real-world data. Recent work on non-parametric Bayesian machine learning provides this flexibility. I will touch upon some of our latest work in this area, including new models for time series and for social and biological networks.
Statistical inference for markov jump process models via differential geometric monte carlo methods and the linear noise approximation
Professor Mark Girolami, Imperial College
Bayesian analysis for Markov jump processes is a non-trivial and challenging problem. Although exact inference is theoretically possible, it is computationally demanding thus its applicability is limited to a small class of problems. In this talk we describe the application of Riemann manifold MCMC methods using an approximation to the likelihood of the Markov jump process which is valid when the system modelled is near its thermodynamic limit. The proposed approach is both statistically and computationally efficient while the convergence rate and mixing of the chains allows for fast MCMC inference. The methodology is evaluated using numerical simulations on two problems from chemical kinetics and one from systems biology.