Skip to content
Events

Signal processing and inference for the physical sciences

Event

Starts:

March
262012

09:00

Ends:

March
272012

17:00

Location

The Royal Society, London, 6-9 Carlton House Terrace, London, SW1Y 5AG

Overview

Organised by Dr Nick Jones and Dr Thomas Maccarone

We will bring together two vibrant research groups for an exchange of ideas: physical scientists working with challenging data and needing tools to make the most of it; and analysts not yet working in these rich scientific fields.  Speakers cover applications across astrophysics, biological physics, geophysics and earth sciences and meet those from applied mathematics, computer science, engineering and statistics. We aim to open the world of new methods for data analysis to the physical scientist and accelerate the integration of data analysts into physical science. For further details on speakers see this final programme.

The proceedings of this meeting are scheduled to be published in a future issue of Philosophical Transactions A.

Satellite Meeting

This meeting was followed by a related Satellite meeting at the Kavli Royal Society International Centre entitled Signal processing for the physical sciences from 28 – 29 March 2012.

Event organisers

Select an organiser for more information

Schedule of talks

Session 1

5 talks Show detail Hide detail

Chair of Session 1

Professor John Sahr, University of Washington, USA

Show speakers

Gravitational wave astronomy: needle in a haystack

Professor Neil Cornish, Montana State University, USA

Abstract

A world-wide array of highly sensitive interferometers stands poised to usher in a new era in astronomy with the first direct detection of gravitational waves. The data from these instruments will provide a unique perspective on extreme astrophysical phenomena such as neutron stars and black holes, and will allow us to test Einstein's theory of gravity in the strong field, dynamical regime. To fully realize these goals we need to solve some challenging problems in signal processing and inference, such as finding rare and weak signals that are buried in non-stationary and non-Gaussian instrument noise, dealing with high dimensional model spaces, and locating what are often extremely tight concentrations of posterior mass within the prior volume. Gravitational wave detection using space based detectors and Pulsar Timing Arrays bring with them the additional challenge of having to isolate individual signals that overlap one another in both time and frequency. Promising solutions to these problems will be discussed, along with some of the challenges that remain.

Show speakers

Rotary components and polarization ellipses: a statistical perspective

Professor Andrew Walden, Imperial College London, UK

Abstract

Rotary analysis decomposes vector motions on the plane into counter-rotating components which have proved particulary useful in the study of geophysical flows influenced by the rotation of the Earth. For stationary random signals the motion at any frequency takes the form of a random polarization ellipse. Although there are numerous applications of rotary analysis relatively little attention has been paid to the statistical properties of the random ellipses or to the estimated rotary coefficient which measures the tendency to rotate counterclockwise or clockwise. The precise statistical structure of the polarization ellipses is reviewed, including the random behaviour of the ellipse orientation, aspect ratio and intensity. Special attention is then paid to spectral matrix estimation from physical data and on hypothesis testing and confidence intervals computed using the estimated matrices.

Show speakers

Time series analysis in astronomy

Dr Simon Vaughan, University of Leicester, UK

Abstract

Progress in astronomy comes from interpreting the signals encoded in the light received from distant objects – the distribution of light over the sky (images), over photon wavelength (spectrum), over polarization angle, and over time (usually called light curves by astronomers). In the time domain we see transient events such as supernovae, gamma-ray bursts, and other powerful explosions; we see periodic phenomena such as the orbits of planets around nearby stars, radio pulsars, and pulsations of stars in nearby galaxies; and persistent aperiodic variations (“noise”) from powerful systems like accreting black holes. In my talk I will briefly review a few of the recent and future challenges in the burgeoning area of Time Domain Astrophysics. I will discuss the recovery of reliable noise power spectra from sparely sampled time series, higher-order properties of accretion black holes, time delays and correlations in multivariate time series, and characterisation of gamma-ray burst light curves.

Show speakers

Transdimensional inverse problems in the geosciences

Professor Malcolm Sambridge, Australian National University, Australia

Abstract

Except for a very thin layer at the surface, all of our knowledge of the physical properties of the Earth is based on indirect observations collected at the surface. This is known as an inverse problem. Inverse problems occur in many areas of the sciences where an abundance of observations exist that only indirectly constrain some process or physical property of interest. Over the past forty years geophysicists have built models of various physical and chemical properties of the Earth’s interior which fit observations collected at the surface. Formal inversion methods typically involve an optimization process whereby one or more classes of data are used to constrain parameters in a mathematical representation of the subsurface. A common difficulty is that surface observations do not uniquely constrain the subsurface, meaning additional information must be introduced, usually in the form of some ad hoc regularizing criteria, which is often chosen for mathematical convenience.

An alternative approach is to embrace the non-uniqueness directly and employ an inference process based on parameter space sampling. Instead of seeking a best model within an optimization framework one seeks an ensemble of solutions and derives properties of that ensemble for inspection. While this idea has itself been employed for more than 30 years, it is only now gaining broad acceptance. Recently these ideas have been extended with the introduction of trans-dimensional and hierarchical sampling methods.  These approaches are becoming popular because they offer novel ways of dealing with problems involving joint fitting of multiple data types, uncertain data errors and/or uncertain model parameterizations. Rather than being forced to make decisions on parameterization, level of data noise and weights between data types in advance, as is often the case in an optimization framework, these choices can be relaxed and instead constrained by the data themselves. Limitations exist with sampling based approaches in that computational cost is often high for large scale structural problems, i.e. many unknowns and data. However there are a surprising number of areas where they are now feasible. This presentation will outline transdimensional inverse methods and describe some recent applications to geophysical problems. They have potential for similar data inference problems across the physical sciences.

Show speakers

Session 2

5 talks Show detail Hide detail

Chair of Session 2

Professor Guy Nason, University of Bristol, UK

Show speakers

Joining forces of bayesian and frequentist methodology: a study for inference in the presence of non-identifiability

Professor Jens Timmer, University of Freiburg, Germany

Abstract

Increasingly complex applications involve large datasets in combination with nonlinear and high-dimensional mathematical models. In this context, statistical inference is a challenging issue that calls for pragmatic approaches that take advantage of both Bayesian and frequentist methods. The elegance of Bayesian methodology is founded in the propagation of information content provided by experimental data and prior assumptions to the posterior probability distribution of model predictions. However, for complex applications experimental data and prior assumptions potentially constrain the posterior probability distribution insufficiently. In these situations Bayesian Markov chain Monte Carlo sampling can be infeasible. From a frequentist point of view insufficient experimental data and prior assumptions can be interpreted as non-identifiability. The profile likelihood approach offers to detect and to resolve non-identifiability by experimental design iteratively. Therefore, it allows to better constrain the posterior probability distribution until Markov chain Monte Carlo sampling can be used securely. Using an application from cell biology we compare both methods and show that a successive application of both methods facilitates a realistic assessment of uncertainty in model predictions.

Show speakers

Signal processing for molecular and cellular biophysics: an emerging field

Dr Max Little, MIT, USA

Abstract

Recent advances in the ability of experimental biophysics to watch the molecular and cellular processes of life in action – such as atomic force microscopy, optical tweezers, and Forster fluorescence resonance energy transfer – raise challenges for digital signal processing of the resulting experimental data. This talk explores the unique properties of such biophysical time series that set them apart from other signals, such as the prevalence of abrupt jumps and steps, multi-modal distributions, and autocorrelated noise. It exposes the problems with classical linear signal processing algorithms applied to this kind of data, and describes new nonlinear and non-Gaussian algorithms that are able to extract information that is of direct relevance to biophysical questions of interest. It is argued that these new methods applied in this context typify the nascent field of biophysical digital signal processing. Practical experimental examples will be discussed.

Show speakers

Similarity and denoising

Professor Paul Vitanyi, CWI, The Netherlands

Abstract

We can discover the effective similarity among pairs of finite objects and denoise a finite object using the Kolmogorov complexity of these objects. The drawback is the Kolmogorov complexity is not computable. If we approximate it using a good real-world compressor, then it turns out that on natural data the processes give adequate results in practice. In all cases we use the entire string. The methodology is parameter-free, alignment-free, and works on individual data. We illustrate both methods with examples.

Show speakers

Using topology to tame the complex biochemistry of genetic networks

Dr Mukund Thattai, National Centre for Biological Sciences India

Abstract

Living cells are controlled by networks of interacting genes, proteins and biochemicals. Cells use the emergent collective dynamics of these networks to probe their surroundings, perform computations, and generate appropriate responses. Here we consider genetic networks, interacting sets of genes that regulate one-another’s expression. It is possible to infer the interaction topology of genetic networks from high-throughput experimental measurements. However, such experiments rarely provide information on the detailed nature of each interaction. We show that topological approaches provide powerful means of dealing with the missing biochemical data. We first discuss the biochemical basis of gene regulation, and describe how genes can be connected into networks. We then show that, given weak constraints on the underlying biochemistry, topology alone determines the emergent properties of certain simple networks. Finally, we apply these approaches to the realistic example of quorum-sensing networks: chemical communication systems that co-ordinate the responses of bacterial populations. We find that the versatility of a quorum-sensing network – its ability to generate diverse response types – is determined purely by its topology. The most versatile topology is the one most commonly observed among real quorum-sensing sytems, suggesting that natural selection can act to optimize topology as well as biochemistry.

Show speakers

Session 3

5 talks Show detail Hide detail

Chair of Session 3

Professor David A van Dyk, Imperial College London, UK

Show speakers

Distilling natural laws from experimental data: from particle physics to computational biology

Professor Hod Lipson, Cornell University, USA

Abstract

Can machines discover scientific laws automatically? For centuries, scientists have attempted to identify and document analytical laws that underlie physical phenomena in nature. Despite the prevalence of computing power, the process of finding natural laws and their corresponding equations has resisted automation. This talk will outline a series of recent research projects, starting with self-reflecting robotic systems, and ending with machines that can formulate hypotheses, design experiments, and interpret the results, to discover new scientific laws. While the computer can discover new laws, will we still understand them? Our ability to have insight into science may not keep pace with the rate and complexity of automatically-generated discoveries. Are we entering a post-singularity scientific age, where computers not only discover new science, but now also need to find ways to explain it in a way that humans can understand? We will see examples from art to architecture, from psychology to cosmology, from big science to small science.

Show speakers

Model-based machine learning

Professor Christopher Bishop FREng, Microsoft Research Cambridge

Abstract

Traditional machine learning is characterised by a bewildering variety of techniques, such as logistic regression, support vector machines, neural networks, Kalman filters, and many others, as well as numerous variants of these. Each has its own merits, and each has its own associated algorithms for fitting adjustable parameters to a training data set. Selecting an appropriate technique can be difficult, and adapting it to a specific application requires detailed understanding of that technique and involves corresponding modifications to the source code.

In recent years that has been a growing interest in a simpler, yet much more powerful, paradigm called model-based machine learning. This allows a very broad range of machine learning models to be specified compactly within a simple development environment. Training the model becomes a task in probabilistic inference, and is decoupled from the specification of the model itself and hence can be automated. The majority of standard techniques correspond to specific choices for the model and arise naturally as special cases, while variants of these techniques to suit specific applications are easily constructed, and alternative related structures can readily be compared. Newcomers to the field of machine learning need only to understand the model specification environment in order to gain access to a huge range of models. The model-based approach to machine learning is particularly powerful when enabled through a probabilistic programming language.

Show speakers

Nonparametric probabilistic modelling

Professor Zoubin Ghahramani FRS, University of Cambridge

Abstract

Uncertainty, data, and inference play a fundamental role in modelling. Probabilistic approaches to modelling have transformed scientific data analysis, artificial intelligence and machine learning, and have made it possible to exploit the many opportunities arising from the recent explosion of big data problems arising in the sciences, society and commerce. Once a probabilistic model is defined, Bayesian statistics (which used to be called "inverse probability") can be used to make inferences and predictions from the model. Bayesian methods work best when they are applied to models that are flexible enough to capture the complexity of real-world data. Recent work on non-parametric Bayesian machine learning provides this flexibility. I will touch upon some of our latest work in this area, including new models for time series and for social and biological networks.

Show speakers

Statistical inference for markov jump process models via differential geometric monte carlo methods and the linear noise approximation

Professor Mark Girolami, University College London, UK

Abstract

Bayesian analysis for Markov jump processes is a non-trivial and challenging problem. Although exact inference is theoretically possible, it is computationally demanding thus its applicability is limited to a small class of problems. In this talk we describe the application of Riemann manifold MCMC methods using an approximation to the likelihood of the Markov jump process which is valid when the system modelled is near its thermodynamic limit. The proposed approach is both statistically and computationally efficient while the convergence rate and mixing of the chains allows for fast MCMC inference. The methodology is evaluated using numerical simulations on two problems from chemical kinetics and one from systems biology.

Show speakers

Session 4

5 talks Show detail Hide detail

Chair of Session 4

Professor Robert Palmer, University of Oklahoma, USA

Show speakers

Independent component analysis: recent advances

Professor Aapo Hyvarinen, University of Helsinki, Finland

Abstract

Independent component analysis is a probabilistic method for learning a linear transform of a random vector. The goal is to find components which are maximally independent and non-Gaussian (non-normal). Its fundamental difference to classical multivariate statistical methods is in the assumption of non-Gaussianity, which enables the identification of original, underlying components, in contrast to classical methods. The basic theory of ICA was mainly developed in the 1990's and summarized, for example, in our monograph in 2001. Here, we provide an overview of recent developments in the theory since the year 2000. The main topics are: testing independent components, analysing multiple data sets (three-way data), analysis of causal relations, modelling dependencies between the components, and improved methods for estimating the basic model.

Show speakers

Multivariate oscillations

Professor Sofia Olhede, University College London, UK

Abstract

We develop a geometric understanding of the modulated multivariate oscillation, starting from a review of the univariate, bivariate and basic multivariate modulated oscillations. We show that in higher dimensions the modulated multivariate oscillation can always, irrespectively of the dimensionality of the observed signal, instantaneously be described as a linearly, circularly or elliptically polarized signal using a set of complex vectors in conjunction with a single complex-valued signal. The evolution of this representation needs careful modelling. We show how the instantaneous rates of change of the signal can conveniently be represented as an evolution of the oscillatory structure across time, coupled with alterations of the multivariate relationships (or geometry) between the multiple signals. We describe how to calculate an intrinsic representation of the oscillation independent from the observational axes.

We show how the global dimensionality of the signal is built up from all its local one dimensional contributions, and introduce the purely unidirectional signal, to quantify how different any given signal is from the closest purely unidirectional signal. We illustrate the properties of the derived representation of the multivariate signal with synthetic and real-world data examples, and conclude with some discussion of outstanding problems of oscillatory representations.

Show speakers

Sequential non-parametric bayesian inference: approaches and applications

Professor Stephen Roberts, University of Oxford, UK

Abstract

This talk will focus on Bayesian inference algorithms built around the elegant formalism of non-parametric models, in particular Gaussian processes. It will firstly introduce Gaussian processes for time-series inference problems and extend this to consider the role of domain knowledge in the models. We show how intuitive extensions allow us to tackle many of the problems faced in time-series modelling, including forecasting, observation scheduling and changepoint detection. Examples are given from a variety of practical domains, including multi-sensor weather forecasting and astrophysical time-series modelling.

Show speakers

The study of atmospheric phenomena using seismic networks

Dr Michael Hedlin, Scripps Institution of Oceanography, University of California, San Diego, USA

Abstract

Although seismic networks have been used for decades to study earthquakes and probe the structure of the Earth’s interior, they also record atmospheric phenomena, presumably through the acoustic-to-seismic coupling phenomenon.  We have analyzed broadband seismic data from the 400-station USArray Transportable Array to create a catalogue of infrasonic sources or “skyquakes” in the western United States.  The network detected and located several hundred skyquakes each year, many of which were not observed by regional infrasonic arrays likely due to the effects of wind noise on infrasonic microphones.  A large-scale study of the detection statistics of these events demonstrates the influence of seasonal reversals of zonal stratospheric winds on infrasonic propagation.  We use well-constrained explosions from the catalog to test propagation algorithms and 3D atmospheric velocity models.  The seismic waveforms reveal in unprecedented detail the spread of the infrasound wavefield across the Earth’s surface within 1000 km of the source, including the penetration of sound into predicted geometric shadow zones.  The seismic waveforms also consistently show long-lived packets of energy from these impulsive atmospheric sources.  Infrasonic ray trace modelling of the observed arrival times may suggest that both the sound penetration and the extended duration of the signal packets are due to interaction of the infrasound wavefield with atmospheric internal gravity waves.

Show speakers
Signal processing and inference for the physical sciences The Royal Society, London 6-9 Carlton House Terrace London SW1Y 5AG UK