In the first of our series of blogposts about machine learning in research, we heard from Dr Neale Gibson, who told us about how machine learning is being put to use in astronomy research. But what is the view from the data science side? How is machine learning becoming increasingly intertwined with a range of research fields?
To find out more, we talked to Sofia Olhede, Professor of Statistics and Honorary Professor of Computer Science at UCL, and one of the machine learning policy project’s working group members, about why she is interested in machine learning, and the opportunities for machine learning in a range of research fields.
How do you use machine learning in your day-to-day work?
I work at the interface of computer science and statistics. So, while some of my work is theoretically-focussed – for example, trying to determine the limits of what we can understand from a given data type – other parts of my work involve helping other scientific fields answer questions about their data.
Data is all about patterns, but not all patterns in data are obvious when you observe them. While some questions are best answered by using traditional statistical approaches, machine learning is a very convenient tool for discovering and quantifying the structure of these patterns.
In my work, this often means using unsupervised learning techniques. Unsupervised learning may cluster data into groups, which helps to uncover patterns that we couldn’t see before.
What excites you about this work?
I think I was always fascinated by what we could learn or understand from observing natural phenomena. This means I am very much motivated by application problems, such as understanding how our perception of pain changes with age. If we can quantify this – making our understanding solid and well-informed – then we can ultimately aim to treat people more effectively.
Being a data scientist is great, because it means you can play in many other scientific disciplines’ backyards!
Which other fields have you worked with? How do you apply machine learning methods to these fields?
I have long-standing collaborations with different branches of science: I have collaborated with oceanographers, geophysicists, ecologists and neuroscientists.
Similar types of structures or challenges appear in a variety of different research fields. For example, one project I work on is analysing data to improve medical treatments for prematurely born infants, and I use the same methods here as I do in a project analysing oceanographic data to help understand climate change. This is exciting, because it shows the universality of these quantitative mechanisms.
What are the key things to keep in mind when working across disciplines?
Modelling your data is really important. Models allow you to create new data with a defined set of properties; if a model is carefully put together, and well built, you can make new data that looks so realistic that experts in the field can believe it comes from real observations. They can seem sort of magic!
For me, creating the model is one of the most exciting aspects of the analytical process. This takes hours of intense conversations, where you need to combine an understanding of the data at hand with your intuition, to create a model that describes what is going on in a simplified way – often simplified down into a few squiggles on pieces of paper!
Creating this common description of the phenomenon we are trying to understand is usually the biggest challenge in these projects. Understanding all the different facets of the data is important, and happens as a gradual process, often by writing down equations related to the data. The context of how a dataset is collected is key to this, because this helps you to understand how to model its structure.
Without this context-specific knowledge, it can be harder to create an accurate model. With enough data, some machine learning processes can get good results without needing this context, but ideally you need to combine expert knowledge about the subject area with expertise in analysing data. The former helps to identify the “right” questions to ask, then the latter helps find the answers.
Are we likely to see more disciplines making more use of advanced techniques like machine learning?
Many research disciplines are getting increasingly complex over time, and this increases the demand for new analytical approaches. As people with quantitative skills move into disciplines, their quantitative frameworks become embedded in these application areas, and can create new fields in which the mathematical concepts and application area are almost impossible to distinguish. We’ve seen this already in biostatistics and statistical genetics. As data science makes more areas more empirical, I expect we will see further developments in this vein.
Over the next few weeks, we’ll be writing more on In Verba about how machine learning is being used across a range of research areas. For more information about the machine learning policy project, check out our website.