Machine learning and big scientific data
Professor Tony Hey CBE FREng, Science and Technology Facilities Council, UKRI, UK
There is now broad recognition within the scientific community that the ongoing deluge of scientific data is fundamentally transforming academic research. Turing Award winner Jim Gray referred to this revolution as 'The Fourth Paradigm: Data Intensive Scientific Discovery'. Researchers now need new tools and technologies to manipulate, analyse, visualise, and manage the vast amounts of research data being generated at the national large-scale experimental facilities. In particular, machine learning technologies are fast becoming a pivotal and indispensable component in modern science, from powering discovery of modern materials to helping us handling large-scale imagery data from microscopes and satellites. Despite these advances, the science community lacks a methodical way of assessing and quantifying different machine learning ecosystems applied to data-intensive scientific applications. There has so far been little effort to construct a coherent, inclusive and easy to use benchmark suite targeted at the ‘scientific machine learning’ (SciML) community. Such a suite would enable an assessment of the performance of different machine learning models, applied to a range of scientific applications running on different hardware architectures – such as GPUs and TPUs – and using different machine learning frameworks such as PyTorch and TensorFlow. In this paper, Professor Hey will outline his approach for constructing such a 'SciML benchmark suite' that covers multiple scientific domains and different machine learning challenges. The output of the benchmarks will cover a number of metrics, not only the runtime performance, but also metrics such as energy usage, and training and inference performance. Professor Hey will present some initial results for some of these SciML benchmarks.
Iterative linear algebra in the exascale era
Dr Erin Carson, Charles University, Czech Republic
Iterative methods for solving linear algebra problems are ubiquitous throughout scientific and data analysis applications and are often the most expensive computations in large-scale codes. Approaches to improving performance often involve algorithm modification to reduce data movement or the selective use of lower precision in computationally expensive parts. Such modifications can, however, result in drastically different numerical behavior in terms of convergence rate and accuracy due to finite precision errors. A clear, thorough understanding of how inexact computations affect numerical behaviour is thus imperative in balancing tradeoffs in practical settings.
In this talk, Dr Carson focuses on two general classes of iterative methods: Krylov subspace methods and iterative refinement. She presents bounds on attainable accuracy and convergence rate in finite precision variants of Krylov subspace methods designed for high performance and show how these bounds lead to adaptive approaches that are both efficient and accurate. Then, motivated by recent trends in multiprecision hardware, she presents new forward and backward error bounds for general iterative refinement using three precisions. The analysis suggests that if half precision is implemented efficiently, it is possible to solve certain linear systems up to twice as fast and to greater accuracy.
As we push toward exascale level computing and beyond, designing efficient, accurate algorithms for emerging architectures and applications is of utmost importance. Dr Carson finishes by discussing extensions in new applications areas and the broader challenge of understanding what increasingly large problem sizes will mean for finite precision computation both in theory and practice.
Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ODEs
Professor Steve Furber CBE FREng FRS, The University of Manchester, UK
There is increasing interest from users of large-scale computation in smaller and/or simpler arithmetic types. The main reasons for this are energy efficiency and memory footprint/bandwidth. However, the outcome of a simple replacement with lower precision types is increased numerical errors in the results. The practical effects of this range from unimportant to intolerable. Professor Furber will describe some approaches for reducing the errors of lower-precision fixed-point types and arithmetic relative to IEEE double-precision floating-point. He will give detailed examples in an important domain for numerical computation in neuroscience simulations: the solution of Ordinary Differential Equations (ODEs). He will look at two common model types and demonstrate that rounding has an important role in producing improved precision of spike timing from explicit ODE solution algorithms. In particular the group finds that stochastic rounding consistently provides a smaller error magnitude - in some cases by a large margin - compared to single-precision floating-point and fixed-point with rounding-to-nearest across a range of Izhikevich neuron types and ODE solver algorithms. We also consider simpler and computationally much cheaper alternatives to full stochastic rounding of all operations, inspired by the concept of 'dither' that is a widely understood mechanism for providing resolution below the LSB in digital audio, image and video processing. Professor Furber will discuss where these alternatives are are likely to work well and suggest a hypothesis for the cases where they will not be as effective. These results will have implications for the solution of ODEs in all subject areas, and should also be directly relevant to the huge range of practical problems that are modelled as Partial Differential Equations (PDEs).
Rethinking deep learning: architectures and algorithms
Professor George Anthony Constantinides, Imperial College London, UK
Professor Constantinides will consider the problem of efficient inference using deep neural networks. Deep neural networks are currently a key driver for innovation in both numerical algorithms and architecture, and are likely to form an important workload for computational science in the future. While algorithms and architectures for these computations are often developed independently, this talk will argue for a holistic approach. In particular, the notion of application efficiency needs careful analysis in the context of new architectures. Given the importance of specialised architectures in the future, Professor Constantinides will focus on custom neural network accelerator design. He will define a notion of computation general enough to encompass both the typical design specification of a computation performed by a deep neural network and its hardware implementation. This will allow us to explore - and make precise - some of the links between neural network design methods and hardware design methods. Bridging the gap between specification and implementation requires us to grapple with questions of approximation, which he will formalise and explore opportunities to exploit. This will raise questions about the appropriate network topologies, finite precision data-types, and design/compilation processes for such architectures and algorithms, and how these concepts combine to produce efficient inference engines. Some new results will be presented on this topic, providing partial answers to these questions, and we will explore fruitful avenues for future research.
Reduced numerical precision and imprecise computing for ultra-accurate next-generation weather and climate models
Professor Tim Palmer CBE FRS, University of Oxford, UK
Developing reliable weather and climate models must surely rank amongst the most important tasks to help society become resilient to the changing nature of weather and climate extremes. With such models we can determine the type of infrastructure investment needed to protect against the worst weather extremes, and can prepare for specific extremes well in advance of their occurrence.
Weather and climate models are based on numerical representations of the underlying nonlinear partial differential equations of fluid flow. However, they do not conform to the usual paradigm in numerical analysis: you give me a problem you want solved to a certain accuracy and I will give you an algorithm to achieve this. In particular, with current supercomputers, we are unable to solve the underpinning equations to the accuracy we would like. As a result, models exhibit pervasive biases against observations. These biases can be as large as the signals we are trying to simulate or predict. For climate modelling the numerical paradigm instead becomes this: for a given computational resource, what algorithms can produce the most accurate weather and climate simulations? Professor Palmer argues for an oxymoron: that to maximise accuracy we need to abandon the use of 64-bit precision where it is not warranted. Examples are given of the use of 16-bit numerics for large parts of the model code. The computational savings so made can be reinvested to increase model resolution and thereby increase model accuracy. In assessing the relative degradation of model performance at fixed resolution with reduced-precision numerics, the group utilises the notion of stochastic parametrisation as a representation of model uncertainty. Such stochasticity can itself reduce model systematic error. The benefit of stochasticity suggests a role for low-energy non-deterministic chips in future HPC.