This page is archived

Links to external sources may no longer work as intended. The content may not represent the latest thinking in this area or the Society’s current position on the topic.

How can a computer understand what is happening in a video?

22 November 2017 18:30 - 19:30
3d rendering of human  brain on technology background  represent artificial intelligence and cyber space concept


2017 Milner Award Lecture given by Professor Andrew Zisserman FRS.

How can a computer recognise people and what they are doing and saying in a video stream? The answer is by learning, and learning can take many different forms. 

One form is known as 'strong supervision': this is when a computer is shown many (thousands) of examples of a person or the action they are doing, and from this it learns a model to classify the video.  Another form of learning is known as 'weak' or 'self-supervision': this is when the computer learns directly from the structure of a video stream.

This lecture explains how both forms of supervision can be used to train neural networks using deep learning. It is illustrated throughout with examples including: recognising people by their faces, recognising human actions, automated lip reading, and using both sound and images in concord for training.

The Award 

The Royal Society Milner Award, kindly supported by Microsoft Research, is given annually for outstanding achievement in computer science by a European researcher. 

The award replaces the Royal Society and Académie des sciences Microsoft Award and is named in honour of Professor Robin Milner FRS (1934-2010), a pioneer in computer science.

Professor Andrew Zisserman FRS was awarded the 2017 Milner Award in recognition of his exceptional achievements in computer programming which includes work on computational theory and commercial systems for geometrical images. 

For all enquiries, please contact the Scientific Programmes team.