To watch the keynote sessions live please go to the livestream. Keynotes from previous days will be available here.

Prof. Andrew Zisserman

University of Oxford

How can we learn sign language by watching sign-interpreted TV?

(Monday 22nd November: 11:05 - 12:05 GMT)

Sign languages are visual languages that have evolved in deaf communities. For many years, computer vision researchers have worked on systems to recognize individual signs, and towards translating sign languages to spoken languages. Now, with the progress in human action and pose recognition and in machine translation, due to deep learning, many of the tools are in place. However, the lack of large scale datasets for training and evaluation is holding back research in this area. This talk will describe recent work in developing automatic and scalable methods to annotate continuous signing videos and build a large-scale dataset. The key idea is to build on sign interpreted TV broadcasts that have weakly-aligned subtitles. These enable sign spotting and high quality annotation of the video data. Three methods will be covered: using mouthing cues from signers to spot signs; using visual sign language dictionaries to spot signs; and using the subtitle content to determine their alignment. Taken together, these methods are used to produce the new, large-scale, BBC-Oxford British Sign Language Dataset with over a thousand hours of sign interpreted broadcast footage and millions of sign instance annotations. The dataset is available for download for non-commercial research.


Andrew Zisserman, FRS, is the Professor of Computer Vision Engineering and the Royal Society Research Professor in the Department of Engineering Science at the University of Oxford, and the founder of the Visual Geometry Group (VGG). Since 2014, he is also affiliated with DeepMind. Prof. Zisserman is one of the principal architects of modern computer vision, and is known internationally for his pioneering work in multiple view geometry, visual recognition, and large scale retrieval in images and video. His papers have won many best paper awards at international conferences (such as CVPR, ICCV and BMVC), including the IEEE Marr prize three times. Prof. Zisserman received a Technical Emmy Award (with the company 2d3) for camera-tracking software in 2002. He also received test of time awards - Longuet-Higgins and Helmholtz prizes - on four occasions, the BMVA Distinguished Fellowship (2008), the IEEE PAMI Distinguished Researcher award (2013), Royal Society Milner Award (2017).

Transforming Drug Discovery using Digital Biology

(Tuesday 23rd November: 16:00 - 17:00 GMT)

Modern medicine has given us effective tools to treat some of the most significant and burdensome diseases. At the same time, it is becoming consistently more challenging and more expensive to develop new therapeutics. A key factor in this trend is that the drug development process involves multiple steps, each of which involves a complex and protracted experiment that often fails. We believe that, for many of these phases, it is possible to develop machine learning models to help predict the outcome of these experiments, and that those models, while inevitably imperfect, can outperform predictions based on traditional heuristics. To achieve this goal, we are bringing together high-quality data from human cohorts, while also developing cutting edge methods in high throughput biology and chemistry that can produce massive amounts of in vitro data relevant to human disease and therapeutic interventions. Those are then used to train machine learning models that make predictions about novel targets, coherent patient segments, and the clinical effect of molecules. Our ultimate goal is to develop a new approach to drug development that uses high-quality data and ML models to design novel, safe, and effective therapies that help more people, faster, and at a lower cost.


Daphne Koller is CEO and Founder of insitro, a machine-learning enabled drug discovery company. Daphne is also co-founder of Engageli, was the Rajeev Motwani Professor of Computer Science at Stanford University, where she served on the faculty for 18 years, the co-CEO and President of Coursera, and the Chief Computing Officer of Calico, an Alphabet company in the healthcare space. She is the author of over 200 refereed publications appearing in venues such as Science, Cell, and Nature Genetics. Daphne was recognized as one of TIME Magazine’s 100 most influential people in 2012. She received the MacArthur Foundation Fellowship in 2004 and the ACM Prize in Computing in 2008. She was inducted into the National Academy of Engineering in 2011 and elected a fellow of the American Association for Artificial Intelligence in 2004, the American Academy of Arts and Sciences in 2014, and the International Society of Computational Biology in 2017.

Prof. Katerina Fragkiadaki

Carnegie Mellon University

Modular 3D neural scene representations for visuomotor control and language grounding

(Wednesday 24th November: 14:30 - 15:30 GMT)

Current state-of-the-art perception models localize rare object categories in images, yet often miss basic facts that a two-year-old has mastered: that objects have 3D extent, they persist over time despite changes in the camera view, they do not 3D intersect, and others. We will discuss models that learn to map 2D and 2.5D images and videos into amodal completed 3D feature maps of the scene and the objects in it by predicting views. We will show the proposed models learn object permanence, have objects emerge in 3D without human annotations, can ground language in 3D visual simulations, and learn intuitive physics and controllers that generalize across scene arrangements and camera configurations. In this way, the proposed world-centric scene representations overcome many limitations of image-centric representations for video understanding, model learning and language grounding.


Katerina Fragkiadaki is an Assistant Professor in the Machine Learning Department in Carnegie Mellon University. She received her Ph.D. from University of Pennsylvania and was a postdoctoral fellow in UC Berkeley and Google research after that. Her work is on learning visual representations with little supervision and on combining spatial reasoning in deep visual learning. Her group develops algorithms for mobile computer vision, learning of physics and common sense for agents that move around and interact with the world. Her work has been awarded with a best Ph.D. thesis award, an NSF CAREER award, AFOSR YIP award, a DARPA Young Investigator award, Google, TRI, Amazon and Sony faculty research awards.

Prof. Davide Scaramuzza

University of Zürich

Vision-based Agile Robotics, from Frames to Events

(Thursday 25th November: 14:30 - 15:30 GMT)

Autonomous mobile robots will soon play a major role in search-and-rescue, delivery, and inspection missions, where a fast response is crucial. However, their speed and maneuverability are still far from those of birds and human pilots. Agile flight is particularly important: since drone battery life is usually limited to 20-30 minutes, drones need to fly faster to cover longer distances. However, to do so, they need faster sensors and algorithms. Human pilots take years to learn the skills to navigate drones. What does it take to make drones navigate as good or even better than human pilots? Autonomous, agile navigation through unknown, GPS-denied environments poses several challenges for robotics research in terms of perception, planning, learning, and control. In this talk, I will show how the combination of both model-based and machine learning methods united with the power of new, low-latency sensors, such as event cameras, can allow drones to achieve unprecedented speed and robustness by relying solely on onboard computing.


Davide Scaramuzza is a Professor of Robotics and Perception at the University of Zurich, where he does research at the intersection of robotics, computer vision, and machine learning, aiming to enable autonomous, agile navigation of micro drones using both standard and neuromorphic event-based cameras. He pioneered autonomous, vision-based navigation of drones, which inspired the NASA Mars helicopter and has served as a consultant for the United Nations’ International Atomic Energy Agency’s Fukushima Action Plan on Nuclear Safety. For his research contributions, he won prestigious awards, such as a European Research Council (ERC) Consolidator Grant, the IEEE Robotics and Automation Society Early Career Award, an SNF-ERC Starting Grant, a Google Research Award, and several paper awards. In 2015, he co-founded Zurich-Eye, today Facebook Zurich, which developed the world leading virtual-reality headset, Oculus Quest. Many aspects of his research have been prominently featured in wider media, such as The New York Times, BBC News, Forbes, Discovery Channel.