Dissertation Talk with Alvin Wan: Efficient 2.5D Deep Learning for Capturing and Viewing Reality; 12:00 PM, Thursday, April 28

April 28, 2022

Title: Efficient 2.5D Deep Learning for Capturing and Viewing Reality
Speaker: Alvin Wan
Advisor: Joseph E. Gonzalez
Date: Thursday, April 28, 2022
Time: 12:00 – 1:00 PM PST

This event is hybrid, held in-person and virtually.
Location (in person): 8019 Berkeley Way West
Location (zoom): https://berkeley.zoom.us/j/96758340287?pwd=TUpHZmNUUTNkbVcxR1JzdENHU3NEUT09

Abstract:
Applications of deep learning in computer vision are growing in large strides, evolving from fun face filters to entire products that fundamentally rely on 3d understanding: namely, products like self-driving cars, virtual reality, and augmented reality. In these new domains, there are two critical requirements for computer vision algorithms: 1) efficiency – the need for real-time, power-efficient, highly-accurate models that run on-device and 2) 3D – processing and generation of the 3D world as viewed from a camera. These multiple concerns yield an exponentially large space of possible models, and this space (a) stumps manual model designs, which produce suboptimal solutions and (b) spawns automatic design methods that consume GPU years for a single model. Furthermore, many 3D models are realistically only 2.5D: these models perform well when viewed in the camera space but simultaneously produce inconsistencies when viewed in 3D.

In this talk, I will discuss progress in both areas: designing efficient ways of automatically building efficient neural networks, reducing design cost by 2 orders of magnitude to produce state-of-the-art efficient models; and leveraging 2.5D models – for example, boosting semantic segmentation to high precision (e.g., distinguishing the individual rungs on a bike wheel) by leveraging predicted depth differences. Finally, I will conclude by discussing ongoing work that is uniquely enabled by the union of these two bodies of work, enabling reality capture from everyday consumer cameras and real-time novel view synthesis – efficient enough to run on a phone or a standalone virtual reality headset.