RISE Seminar: Strategies for integrating people and machine learning in online systems

September 21, 2017

When/Where:

Thursday 9/21 12:30-1:30 Wozniak Lounge, Soda Hall, UC Berkeley

Abstract:

Clara Labs is an email-based scheduling service for busy people. Simply CC Clara on an email to a person you want to meet with, and we’ll handle the back and forth game of email-tag for you in accordance with your preferences. To build a robust and accurate system that gracefully handles nuanced requests, we’ve combined machine learning (ML) with a distributed human labor force. This system enables a single person to schedule consistently for an unbounded number of customers, regardless of worker location or lack of a priori customer context.

A partially-automated system has clear benefits, such as increased accuracy and decreased cost (i.e., increased scalability). Further, human input to the system leads to new annotations for retraining algorithms. We’ve also found that there are great advantages to vertically integrating the ML annotation process directly with the product, e.g., the fidelity of labelled data increases when the annotator understands what actions will be derived directly from their work.

Despite these advantages, there are several distinct challenges to building such a system: annotators are noisy and may be biased by bad ML predictions (if displayed), there tends to be an inverse relationship between speed of data entry and annotator accuracy, and the learning curve for using a unique data-entry system may be high. In fact, simply measuring accuracy in the system may be challenging depending on time and cost constraints.

In this talk we’ll discuss incentives and algorithms for increasing both the accuracy and speed of human operators, measuring their performance, strategies for dealing with task ambiguity, and tricks for building an effective ramping system to onboard workers. These topics will be covered in the context of bounded time and cost resource constraints. We will further discuss the “automation spectrum,” i.e., the automation subtasks that can be surfaced to people and how they can be leveraged for progressive cost and speed gains over time.

Bio:

Jason Laska leads the machine learning efforts at Clara Labs. Previously, Jason spearheaded the computer vision program at Dropcam (acquired by Google in 2014), developing large scale online vision systems for motion and activity recognition in the product. Jason holds a PhD in electrical engineering from Rice University, where he focused on inverse problems, dimensionality reduction, and optimization (notable projects include the “single-pixel-camera”, democratic projections, and binary stable embeddings). He briefly experimented in publishing as a cofounder of Rejecta Mathematica, a publication for previously rejected mathematical articles. Find further information, projects, and publications at https://www.linkedin.com/in/jasonlaska/.