Hessian-based Analysis of Large Batch Training and Robustness to Adversaries

Amir Gholaminejad Deep Learning, Optimization, Theoretical ML

Large batch size training of Neural Networks has been shown to incur accuracy loss when trained with the current methods. The exact underlying reasons for this are still not completely understood. Here, we study large batch size training through the lens of the Hessian operator and robust optimization. In particular, we perform a Hessian based study to analyze exactly how the landscape of the loss function changes when training with large batch size. We compute the true Hes- sian spectrum, without approximation, by back-propagating the second derivative. Extensive experiments on multiple networks show that saddle-points are not the cause for generalization gap of large batch size training, and the results consistently show that large batch converges to points with noticeably more…

Authors: Zhewei Yao, Amir Gholaminejad, Michael Mahoney

MLPerf: SPEC for ML

David Patterson Deep Learning, News, Open Source, Optimization, Reinforcement Learning, Systems, Uncategorized 0 Comments

The RISE Lab at UC Berkeley today joins Baidu, Google, Harvard University, and Stanford University to announce a new benchmark suite for machine learning called MLPerf at the O’Reilly AI conference in New York City (see https://mlperf.org/). The MLPerf effort aims to build a common set of benchmarks that enables the machine learning (ML) field to measure system performance eventually for both training and inference from mobile devices to cloud services. We believe that a widely accepted benchmark suite will benefit the entire community, including researchers, developers, builders of machine learning frameworks, cloud service providers, hardware manufacturers, application providers, and end users. Historical Inspiration. We are motivated in part by the System Performance Evaluation Cooperative (SPEC) benchmark for general-purpose computing that drove rapid,…

Iterative methods for solving factorized linear systems

Aaditya Ramdas Optimization

Stochastic iterative algorithms such as the Kaczmarz and Gauss-Seidel methods have gained recent attention because of their speed, simplicity, and the ability to approximately solve large-scale linear systems of equations without needing to access the entire matrix. In this work, we consider the setting where we wish to solve a linear system in a large matrix X that is stored in a factorized form, X = UV; this setting either arises naturally in many applications or may be imposed when working with large low-rank datasets for reasons of space required for storage. We propose a variant of the randomized Kaczmarz method for such systems that takes advantage of the factored form, and avoids computing X. We prove an exponential convergence more…

Authors: Aaditya Ramdas, Anna Ma, Deanna Needell