Random Projection Design for Scalable Implicit Smoothing of Randomly Observed Stochastic Processes

Aaditya Ramdas Intelligent, Statistical Methodology, Theoretical ML

Standard methods for multi-variate time series analysis are hampered by sampling at random timestamps, long range dependencies , and the scale of the data. In this paper we present a novel estimator for cross-covariance of randomly observed time series which identifies the dynamics of an unobserved stochastic process. We analyze the statistical properties of our estimator without the assumption that observation timestamps are independent from the process of interest and show that our solution does not suffer from the corresponding issues affecting standard estimators for cross-covariance. We implement and evaluate our statistically sound and scalable approach in the distributed setting using Apache Spark and demonstrate its ability to identify interactions between processes on simulations and financial data with tens of millions of samples. Pdf: Aistats_camera_ready

Authors: 66, Joseph Gonzalez, Evan Sparks, Alexandre M. Bayen

The p-filter: multi-layer FDR control for grouped hypotheses

Aaditya Ramdas Statistical Methodology

In many practical applications of multiple hypothesis testing using the False Discovery Rate (FDR), the given hypotheses can be naturally partitioned into groups, and one may not only want to control the number of false discoveries (wrongly rejected null hypotheses), but also the number of falsely discovered groups of hypotheses (we say a group is falsely discovered if at least one hypothesis within that group is rejected, when in reality the group contains only nulls). In this paper, we introduce the p-filter, a procedure which unifies and generalizes the standard FDR procedure by Benjamini and Hochberg and global null testing procedure by Simes. We first prove that our proposed method can simultaneously control the overall FDR at the finest level more…