Random Projection Design for Scalable Implicit Smoothing of Randomly Observed Stochastic Processes

Francois Belletti Intelligent, Statistical Methodology, Theoretical ML

Standard methods for multi-variate time series analysis are hampered by sampling at random timestamps, long range dependencies , and the scale of the data. In this paper we present a novel estimator for cross-covariance of randomly observed time series which identifies the dynamics of an unobserved stochastic process. We analyze the statistical properties of our estimator without the assumption that observation timestamps are independent from the process of interest and show that our solution does not suffer from the corresponding issues affecting standard estimators for cross-covariance. We implement and evaluate our statistically sound and scalable approach in the distributed setting using Apache Spark and demonstrate its ability to identify interactions between processes on simulations and financial data with tens of millions of samples.

Authors: Francois Belletti, Joseph Gonzalez, Evan Sparks, Alexandre M. Bayen