Convolutional Neural Networks (ConvNets) enable computers to excel on vision learning tasks such as image classification, object detection. Recently, faster inference on live data is becoming more and more important. From a system perspective, it means faster inference on each single, incoming data item (e.g. 1 image). Two main-stream distributed model serving methods – data parallelism and model parallelism – are not desirable here, because we cannot further split a single input data piece via data parallelism and model parallelism introduces huge communication overhead. To achieve low-latency, live data inference, we propose sensAI, a novel and generic approach that decouples a CNN model into disconnected subnets, each is responsible for predicting certain class(es). We call this new model distribution class parallelism. Experimental results show that, sensAI achieves 2-6x faster inference on single input data item with no accuracy loss.
Published On: March 4, 2020
Presented At/In: Workshop on MLOps Systems in MLSys 2020