Precog: Efficient ML Inference for IoT


Networked applications with heterogeneous sensors on the edge, popularly called the Internet of Things (IoT), are a growing source of data. Such applications now use machine learning (ML) to make streaming predictions. The current dominant approach to deploying ML in most IoT applications is monolithic–features from all sensors are collected in a centralized cloud-based tier to stitch the whole feature vector for ML inference. This approach has high communication costs, which wastes energy on the edge devices and often bottlenecks the network. In this work, we study an alternative approach that mitigates such issues by ‘‘pushing down’’ ML inference queries through a hierarchy of devices to the edge as much as possible. Our approach presents a new technical challenge of rewriting ML inference without significantly reducing prediction accuracy. We provide the first comprehensive characterization of several popular ML models in terms of their amenability to such push down rewrites based on their communication cost–computation cost-accuracy trade-offs. We introduce novel exact rewrite algorithms for some popular models that preserve accuracy. We also create novel approximate variants of other models that still offer high accuracy. Our rewrites reduce communication cost by up to 90%, while a real prototype on a common edge device in IoT networks shows that our techniques reduce energy use and latency by up to 67%.

Downloads (Paper, Code, Data, etc.)

  • Pushing Down Machine Learning Inference to the Edge in Heterogeneous Internet of Things Applications
    Anthony Thomas, Yunhui Guo, Yeseong Kim, Baris Aksanli, Arun Kumar, Tajana S. Rosing
    Under Submission | TechReport

  • Source code on GitHub (Python code for data precleaning and neural network implementations; data files for publicly available data)

Student Contact

Anthony H. Thomas: ahthomas [at] eng [dot] ucsd [dot] edu