ADA Lab @ UCSD
|
Project Triptych
|
Overview
Triptych is an end-to-end model selection management system (MSMS) that aims to simplify
and accelerate the process of sourcing data/features and selecting ML models. Our guiding
principles are to exploit the semantics of the data and the ML task to the extent possible
to reduce work for the data scientist and reduce runtimes and costs. We apply these
principles to remove or mitigate different bottlenecks in this end-to-end process,
eventually unifying these components to yield an integrated ‘‘operating system’’ for ML
analytics tasks. Please refer to the ACM SIGMOD Record paper below for more details of
this vision.
Active Component Projects
|
Cerebro
Efficient and reproducible model selection on deep learning systems.
|
|
Morpheus
Integrating linear algebra and relational algebra to simplify feature engineering for ML.
|
|
SortingHat
ML schema inference and automatic data preparation.
|
Publications
Towards A Polyglot Framework for Factorized ML
David Justo, Shaoqing Yi, Lukas Stadler, Nadia Polikarpova, and Arun Kumar
VLDB 2021 (Industrial Track; to appear) | Paper PDF coming soon | TechReport | Code coming soon
Application of Convolutional Neural Network Algorithms for Advancing Sedentary and Activity Bout Classification
Supun Nakandala, Marta Jankowska, Fatima Tuz-Zahra, John Bellettiere, Jordan Carlson, Andrea LaCroix, Sheri Hartman, Dori Rosenberg, Jingjing Zou, Arun Kumar, and Loki Natarajan
Journal for the Measurement of Physical Behaviour | Paper PDF | Code
Cerebro: A Layered Data Platform for Scalable Deep Learning
Arun Kumar, Supun Nakandala, Yuhao Zhang, Side Li, Advitya Gemawat, and Kabir Nagrecha
CIDR 2021 (Vision paper) | Paper PDF | Talk video
Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent
Fengan Li, Lingjiao Chen, Yijing Zeng, Arun Kumar, Jeffrey Naughton, Jignesh Patel, and Xi Wu
ACM SIGMOD 2019 | Paper PDF | TechReport | Code on GitHub
Demonstration of Nimbus: Model-based Pricing for Machine Learning in a Data Marketplace
Lingjiao Chen, Hongyi Wang, Leshang Chen, Paraschos Koutris, and Arun Kumar
ACM SIGMOD 2019 Demo | Paper PDF | Video coming soon
To Join or Not to Join? Thinking Twice about Joins before Feature Selection
Arun Kumar, Jeffrey Naughton, Jignesh M. Patel, and Xiaojin Zhu
ACM SIGMOD 2016 |
Paper PDF |
TechReport |
Code and Data
Model Selection Management Systems: The Next Frontier of Advanced Analytics
Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M. Patel
ACM SIGMOD Record Dec 2015 Vision Track |
Paper PDF
Technical Reports
Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches
Yuhao Zhang, Arun Kumar, Frank McQuillan, Nandish Jayaram, Nikhil Kak, Ekta Khanna, Orhan Kislal, and Domino Valdano
Under submission | TechReport | Code release
Past Projects
|
Hamlet
Exploiting database schema information to simplify data sourcing.
|
|
Nimbus
Enabling the first ML-aware cloud-based commodity market for the new black gold: training data.
|
|
SLAB
The first comprehensive benchmark comparison of scalable linear algebra systems.
|
|