ADA Lab @ UCSD
Note: This umbrella project webpage is now deprecated.
Please see the webpages of the active projects Cerebro and SortingHat.
|
Project Triptych
|
Overview
Triptych is an end-to-end model selection management system (MSMS) that aims to simplify
and accelerate the process of sourcing data/features and selecting ML models. Our guiding
principles are to exploit the semantics of the data and the ML task to the extent possible
to reduce work for the data scientist and reduce runtimes and costs. We apply these
principles to remove or mitigate different bottlenecks in this end-to-end process,
eventually unifying these components to yield an integrated ‘‘operating system’’ for ML
analytics tasks. Please refer to the ACM SIGMOD Record paper below for more details of
this vision.
Active Component Projects
|
Cerebro
Efficient and reproducible model selection on deep learning systems.
|
|
Morpheus
Integrating linear algebra and relational algebra to simplify feature engineering for ML.
|
|
SortingHat
ML schema inference and automatic data preparation.
|
Publications
Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches
Yuhao Zhang, Frank McQuillan, Nandish Jayaram, Nikhil Kak, Ekta Khanna, Orhan Kislal, Domino Valdano, and Arun Kumar
VLDB 2021 | Paper PDF | TechReport | Talk video | Code release
Towards A Polyglot Framework for Factorized ML
David Justo, Shaoqing Yi, Lukas Stadler, Nadia Polikarpova, and Arun Kumar
VLDB 2021 (Industrial Track) | Paper PDF | TechReport | Talk video | Code coming soon
The CNN Hip Accelerometer Posture (CHAP) Method for Classifying Sitting Patterns from Hip Accelerometers: A Validation Study
Mikael Anne Greenwood-Hickman, Supun Nakandala, Marta M. Jankowska, Fatima Tuz-Zahra, John Bellettiere, Jordan Carlson, Paul R. Hibbing, Jingjing Zou, Andrea Z. LaCroix, Arun Kumar, and Loki Natarajan
Medicine and Science in Sports and Exercise Journal, 2021 | Paper PDF coming soon | Code
Application of Convolutional Neural Network Algorithms for Advancing Sedentary and Activity Bout Classification
Supun Nakandala, Marta Jankowska, Fatima Tuz-Zahra, John Bellettiere, Jordan Carlson, Andrea LaCroix, Sheri Hartman, Dori Rosenberg, Jingjing Zou, Arun Kumar, and Loki Natarajan
Journal for the Measurement of Physical Behaviour, 2021 | Paper PDF and BibTeX | Code
Cerebro: A Layered Data Platform for Scalable Deep Learning
Arun Kumar, Supun Nakandala, Yuhao Zhang, Side Li, Advitya Gemawat, and Kabir Nagrecha
CIDR 2021 (Vision paper) | Paper PDF and BibTeX | Talk video
Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent
Fengan Li, Lingjiao Chen, Yijing Zeng, Arun Kumar, Jeffrey Naughton, Jignesh Patel, and Xi Wu
ACM SIGMOD 2019 | Paper PDF | TechReport | Code on GitHub
Demonstration of Nimbus: Model-based Pricing for Machine Learning in a Data Marketplace
Lingjiao Chen, Hongyi Wang, Leshang Chen, Paraschos Koutris, and Arun Kumar
ACM SIGMOD 2019 Demo | Paper PDF | Video coming soon
To Join or Not to Join? Thinking Twice about Joins before Feature Selection
Arun Kumar, Jeffrey Naughton, Jignesh M. Patel, and Xiaojin Zhu
ACM SIGMOD 2016 |
Paper PDF and BibTeX |
TechReport |
Code and Data
Model Selection Management Systems: The Next Frontier of Advanced Analytics
Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M. Patel
ACM SIGMOD Record Dec 2015 Vision Track |
Paper PDF
Technical Reports
Past Projects
|
Hamlet
Exploiting database schema information to simplify data sourcing.
|
|
Nimbus
Enabling the first ML-aware cloud-based commodity market for the new black gold: training data.
|
|
SLAB
The first comprehensive benchmark comparison of scalable linear algebra systems.
|
|