ADA Lab @ UCSD

Peer-reviewed Publications

  • How do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses
    Vraj Shah, Thomas Parashos, and Arun Kumar
    VLDB 2024 | Paper PDF | TechReport | Code and Data coming soon

  • Low movement, deep-learned sitting patterns, and sedentary behavior in the International Study of Childhood Obesity, Lifestyle, and the Environment (ISCOLE)
    Paul R. Hibbing et al. (12 authors)
    International Journal of Obesity 2023 | Paper PDF

  • Database-Aware ASR Error Correction for Speech-to-SQL Parsing
    Yutong Shao, Arun Kumar, and Ndapandula Nakashole
    IEEE ICASSP 2023 | Paper PDF

  • CHAP-child: An open source method for estimating sit-to-stand transitions and sedentary bout patterns from hip accelerometers among children
    Jordan A. Carlson et al. (15 authors)
    International Journal of Behavioral Nutrition and Physical Activity 2022 | Paper PDF | Code, Models, and Documentation

  • Structured Data Representation in Natural Language Interfaces
    Yutong Shao, Arun Kumar, and Ndapandula Nakashole
    IEEE Data Engineering Bulletin 2022 (Invited) | Paper PDF

  • CHAP-Adult: A Reliable and Valid Algorithm to Classify Sitting and Measure Sitting Patterns Using Data from Hip-Worn Accelerometers in Adults Aged 35+
    John Bellettiere et al. (14 authors)
    Journal for the Measurement of Physical Behaviour 2022 | PDF | Code, Models, and Documentation

  • VLDB Scalable Data Science Category: The Inaugural Year
    Arun Kumar, Alon Halevy, and Nesime Tatbul
    ACM SIGMOD Record 2022 | Paper PDF

  • Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets
    Supun Nakandala and Arun Kumar
    SIGMOD 2022 | Paper PDF | TechReport | Code Release

  • VLDB Panel Summary: “The Future of Data(base) Education: Is the Cow Book Dead?”
    Zachary Ives, Johannes Gehrke, Jana Giceva, Arun Kumar, and Rachel Pottinger
    ACM SIGMOD Record 2021 | Paper PDF

  • Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches
    Yuhao Zhang, Frank McQuillan, Nandish Jayaram, Nikhil Kak, Ekta Khanna, Orhan Kislal, Domino Valdano, and Arun Kumar
    VLDB 2021 | Paper PDF | TechReport | Talk video | Code release

  • Intermittent Human-in-the-Loop Model Selection using Cerebro: A Demonstration
    Liangde Li, Supun Nakandala, and Arun Kumar
    VLDB 2021 Demo | Paper PDF | TechReport | Video

  • Towards A Polyglot Framework for Factorized ML
    David Justo, Shaoqing Yi, Lukas Stadler, Nadia Polikarpova, and Arun Kumar
    VLDB 2021 (Industrial Track) | Paper PDF | TechReport | Talk video | Code coming soon

  • Automation of Data Prep, ML, and Data Science: New Cure or Snake Oil?
    Arun Kumar
    ACM SIGMOD 2021 Panel | Paper PDF

  • The CNN Hip Accelerometer Posture (CHAP) Method for Classifying Sitting Patterns from Hip Accelerometers: A Validation Study
    Mikael Anne Greenwood-Hickman, Supun Nakandala, Marta M. Jankowska, Fatima Tuz-Zahra, John Bellettiere, Jordan Carlson, Paul R. Hibbing, Jingjing Zou, Andrea Z. LaCroix, Arun Kumar, and Loki Natarajan
    Medicine and Science in Sports and Exercise Journal, 2021 | Paper PDF | Code, Models, and Documentation

  • Application of Convolutional Neural Network Algorithms for Advancing Sedentary and Activity Bout Classification
    Supun Nakandala, Marta Jankowska, Fatima Tuz-Zahra, John Bellettiere, Jordan Carlson, Andrea LaCroix, Sheri Hartman, Dori Rosenberg, Jingjing Zou, Arun Kumar, and Loki Natarajan
    Journal for the Measurement of Physical Behaviour, 2021 | Paper PDF and BibTeX | Code, Models, and Documentation

  • Cerebro: A Layered Data Platform for Scalable Deep Learning
    Arun Kumar, Supun Nakandala, Yuhao Zhang, Side Li, Advitya Gemawat, and Kabir Nagrecha
    CIDR 2021 (Vision paper) | Paper PDF and BibTeX | Talk video

  • Understanding and Benchmarking the Impact of GDPR on Database Systems
    Supreeth Shastri, Vinay Banakar, Melissa Wasserman, Arun Kumar, and Vijay Chidambaram
    VLDB 2020 | Paper PDF | TechReport | Webpage | Talk videos: Youtube Bilibili

  • Query Optimization for Faster Deep CNN Explanations
    Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou
    ACM SIGMOD Record 2020 | Paper PDF and BibTeX
    ACM SIGMOD Research Highlights Award

  • Incremental and Approximate Computations for Accelerating Deep CNN Inference
    Supun Nakandala, Kabir Nagrecha, Arun Kumar, and Yannis Papakonstantinou
    ACM TODS 2020 | Paper PDF and BibTeX
    Invited Paper

  • Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations
    Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou
    ACM SIGMOD 2019 | Paper PDF and BibTeX | TechReport | Blog post | Talk Video
    Honorable Mention for Best Paper Award

  • Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra
    Side Li, Lingjiao Chen, and Arun Kumar
    ACM SIGMOD 2019 | Paper PDF and BibTeX | Code and Data on Github

  • Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent
    Fengan Li, Lingjiao Chen, Yijing Zeng, Arun Kumar, Jeffrey Naughton, Jignesh Patel, and Xi Wu
    ACM SIGMOD 2019 | Paper PDF | TechReport | Code on GitHub

  • Model-based Pricing for Machine Learning in a Data Marketplace
    Lingjiao Chen, Paraschos Koutris, and Arun Kumar
    ACM SIGMOD 2019 | Paper PDF | TechReport

  • Cerebro: Efficient and Reproducible Model Selection on Deep Learning Systems
    Supun Nakandala, Yuhao Zhang, and Arun Kumar
    ACM SIGMOD 2019 DEEM Workshop | Paper PDF and BibTeX | TechReport | Blog post

  • Demonstration of SpeakQL: Speech-driven Multimodal Querying of Structured Data
    Vraj Shah, Side Li, Kevin Yang, Arun Kumar, and Lawrence Saul
    ACM SIGMOD 2019 Demo | Paper PDF and BibTeX | Video

  • Demonstration of Nimbus: Model-based Pricing for Machine Learning in a Data Marketplace
    Lingjiao Chen, Hongyi Wang, Leshang Chen, Paraschos Koutris, and Arun Kumar
    ACM SIGMOD 2019 Demo | Paper PDF | Video coming soon

  • Demonstration of Krypton: Optimized CNN Inference for Occlusion-based Deep CNN Explanations
    Allen Ordookhanians, Xin Li, Supun Nakandala, and Arun Kumar
    VLDB 2019 | Paper PDF and BibTeX | Video

  • Demonstration of Krypton: Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations
    Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou
    SysML 2019 Demo | Paper PDF | Video

  • Data Management in Machine Learning Systems
    Matthias Boehm, Arun Kumar, and Jun Yang
    Synthesis Lectures on Data Management, Morgan & Claypool Publishers (Book), 2019 | PDF | Order hard copy

  • Hierarchical and Distributed Machine Learning Inference Beyond the Edge
    Anthony Thomas, Yunhui Guo, Yeseong Kim, Baris Aksanli, Arun Kumar and Tajana Rosing
    IEEE ICNSC 2019 | Paper PDF

  • Predicting Eating Events in Free Living Individuals
    Jiayi Wang, Jiue-An Yang, Supun Nakandala, Arun Kumar and Marta M. Jankowska
    eScience 2019 Conference (Poster)

  • A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics
    Anthony Thomas and Arun Kumar
    VLDB 2018/2019 | Paper PDF | TechReport | Code and Data

  • In-RDBMS Hardware Acceleration of Advanced Analytics
    Divya Mahajan, Joon Kyung Kim, Jacob Sacks, Adel Ardalan, Arun Kumar, and Hadi Esmaeilzadeh
    VLDB 2018 | Paper PDF | Addendum

  • Materialization Trade-offs for Feature Transfer from Deep CNNs for Multimodal Data Analytics
    Supun Nakandala and Arun Kumar
    SysML 2018 Short paper/poster | Paper PDF

  • Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics
    Xi Wu, Fengan Li, Arun Kumar, Kamalika Chaudhuri, Somesh Jha, and Jeffrey Naughton
    ACM SIGMOD 2017 | Paper PDF | TechReport

  • SpeakQL: Towards Speech-driven Multi-modal Querying
    Dharmil Chandarana, Vraj Shah, Arun Kumar, and Lawrence Saul
    ACM SIGMOD 2017 HILDA Workshop | Paper PDF and BibTeX

  • Model-based Pricing: Do Not Pay for More than What You Learn!
    Lingjiao Chen, Paraschos Koutris, and Arun Kumar
    ACM SIGMOD 2017 DEEM Workshop | Paper PDF

  • Cerebro: A System to Manage Deep Learning for Relational Data Analytics
    Arun Kumar
    CIDR 2017 Abstract | Paper PDF

  • To Join or Not to Join? Thinking Twice about Joins before Feature Selection
    Arun Kumar, Jeffrey Naughton, Jignesh M. Patel, and Xiaojin Zhu
    ACM SIGMOD 2016 | Paper PDF and BibTeX| TechReport | Code and Data

  • Materialization Optimizations for Feature Selection Workloads
    Ce Zhang, Arun Kumar, and Christopher Re
    ACM TODS 2016 (Invited) | Paper PDF

  • Model Selection Management Systems: The Next Frontier of Advanced Analytics
    Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M. Patel
    ACM SIGMOD Record Dec 2015 Vision Track | Paper PDF

  • Demonstration of Santoku: Optimizing Machine Learning over Normalized Data
    Arun Kumar, Mona Jalal, Boqun Yan, Jeffrey Naughton, and Jignesh M. Patel
    VLDB 2015 Demo | Paper PDF | Code and Data

  • Learning Generalized Linear Models Over Normalized Data
    Arun Kumar, Jeffrey Naughton, and Jignesh M. Patel
    ACM SIGMOD 2015 | Paper PDF | Code and Data

  • Materialization Optimizations for Feature Selection Workloads
    Ce Zhang, Arun Kumar, and Christopher Re
    ACM SIGMOD 2014 | Paper PDF
    Best Paper Award; Invited to ACM TODS 2016

  • Distributed and Scalable PCA in the Cloud
    Arun Kumar, Nikos Karampatziakis, Paul Mineiro, Markus Weimer, and Vijay Narayanan
    NIPS BigLearn 2013 | Paper PDF

  • Feature Selection in Enterprise Analytics: A Demonstration using an R-based Data Analytics System
    Pradap Konda, Arun Kumar, Christopher Ré, and Vaishnavi Sashikanth
    VLDB 2013 Demo | Paper PDF

  • Hazy: Making it Easier to Build and Maintain Big-data Analytics
    Arun Kumar, Feng Niu, and Christopher Re
    ACM Queue 2013 | Article
    Invited to the Communications of the ACM March 2013

  • Brainwash: A Data System for Feature Engineering
    Michael Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael Cafarella, Arun Kumar, Feng Niu, Yongjoo Park, Christopher Re, and Ce Zhang
    CIDR 2013 Vision Track | Paper PDF

  • Towards a Unified Architecture for in-RDBMS Analytics
    Xixuan Feng*, Arun Kumar*, Benjamin Recht, and Christopher Re
    ACM SIGMOD 2012 | Paper PDF | TechReport | Code and Data

  • The MADlib Analytics Library or MAD Skills, the SQL
    Joseph M. Hellerstein, Christopher Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar
    VLDB 2012 Industrial Track | Paper PDF

Manuscripts and Articles

  • Arun Kumar's contribution to “Reminiscences on Influential Papers”
    Pinar Tozun
    ACM SIGMOD Record 2023 | Article PDF

  • Design and Evaluation of an SQL-Based Dialect for Spoken Querying
    Kyle Luoma and Arun Kumar
    TechReport

  • Hydra: A Data System for Large Multi-Model Deep Learning
    Kabir Nagrecha and Arun Kumar
    TechReport | Code release

  • Letter from the Rising Star Award Winner
    Arun Kumar
    IEEE Data Engineering Bulletin, June 2021 | PDF

  • Improving Feature Type Inference Accuracy of TFDV with SortingHat
    Vraj Shah, Kevin Yang, and Arun Kumar
    TechReport

  • ML/AI Systems and Applications: Is the SIGMOD/VLDB Community Losing Relevance?
    Arun Kumar
    Blog post on the official ACM SIGMOD Blog, 2018 | Webpage

  • Advice from PhD to Early Career
    Arun Kumar
    ACM SIGMOD 2018 New Researcher Symposium Talk | Slides

  • A Survey of the Existing Landscape of ML Systems
    Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M. Patel
    UW-Madison Technical Report TR1827 | PDF

Theses, and Dissertations

  • Simplifying Data Preparation for Machine Learning on Tabular Data
    Vraj Shah. PhD Dissertation. UC San Diego. 2022 | PDF

  • Query Optimizations for Deep Learning Systems
    Supun Nakandala. PhD Dissertation. UC San Diego. 2022 | PDF

  • Efficient Systems for Advanced Data Analytics
    Liangde Li. MS Thesis. UC San Diego. 2022 | PDF

  • Write once, rewrite everywhere: A Unified Framework for Factorized Machine Learning
    David Justo. MS Thesis. UC San Diego. 2019 | PDF

  • Learning Over Joins
    Arun Kumar. PhD Dissertation. UW-Madison. 2016 | PDF | Video of job talk at UCSD
    Wisconsin CS 2016 Graduate Student Research Award for best dissertation research