Advanced Data Analytics (ADA) Lab

Peer-reviewed Publications

  • Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers?
    Vraj Shah, Arun Kumar, and Xiaojin Zhu.
    VLDB 2018 (To appear) | Paper PDF (Coming soon) | TechReport | Code and Data

  • Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics
    Xi Wu, Fengan Li, Arun Kumar, Kamalika Chaudhuri, Somesh Jha, and Jeffrey Naughton
    ACM SIGMOD 2017 | Paper PDF | TechReport

  • SpeakQL: Towards Speech-driven Multi-modal Querying
    Dharmil Chandarana, Vraj Shah, Arun Kumar, and Lawrence Saul
    ACM SIGMOD 2017 HILDA Workshop | Paper PDF

  • Model-based Pricing: Do Not Pay for More than What You Learn!
    Lingjiao Chen, Paraschos Koutris, and Arun Kumar
    ACM SIGMOD 2017 DEEM Workshop | Paper PDF

  • Cerebro: A System to Manage Deep Learning for Relational Data Analytics
    Arun Kumar
    CIDR 2017 (Abstract) | Paper PDF

  • To Join or Not to Join? Thinking Twice about Joins before Feature Selection
    Arun Kumar, Jeffrey Naughton, Jignesh M. Patel, and Xiaojin Zhu
    ACM SIGMOD 2016 | Paper PDF | TechReport | Code and Data

  • Model Selection Management Systems: The Next Frontier of Advanced Analytics
    Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M. Patel
    ACM SIGMOD Record Dec 2015 (Vision Track) | Paper PDF

  • Demonstration of Santoku: Optimizing Machine Learning over Normalized Data
    Arun Kumar, Mona Jalal, Boqun Yan, Jeffrey Naughton, and Jignesh M. Patel
    VLDB 2015 (Demo) | Paper PDF | Code and Data

  • Learning Generalized Linear Models Over Normalized Data
    Arun Kumar, Jeffrey Naughton, and Jignesh M. Patel
    ACM SIGMOD 2015 | Paper PDF | Code and Data

  • Materialization Optimizations for Feature Selection Workloads
    Ce Zhang, Arun Kumar, and Christopher Re
    ACM SIGMOD 2014 | Paper PDF
    Best Paper Award; Invited to ACM TODS 2016

  • Distributed and Scalable PCA in the Cloud
    Arun Kumar, Nikos Karampatziakis, Paul Mineiro, Markus Weimer, and Vijay Narayanan
    NIPS BigLearn 2013 | Paper PDF

  • Feature Selection in Enterprise Analytics: A Demonstration using an R-based Data Analytics System
    Pradap Konda, Arun Kumar, Christopher Ré, and Vaishnavi Sashikanth
    VLDB 2013 (Demo) | Paper PDF

  • Hazy: Making it Easier to Build and Maintain Big-data Analytics
    Arun Kumar, Feng Niu, and Christopher Re
    ACM Queue 2013 | Article
    Invited to the Communications of the ACM March 2013

  • Brainwash: A Data System for Feature Engineering
    Michael Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael Cafarella, Arun Kumar, Feng Niu, Yongjoo Park, Christopher Re, and Ce Zhang
    CIDR 2013 (Vision Track) | Paper PDF

  • Towards a Unified Architecture for in-RDBMS Analytics
    Xixuan Feng*, Arun Kumar*, Benjamin Recht, and Christopher Re
    ACM SIGMOD 2012 | Paper PDF | TechReport | Code and Data

  • The MADlib Analytics Library or MAD Skills, the SQL
    Joseph M. Hellerstein, Christopher Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar
    VLDB 2012 (Industrial Track) | Paper PDF

Manuscripts and Dissertations

  • Learning Over Joins
    PhD Dissertation. UW-Madison 2016 | PDF | Video of job talk at UCSD
    Wisconsin CS 2016 Graduate Student Research Award for best dissertation research

  • A Survey of the Existing Landscape of ML Systems
    Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M. Patel
    UW-Madison Technical Report TR1827 | PDF