Project Nimbus


While a lot of work has focused on improving the efficiency, scalability, and usability of machine learning (ML), little work has studied the cost of data acquisition for advanced data analytics. Datasets are already being bought and sold in marketplaces, typically cloud-based, for various tasks, including training ML models. But current data marketplaces force users to buy such datasets in whole or as fixed subsets without any awareness of the ML tasks they are used for. This leads to sub-optimal choices and missed opportunities for both data sellers and buyers.

In this project, we envision and prototype the first principled and practical pricing framework that resolves the above issues by devising ML-aware pricing mechanisms. We call our approach model-based pricing. Our key observation is that for most ML users, the data is only a means to an end to meet their accuracy goals. This gives rise to novel trade-offs between price, accuracy, and runtimes. We study and optimize these trade-offs by combining ideas from ML, data management, and micro-economics.

Downloads (Paper, Code, Data, etc.)

  • Model-based Pricing for Machine Learning in a Data Marketplace
    Lingjiao Chen, Paraschos Koutris, and Arun Kumar
    ACM SIGMOD 2019 (To Appear) | TechReport

  • Model-based Pricing: Do Not Pay for More than What You Learn!
    Lingjiao Chen, Paraschos Koutris, and Arun Kumar
    ACM SIGMOD 2017 DEEM Workshop | Paper PDF

Student Contact

Lingjiao Chen: lchen1 [at] wisc [dot] edu