ADA Lab @ UCSD
OverviewWhile a lot of work has focused on improving the efficiency, scalability, and usability of machine learning (ML), little work has studied the cost of data acquisition for advanced data analytics. Datasets are already being bought and sold in marketplaces, typically cloud-based, for various tasks, including training ML models. But current data marketplaces force users to buy such datasets in whole or as fixed subsets without any awareness of the ML tasks they are used for. This leads to sub-optimal choices and missed opportunities for both data sellers and buyers. In this project, we envision and prototype the first principled and practical pricing framework that resolves the above issues by devising ML-aware pricing mechanisms. We call our approach model-based pricing. Our key observation is that for most ML users, the data is only a means to an end to meet their accuracy goals. This gives rise to novel trade-offs between price, accuracy, and runtimes. We study and optimize these trade-offs by combining ideas from ML, data management, and micro-economics. Downloads (Paper, Code, Data, etc.)
Student ContactLingjiao Chen: lchen1 [at] wisc [dot] edu |