1 code implementation • 19 Apr 2024 • Sibei Chen, Yeye He, Weiwei Cui, Ju Fan, Song Ge, Haidong Zhang, Dongmei Zhang, Surajit Chaudhuri
Spreadsheets are widely recognized as the most popular end-user programming tools, which blend the power of formula-based computation, with an intuitive table-based interface.
no code implementations • 13 Oct 2023 • Peng Li, Yeye He, Dror Yashar, Weiwei Cui, Song Ge, Haidong Zhang, Danielle Rifinski Fainman, Dongmei Zhang, Surajit Chaudhuri
Language models, such as GPT-3. 5 and ChatGPT, demonstrate remarkable abilities to follow diverse human instructions and perform a wide range of tasks.
1 code implementation • 27 Jul 2023 • Peng Li, Yeye He, Cong Yan, Yue Wang, Surajit Chaudhuri
Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, have been the standard for tables in relational databases.
no code implementations • 4 Jun 2023 • Dezhan Tu, Yeye He, Weiwei Cui, Song Ge, Haidong Zhang, Han Shi, Dongmei Zhang, Surajit Chaudhuri
Data pipelines are widely employed in modern enterprises to power a variety of Machine-Learning (ML) and Business-Intelligence (BI) applications.
no code implementations • 13 Nov 2022 • Renzhi Wu, Alexander Bendeck, Xu Chu, Yeye He
We also show that a deep learning EM end model (DeepMatcher) trained on labels generated from our weak supervision approach is comparable to an end model trained using tens of thousands of ground-truth labels, demonstrating that our approach can significantly reduce the labeling efforts required in EM.
no code implementations • 11 Dec 2021 • Yeye He, Jie Song, Yue Wang, Surajit Chaudhuri, Vishal Anil, Blake Lassiter, Yaron Goland, Gaurav Malhotra
As data lakes become increasingly popular in large enterprises today, there is a growing need to tag or classify data assets (e. g., files and databases) in data lakes with additional metadata (e. g., semantic column-types), as the inferred metadata can enable a range of downstream applications like data governance (e. g., GDPR compliance), and dataset search.
1 code implementation • 25 Jun 2021 • Junwen Yang, Yeye He, Surajit Chaudhuri
We in this work propose to automate multiple such steps end-to-end, by synthesizing complex data pipelines with both string transformations and table-manipulation operators.
no code implementations • 21 Jun 2021 • Renzhi Wu, Prem Sakala, Peng Li, Xu Chu, Yeye He
Panda's IDE includes many novel features purpose-built for EM, such as smart data sampling, a builtin library of EM utility functions, automatically generated LFs, visual debugging of LFs, and finally, an EM-specific labeling model.
no code implementations • 10 Apr 2021 • Jie Song, Yeye He
Complex data pipelines are increasingly common in diverse applications such as BI reporting and ML modeling.