The search result changed since you submitted your search request. Documents might be displayed in a different sort order.
  • search hit 82 of 403
Back to Result List

Data Preparation

  • Raw data are often messy: they follow different encodings, records are not well structured, values do not adhere to patterns, etc. Such data are in general not fit to be ingested by downstream applications, such as data analytics tools, or even by data management systems. The act of obtaining information from raw data relies on some data preparation process. Data preparation is integral to advanced data analysis and data management, not only for data science but for any data-driven applications. Existing data preparation tools are operational and useful, but there is still room for improvement and optimization. With increasing data volume and its messy nature, the demand for prepared data increases day by day. <br /> To cater to this demand, companies and researchers are developing techniques and tools for data preparation. To better understand the available data preparation systems, we have conducted a survey to investigate (1) prominent data preparation tools, (2) distinctive tool features, (3) the need for preliminary dataRaw data are often messy: they follow different encodings, records are not well structured, values do not adhere to patterns, etc. Such data are in general not fit to be ingested by downstream applications, such as data analytics tools, or even by data management systems. The act of obtaining information from raw data relies on some data preparation process. Data preparation is integral to advanced data analysis and data management, not only for data science but for any data-driven applications. Existing data preparation tools are operational and useful, but there is still room for improvement and optimization. With increasing data volume and its messy nature, the demand for prepared data increases day by day. <br /> To cater to this demand, companies and researchers are developing techniques and tools for data preparation. To better understand the available data preparation systems, we have conducted a survey to investigate (1) prominent data preparation tools, (2) distinctive tool features, (3) the need for preliminary data processing even for these tools and, (4) features and abilities that are still lacking. We conclude with an argument in support of automatic and intelligent data preparation beyond traditional and simplistic techniques.show moreshow less

Export metadata

Additional Services

Search Google Scholar Statistics
Metadaten
Author details:Mazhar HameedORCiD, Felix NaumannORCiDGND
DOI:https://doi.org/10.1145/3444831.3444835
ISSN:0163-5808
ISSN:1943-5835
Title of parent work (English):SIGMOD record
Subtitle (English):a survey of commercial tools
Publisher:Association for Computing Machinery
Place of publishing:New York
Publication type:Article
Language:English
Date of first publication:2020/12/17
Publication year:2020
Release date:2023/01/06
Tag:data cleaning; data quality; data wrangling
Volume:49
Issue:3
Number of pages:12
First page:18
Last Page:29
Funding institution:HPI research school on Data Science and Engineering
Organizational units:An-Institute / Hasso-Plattner-Institut für Digital Engineering gGmbH
DDC classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 000 Informatik, Informationswissenschaft, allgemeine Werke
Peer review:Referiert
Accept ✔
This website uses technically necessary session cookies. By continuing to use the website, you agree to this. You can find our privacy policy here.