Automatable Data Quality Dimensions for Data Exchange: Formulation and Application

Automatable Data Quality Dimensions for Data Exchange: Formulation and Application

Authors: Debarun Sengupta, Anjula Gurtoo, Minnu Malieckal, Jyotirmoy Dutta

Large amounts of data get generated and applied in decision making to improve outcomes. However, quality of the data remains an issue as data gets generated from varied sources, in unspecified formats, and variables vary across different types of data. Identifying key quality dimensions as standardized treatment, therefore, is a challenge. The present study addresses such a problem by identifying and defining quantifiable and automatable quality dimensions, which can easily be integrated into any data exchange scenario. The proposed dimensions are intrinsic to data, namely, completeness, accuracy – correctness, accuracy – precision, timeliness, and uniqueness. The methodology follows a deterministic continuous framework quality assessment and flagging laggard dimensions of incoming datasets. As a second step, the study tests the dimensions as an empirical exercise based on multivariate datasets. The proposed quality dimensions are tested to measure task-independent quality aspects of corresponding datasets.

Journal/Conference

Annals of Data Sciences