Treffer: Data Quality Checking for Machine Learning with MeSQuaL

Title:
Data Quality Checking for Machine Learning with MeSQuaL
Contributors:
Data Integration, Analysis, and Management as Services (DIAMS), Laboratoire d'Informatique et des Systèmes (LIS) (Marseille, Toulon) (LIS), Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS)-Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS), UMR 228 Espace-Dev, Espace pour le développement, Institut de Recherche pour le Développement (IRD)-Université de Perpignan Via Domitia (UPVD)-Avignon Université (AU)-Université de La Réunion (UR)-Université de Montpellier (UM)-Université de Guyane (UG)-Université des Antilles (UA), ANR-18-CE23-0002,QualiHealth,QualiHealth: Amélioration de la Qualité des Données de Soins(2018)
Source:
Advances in Database Technology - EDBT 2020, 23rd International Conference on Extending Database Technology, Copenhagen, Denmark, March 30 - April 02, Proceedings ; Advances in Database Technology - EDBT 2020, 23rd International Conference on Extending Database Technology, ; https://hal.science/hal-02865824 ; Advances in Database Technology - EDBT 2020, 23rd International Conference on Extending Database Technology,, Mar 2020, Copenhagen, Denmark
Publisher Information:
CCSD
Publication Year:
2020
Subject Geographic:
Document Type:
Konferenz conference object
Language:
English
Relation:
IRD: fdi:010078830
Rights:
info:eu-repo/semantics/OpenAccess
Accession Number:
edsbas.6AA4CDCE
Database:
BASE

Weitere Informationen

International audience ; This demo proposes MeSQuaL, a system for profiling and checking data quality before further tasks, such as data analytics and machine learning. MeSQuaL extends SQL for querying relational data with constraints on data quality and facilitates the verification of statistical tests. The system includes: (1) a query interpreter for SQuaL, the SQL-extended language we propose for declaring and querying data with data quality checks and statistical tests; (2) an extensible library of user-defined functions for profiling the data and computing various data quality indicators;and (3) a user interface for declaring data quality constraints,profiling data, monitoring data quality with SQuaL queries, and visualizing the results via data quality dashboards. We showcase our system in action with various scenarios on real-world datasets and show its usability for monitoring data quality over time and checking the quality of data on-demand