Data Quality meets Machine Learning and Knowledge Graphs

DQMLKG Workshop at ESWC 2024, May 26th




DQMLKG - Bridging Precision with Intelligence


Machine Learning (ML) and Knowledge Graphs (KGs) possess a symbiotic relationship with the potential to mutually enhance their capabilities. ML can play a pivotal role in the construction of KGs by automating ontology design, inferring classes and relations from alternative sources, and aiding data curators in making informed decisions. ML algorithms need data to be in a "certain form", meaning that they need to get the data prepared and ensure high-quality data used as input. Poor-quality datasets can compromise ML systems in several ways. The four data quality categories of intrinsic, contextual, representational, and accessibility are relevant to different stages of the ML development pipeline. For example, intrinsic data quality categories include accuracy, completeness, and consistency, while contextual data quality categories include bias, relevance, and validity. Conversely, KGs can enrich ML models through node or graph embedding techniques, link prediction, supporting explainability and improving the overall performance of data-driven models. Low quality KGs, i.e., those not fit to be used, may lead to biased or inaccurate ML models, hindering their ability to generate meaningful insights or make informed decisions. The growing range of ML data management guidelines, frameworks and standards presents practitioners with a vast range of possible criteria to aspire to, on top of the traditional data management practices that were established in previous decades.

This workshop aims to explore the intricate interplay of data quality, ML, and KGs, elucidating limitations in assessment methodologies, proposing effective methods for objective quality assessment, and addressing challenges on ML and AI in general, verify if and to what extent well-known quality metrics are compliant with ML-based quality assessment, and addressing FAIR principles. We also welcome proposals riding the path of Explainable AI, Large Language Models, Generative AI, and any AI-driven approach that can be applied to the Semantic Web technologies to support and enhance data quality assessment and improvement.



Topics of interest

New approaches for performing Data quality assessment or improvement of Knowledge Graphs via Machine Learning

  • Quality assessment over the time
  • Scalability issues
  • Proactive approaches able to improve KG quality during the data authoring stage
  • Reactive approaches to improve KG quality before the data exploitation stage
  • Large Language Models to deal with KG quality issues
  • Generative Artificail Intelligence (AI) to cope with KG quality issues
  • AI-driven approach to assess and improve data quality issues over KGs

Applications combining Machine Learning and Knowledge Graphs dealing with Data Quality concerns:

  • Recommender Systems leveraging (incomplete) Knowledge Graphs
  • Link Prediction and completing KGs
  • Ontology Learning and Matching coping with KG consistency and accuracy
  • Question Answering exploiting Knowledge Graphs and Machine Learning dealing with representational issues
  • Domain Specific KGs quality issues

Submission details

Submissions can fall in one of the following categories:

  • Full research papers (up to 15 pages, excluding references)
  • Short research papers (up to 8 pages, excluding references)

We welcome contributions presenting

  • success stories,
  • negative results,
  • reviews of the state of the art,
  • position papers critically discussing what is missing in this alliance, i.e., data quality, ML and KG.

Papers must comply with the CEUR-WS template. Papers are submitted in PDF format via the workshop’s Open Review submission page.

Accepted papers (after blind review of at least 2 experts) will be published by CEUR–WS.

At least one of the authors of the accepted papers must register for the workshop (pre-conference only option) to be included into the workshop proceedings. Information about registration can be found on the ESWC 2024 official page.


Important dates

  • Paper submission deadline: February 26, 2024 March 11, 2024
  • Notification of Acceptance: March 28, 2024
  • Camera-ready paper due: April 18, 2024
  • ESWC 2024 Workshop days: May 26, 2024

Program details and Keynote

  • 09:00 - 09:05 AM > Welcome Session
  • 09:05 - 10:05 AM > KEYNOTE by Elena Simperl
Elena Simperl is a Professor of Computer at King’s College London and the Director of Research for the Open Data Institute (ODI). She is a Fellow of the British Computer Society and the Royal Society of Arts, and a Hans Fischer Senior Fellow. Elena’s work is at the intersection between AI and social computing. She features in the top 100 most influential scholars in knowledge engineering of the last decade and in the Women in AI 2000 ranking. She is the president of the Semantic Web Sciences Association.

Title: When stars align: studies in data quality, knowledge graphs, and machine learning

Abstract: In this talk I will present several projects that tease out the intricate relationship between these three fields of research to produce better AI datasets and, with that, better AI models and downstream applications. I will start with work in knowledge graphs, which are machine-readable structured data representations, organised for general-purpose use. Besides data integration, they are extensively used in search engines, recommender systems, virtual assistants and other AI contexts, commonly as a source of domain knowledge and explanations. I propose sociotechnical methods, drawing on machine learning and other techniques, to understand biases, improve the quality, and increase people’s trust in knowledge graphs. Then I will move to ongoing work on assuring any type of AI dataset, using semantic technologies. I will introduce the data-centric AI programme at the Open Data Institute and deep dive into Croissant, a schema.org-based vocabulary to describe AI datasets to improve their quality and reuse.

Slides: Slides are available online at the following link

  • 10:05 - 10:30 AM > Stefani Tsaneva, Stefan Vasic, Marta Sabou. ‘‘LLM-driven Ontology Evaluation: Verifying Ontology Restrictions with ChatGPT’’ - paper - slides
  • 10:30 - 11:00 AM > Coffee Break
  • 11:00 - 11:25 AM > Jose Emilio Labra Gayo. ‘‘Extending Shape Expressions for different types of knowledge graphs’’ - paper - slides
  • 11:25 - 11:40 AM > Gabriele Tuozzo. ‘‘Moving from Tabular Knowledge Graph Quality Assessment to RDF Triples Leveraging ChatGPT’’ - paper - slides
  • 11:40 - 11:55 AM > Pasquale Esposito. ‘‘The Linguistic Linked Open Data through the Linguists’ Lens’’ - paper - slides
  • 12:00 - 12:30 PM > Panel (Elena, Anastasia, Heiko, Paul)

Organizers

Maria Angela Pellegrino Anisa Rula Jose Emilio Labra Gayo Michael Cochez Mehwish Alam
University of Salerno University of Brescia University of Oviedo Vrije Universiteit Amsterdam Institut Polytechnique de Paris
Italy Italy Spain the Netherlands France

Program committee

  • Cinzia Cappiello, Polytechnic of Milan, Italy
  • Jeremy Debattista, Trinity College Dublin, Ireland
  • Anastasia Dimou, Katholieke Universiteit, Leuven
  • Paul Groth, University of Amsterdam, Holland
  • Antonio Lieto, University of Salerno, Italy
  • Ernesto Jiménez-Ruiz, University of London, England
  • Blerina Spahiu, University of Milan, Italy