On Measuring Faithfulness or Self-consistency of Natural Language Explanations

Events

On Measuring Faithfulness or Self-consistency of Natural Language Explanations

Lectures and seminars

LLM seminar event about the paper "On Measuring Faithfulness or Self-consistency of Natural Language Explanations" by Heidelberg University

Image with writing about the presenter name, title, time and place of the event. Black background with a book

When

6.3.2025 14:00 – 15:00 (UTC +2)

Where

Computer Science building - meeting room A142 &

Event language(s)

English

Title: On Measuring Faithfulness or Self-consistency of Natural Language Explanations

Presenter: Linli Zhang

Abstract: Large language models (LLMs) can explain their predictions through post-hoc or Chain-of-Thought (CoT) explanations. But an LLM could make up reasonably sounding explanations that are unfaithful to its underlying reasoning. Recent work has designed tests that aim to judge the faithfulness of post-hoc or CoT explanations. In this work the authors argue that these faithfulness tests do not measure faithfulness to the models' inner workings -- but rather their self-consistency at output level. Their contributions are three-fold: i) They clarify the status of faithfulness tests in view of model explainability, characterising them as self-consistency tests instead. This assessment they underline by ii) constructing a Comparative Consistency Bank for self-consistency tests that for the first time compares existing tests on a common suite of 11 open LLMs and 5 tasks -- including iii) their new self-consistency measure CC-SHAP. CC-SHAP is a fine-grained measure (not a test) of LLM self-consistency. It compares how a model's input contributes to the predicted answer and to generating the explanation. Their fine-grained CC-SHAP metric allows them iii) to compare LLM behaviour when making predictions and to analyse the effect of other consistency tests at a deeper level, which takes us one step further towards measuring faithfulness by bringing us closer to the internals of the model than strictly surface output-oriented tests.

Paper link: ,

Disclaimer: The presenter is not part of the authors!

Updated: 6.3.2025
Published: 14.2.2025

91�����