Introduction
The increasing use of machine learning (ML) has raised concerns about the ethical and legal implications of using models whose behavior is not well understood. This has led to the development of Explainable Artificial Intelligence (XAI), which aims to generate explanations for ML models. The paper argues that a common reasoning scheme in the XAI literature is problematic. This scheme links the absence of a desired capability in an ML model (e.g., interpretability, explainability) to a need for XAI algorithms to supplement that capability, leading to a goal like increased user trust. The authors contend that this scheme is flawed because these capabilities lack precise definitions, and their relationship to goals is unclear. To address this issue, the paper shifts to a user-centric perspective, examining the questions that users might reasonably ask about ML models. This involves a thought experiment focusing on a spam filter to illustrate the types of questions users might have. The authors aim to clarify what capabilities XAI algorithms can realistically deliver, focusing on two modest but important capabilities: the interpretation of input attributes and the simplicity of the functions used to translate attributes into interpretable representations. These are more achievable goals for XAI research than the more ambitious capabilities often discussed in the literature. The paper will analyze these capabilities through an interdisciplinary lens combining computer science, philosophy, and technology assessment.
Literature Review
The paper reviews existing literature on Explainable AI, criticizing the common reasoning scheme that links capabilities like interpretability or explainability to goals such as trust. The authors point out the lack of consensus on the definitions of these capabilities and the unclear relationship between the capabilities and goals. They highlight the work of Tomsett et al. (2018) and Zednik (2021), which focus on the diverse questions posed by different stakeholders when interacting with ML applications. The paper also mentions existing critiques of the definitions of interpretability (Lipton, 2018; Krishnan, 2020; Robbins, 2019; Erasmus et al., 2021), and explainability (Arrieta et al., 2020; Fleisher, 2022), emphasizing the diverse and contextual meanings of these terms. The authors cite several publications that emphasize the link between explainability and trust (Ribeiro et al., 2016; Mittelstadt et al., 2019; Gilpin et al., 2018; Molnar, 2020; Erasmus et al., 2021), and argue that this connection is not well-defined and that trust can be achieved through other means. Liao et al. (2020) are also mentioned for their work on the suitability of explanations depending on the context and questions asked. The paper acknowledges several different definitions of interpretation and explainability (Fleisher, 2022; Erasmus et al., 2021), however, they argue for a more practical, user-centric approach rather than focusing on these philosophical accounts.
Methodology
The paper uses a thought experiment involving a spam filter to explore the questions a user might ask about an ML model. Seven questions are identified, representing different aspects of understanding the model's behavior. These questions range from asking for a reason for a specific classification (Q1) to asking about the general principles that distinguish spam from non-spam (Q3 and Q7). The authors then analyze each question to determine whether current XAI algorithms can answer them. They analyze the nature of the questions, exploring different levels of ambiguity depending on the users' backgrounds (layperson vs. computer scientist). They break down the complexities of supervised machine learning, including the use of training data, function types, hyperparameters, and the distinction between interpreted and technical attributes. The authors show how answers to different questions require different types of explanations, ranging from explanations of specific predictions to explanations of the general model's mechanisms. They find that current XAI algorithms primarily address a specific type of question, reformulated as Q*: "How can one represent an ML model as a simple function that uses interpreted attributes?" This central question leads to the identification of two major challenges for XAI: the approximation challenge (finding simpler models to represent complex ones) and the translation challenge (mapping technical attributes to human-interpretable ones).
Key Findings
The core finding of the paper is that current XAI algorithms primarily address the question of how to represent a complex ML model as a simpler function using interpretable attributes (Q*). This contrasts with the frequent use of ambiguous terms like "interpretability" and "explainability" in the literature, and highlights a gap between the actual capabilities of XAI and the expectations based on the prevalent terminology. The paper identifies two key challenges for XAI research stemming from question Q*: approximation and translation. The approximation challenge involves finding a balance between simplicity and fidelity when creating simpler surrogate models to represent complex ML models. The translation challenge focuses on converting the technical attributes used by ML models into interpretable attributes that are meaningful to humans. The difficulty of the translation challenge varies significantly depending on the specific application. In some cases, the mapping between technical and interpretable attributes is straightforward, whereas in others it is difficult or even impossible to establish such a mapping. The paper reviews existing XAI algorithms, showing that many address the approximation challenge, while fewer tackle the translation challenge effectively. There is a lack of holistic methods that adequately address both challenges simultaneously. The authors argue that existing XAI algorithms do not adequately address questions about the real-world phenomena the models are meant to explain, the creation process of the models, the comparison of different models, comparisons between the model and a user's reasoning, and the identification of universal rules about these phenomena. Focusing solely on question Q* allows for a clearer and more realistic assessment of XAI's capabilities, leading to a more productive research agenda.
Discussion
The paper's user-centric approach to evaluating XAI algorithms offers a more grounded and realistic perspective than the prevalent approach that links capabilities to often vaguely defined goals. By focusing on the specific question that XAI algorithms currently address, the authors provide a clearer picture of their capabilities and limitations. The identification of the approximation and translation challenges provides a concrete framework for future research directions. The authors' emphasis on the need for more holistic methods that address both challenges simultaneously is crucial. The paper's findings suggest that current XAI methods are better suited for addressing technical questions rather than addressing more philosophical or ethical goals such as trust or fairness, which are often associated with XAI. This distinction is essential for managing expectations surrounding XAI and for prioritizing research efforts towards achieving realistically achievable objectives. Addressing the approximation and translation challenges will improve the usability and acceptance of XAI methods, contributing to more responsible development and deployment of ML systems.
Conclusion
This paper reveals a disconnect between the aspirational goals and actual capabilities of current XAI algorithms. By focusing on the user-centric question of representing complex ML models as simpler functions using interpretable attributes, the authors highlight the critical challenges of approximation and translation. They advocate for a shift towards a more realistic assessment of XAI capabilities, emphasizing the need for holistic methods that address both challenges. Future research should prioritize developing methods that facilitate effective translation of technical attributes into interpretable ones and that strike an optimal balance between simplicity and fidelity in surrogate models. This will lead to more effective and trustworthy XAI systems.
Limitations
The paper primarily relies on a philosophical analysis of the XAI literature and a thought experiment, rather than empirical data. While the thought experiment provides a valuable framework for exploring user questions, it might not fully capture the nuances of real-world user interactions with ML systems. The focus on a specific type of question (Q*) might overlook other important aspects of explainability. The analysis of existing XAI algorithms is not exhaustive and does not delve deeply into the technical details of individual methods. Therefore, the conclusions of the paper should be considered within these limitations.
Related Publications
Explore these studies to deepen your understanding of the subject.