Foundational Research
Core methodological work that underpins our understanding of how and why AI systems make decisions.
Feature Attribution Methods
Understanding which input features drive model predictions is fundamental to explainability. We study methods like SHAP and LIME, investigating their faithfulness, stability, and alignment with human intuition across model architectures and application domains.
- Local vs. global explanation trade-offs
- Attribution faithfulness and consistency metrics
- Interaction effects in high-dimensional feature spaces
- User studies on explanation comprehension
Counterfactual Explanations
Counterfactual explanations answer the question: “What would need to change for the outcome to be different?” We research methods for generating actionable, sparse, and plausible counterfactuals that respect causal structure and domain constraints.
- Actionable recourse in automated decisions
- Causal counterfactual generation
- Plausibility constraints and feasibility
- Comparative evaluation of counterfactual methods
Concept-based Explanations
Rather than explaining predictions in terms of raw features, concept-based methods use human-understandable concepts as the unit of explanation. We study concept bottleneck models, concept activation vectors, and methods for discovering latent concepts aligned with human reasoning.
- Concept bottleneck architectures
- Testing with Concept Activation Vectors (TCAV)
- Automated concept discovery
- Concept completeness and fidelity
Human-in-the-Loop Evaluation
Explanations are only as good as their impact on human decision-making. We design and conduct rigorous user studies to evaluate how explanations affect comprehension, trust calibration, and task performance in real-world settings.
- Trust calibration through explanations
- Explanation modality and cognitive load
- Task-grounded evaluation frameworks
- Longitudinal studies of explanation utility