The Explainability Gap

Enterprises face an explainability paradox. The AI models that produce the most value tend to be the least interpretable. Deep learning models that outperform simpler alternatives on credit risk, medical imaging, and natural language tasks do so by exploiting complex, non-linear patterns that resist human-readable explanation. Yet those same high-stakes use cases are precisely where regulators, courts, and affected individuals most demand explanation.

The response most enterprises have adopted is explainability theater: applying post-hoc explanation tools to black-box models, generating feature importance scores that are technically defensible but practically useless, and calling the result an explainability program. This approach satisfies no one. Regulators who examine it closely find it inadequate. Affected individuals who receive these explanations cannot understand or contest their decisions. And the explanations often do not accurately reflect what the model is actually doing.

Building genuine explainability infrastructure requires rethinking the problem. Explainability is not a technique applied to a model after it is built. It is a design requirement that shapes model selection, architecture, and deployment from the start.

Regulatory Requirement

The EU AI Act requires providers of high-risk AI systems to ensure that outputs can be interpreted by deployers and inform human oversight. The EU General Data Protection Regulation's right to explanation requires that decisions made solely by automated means can be explained to affected individuals in meaningful terms. "Feature importance scores" have not been accepted as satisfying this requirement in enforcement actions to date.

A Tiered Explainability Architecture

The fundamental design principle for enterprise explainability is proportionality: explanation depth should match decision stakes. A uniform explanation standard applied across all AI systems over-invests in low-stakes systems and under-invests in high-stakes ones.

Tier 1
Monitoring Grade
Low-stakes internal decisions
Aggregate logging sufficient for batch audit review. No individual decision explanation required. Aggregate feature importance and population-level performance metrics. Anomaly detection to flag unusual model behavior patterns. Internal dashboards showing performance trends. No real-time explanation generation.
Examples: content recommendation, internal routing, resource scheduling, maintenance prioritization
Tier 2
Operational Grade
Moderate-stakes operational decisions
Individual decision logging with key factor attribution. On-request explanation for human reviewers. Summary explanation format available within 24 hours of request. Explanation sufficient for internal appeals processes. Batch adverse action analysis capability. No external disclosure requirement.
Examples: operational resource allocation, supply chain decisions, B2B pricing, internal HR analytics
Tier 3
Regulatory Grade
Consequential individual decisions
Real-time explanation generation for every decision. Explanation format accessible to affected individuals without technical background. Audit trail preserved for regulatory review. Explanation accuracy validation: explanations must accurately reflect model reasoning, not just post-hoc rationalization. Human review capability triggered by explanation anomalies. External disclosure capability required.
Examples: credit underwriting, insurance underwriting, hiring and promotion, benefits eligibility
Tier 4
Safety Grade
Safety-critical decisions
Full reasoning chain documentation with every decision. Real-time human oversight with override capability. Explanation must be interpretable by the human overseeing the decision in real time. Adversarial robustness testing of explanation fidelity. Explanation failure triggers automatic fallback to human decision. Regulatory approval of explanation methodology required before deployment.
Examples: medical diagnostic support, critical infrastructure management, autonomous vehicle operation

Explainability Methods: What Actually Works

The explainability toolbox has expanded significantly in recent years. Understanding what each method actually does, what it does well, and where it breaks down is essential before selecting an approach for a specific system.

Model-Agnostic
SHAP Values
Shapley value attribution assigns each feature a contribution to the specific prediction. Theoretically grounded, consistent, and model-agnostic.
⚠ Computationally expensive for large feature sets. Explanations are mathematically sound but not always human-interpretable.
Model-Agnostic
LIME
Local Interpretable Model-Agnostic Explanations fits a simple interpretable model to the neighborhood of a specific prediction.
⚠ Explanations are local approximations, not exact. Stability varies by sampling — same input can produce different explanations.
Model-Specific
Attention Maps
For transformer models, attention weights show which input tokens the model focused on when generating an output.
⚠ Attention does not equal importance. High-attention tokens do not necessarily drive the prediction. Widely misused as explanation.
Counterfactual
Counterfactual Explanations
Identifies the minimal change to input features that would change the model's output. "You were denied credit. If your income were 15% higher, you would have been approved."
⚠ May identify counterfactuals that are practically impossible for the individual to achieve.
Intrinsic
Inherently Interpretable Models
Linear regression, decision trees, and rule-based models are inherently interpretable. The explanation is the model itself.
⚠ Performance tradeoff with complex models. May not be viable for highest-accuracy requirements.
Global
Global Feature Importance
Aggregate importance of each feature across the training population. Useful for understanding overall model behavior but not individual decisions.
⚠ Global importance does not explain individual decisions. Widely misused as a substitute for individual-decision explanation.
Critical Distinction

Explanation accuracy is different from explanation fidelity. An explanation can be accurate (correctly describing the most important features) while having low fidelity (not accurately reflecting the model's actual reasoning process). For regulatory and legal purposes, explanation fidelity is what matters. Post-hoc explanations applied to black-box models often have unknown fidelity, which is why regulators are increasingly skeptical of them.

Regulatory Requirements for Explainability

The regulatory landscape for AI explainability is developing rapidly and varies significantly by jurisdiction and use case. Enterprises operating across multiple jurisdictions face a patchwork that requires careful mapping.

The EU General Data Protection Regulation Article 22 gives individuals the right not to be subject to solely automated decisions that significantly affect them, and the right to obtain meaningful information about the logic involved. "Meaningful information" has been interpreted by data protection authorities to require more than feature importance scores. The explanation must enable the individual to understand the decision and contest it if they believe it is wrong.

The EU AI Act adds requirements for providers of high-risk AI systems to design systems so that their outputs can be interpreted by deployers. This is a design-time requirement, not a post-deployment patch. Systems that cannot produce interpretable outputs at design cannot be certified as compliant.

In the United States, the Equal Credit Opportunity Act requires adverse action notices that explain the specific reasons for credit denial. The Consumer Financial Protection Bureau has issued guidance indicating that AI model outputs require explanation in terms that the applicant can understand and act upon. "Black box model output" is not accepted as an explanation.

Financial services regulators under SR 11-7 require that model explanations support effective challenge by independent review functions. Models whose reasoning cannot be examined cannot satisfy this requirement.

The Generative AI Explainability Challenge

Large language models present a novel explainability challenge. They are inherently probabilistic, context-dependent, and their internal representations do not map cleanly to human-interpretable concepts. The explanation methods developed for structured prediction tasks apply poorly or not at all to generative outputs.

Enterprises deploying generative AI in consequential contexts need to think carefully about what explainability means for this model class. The practical answer is usually not technical explainability of the model's internal reasoning, but operational transparency: logging all inputs and outputs, documenting the boundaries of the system's intended use, implementing human review for consequential outputs, and maintaining the ability to audit what the system produced and why it was deployed in a given context.

For generative AI in customer-facing applications, transparency to the user that they are interacting with an AI system is itself a regulatory requirement in several jurisdictions. The EU AI Act requires disclosure of AI-generated content in defined contexts. Several US states have enacted similar disclosure requirements. Compliance with these disclosure requirements is a distinct obligation from technical model explainability.

Is Your Explainability Program Regulatory Grade?

Our practitioners evaluate your current explainability infrastructure against regulatory requirements and the specific decision stakes of your AI portfolio.

Request an Explainability Assessment →

Building Explainability Infrastructure

Explainability infrastructure is not a library you install. It is a combination of model design choices, explanation generation pipelines, storage and retrieval systems, and human processes that together enable the organization to produce, deliver, and defend explanations for every consequential decision its AI systems make.

The infrastructure components that most enterprises need to build:

  • Explanation generation pipeline: For Tier 3 and Tier 4 systems, an automated pipeline that generates the required explanation at inference time, stores it alongside the decision record, and makes it retrievable for regulatory review or individual request.
  • Explanation quality monitoring: Systematic evaluation of whether explanations are accurate, stable, and consistent over time. Explanation quality degrades as model behavior changes. This must be monitored continuously.
  • Explanation delivery mechanism: The channel and format through which explanations reach affected individuals. A technically valid explanation that an affected individual cannot understand satisfies no one. Explanation format must be validated with representative members of the affected population.
  • Appeal handling process: A structured process for receiving, reviewing, and responding to contests of AI-driven decisions. The EU GDPR requires that automated decisions subject to the right of explanation also be subject to human review upon request. This human review function must have genuine authority to override the model.

Explainability and Model Selection

One of the most consequential explainability decisions happens before any explanation tool is selected: model architecture choice. For high-stakes decisions, the question should be whether a less complex model can achieve acceptable performance. If a decision tree or logistic regression achieves 88% of the accuracy of a gradient boosting model, the explainability advantages of the simpler model may justify the performance tradeoff.

This is not a blanket argument for simple models. In medical imaging, fraud detection, and natural language applications, complex models achieve performance levels that simple models cannot approach, and the performance difference has material consequences. But in many business applications, the performance difference between complex and simple models is smaller than assumed, and the governance, audit, and explainability costs of complex models are not accounted for in the comparison.

The model selection decision should explicitly account for: the performance requirement, the explainability requirement imposed by the decision stakes, the regulatory classification of the use case, and the total cost of compliance including explanation infrastructure, audit, and ongoing monitoring. When these factors are all included, inherently interpretable models are competitive in more use cases than pure accuracy comparisons suggest.

For the governance framework that structures explainability requirements, see our enterprise AI governance framework guide. For how explainability fits into AI audit methodology, see our AI audit guide. For the responsible AI operating model, see our responsible AI practical guide. To explore governance services, visit our AI Governance page.

📋

AI Governance Handbook

Explainability architecture templates, regulatory compliance checklists, and explanation format standards for consequential AI decisions.

Download Free →

Build Explainability That Withstands Regulatory Scrutiny

Our advisors design and implement explainability infrastructure that satisfies regulators, enables meaningful appeals, and holds up under examination.