AI Model Risk Management in Financial Services

Why SR 11-7 Was Not Built for AI

The Federal Reserve's SR 11-7 guidance on model risk management, issued in 2011, established the framework that governs how banks identify, validate, and control model risk. For a decade, it worked reasonably well for the statistical models that dominated financial services: credit scorecards, regression-based pricing models, CCAR stress testing models, and similar systems with interpretable outputs and well-understood failure modes.

AI changes every assumption SR 11-7 makes. Statistical models have closed-form mathematics that can be examined. Neural networks do not. Statistical models have stable behavior on similar inputs. Large language models do not. Statistical models have well-understood out-of-sample performance bounds. Gradient boosting models trained on regime-specific historical data can fail dramatically in novel regimes in ways that classical model validation frameworks are not designed to detect.

The result is a regulatory gap. Most financial institutions are applying SR 11-7 to AI models with modifications that are insufficient for the actual risk profile of these systems. Regulators have begun to notice. OCC, Federal Reserve, and FDIC examiners are asking more sophisticated questions about AI model governance. The institutions best positioned are those that have proactively updated their MRM frameworks rather than waiting for formal updated guidance.

Regulatory Signal

The OCC's 2023 proposed guidance on model risk management explicitly acknowledged that SR 11-7's concepts apply to AI and machine learning models but noted that additional considerations are necessary given the unique characteristics of these models. Examination findings increasingly cite inadequate AI-specific model risk controls as material gaps. The formal updated guidance will codify what progressive institutions are already building.

Where AI Breaks Traditional MRM

The specific characteristics of AI systems that create material gaps in traditional MRM frameworks:

SR 11-7 Gap

Interpretability

SR 11-7 requires effective challenge, which requires understanding what a model is doing. Complex AI models resist this understanding. Examiners expect validators to demonstrate they understand model behavior, not just that they ran performance tests.

→ Implement tiered explainability matched to model complexity and decision stakes

SR 11-7 Gap

Conceptual Soundness

SR 11-7 requires validators to assess conceptual soundness — whether the model is theoretically appropriate for the problem. AI models learned from data rather than derived from theory present novel conceptual soundness challenges.

→ Develop AI-specific conceptual soundness criteria including inductive bias assessment and architecture appropriateness

SR 11-7 Gap

Ongoing Monitoring

Traditional MRM monitoring tracks performance metrics against predefined thresholds. AI models can exhibit behavioral drift, distributional shift, and emergent failure modes not captured by performance metrics alone.

→ Augment performance monitoring with behavioral monitoring, input distribution tracking, and adversarial probing

SR 11-7 Gap

Vendor Model Management

SR 11-7 applies regardless of whether a model is built internally or purchased from a vendor. AI vendors often cannot provide the documentation SR 11-7 requires. Foundation model APIs present novel third-party risk that existing vendor management frameworks do not address.

→ Establish AI vendor due diligence standards and contractual requirements for model documentation and change notification

SR 11-7 Gap

Change Management

Traditional model change management triggers on version releases. AI systems can change behavior continuously through online learning, retraining pipelines, and upstream data changes without triggering formal change management gates.

→ Implement behavioral monitoring that detects material changes in model outputs regardless of whether a formal version change occurred

SR 11-7 Gap

Scope Definition

SR 11-7's definition of "model" was written with statistical models in mind. AI systems used as decision inputs, content generators, or data processors may not trigger model inventory requirements under traditional scope definitions.

→ Expand model inventory scope to capture all AI systems that influence consequential decisions, not just those that directly produce the decision output

An AI-Extended MRM Framework

The foundational structure of SR 11-7 remains valid for AI: model development, model validation, and ongoing monitoring and control. What must change is the specific content of each stage and the standards applied within it.

Stage 1

AI Model Development and Documentation

Traditional development documentation covers data sources, methodology, assumptions, and limitations. AI models require extended documentation covering training data provenance and bias analysis, architecture selection rationale and alternatives considered, training process documentation including hyperparameter selection, data preprocessing decisions and their effects on model behavior, and known failure modes and edge cases identified during development.

AI Extension: Model cards and system cards are increasingly expected by regulators as minimum documentation standards. The OCC has referenced NIST AI RMF documentation requirements in examination guidance.

Stage 2

AI Model Validation

Traditional validation covers conceptual soundness, data quality, performance testing, and sensitivity analysis. AI validation requires additional coverage: behavioral testing across distributional subgroups, adversarial robustness testing, explainability assessment against defined standards, fairness evaluation against appropriate criteria, and stress testing under distribution shift conditions not present in training data.

AI Extension: Effective challenge for AI requires validators with ML expertise. MRM teams lacking this expertise must either develop it or supplement with external validators. Examiner expectations for validator qualifications are rising.

Stage 3

Production Approval and Tiering

Traditional MRM tiers models by materiality and classifies them accordingly for monitoring intensity and revalidation frequency. AI models require additional tiering dimensions: interpretability (how transparent is the model's reasoning), autonomy (how much does the model's output directly drive decisions without human review), and novelty (how well-established are the model architecture and training approach in the relevant domain).

AI Extension: High-autonomy, low-interpretability AI models in material use cases warrant tier 1 classification regardless of asset size impact, because their failure modes are harder to detect and harder to remediate.

Stage 4

Ongoing Monitoring

Traditional monitoring tracks output distribution and performance metrics. AI monitoring must additionally track input distribution shift (when the population being served diverges from the training population), behavioral consistency (whether the model treats similar inputs similarly over time), and fairness metric drift (whether fairness properties measured at deployment are maintained in production). Monitoring must be designed to detect silent failures — degradations that do not immediately appear in aggregate performance metrics.

AI Extension: Generative AI systems require behavioral monitoring approaches distinct from predictive model monitoring. Output quality assessment, hallucination rate tracking, and prompt injection detection are AI-specific monitoring requirements.

Stage 5

Revalidation and Retirement

Traditional revalidation is triggered by the passage of time and material changes. AI revalidation must also be triggered by: training data refresh (even if model architecture does not change), upstream data pipeline changes that could affect model inputs, detected distributional shift in model inputs or outputs, and regulatory or supervisory guidance that changes the validation standard for the use case.

AI Extension: The useful life of AI models may be shorter than traditional statistical models, particularly in domains experiencing rapid distributional change. Revalidation calendars calibrated for statistical models may be inadequate for AI.

Is Your MRM Framework AI-Ready?

Our financial services advisors assess your current MRM framework against AI-specific requirements and build the gap remediation roadmap before examiners arrive.

Request an MRM Assessment →

Generative AI in the MRM Framework

Large language models and other generative AI systems present the most significant MRM challenge financial institutions have faced since the proliferation of complex derivative pricing models in the 1990s. The challenge is not technical: it is definitional and governance-structural.

The definitional question is whether foundation models accessed via API constitute "models" under SR 11-7. The functional answer must be yes: if a financial institution uses a foundation model to generate content, analyze documents, or inform decisions, that use creates model risk regardless of whether the institution built the model. The institution cannot outsource model risk to the foundation model provider. This is the same logic SR 11-7 applies to vendor models.

The governance challenge is that foundation models change in ways that institutions cannot control or predict. A model accessed via an API may be retrained, fine-tuned, or replaced by the vendor without notice. Traditional model change management frameworks are not designed for this. Institutions must establish contractual requirements for change notification and technical monitoring to detect behavioral changes regardless of vendor notification.

The practical MRM requirements for foundation model use include: inventory of all foundation model API integrations, documentation of the specific use case and decision influence of each integration, testing protocols for each new model version or API update, behavioral monitoring in production to detect unexpected output changes, and human review gates for any generative output that directly informs a material decision.

Fair Lending Integration

AI model risk management in financial services has a fair lending dimension that does not exist for most other industries. ECOA, Fair Housing Act, and CFPB enforcement create legal obligations that make bias management a model risk management requirement, not just a governance best practice.

The fair lending integration points in MRM include: adverse impact analysis in model validation for any model used in credit, housing, or employment decisions; pre-deployment testing against appropriate fairness criteria; production monitoring for disparate impact; adverse action notice capability for all AI-driven credit decisions; and documentation sufficient to demonstrate to examiners that AI models in consumer credit use cases do not produce illegal discrimination.

The supervisory expectation is not that AI models will never produce disparate impact. It is that institutions know whether their models produce disparate impact, can explain why, can demonstrate they have considered less discriminatory alternatives, and have implemented monitoring to detect fairness degradation.

📋

AI Governance Handbook

Detailed MRM framework extensions for AI, fair lending integration guidance, and documentation templates aligned with regulatory expectations.

Download Free →

What Examiners Are Looking For

Based on current examination trends and regulatory communications, these are the areas where AI MRM gaps are most commonly cited:

Scope completeness: AI systems that influence decisions but are not classified as models in the inventory. Shadow AI built by business units without MRM review. Foundation model integrations without formal model risk assessment.
Validator qualifications: Model validators without sufficient ML expertise to effectively challenge complex AI model architectures. This is the most common practical gap, because ML expertise is expensive and MRM functions have historically not required it.
Documentation completeness: AI models in production without documentation sufficient for examination. Training data provenance records that cannot be produced. Absence of model cards or equivalent documentation artifacts.
Ongoing monitoring adequacy: Monitoring frameworks designed for statistical models applied to AI without AI-specific monitoring augmentation. Absence of distributional shift detection. Absence of behavioral consistency monitoring.
Change management gaps: AI systems that change behavior through continuous retraining or upstream data changes without triggering formal change management review.

Institutions that proactively address these gaps before examination are in substantially better position than those that wait for findings. The cost of building AI-extended MRM capability is a fraction of the cost of regulatory remediation orders.

Building MRM Capability for AI

The practical path to AI-ready MRM requires investment in three areas: expertise, tooling, and governance.

Expertise means adding ML engineers or data scientists to validation teams, or contracting with external validators who have this expertise. The alternative, applying traditional MRM validation approaches to AI models without the technical expertise to evaluate them, is not effective challenge and will not satisfy sophisticated examiners. Institutions that cannot build this expertise internally need a credible external partner strategy.

Tooling means investment in AI-specific validation software, behavioral monitoring infrastructure, fairness evaluation pipelines, and model documentation management systems that can handle the expanded documentation requirements of AI models at scale.

Governance means updating the MRM policy, standards, and procedures to reflect AI-specific requirements. This includes updating the model definition to capture AI systems, updating validation standards to include AI-specific validation requirements, updating monitoring standards to include behavioral and distributional monitoring, and updating the model inventory to capture foundation model integrations.

For the broader AI governance framework into which MRM fits, see our enterprise AI governance framework guide. For the audit methodology that supports MRM review, see our AI audit guide. For the responsible AI program that provides the policy framework, see our responsible AI practical guide. To discuss MRM program assessment or build, visit our AI Governance service page.

Get Ahead of AI Model Risk Before Examiners Arrive

Our financial services advisors help banks, insurers, and asset managers build AI MRM capability that satisfies regulators and protects institutions from the failure modes unique to AI.

Get Your Free AI Assessment → Explore AI Governance Talk to an Advisor

AI Model Risk Management in Financial Services

Why SR 11-7 Was Not Built for AI

Where AI Breaks Traditional MRM

An AI-Extended MRM Framework

Is Your MRM Framework AI-Ready?

Generative AI in the MRM Framework

Fair Lending Integration

AI Governance Handbook

What Examiners Are Looking For

Building MRM Capability for AI

Get Ahead of AI Model Risk Before Examiners Arrive

AI Governance Advisory

Get the AI Strategy Playbook — Free