The bank's consumer and SME lending portfolio had grown from $54 billion to $84 billion over the preceding five years through a combination of organic growth and two acquisitions. The credit risk models underpinning origination decisions, early warning signals, and provisioning had not kept pace with this growth. The primary origination scorecard was a logistic regression model built in 2016, validated against pre-pandemic credit data that no longer reflected the behavioral patterns of the current customer base. Two supplementary models used for SME lending had been built by the acquired institutions and had never been re-validated under the acquiring bank's Model Risk Management framework.
The consequences were visible in the numbers. The bank's charge-off rate had risen from 1.8% to 2.7% over 18 months while peer institutions with more current models had held steady at 1.9 to 2.1%. The Chief Risk Officer estimated that 40 basis points of the 90-point deterioration was attributable to model performance degradation rather than macroeconomic factors, representing approximately $340 million in excess annual credit losses. Fixing the models was a material financial priority, not a technology initiative.
The constraint was not an absence of data. The bank held rich transaction, behavioral, and relationship data across 11 million customer accounts. What was absent was an organized approach to feature engineering, model architecture selection, and governance that would produce ML models capable of passing the bank's stringent SR 11-7 Model Risk Management validation requirements within a commercially useful timeframe.
The central challenge was not algorithmic. Gradient boosted ensembles and neural networks are well-established as outperforming logistic regression on default prediction tasks given sufficient data. The challenge was building ML models that would satisfy SR 11-7 model risk governance requirements without sacrificing the performance advantage that ML offers over traditional approaches. Financial services model risk teams have legitimate concerns about black-box ML models, and those concerns must be addressed architecturally, not just documented.
Specifically, the bank's MRM team required five things that purely predictive ML models typically do not provide by default:
We treated these requirements as design inputs, not compliance checkboxes. Every model was architected to satisfy all five requirements by design rather than retrofitted after algorithm selection.
We structured the program around three principles that guided every technical decision: SR 11-7 compliance by design, feature engineering depth over algorithmic complexity, and production-readiness from the first sprint.
Governance-First Architecture. Before writing a single line of model code, we produced a Model Development Plan for each of the nine target models documenting: purpose and scope, data sources and feature rationale, algorithm selection justification, performance metrics and acceptance thresholds, disparate impact testing methodology, and monitoring framework specification. The MRM team reviewed and approved each Model Development Plan before development commenced. This upfront governance investment eliminated the back-and-forth validation cycles that typically extend financial services AI programs by 6 to 18 months.
Feature Engineering as the Primary Value Driver. The most important technical work in this program was feature engineering, not algorithm selection. We built a feature library of 340 behavioral, transactional, and relationship features from the bank's existing data assets, including 90-day trailing payment behavior patterns, product utilization velocity signals, relationship depth indicators, and delinquency vintage curves that the legacy models had not incorporated. A feature selection process combining mutual information analysis, VIF screening for multicollinearity, and business logic review produced the final feature sets for each model.
XGBoost with SHAP as the Core Architecture. After evaluating five candidate architectures against the MRM requirements and performance targets, gradient-boosted decision trees with XGBoost were selected as the primary architecture for six of the nine models, with a monotone constraint framework applied to ensure directionally sensible feature relationships that MRM could validate. Individual-level SHAP values were computed in real-time for each origination decision, producing deterministic adverse action reason codes. Two models covering thin-file applicants used a neural architecture with attention mechanisms that provided interpretability through attention weight visualization. One model retained a constrained logistic regression architecture where a regulatory reporting requirement necessitated simple coefficient-based documentation.
Champion/Challenger Deployment Infrastructure. We built a model serving infrastructure supporting champion/challenger splits from day one, with 90% of traffic routed to new champion models and 10% to retained challengers (either updated legacy models or alternative ML architectures). Performance monitoring dashboards tracked PSI, Gini coefficient, KS statistic, and disparate impact indices daily for each model, with automated alerts at 80% of defined escalation thresholds.
Current model performance analysis across all 9 target models. Credit loss attribution analysis identifying model-driven versus macro-driven losses. Data asset inventory and quality profiling across 14 source systems. Model Development Plans drafted for all 9 models. MRM team review commenced. Output: approved Model Development Plans, feature data dictionary, governance framework.
340-feature library construction from production data. Feature selection process producing per-model final feature sets (ranging from 28 to 67 features per model). XGBoost and neural architecture training on 4-year historical dataset. Disparate impact testing across 7 protected categories for all models. SHAP adverse action reason code framework implemented and tested. Champion/challenger serving infrastructure built and load-tested. All 9 models passed MRM conceptual soundness review.
All 9 models run in shadow mode for 3 weeks alongside production legacy models. New model decisions compared to legacy decisions: 23% of decisions different, with new model predictions subsequently validated as more accurate at 3-month outcome. MRM validation documentation submitted and reviewed. Monitoring dashboard live and reviewed by MRM team. All 9 models received MRM approval for production deployment.
Mortgage origination model deployed week 9. Personal loan and auto loan models deployed week 10. SME lending models (3 models) deployed week 10. Credit limit management models (2 models) deployed week 11. Early warning model deployed week 11. All models live with 90/10 champion/challenger splits. Daily monitoring active. Initial performance at deployment: 34% higher Gini coefficient versus legacy models on concurrent comparison cohort.
Credit risk model programs in financial services fail in two predictable ways: they either produce technically strong models that cannot pass MRM validation, or they produce MRM-compliant models with marginal performance improvement over what they replace. This engagement delivered neither failure mode. Nine models, one validation cycle, and a financial impact that was visible in our charge-off rate within a quarter of deployment. That is not a typical outcome. The governance-first design methodology is the reason it happened.
Our senior advisors have built and validated credit risk models for retail banks, card issuers, consumer finance companies, and SME lenders in 18 countries. We can assess your current model performance gap and quantify the financial impact of a next-generation ML program for your specific portfolio.
Senior advisor response within 24 hours.