Why AI Auditing Is No Longer Optional
The EU AI Act, effective for high-risk AI systems from August 2026, mandates conformity assessments before deployment and ongoing monitoring documentation afterward. The Federal Reserve's SR 11-7 guidance on model risk management applies explicitly to AI models in financial services. EEOC guidance requires employers using AI in hiring decisions to conduct adverse impact analysis. Across sectors, the regulatory direction is the same: document what your AI systems are doing and prove they are doing it safely, fairly, and as intended.
Beyond regulation, the operational case for AI auditing is compelling. Enterprises that conduct regular AI audits detect model drift 4.7 times faster than those relying on incident reports. They identify fairness problems before they escalate into regulatory action. And they build the institutional knowledge of their AI portfolio that is required to govern it effectively.
The challenge is that most enterprises have not built audit capability. They have built model development capability and model deployment capability. The audit function — independent evaluation against defined standards — has been assumed rather than designed. This guide addresses that gap.
Under the EU AI Act, providers of high-risk AI systems must maintain technical documentation sufficient for competent authorities to assess the system's conformity with the Act. This documentation must be kept for 10 years after the system is placed on the market. Retroactive documentation is not accepted. The audit trail must exist from deployment.
Defining Audit Scope
The first decision in any AI audit program is what gets audited. Not all AI systems warrant the same audit intensity. A risk-based approach focuses audit resources where the exposure is highest.
Audit scope determination flows from your AI system inventory and risk classification. Systems classified as high-risk require comprehensive audits on a defined schedule. Medium-risk systems require periodic sampling audits. Low-risk systems require only automated monitoring with periodic review.
The factors that determine audit priority are the same factors that determine risk classification: the consequences of error, the affected population, the reversibility of decisions, the degree of human oversight in the existing process, and the maturity of the system. For most enterprises, the initial audit program should target the 20% of systems that represent 80% of the risk exposure.
A critical prerequisite: you cannot audit what you cannot inventory. Before building an audit program, complete a comprehensive AI system inventory. Include systems built internally, systems purchased from vendors, systems built by business units without central oversight, and models embedded in commercial software you deploy. Most enterprises discover substantially more AI systems than they thought they had.
The Six Dimensions of an AI Audit
A comprehensive AI audit evaluates a system across six dimensions. Each dimension requires distinct evaluation methodology and produces distinct evidence for the audit record.
The Audit Methodology
AI audits follow a six-phase methodology. Each phase produces documented outputs that together constitute the audit record. The audit record must be complete enough that a different auditor reviewing the same documentation would reach the same conclusions.
- Request production inference logs for the audit period
- Obtain training data documentation and provenance records
- Interview model owner, data scientists, and operational stakeholders
- Review all governance artifacts from the model's lifetime in production
Need an Independent AI Audit?
Our practitioners conduct AI audits that satisfy regulatory requirements and produce findings boards and regulators take seriously. Start with a scoping conversation.
Schedule an Audit Scoping Call →The Finding Severity Framework
AI audit findings must be classified by severity to prioritize remediation and communicate risk to governance bodies. This four-tier framework is aligned with regulatory expectations and governance best practice.
| Severity | Definition | Remediation Timeline | Escalation |
|---|---|---|---|
| CRITICAL | Active harm occurring or imminent. Regulatory violation confirmed. Fundamental fairness failure affecting a protected class. Immediate suspension of model operation may be required. | 5 days for plan; 30 days for remediation or interim controls | Board notification required. Regulatory notification may be required. |
| HIGH | Significant risk of harm or regulatory violation. Fairness metrics materially outside defined bounds. Material performance degradation. Documentation insufficient for regulatory review. | 30 day remediation plan; 90 day completion | CAO notification. Governance committee review within 2 weeks. |
| MEDIUM | Risk present but not immediately materializing. Process gaps that could become high-severity under adverse conditions. Documentation incomplete but governance controls functioning. | 90 day remediation plan; 180 day completion | Governance committee tracking. Monthly progress reporting. |
| LOW | Best practice not followed. Minor documentation gaps. Efficiency or quality improvements available that do not constitute risk exposures. | Next release cycle | Model owner tracking. Annual audit follow-up. |
Documentation Standards That Hold Up
AI audit documentation must satisfy two audiences: internal governance leadership who need to manage risk, and regulators who may review documentation during examination. The documentation standards must be calibrated to the more demanding of the two, which in regulated industries is almost always the regulator.
The audit record for each system must contain:
- System identification: System name, version, deployment environment, purpose, affected population, and decision type.
- Audit scope and limitations: What was evaluated, what evaluation methodology was used, and what limitations applied to the evaluation (data availability, sample sizes, access restrictions).
- Evidence catalog: Index of all evidence collected, with source, date, and how each piece of evidence was used in the evaluation.
- Quantitative results: All quantitative evaluation results in tabular form, with comparison to defined acceptance criteria.
- Finding register: All findings with severity classification, evidence basis, remediation recommendation, model owner response, and remediation status.
- Auditor attestation: Auditor identity (internal or external), independence declaration, and attestation that the evaluation was conducted in accordance with the defined methodology.
A critical test for documentation adequacy: could a regulator who had never seen this system reconstruct what it does, who it affects, how its risks were assessed, and what was done about identified problems? If not, the documentation is insufficient.
AI Governance Handbook
Complete AI audit templates, finding classification frameworks, and documentation checklists aligned with EU AI Act and sector-specific regulatory requirements.
Download Free →Setting the Right Audit Cadence
Risk-based audit cadence means high-risk systems are audited more frequently than low-risk systems. The baseline cadence for most enterprise AI governance programs:
High-risk systems (EU AI Act high-risk classification, or systems where errors cause material individual harm): comprehensive audit annually, with continuous automated monitoring and quarterly metric review in between. Any significant change to model architecture, training data, or deployment context triggers an out-of-cycle audit.
Medium-risk systems: comprehensive audit every two years, with annual sampling review. Material incidents trigger out-of-cycle audit.
Low-risk systems: audit incorporated into annual governance review. Automated monitoring provides the primary assurance.
The cadence must also account for model changes. A model that appears stable on its original training distribution may be drifting silently. Any model that has not been re-evaluated in 24 months should be considered due for comprehensive review regardless of its risk classification, because the deployment environment changes even when the model does not.
Internal Versus External Auditors
The independence question is central to AI audit credibility. An audit conducted entirely by the team that built and operates the model does not produce independent assurance. It produces self-assessment. Self-assessment has value, but it is not an audit and should not be represented as one.
The practical resolution for most enterprises is a hybrid model. Internal audit teams, working with AI governance expertise, conduct the routine annual audit cycle. Independent external auditors are brought in for: initial program establishment, high-stakes systems where regulatory credibility is paramount, any system that is the subject of regulatory inquiry, and periodic validation that the internal audit methodology is sound.
External auditors provide independence and regulatory credibility that internal teams cannot self-certify. They also bring pattern recognition from auditing AI programs across multiple organizations that is not available to internal teams working within a single enterprise.
For the governance framework into which auditing fits, see our enterprise AI governance framework guide. For how auditing connects to bias management, see our AI bias and fairness guide. For the responsible AI operating model context, see our responsible AI practical guide.
Building the Audit Function
Standing up an enterprise AI audit capability requires three things: methodology, tooling, and people. Enterprises that approach this in the wrong sequence spend 18 months building the wrong thing.
Start with methodology. Define what audits evaluate, how they evaluate it, what evidence is required, and how findings are classified. The methodology document should be reviewed by legal and compliance before any audit work begins, because it defines the standard against which systems will be assessed and shapes the legal defensibility of the audit record.
Then address tooling. Audit tooling includes bias evaluation libraries, model interpretability tools, performance monitoring infrastructure, and documentation management systems. Select tooling that integrates with your existing MLOps environment rather than creating parallel data pipelines that are expensive to maintain.
Then hire or develop the people. An AI audit function needs data scientists who understand model evaluation methodology, governance professionals who understand regulatory requirements, and an audit lead with enough organizational authority to report findings without fear of reprisal from business units whose models receive critical findings. That last requirement is often the hardest to satisfy.
To explore how our advisors can help establish or strengthen your AI audit capability, visit our AI Governance service page or start with our free AI assessment.
Build AI Audit Capability That Satisfies Regulators
Our senior advisors help enterprises establish AI audit programs, conduct independent audits, and build the documentation infrastructure that satisfies regulatory review.