What Vendors Discuss and What They Don't

Every GenAI vendor has a risk slide. It addresses hallucination in general terms, notes that the technology is evolving, and recommends human oversight. It does not address the specific production failure modes that cost enterprises real money. It does not explain how the risk profile changes when you move from a demo environment to production at scale. It does not tell you what happens when a regulatory body examines your AI-generated customer communications.

We have been in the room when these incidents happen. A major European bank discovered that its customer-facing GenAI assistant had been providing subtly incorrect information about loan product terms to thousands of customers over four months. The error rate was small enough not to trigger automated quality monitoring but significant enough to create regulatory exposure. The vendor's response: the system was performing as designed, and human review requirements were disclosed in the contract.

This is not a failure of vendor malice. It is a structural misalignment: vendors are incentivized to close deals, not to ensure production success. Independent advisors who do not earn referral fees or vendor commissions are structurally positioned to give you the assessment your vendor cannot. What follows is that assessment.

$8.4M
Average cost of a significant AI-related incident across regulated industries, including regulatory penalties, remediation costs, customer compensation, and reputational damage. Most incidents trace to governance gaps that were visible before deployment. Source: AI Advisory Practice analysis.

The Real Risk Inventory: Nine Risks That Actually Materialize

The following risks are drawn from actual enterprise production incidents, not from theoretical threat models. Each has been observed in multiple enterprise environments.

01
Severity: High
Confident Hallucination at Scale
Language models generate confident-sounding responses regardless of whether the retrieved context supports the claim. Unlike traditional software that returns an error when data is unavailable, GenAI generates a plausible answer. At low query volumes this is manageable. At enterprise scale, a 1% hallucination rate on a system processing 100,000 queries daily produces 1,000 incorrect responses every day.
A financial services firm deployed a RAG system to answer advisor questions about product features. The model hallucinated a product feature that did not exist in 3.2% of queries. At 8,000 daily queries, this produced 256 incorrect advisor answers daily before detection at week six.
Mitigation
Design for confident abstention. Systems must detect when retrieved context does not support the query and respond with "I cannot find that information" rather than generating an answer. Implement automated response auditing with semantic similarity scoring against source documents.
02
Severity: High
Data Exfiltration Via Prompt Injection
Prompt injection attacks embed instructions in user input or retrieved documents that cause the model to behave differently than intended. In RAG systems where the model has access to multiple documents, an attacker can craft input that causes the model to exfiltrate content from documents it retrieved for the query. This is not theoretical: it has been demonstrated on production systems.
A legal research assistant processed opposing party documents submitted for review. An opposing counsel with knowledge of the system embedded instructions in a submitted document that caused the model to include internal case strategy notes in its response summary.
Mitigation
Implement prompt injection detection at the input layer. Treat all user-submitted content and external documents as potentially adversarial. Use output filtering to detect unexpected document references. Restrict tool access to the minimum required for each use case.
03
Severity: High
Regulatory Non-Compliance in Customer Communications
Financial services, healthcare, and insurance regulators have specific requirements for customer-facing communications: disclosure requirements, fair dealing standards, and prohibition on misleading statements. GenAI systems generating customer-facing content can violate these requirements in ways that are difficult to detect through standard quality monitoring because the violations are subtle rather than obvious.
A top 15 insurer's customer service GenAI included coverage language that technically complied with its system prompt but omitted a required disclosure that human agents were trained to include. The omission was compliant in most jurisdictions but created exposure in two US states with additional disclosure requirements.
Mitigation
Mandatory legal and compliance pre-approval for any GenAI system generating customer-facing content. Implement jurisdiction-specific constraint libraries that are version-controlled and updated when regulatory requirements change. Treat GenAI-generated customer communications as a regulated activity, not a technology project.
04
Severity: High
Unauthorized Data Access Through Permission Gaps
Enterprise GenAI systems that index organizational knowledge bases frequently contain documents with different access levels. When permission enforcement is implemented at the output layer rather than at retrieval, users can access content they are not authorized to see by asking questions that cause the model to include restricted content in its response.
An internal legal Q&A system indexed the general counsel's complete file structure, including privileged memoranda. A business unit employee asked a question about a regulatory matter. The model retrieved a privileged memo from the response and incorporated its contents into the answer. The memo was marked confidential but the permission check was applied to the final response, not to document retrieval.
Mitigation
Enforce permissions at the retrieval layer. Documents that a user cannot access should not enter the retrieved context at all. A post-generation filter on output is not adequate governance. Audit your vector database access control architecture before deployment.
05
Severity: Medium-High
Model Performance Degradation After Deployment
GenAI systems degrade in production for reasons that traditional monitoring does not catch. The underlying model may be updated by the vendor. The document base may evolve so retrieved context becomes less relevant. User behavior may drift toward query types outside the system's designed scope. Without continuous evaluation, you will not detect degradation until users complain or an incident forces investigation.
A procurement team's GenAI research assistant performed well for 90 days after deployment. A vendor updated the underlying model weights without advance notice. Response quality on the specific procurement negotiation tasks the system was designed for declined 23% by standard RAGAS metrics. The team attributed the degradation to user experience issues for six weeks before the technical cause was identified.
Mitigation
Implement continuous automated evaluation against a representative golden dataset. Establish performance thresholds that trigger human review. Require vendor contractual notification for model updates that may affect performance. Do not assume deployment-time performance persists without measurement.
06
Severity: Medium-High
Intellectual Property and Training Data Exposure
When organizational content is submitted to a GenAI vendor's API without a clear data processing agreement, that content may be used for model training. This creates two distinct risks: your proprietary data may influence future model outputs for competitor users, and you may not have clear IP ownership of AI-generated content in jurisdictions where training data provenance matters.
A professional services firm's employees were using a consumer-grade ChatGPT account to draft client deliverables. The firm's proprietary methodologies and client-specific information were being submitted to OpenAI's API under default terms that permitted training data use. The firm discovered this during an information security audit, not through proactive risk management.
Mitigation
Never use consumer-grade GenAI accounts for enterprise content. Require enterprise agreements with explicit opt-out from training data use. Implement a shadow AI policy and monitoring program. Review data processing agreements for every GenAI tool in use across the organization.
07
Severity: Medium
EU AI Act High-Risk Classification
The EU AI Act's high-risk classification covers systems used in employment decisions, credit decisions, biometric identification, critical infrastructure, and several other categories. Organizations that classify their GenAI systems as low-risk tools without legal review of the actual use case may be deploying unregistered high-risk AI systems. The penalties are significant: up to 3% of global annual revenue for high-risk violations.
A UK-based insurer deployed a GenAI system to assist underwriters with risk assessment questions. Legal counsel was not consulted before deployment. A post-deployment review identified that the system fell within the EU AI Act's high-risk category due to its involvement in insurance underwriting decisions for EU customers. The compliance program required significant remediation.
Mitigation
Conduct EU AI Act classification review for every GenAI deployment before go-live. Legal and compliance teams must participate in use case definition, not just security review. Build a GenAI system registry that tracks risk classifications and compliance status for all deployed systems.
08
Severity: Medium
Vendor Lock-In and Switching Cost Underestimation
Enterprise GenAI systems embed vendor dependencies at multiple layers: the underlying model API, the vector database, the orchestration framework, and the fine-tuning data. When the vendor relationship changes through pricing increases, capability changes, or acquisition, the switching cost is substantially higher than the organization anticipated when it signed the initial contract.
A Fortune 500 manufacturer built its manufacturing process Q&A system on a specialized GenAI vendor's proprietary orchestration framework. When the vendor raised per-query pricing by 340% following a Series C funding round, the cost to migrate to an alternative architecture was estimated at $2.1M and 8 months, compared to the $800K original build cost.
Mitigation
Design for portability from the start. Use open standards for document indexing and retrieval where possible. Negotiate data portability and model portability clauses in vendor contracts. Avoid proprietary orchestration frameworks for core enterprise systems without exit strategy analysis.
09
Severity: Medium (but growing)
GenAI in Agentic Tasks: Cascading Error Risk
As organizations move from GenAI as a drafting tool to GenAI as an autonomous agent executing multi-step tasks, the risk profile changes fundamentally. A hallucination in a drafted document can be caught in review. A hallucination in step two of a seven-step automated workflow can propagate through subsequent steps and execute irreversible actions before detection.
A financial operations team deployed an agentic workflow to process vendor invoices: classify, extract data, match to purchase order, and approve payment below $5,000. The model misclassified a $4,800 invoice and approved payment to a fraudulent vendor that had submitted a realistic-looking invoice. The automated approval bypassed the human review step that would have caught the mismatch.
Mitigation
Every irreversible action in an agentic workflow requires a human authorization checkpoint. Start with fully reversible task automation only. Define explicit error recovery procedures for each workflow step before deployment. Never allow agentic AI to execute financial transactions without senior-level approval architecture.

Ten Questions Your Vendor Should Answer Before You Deploy

The following questions are designed to expose gaps in vendor governance claims. A vendor that cannot answer these questions with specificity has not built an enterprise-grade system.

01
What is the measured hallucination rate for this specific use case on a representative dataset? Not benchmark performance on public datasets. Your use case, your data, your error rate.
02
How does the system behave when retrieved context does not support the query? Show us examples of abstention responses, not just correct answers.
03
Does your enterprise agreement explicitly exclude our data from training purposes? Get the specific contract clause reference, not a general assurance.
04
How are document-level permissions enforced at retrieval time, not output filtering? Walk us through the architecture for a user who should not have access to a specific document class.
05
What advance notice do you provide before model weight updates that may affect performance? And what remediation do you offer if performance degrades after an update?
06
What are our data portability rights if we want to migrate to a different provider? Specifically: our fine-tuned model weights, our vector index, our prompt library, and our evaluation datasets.
07
What is your EU AI Act risk classification methodology for our specific use case? Have your legal team reviewed this classification?
08
What prompt injection defenses are implemented, and what penetration testing has been conducted? Request the penetration test report summary, not the marketing description.
09
How do you detect and alert on system performance degradation in production? Show us the monitoring dashboard, not the slide describing it.
10
What is included in your incident response SLA and how quickly must you notify us of a data breach affecting our content? The contract language matters more than verbal assurances.
Research Report
Enterprise AI Governance Handbook (56 pages)
Four-tier risk classification, EU AI Act compliance roadmap, model lifecycle governance, ethics and fairness program design, and board reporting frameworks. Essential governance reading for any enterprise GenAI program.
Download Free →

Why Independent Risk Assessment Matters More Than Vendor Assurances

Vendors are not the right source for GenAI risk assessment. This is not a criticism of vendor intentions; it is an observation about structural incentives. A vendor that acknowledges production risk creates friction in the sales process. An independent advisor who surfaces the same risk creates value by helping you avoid a production incident.

Our AI Governance advisory practice includes pre-deployment risk assessments that specifically evaluate the nine risk categories above against your proposed use case, data environment, and organizational context. We do not earn referral fees or vendor commissions. Our only incentive is helping you deploy systems that work safely in production.

If you are planning a GenAI deployment in the next six months, the governance architecture decision should happen before the technology selection decision. What you put in place around the system determines whether it creates value or creates incidents. The most important thing our governance-first clients have in common is that they made risk assessment the first step, not an afterthought.

Get an independent GenAI risk assessment
Our senior advisors evaluate your proposed GenAI deployment against the nine production risk categories in this article. No vendor relationships, no referral fees. Delivered in two weeks by practitioners who have seen these incidents happen.
Request Risk Assessment →
Take the free AI readiness assessment
Evaluate your governance posture, data environment, and organizational readiness for GenAI deployment before committing to a platform or vendor.
Start Free Assessment →
The AI Advisory Insider
Weekly intelligence on enterprise AI strategy, governance, and deployment. Read by 12,000+ senior leaders.