Why Vector Databases Are Now a Production Necessity

Generative AI adoption in enterprise has produced a new class of infrastructure requirement: the ability to search large collections of embeddings at low latency. A retrieval-augmented generation system that takes three seconds to find relevant context before generating a response is not a product. It is a demo.

Traditional relational databases handle vector similarity search poorly. PostgreSQL with pgvector works at small scale and is excellent for teams that already run Postgres and need basic semantic search. At 100 million vectors with sub-100ms p99 latency requirements, it starts to show its limits. Dedicated vector databases exist precisely to handle these workloads at production scale.

The question enterprises now face is not whether to adopt a vector database. It is which one, in what deployment model, and integrated how deeply into the existing data platform. The wrong answers to those questions create migration costs that tend to materialise eighteen months after the initial selection.

Selection Context

Vector database selection is not primarily a technical decision. It is a total cost of ownership decision with technical constraints. The cheapest option at 10 million vectors is often not the cheapest option at 500 million vectors. Factor in operational cost, not just licensing.

The Evaluation Dimensions That Actually Matter

Most vendor comparison articles for vector databases focus on benchmark query speeds. Benchmarks matter, but they are rarely the deciding factor for enterprise selection. The dimensions that actually drive enterprise decisions are more operational than algorithmic.

01
Query Latency at Production Scale
p95 and p99 latency at your anticipated vector count, not the vendor's demo dataset. Request benchmarks at 10x your current scale. Latency profiles change significantly above 50 million vectors.
Ask for: p99 latency at 100M vectors, 1K QPS
02
Hybrid Search Support
Pure vector similarity search is insufficient for most enterprise use cases. You need to filter by metadata (date, category, user ID) simultaneously with the vector search. Not all systems handle this without performance degradation.
Ask for: latency impact of adding 3-field metadata filter
03
Data Residency and Security
For regulated industries, where vector data is stored and processed matters. Embeddings of sensitive documents can be partially reverse-engineered. Your security team needs to approve the data residency model before you commit to a vendor.
Ask for: SOC2 report, data residency options, encryption key management
04
Total Cost of Ownership at Scale
Managed SaaS has low operational cost but high per-query cost at volume. Self-hosted has low per-query cost but requires engineering time for operations. Model the cost at 3x your current usage before committing.
Ask for: cost at 500M vectors, 10K QPS, 6 months retention
05
Operational Maturity and Support
Vector databases are a young category. Many are less than three years old. Evaluate GitHub activity, documented breaking changes, enterprise SLA availability, and the vendor's financial stability. You are building production dependency here.
Ask for: enterprise SLA terms, upgrade path history, named support engineer
06
Integration with Existing Data Stack
A vector database that requires a separate ETL pipeline from your existing data platform doubles your operational surface area. Prefer options that integrate with your existing orchestration, monitoring, and access control patterns.
Ask for: Spark connector, IAM integration, existing monitoring compatibility

The Major Options: A Practical Assessment

This is not a comprehensive benchmark. It is an honest practitioner assessment of where each major option fits and where it does not. Benchmarks change with every release; architectural fit changes rarely.

Pinecone
Managed SaaS
Fastest time to production of any option
Excellent managed scaling and uptime
Highest cost per query at volume
No self-hosted option
Strong enterprise sales and support tier
Best for: teams that need production quickly and can absorb managed SaaS cost
Weaviate
Open Source / Managed
Strong hybrid search (BM25 + vector)
Rich filtering with ACORN algorithm
Self-hosted and managed options
Active open source community
More complex operational overhead self-hosted
Best for: use cases requiring strong hybrid search and flexible deployment
Qdrant
Open Source / Managed
Highest query throughput in benchmarks
Rust-based: low memory footprint
Payload filtering with minimal latency penalty
Smaller enterprise customer base than Pinecone
Good multi-vector support
Best for: high-throughput, cost-sensitive workloads where engineering can operate self-hosted
Milvus
Open Source / Managed (Zilliz)
Designed for billion-scale vector workloads
Distributed architecture for extreme scale
Complex operational requirements
Strong Kubernetes-native deployment
Zilliz Cloud for managed option
Best for: very large-scale use cases (1B+ vectors) with dedicated platform engineering
pgvector
PostgreSQL Extension
Zero new infrastructure if you run Postgres
ACID compliance and JOIN support
Performance ceiling at 10M vectors
No managed horizontal scaling
Rapid improvement with pgvectorscale
Best for: existing Postgres users with moderate vector volumes and strong SQL requirements
Cloud-Native (AWS, GCP, Azure)
Managed Cloud Services
Deep integration with cloud data platforms
Familiar IAM and billing
Vendor lock-in risk highest here
Performance often trails dedicated solutions
Strong if already all-in on one cloud
Best for: organisations fully committed to a single cloud with tight integration requirements

Decision Framework: Which Option for Which Use Case

RAG for Internal Documents
Enterprise knowledge base, policy search, internal copilots
Vector count typically under 50 million. Hybrid search (keyword + semantic) is important for precise document retrieval. Low to medium QPS. Metadata filtering by department, date, and document type is standard.
Recommendation: Weaviate (self-hosted) or pgvector for Postgres shops. Pinecone if speed to production is critical.
Customer-Facing Semantic Search
Product search, content discovery, e-commerce recommendation
High QPS with strict latency requirements. Catalogue vectors often exceed 100 million for large retailers. Real-time index updates as inventory changes. Cost per query matters significantly at scale.
Recommendation: Qdrant self-hosted for cost efficiency, Pinecone for managed simplicity. Avoid pgvector above 20M vectors here.
Fraud and Risk Signals
Entity similarity, transaction pattern matching, anomaly detection
Extreme latency sensitivity (sub-20ms). Data residency and security requirements are strict. Metadata filtering complexity is high. Volume can be very large for transaction-level embeddings.
Recommendation: Qdrant or Milvus self-hosted in private cloud. Cloud-native if security review approves. Avoid shared SaaS for PII-adjacent embeddings.
Multimodal Search (Image, Audio, Video)
Visual search, media asset management, content moderation
High-dimensional embeddings (512 to 2048 dimensions). Larger storage requirements per vector. Multi-vector indexing where both image and text embeddings represent the same asset.
Recommendation: Qdrant (strong multi-vector support) or Milvus for very large collections. Test dimension reduction strategies to manage storage cost.

The Hidden Cost: Embedding Storage at Scale

Vector database costs are often underestimated because the vector count at project inception is not the vector count eighteen months later. A document management system that starts at 5 million vectors grows with every document processed. A recommendation system that vectors every product interaction can reach billions of vectors in a mature e-commerce business.

The cost of storing a float32 embedding at 1536 dimensions (OpenAI ada-002 standard) is approximately 6KB per vector. At 100 million vectors, that is roughly 600GB of raw storage before indexing overhead. Most vector databases add 1.5x to 3x storage overhead for their index structures. Plan for 2TB at 100 million vectors and budget accordingly.

Quantisation reduces this significantly. int8 quantisation reduces storage by 4x with typically 1 to 3% recall degradation. Binary quantisation reduces by 32x with higher recall loss but can be acceptable for certain use cases. Both Qdrant and Weaviate have strong quantisation support. Factor this into your cost model when comparing managed options.

Cost Planning Note

Build a cost model at 3x, 10x, and 30x your current vector count before committing to a managed SaaS option. The per-vector pricing that looks reasonable at 5 million vectors often becomes the dominant line item in your AI infrastructure budget at 200 million. The vendor selection advisory work we do always includes a 3-year TCO model before any recommendation.

Integration Architecture: Where Vector Databases Sit

A vector database is a component of a larger AI data architecture, not a standalone system. How it integrates with the existing data platform determines operational overhead more than the choice of vendor.

The two integration questions that matter most: first, where do embeddings get generated and how do they flow into the vector store? Second, how is the vector store kept in sync when source data changes?

The embedding generation pipeline typically runs as part of the data lake processing layer: source document arrives, triggers an embedding job, vector is upserted into the store with its metadata. This requires a durable queue (Kafka, SQS) between the data platform and the vector store to handle backpressure when embedding throughput exceeds ingestion capacity.

The synchronisation problem is harder. When a document is updated, its embedding changes. When a document is deleted, its vector must be removed. Systems that handle this correctly maintain an audit log of document-to-vector-ID mappings and run reconciliation jobs. Systems that handle this poorly accumulate stale vectors that degrade retrieval quality over time. This is not a vector database problem. It is a data engineering problem that manifests as a vector database problem.

Evaluating a Vendor: What to Ask Before You Sign

Beyond the technical evaluation, enterprise vendor selection for infrastructure components involves contractual and operational questions that data scientists often skip. Before committing to any vector database vendor at enterprise scale, get answers to these questions in writing:

  • Data export: What format is data exported in? How long does a full export take at your scale? Is there an API or only a web interface?
  • Service level agreement: What is the SLA for query latency, not just uptime? What is the remediation process and compensation structure for SLA breaches?
  • Version compatibility: What is the upgrade path between major versions? How many breaking changes occurred in the last three major releases?
  • Support tier: Does enterprise support include a named technical account manager? What is the escalation path for production incidents at 3am?
  • Security certifications: SOC 2 Type II, ISO 27001, and any sector-specific certifications your procurement process requires.

These questions are not bureaucratic. They are the questions you will wish you had asked when your vendor has a three-hour outage during a peak business period and the SLA document turns out to cover only uptime, not latency.

When to Consolidate Vector Search into an Existing Platform

Not every team needs a dedicated vector database. The case for pgvector is stronger than its performance characteristics suggest, for teams in the right situation. If you run Postgres, have a skilled DBA team, and your vector use case is internal with moderate scale, pgvector gives you ACID transactions, JOIN capability, a single operational stack, and familiar tooling. The operational cost savings over a dedicated vector database can justify a performance compromise that would be unacceptable for a customer-facing use case.

Similarly, if your data platform lives entirely within a single cloud provider, the native vector offerings from AWS (OpenSearch with k-NN), GCP (Vertex Matching Engine), and Azure (AI Search) are worth evaluating seriously. They are not the performance leaders but they have the lowest operational overhead for teams already deep in that cloud ecosystem. The deeper your cloud commitment, the more attractive the managed native option becomes.

The decision tree is simple: if your use case is internal, your scale is moderate, and you run Postgres, start with pgvector and evaluate dedicated options only when you hit limits. If your use case is customer-facing, high-volume, or involves sensitive data with strict residency requirements, evaluate dedicated options from the start. See our AI data strategy advisory for help structuring this decision within your existing platform architecture.

Summary: Making a Decision That Holds at Scale

The vendor you select in a two-week proof of concept is usually fine for the proof of concept. The problem surfaces when production scale is 10x the POC and the cost model changes, or when your security team reviews the data residency terms, or when the vendor has a major version release with breaking API changes.

Select based on your anticipated scale in two years, not your current scale. Preference vendors with open storage formats and documented migration paths. Quantify TCO at 10x current volume before signing. And if you cannot answer the question "how do we migrate to a different vector database if this vendor is acquired next year," make sure you can before you go to production.

For teams building the infrastructure strategy around multiple AI data components, the enterprise data lake architecture guide covers how vector storage integrates with the broader data platform. The AI vendor selection advisory provides structured evaluation frameworks and reference architectures for these decisions.