How to run RAG in production

Measure retrieval, model data, plan observability.

Why most RAG demos fail

Without evaluation and disciplined data modelling, RAG becomes fragile. This guide captures must-have patterns—alongside our delivery offerings.

1. Measurable retrieval quality

Define ground-truth sets, offline metrics, and regression tests before widening scope. Without a baseline, every change becomes opinion, not evidence.

2. Data model and access control

Attach tenant and role metadata early. Citations and filters only work when the underlying records are consistent.

3. Vector and hybrid search

Use pgvector when Postgres is already your source of truth; OpenSearch when hybrid search, logging, and analytics live there. Avoid splitting worlds without cause.

4. Operations and feedback loops

Instrument queries, latency, and user feedback. Use human review for edge cases—it improves datasets and eval suites.

Practice FAQ

  • Do we need pgvector and OpenSearch immediately?

    Not necessarily—decide from query patterns and existing ops skills.

  • What latency is realistic?

    Depends on index, batching, and network—measure end-to-end, not model time alone.

  • When to bring help?

    When IAM and core integrations matter or evaluation duties are mandatory.

RAG delivery with Devolute

We guide architecture, pilot, and handover with explicit deliverables.

  • Named products and brands are used for technical orientation and remain property of their respective owners. Mention does not imply endorsement, partnership, or availability guarantees for experimental software.

Contact form

Send us a short message and we usually reply within one business day.

Christian Wörle

Your contact person

Christian Wörle

Technical Lead

contact@devolute.org