PRODUCTION EVIDENCE

The work. Not the claim

Every case study here documents a real engagement - the failure mode found, the approach taken, and the outcome measured. Clients are named where consent is confirmed. Where anonymisation is required, the technical detail is preserved so the work can be evaluated on its own terms

ENGAGEMENTS
RAG Audit · Production Risk Discovery

Heritage Archive - RAG Audit and Production Risk Discovery

A RAG-based knowledge retrieval system for a cultural organisation. 30 books, two languages, and a high-stakes client handover - built with solid retrieval architecture but no production infrastructure around it.

47 pg
Audit report delivered
12
Production gaps identified
3
Critical (P0) vulnerabilities

An engineering team had built a RAG-based knowledge retrieval system for a cultural foundation - a chatbot designed to answer questions about historical philosophy across 30 books in two languages. They had delivered the initial build and were preparing for client handover.

The team came in asking for help with test queries. What the engagement actually required was an architecture audit first. The system had a solid RAG foundation - hybrid retrieval, cross-encoder reranking, LaBSE embeddings - but no production infrastructure around it. No input validation. No LLM fallback. No backup for the BM25 index stored as a single local file. Three critical failures waiting to happen, none of them visible without an audit.

Before running a single test query, the architecture was assessed against production readiness criteria - what happens when it breaks, what the cost exposure looks like, where a determined adversary could manipulate outputs.

Findings - 3 Critical (P0) Issues
  • BM25 index stored as a single local file with no backup - one disk event away from complete retrieval failure
  • No input validation layer - adversarial prompts could manipulate retrieval and poison outputs surfaced to end users
  • No LLM fallback defined - any model API outage would take the entire system offline with no graceful degradation path

A 47-page audit report documented all 12 production gaps with severity classification, remediation recommendations, and a sequenced implementation plan. The team had a complete architectural picture before client handover - not a reactive fix list after a production failure.

Clinical RAG · Zero Hallucination Requirement

Specialist Dermatologist - Clinical Knowledge Retrieval with Zero Hallucination Tolerance

A practicing dermatologist with a 4,000–5,000 page reference textbook. Finding specific treatment protocols took 10 to 30 minutes per query in a clinical workflow where time is constrained and accuracy is non-negotiable.

<10 s
Query response time (vs 10–30 min)
0
Hallucinated outputs in production
100%
Responses cite chapter, volume, page

The client is a practicing dermatologist who works daily with a 4,000–5,000 page reference textbook. Finding specific treatment protocols, drug interaction data, or diagnostic criteria could take 10 to 30 minutes per query - unacceptable in a clinical preparation workflow where time is constrained and accuracy is non-negotiable.

The tools available - generic AI assistants, PDF search - either carried hallucination risk or were too slow to be useful. In a clinical context, a fabricated drug interaction or an unsupported dosing recommendation is not a retrieval error. It is a patient safety risk. The requirement was not a faster search. It was a retrieval system that could be trusted on every single output.

A closed-domain RAG system was built: no internet access, no generic model inference, retrieval constrained entirely to the indexed textbook - with source citations surfaced at the chapter, volume, and page level on every response.

Design Constraints Applied
  • Closed-domain retrieval only - no web access, no external model knowledge surfaced to end user
  • Every response cites the specific source: chapter, volume, and page number from the indexed textbook
  • Confidence scoring on every retrieval - low-confidence outputs flagged rather than hallucinated
  • Human-in-the-loop override: no AI output presented as definitive clinical guidance

Query time reduced from 10–30 minutes to under 10 seconds. Zero hallucinated outputs in production - every response is directly grounded in the indexed textbook with a verifiable source citation. Practitioners can cross-reference any AI output against the physical source in under 30 seconds.


METHODOLOGY NOTE

Devverse Labs documents engagements in full - the failure mode diagnosed, the architecture decisions made, and the measurable outcome at handover. Where clients have consented to named publication, the complete engagement record is available. Where confidentiality requires anonymisation, the technical specifics are preserved: the framework applied, the gap identified, the production criteria met. What is never published is an outcome metric without a documented methodology behind it. Case studies here are evidence, not marketing.

A pattern you recognise in these case studies is probably a pattern worth diagnosing

Book the APMM Diagnostic

30 minutes · Written follow-up within 24 hours · No pitch