Case Studies - Devverse Labs

ENGAGEMENTS

RAG Audit · Production Risk Discovery

Heritage Archive - RAG Audit and Production Risk Discovery

A RAG-based knowledge retrieval system for a cultural organisation. 30 books, two languages, and a high-stakes client handover - built with solid retrieval architecture but no production infrastructure around it.

47 pg

Audit report delivered

Production gaps identified

Critical (P0) vulnerabilities

Context

An engineering team had built a RAG-based knowledge retrieval system for a cultural foundation - a chatbot designed to answer questions about historical philosophy across 30 books in two languages. They had delivered the initial build and were preparing for client handover.

The Problem

The team came in asking for help with test queries. What the engagement actually required was an architecture audit first. The system had a solid RAG foundation - hybrid retrieval, cross-encoder reranking, LaBSE embeddings - but no production infrastructure around it. No input validation. No LLM fallback. No backup for the BM25 index stored as a single local file. Three critical failures waiting to happen, none of them visible without an audit.

Approach

Before running a single test query, the architecture was assessed against production readiness criteria - what happens when it breaks, what the cost exposure looks like, where a determined adversary could manipulate outputs.

Findings - 3 Critical (P0) Issues

BM25 index stored as a single local file with no backup - one disk event away from complete retrieval failure
No input validation layer - adversarial prompts could manipulate retrieval and poison outputs surfaced to end users
No LLM fallback defined - any model API outage would take the entire system offline with no graceful degradation path

Outcome

A 47-page audit report documented all 12 production gaps with severity classification, remediation recommendations, and a sequenced implementation plan. The team had a complete architectural picture before client handover - not a reactive fix list after a production failure.

Clinical RAG · Zero Hallucination Requirement

Specialist Dermatologist - Clinical Knowledge Retrieval with Zero Hallucination Tolerance

A practicing dermatologist with a 4,000–5,000 page reference textbook. Finding specific treatment protocols took 10 to 30 minutes per query in a clinical workflow where time is constrained and accuracy is non-negotiable.

<10 s

Query response time (vs 10–30 min)

Hallucinated outputs in production

100%

Responses cite chapter, volume, page

Context

The client is a practicing dermatologist who works daily with a 4,000–5,000 page reference textbook. Finding specific treatment protocols, drug interaction data, or diagnostic criteria could take 10 to 30 minutes per query - unacceptable in a clinical preparation workflow where time is constrained and accuracy is non-negotiable.

The Problem

The tools available - generic AI assistants, PDF search - either carried hallucination risk or were too slow to be useful. In a clinical context, a fabricated drug interaction or an unsupported dosing recommendation is not a retrieval error. It is a patient safety risk. The requirement was not a faster search. It was a retrieval system that could be trusted on every single output.

Approach

A closed-domain RAG system was built: no internet access, no generic model inference, retrieval constrained entirely to the indexed textbook - with source citations surfaced at the chapter, volume, and page level on every response.

Design Constraints Applied

Closed-domain retrieval only - no web access, no external model knowledge surfaced to end user
Every response cites the specific source: chapter, volume, and page number from the indexed textbook
Confidence scoring on every retrieval - low-confidence outputs flagged rather than hallucinated
Human-in-the-loop override: no AI output presented as definitive clinical guidance

Outcome

Query time reduced from 10–30 minutes to under 10 seconds. Zero hallucinated outputs in production - every response is directly grounded in the indexed textbook with a verifiable source citation. Practitioners can cross-reference any AI output against the physical source in under 30 seconds.

The work. Not the claim

Heritage Archive - RAG Audit and Production Risk Discovery

Specialist Dermatologist - Clinical Knowledge Retrieval with Zero Hallucination Tolerance

A pattern you recognise in these case studies is probably a pattern worth diagnosing