Haize Labs has developed LosslessRAG, a low-latency, high-accuracy RAG method that entirely avoids a category of hallucinations. This is achieved by leveraging infinigram search to ground answers exactly in reference data.

  1. Introduction
  2. Architecture
  3. Experiments
    1. Single-File Factual Lookup
    2. Multi-File Reasoning
    3. Github Issue Handling
  4. LiteraryQA
    1. Pushing the Speed-Accuracy Frontier

Introduction

Embedding-based RAG is well-adopted. However, it suffers from one major flaw: embedding models’ semantics are grounded in a static training set. This means that in domain-specific retrieval and Q&A, e.g. ASIC development, traditional RAG systems often hallucinate.

Architecture

LosslessRAG manages both high recall and precision.

We let an RLM—essentially agents coordinating subagents—decide the mechanics of how to perform query expansion, pruning, summarization, and answer generation.

Figure 1

Figure 1

We evaluate against two baseline architectures.

Figure 2

Figure 2

Figure 3

Figure 3

Experiments

We evaluate all three architectures on the combined FBOSS and SONiC codebases, including their associated wiki documentation. For Infinigram retrieval, we build a character-level U16 suffix array over the full corpus using the SAIS algorithm, producing a roughly 1.1 GB index that runs entirely on CPU. For embedding retrieval, we chunk the same files and embed them with OpenAI's text-embedding-3-large into a ChromaDB vector store. We measure the following: