聊聊langchain4j的RAG

序

本文主要研究一下langchain4j的RAG

概述

RAG(Retrieval-Augmented Generation)即检索增强生成，它通过检索来获取相关信息，注入到prompt，然后用增强的prompt然后输入给LLM让LLM在回答的时候能够利用检索到信息，从而降低幻觉。常见的信息检索方法包括：全文(关键词)搜索、向量搜索(语义搜索)、混合搜索。目前langchain4j以向量搜索为主(例如通过Qdrant等向量数据库构建高效检索系统)，后续会扩展支持全文搜索及混合搜索(目前Azure AI Search支持，详细见AzureAiSearchContentRetriever)。

RAG可以分为两步：索引、检索。

索引

索引阶段可以对文档进行预处理以便在检索阶段实现高效搜索，这一步不同的检索方法有所不同。对于向量搜索，通常包括：

清理文档：去除噪音数据，统一格式
使用额外数据及元数据增强：增加文档来源、时间戳、作者等辅助信息
分块：将长文档分割为更小的语义单元，以适配嵌入模型的上下文窗口限制
向量化：使用嵌入模型将文本块转换为向量
向量存储：存储到向量数据库
索引阶段通常是离线进行的，这意味着不需要终端用户等待其完成。例如，可以通过一个定时任务（cronjob）每周在周末重新索引公司内部文档。负责索引的代码也可以是一个独立的应用程序，专门处理索引任务。
在某些情况下，终端用户可能希望上传自定义文档以使其能够被大型语言模型（LLM）访问。在这种情况下，索引应在线进行，并成为主应用程序的一部分。

检索

检索阶段通常在线上进行，当用户提交一个问题时，该问题需要使用索引过的文档来回答。这一过程可能会根据所使用的信息检索方法而有所不同。对于向量搜索，这通常涉及将用户的查询（问题）嵌入到向量表示中，并在嵌入存储库中执行相似性搜索。然后，将相关段落（原始文档的片段）注入到提示中，并发送给大型语言模型（LLM）。

实现

LangChain4j 提供了三种RAG（Retrieval-Augmented Generation，检索增强生成）的实现方式：

Easy RAG：这是最简单的方式，适合初学者快速上手。用户只需将文档丢给LangChain4j，无需了解嵌入、向量存储、正确的嵌入模型选择以及如何解析和拆分文档等复杂内容。这种方式非常适合快速体验 RAG 功能。
Naive RAG：这是一种基础的RAG实现方式(使用向量搜索)，主要通过简单的索引、检索和生成过程完成任务。它通常涉及将文档拆分为片段，并使用向量搜索进行检索。然而，Naive RAG存在一些局限性，例如检索的相关性较差、生成的答案可能不连贯或重复。
Advanced RAG：这是一种模块化的RAG框架，允许添加额外的步骤，如查询转换、从多个来源检索以及重排序（reranking）。Advanced RAG通过引入更高级的技术（如语义分块、查询扩展与压缩、元数据过滤等）来提高检索质量和生成答案的相关性。

Easy RAG

pom.xml

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-easy-rag</artifactId>
    <version>1.0.0-beta2</version>
</dependency>

example

    public void testEasyRag() {
        String dir = System.getProperty("user.home") + "/Downloads/rag";
        List<Document> documents = FileSystemDocumentLoader.loadDocuments(dir);
        log.info("finish load document");
        InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
        EmbeddingStoreIngestor.ingest(documents, embeddingStore);
        log.info("finish inject to embedding store");
        Assistant assistant = AiServices.builder(Assistant.class)
                .chatLanguageModel(chatModel)
                .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
                .contentRetriever(EmbeddingStoreContentRetriever.from(embeddingStore))
                .build();
        String answer = assistant.chat("How to do Easy RAG with LangChain4j?");
        log.info("answer:{}", answer);
    }

输出如下：

2025-03-14T09:55:02.003Z  INFO 3181 --- [           main] com.example.AppTest                      : finish load document
2025-03-14T09:55:06.116Z  INFO 3181 --- [           main] ai.djl.util.Platform                     : Found matching platform from: jar:file:/root/.m2/repository/ai/djl/huggingface/tokenizers/0.31.1/tokenizers-0.31.1.jar!/native/lib/tokenizers.properties
2025-03-14T09:55:06.118Z  WARN 3181 --- [           main] a.d.huggingface.tokenizers.jni.LibUtils  : No matching cuda flavor for linux-x86_64 found: cu117/sm_75.
2025-03-14T09:55:06.282Z DEBUG 3181 --- [           main] d.l.s.embedding.EmbeddingStoreIngestor   : Loaded the following document splitter through SPI: dev.langchain4j.data.document.splitter.DocumentByParagraphSplitter@55053f81
2025-03-14T09:55:07.076Z  WARN 3181 --- [           main] a.d.h.tokenizers.HuggingFaceTokenizer    : maxLength is not explicitly specified, use modelMaxLength: 512
2025-03-14T09:55:07.077Z DEBUG 3181 --- [           main] d.l.s.embedding.EmbeddingStoreIngestor   : Loaded the following embedding model through SPI: dev.langchain4j.model.embedding.onnx.bgesmallenv15q.BgeSmallEnV15QuantizedEmbeddingModel@631bc9f4
2025-03-14T09:55:07.077Z DEBUG 3181 --- [           main] d.l.s.embedding.EmbeddingStoreIngestor   : Starting to ingest 1 documents
2025-03-14T09:55:07.294Z DEBUG 3181 --- [           main] d.l.s.embedding.EmbeddingStoreIngestor   : Documents were split into 33 text segments
2025-03-14T09:55:07.294Z DEBUG 3181 --- [           main] d.l.s.embedding.EmbeddingStoreIngestor   : Starting to embed 33 text segments
2025-03-14T09:55:08.209Z DEBUG 3181 --- [           main] d.l.s.embedding.EmbeddingStoreIngestor   : Finished embedding 33 text segments
2025-03-14T09:55:08.209Z DEBUG 3181 --- [           main] d.l.s.embedding.EmbeddingStoreIngestor   : Starting to store 33 text segments into the embedding store
2025-03-14T09:55:08.211Z DEBUG 3181 --- [           main] d.l.s.embedding.EmbeddingStoreIngestor   : Finished storing 33 text segments into the embedding store
2025-03-14T09:55:08.212Z  INFO 3181 --- [           main] com.example.AppTest                      : finish inject to embedding store
2025-03-14T09:55:24.098Z  INFO 3181 --- [           main] com.example.AppTest                      : answer:Okay, here’s a guide on how to do Easy RAG with LangChain4j, based on the provided information:
**1. Getting Started (General Approach)**
The easiest way to get started with RAG using LangChain4j is through the "Easy RAG" feature. This eliminates the need to configure embeddings, vector stores, or document parsing yourself.
**2. Dependencies**
Import the `langchain4j-easy-rag` dependency:
<dependency>
   <groupId>dev.langchain4j</groupId>
   <artifactId>langchain4j-easy-rag</artifactId>
   <version>1.0.0-beta2</version>
</dependency>
**3. Loading Documents**
Load your documents using `FileSystemDocumentLoader`:
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/home/langchain4j/documentation");
**4. Easy RAG in Action**
*   **No Configuration Needed:**  Simply point the `FileSystemDocumentLoader` to your documents. LangChain4j handles everything under the hood.
*   **Retrieval:** LangChain4j uses Apache Tika to detect document types and parse them.
*   **Output:** `result.content()` provides the retrieved content. `result.sources()` provides a list of the sources.
**5. Quarkus Specifics**
"If you are using Quarkus, there is an even easier way to do Easy RAG. Please read Quarkus documentation." (This suggests you'll need to consult the Quarkus documentation for specific instructions, likely involving simplified configuration).
**6. Streaming and the `Assistant` Interface**
*   When streaming, you can use `onRetrieved()` to handle retrieved content.
*   The `Assistant` interface defines the `chat()` method, which can be used to interact with the RAG system.
*   Example:
Assistant assistant = ...; // Instantiate your assistant
String answer = assistant.chat("How to do Easy RAG with LangChain4j?")
    .onRetrieved((List<Content> sources) -> {
        // Process the retrieved sources here
        System.out.println("Retrieved Sources: " + sources);
    })
    .onPartialResponse(...)
    .onCompleteResponse(...)
    .onError(...)
      .start();
**7. RAG Flavors**
LangChain4j offers three RAG flavors:
*   **Easy RAG:** The simplest, quickest way to get started.
*   **Naive RAG:** A basic implementation using vector search.
*   **Advanced RAG:** A modular framework for more customization (query transformation, multiple sources, re-ranking, etc.).
**Important Note:**  Easy RAG will likely produce lower quality results than a more tailored RAG setup.  It's a good starting point for learning and prototyping.
---    
**Disclaimer:** This answer is based solely on the provided text.  It doesn't include details about specific libraries, configurations, or error handling beyond what's mentioned in the text.  You'll need to consult the LangChain4j documentation and the Quarkus documentation for a complete understanding.

小结

RAG(Retrieval-Augmented Generation)即检索增强生成，它通过检索来获取相关信息，注入到prompt，然后用增强的prompt然后输入给LLM让LLM在回答的时候能够利用检索到信息，从而降低幻觉。LangChain4j 提供了三种RAG（Retrieval-Augmented Generation，检索增强生成）的实现方式：Easy RAG、Naive RAG、Advanced RAG。

聊聊langchain4j的RAG

序

概述

索引

检索

实现

Easy RAG

pom.xml

example

小结

doc

codecraft

引用和评论

聊聊Tomato Architecture

聊聊langchain4j的Tools(Function Calling)

聊聊langchain4j的核心RAG APIs

聊聊langchain4j的AiServicesAutoConfig

聊聊langchain4j的HTTP Client

聊聊langchain4j的ChatMemory