聊聊Spring AI Alibaba的BilibiliDocumentReader

序

本文主要研究一下Spring AI Alibaba的BilibiliDocumentReader

BilibiliDocumentReader

community/document-readers/spring-ai-alibaba-starter-document-reader-bilibili/src/main/java/com/alibaba/cloud/ai/reader/bilibili/BilibiliDocumentReader.java

public class BilibiliDocumentReader implements DocumentReader {

    private static final Logger logger = LoggerFactory.getLogger(BilibiliDocumentReader.class);

    private static final String API_BASE_URL = "https://api.bilibili.com/x/web-interface/view?bvid=";

    private final String resourcePath;

    private final ObjectMapper objectMapper;

    private static final int MEMORY_SIZE = 5;

    private static final int BYTE_SIZE = 1024;

    private static final int MAX_MEMORY_SIZE = MEMORY_SIZE * BYTE_SIZE * BYTE_SIZE;

    private static final WebClient WEB_CLIENT = WebClient.builder()
        .defaultHeader(HttpHeaders.ACCEPT, MediaType.APPLICATION_JSON_VALUE)
        .codecs(configurer -> configurer.defaultCodecs().maxInMemorySize(MAX_MEMORY_SIZE))
        .build();

    public BilibiliDocumentReader(String resourcePath) {
        Assert.hasText(resourcePath, "Query string must not be empty");
        this.resourcePath = resourcePath;
        this.objectMapper = new ObjectMapper();
    }

    @Override
    public List<Document> get() {
        List<Document> documents = new ArrayList<>();
        try {
            String bvid = extractBvid(resourcePath);
            String videoInfoResponse = fetchVideoInfo(bvid);
            JsonNode videoData = parseJson(videoInfoResponse).path("data");
            String title = videoData.path("title").asText();
            String description = videoData.path("desc").asText();
            Document infoDoc = new Document("Video information", Map.of("title", title, "description", description));
            documents.add(infoDoc);
            String documentContent = fetchAndProcessSubtitles(videoData, title, description);
            documents.add(new Document(documentContent));
        }
        catch (IllegalArgumentException e) {
            logger.error("Invalid input: {}", e.getMessage());
            documents.add(new Document("Error: Invalid input"));
        }
        catch (IOException e) {
            logger.error("Error parsing JSON: {}", e.getMessage(), e);
            documents.add(new Document("Error parsing JSON: " + e.getMessage()));
        }
        catch (Exception e) {
            logger.error("Unexpected error: {}", e.getMessage(), e);
            documents.add(new Document("Unexpected error: " + e.getMessage()));
        }
        return documents;
    }

    private String extractBvid(String resourcePath) {
        return resourcePath.replaceAll(".*(BV\\w+).*", "$1");
    }

    private String fetchVideoInfo(String bvid) {
        return WEB_CLIENT.get().uri(API_BASE_URL + bvid).retrieve().bodyToMono(String.class).block();
    }

    private JsonNode parseJson(String jsonResponse) throws IOException {
        return objectMapper.readTree(jsonResponse);
    }

    private String fetchAndProcessSubtitles(JsonNode videoData, String title, String description) throws IOException {
        JsonNode subtitleList = videoData.path("subtitle").path("list");
        if (subtitleList.isArray() && subtitleList.size() > 0) {
            String subtitleUrl = subtitleList.get(0).path("subtitle_url").asText();
            String subtitleResponse = WEB_CLIENT.get().uri(subtitleUrl).retrieve().bodyToMono(String.class).block();

            JsonNode subtitleJson = parseJson(subtitleResponse);
            StringBuilder rawTranscript = new StringBuilder();
            subtitleJson.path("body").forEach(node -> rawTranscript.append(node.path("content").asText()).append(" "));

            return String.format("Video Title: %s, Description: %s\nTranscript: %s", title, description,
                    rawTranscript.toString().trim());
        }
        else {
            return String.format("No subtitles found for video: %s. Returning an empty transcript.", resourcePath);
        }
    }

}

BilibiliDocumentReader使用WebClient去请求B站接口，它从url解析bvid，再根据bvid去请求接口，解析json获取title、description，通过fetchAndProcessSubtitles再去请求subtitle_url获取字幕内容作为document的内容

示例

public class BilibiliDocumentReaderTest {

    private static final Logger logger = LoggerFactory.getLogger(BilibiliDocumentReader.class);

    @Test
    void bilibiliDocumentReaderTest() {
        BilibiliDocumentReader bilibiliDocumentReader = new BilibiliDocumentReader(
                "https://www.bilibili.com/video/BV1KMwgeKECx/?t=7&vd_source=3069f51b168ac07a9e3c4ba94ae26af5");
        List<Document> documents = bilibiliDocumentReader.get();
        logger.info("documents: {}", documents);
    }

}

小结

spring-ai-alibaba-starter-document-reader-bilibili提供了BilibiliDocumentReader用于解析B站的视频url，它请求两次接口，一次获取title和description，一次获取字幕。

doc

java2ai

聊聊Spring AI Alibaba的BilibiliDocumentReader

序

BilibiliDocumentReader

示例

小结

doc

codecraft

引用和评论

聊聊Tomato Architecture

一文掌握 MCP 上下文协议：从理论到实践

大模型中的Token究竟是什么？从原理到作用深度解析

AdventureX 2025 正式启动：五天四夜，120小时极限创造！一起在杭州点燃青年创新之火！

大模型时代，后端程序员如何避免被AI卷死？

MCP 协议为何不如你想象的安全？从技术专家视角解读

🔥吐血整理 Bolt.diy 部署与应用攻略