Alibaba Cloud Message Queue Kafka-Message Retrieval Practice

Author: Kafka&Tablestore

This article mainly introduces the troubleshooting methods for pain points such as message loss and repeated consumption encountered in the use of message queues, as well as the scenario practice of the "retrieval component" of message queue Kafka, and explains its key technologies. The purpose is to help you become more familiar with the characteristics and usage of the message queue Kafka "retrieval component", so as to more effectively solve the problems encountered in the process of message troubleshooting.

Scenario Pain Point Introduction

In the process of using message queue, due to its distributed nature, it will inevitably encounter problems such as message loss and message retransmission.

For example, in log aggregation scenarios, multiple heterogeneous data sources usually produce data into Kafka for consumption by downstream computing engines such as Spark. When some logs are missing, it is difficult to check directly from the logs of the client due to the variety of message data sending methods and data structures.
For another example, in the process of message forwarding, the consumer may consume the same data repeatedly, which requires retrieving data from the message queue according to the content to determine whether the message is repeatedly produced, and the message queue can usually only be traversed by partition and consumption site. Scanning does not flexibly implement message retrieval.

The existing message queue products in the industry do not have good tools and methods to retrieve message content, which will greatly increase the difficulty of investigation and the investment cost.

Kafka message retrieval component

Introduction to Search Components

The message queue Kafka "retrieval component" is a fully managed, highly elastic, and interactive retrieval component. It has the second-level response capability of trillion-level message content retrieval, and aims to solve the problem that the industry's message products do not support retrieval of message content. The message queue Kafka "retrieval component" transfers the message data in the topic to the table store (Tablestore) through the Kafka Connector, and provides the message retrieval capability based on the multi-index function of the table store. It can support the combination of one or more conditions such as the partition, location, and sending time range of the message, and also supports the full-text search of the message according to the message Key and Value.

case practice

Case background

Suppose an operation and maintenance team needs to monitor the operation of an online cluster, collect process-level logs and import them into Kafka, and use Flink downstream for consumption to calculate the resource consumption of each process in real time. When it is found that the log data of a certain process is lost in a certain period of time in Flink, the message queue Kafka "retrieval component" needs to be used to retrieve the message data based on the message value and time range, and determine whether the log has been successfully pushed to the message queue Kafka. .

For example, the collected log data is JSON structure, and the format of a log data is:

 key   =  276
value =  {"PID":"276","COMMAND":"Google Chrom","CPU_USE":"7.2","TIME":"00:01:44","MEM":"8836K","STATE":"sleeping","UID":"0","IP":"164.29.0.1"}

Enable message retrieval

First, you need to log in to the Alibaba Cloud Message Queue Kafka console, select the corresponding topic, and activate the message retrieval service.

title=

After the message retrieval service is activated, a Tablestore instance will be automatically created, then the message data will be transferred to the Tablestore, and an index will be created to provide message retrieval capabilities. Each topic corresponds to a data table in Tablestore. You can view the message retrieval component details of each topic on the message queue Kafka console.

title=

Message Retrieval Practice

After the message retrieval service is activated, multiple search items in the message can be used to retrieve the message to implement the above case. For example, specify a time range and retrieve messages with PID = 276 in Message Value.

title=

Example of return result

title=

Capability expansion

Introduction to Tablestore

Tablestore Tablestore is a structured data storage based on the underlying Feitian platform, which can provide service capabilities of hundreds of billions of scale data storage and millisecond-level data retrieval. After the message queue Kafka dumps messages to Tablestore, it supports retrieving messages through Tablestore's native data access method. Tablestore supports more complex retrieval logic and supports retrieving messages through SQL syntax. There are two ways to retrieve messages:

Multiple Index Search

Log in to the Tablestore console, enter the Tablestore instance and data table corresponding to the Kafka message data dump, and select the multi-index search message on the index management page.

title=

For example, it is necessary to retrieve messages whose message Value contains PID=276 or PID=277.

title=

return result

title=

SQL to retrieve messages

Tablestore Tablestore supports retrieving messages based on SQL syntax. First, you need to create an SQL mapping table on the data table where the messages are dumped.

title=

Retrieve message with PID=276 based on Tablestore SQL.

title=

Summarize

Alibaba Cloud's message queue Kafka "retrieval component" is the first component in the field of message queues to support interactive message content retrieval. Tablestore provides message retrieval service capabilities based on data dump table storage, and supports free combination retrieval based on any conditions such as Key, Value, and partition. It also supports full-text retrieval of messages by Key and Value, and has the characteristics of free development, free operation and maintenance, and high flexibility. At the same time, messages can also be retrieved directly through the Tablestore index or SQL, which greatly improves the speed of daily checking of the existence or correctness of messages.

If you have any questions about Tablestore's multi-index and SQL queries in this article, you are welcome to join the technical exchange group, which provides free online expert services. Welcome to scan the code to join or search the group number 23307953.

title=

Click here , welcome to open the trial message queue Kafka "retrieval component"~

Alibaba Cloud Message Queue Kafka-Message Retrieval Practice

Scenario Pain Point Introduction

Kafka message retrieval component

Introduction to Search Components

case practice

Case background

Enable message retrieval

Message Retrieval Practice

Capability expansion

Introduction to Tablestore

Multiple Index Search

SQL to retrieve messages

Summarize

阿里云云原生

引用和评论

Spring AI Alibaba 发布企业级 MCP 分布式部署方案

支付宝H5下载被拦截的原因排查与解决指南

JManus - 面向 Java 开发者的开源通用智能体

MCP协议重大升级，Spring AI Alibaba联合Higress发布业界首个Streamable HTTP实现方案

PAI Model Gallery 支持云上一键部署 Qwen3 全尺寸模型

通过Milvus内置Sparse-BM25算法进行全文检索并将混合检索应用于RAG系统

2025年3月中国数据库排行榜：PolarDB夺魁傲群雄，GoldenDB晋位入三强