The message queue Kafka "retrieval component" is launched!

Author: Kafka & Tablestore team

foreword

Are you still troubled by the inability to efficiently troubleshoot duplicate and failed messages when using message queues?

Are you still troubled by the inability to accurately find the content of the message and locate the problem when using the message queue?

. . .

Message queue Kafka "retrieval component" to help you~

This article introduces the "retrieval component" of the message queue Kafka in detail. First, it introduces the pain points in the use of the message queue, and then proposes corresponding solutions for the pain points, and interprets the key technologies. The characteristics and usage methods of the message queue Kafka's "retrieval component" are more familiar, in order to help you more effectively solve the pain points encountered in the message troubleshooting process.

Introduction to Pain Points

In the process of using the message queue, the industry default is to assume that after the message enters the message queue, the message is reliable and the probability of loss is low. But in practical applications, there are various problems:

Pain points in application

Due to the characteristics of the distributed system, the failure and repetition of messages are inevitable. For the troubleshooting of failure and repetition, it is usually deduced by the log of the client. However, if the scale is large, it will be very difficult for the client to do this manually. This will challenge the reliability of the message;
In addition, larger projects are generally completed by multiple people or teams, and the code implementation methods for message sending and consumption are also different, which will bring challenges to whether the message can successfully complete its mission in the end;
In addition to the troubleshooting of the problem results, will the message not meet the expectations when it is generated? This is also one of the difficulties that troubles customers. Judging from the current system of message queues, there is still no way to check according to the content, which makes it difficult to check the correctness of the business.

To put it simply, in the message field, each message can often represent a specific meaning and action. Once failures, losses and errors occur, it is difficult to troubleshoot specific problems under the current status of message queues in the industry, which will lead to positioning the entire upstream and downstream. The link problem is more difficult.

Pain points on the technical side

The above are the problems that customers will face in the scenario of messaging applications. Based on application scenario problems, there will also be many pain points on the technical side. When dealing with message troubleshooting:

First of all, the code investment, deployment and operation and maintenance of research and development are required. At the same time, the operation and maintenance personnel need to be familiar with the use of Kafka. They need to use the Kafka client for consumer consumption, and then confirm the existence of the message according to the traversal method;
In addition to the code investment, deployment, and operation and maintenance that need to be developed, other products may also need to be introduced, such as docking stream computing and traversing messages through stream computing.

What's more troublesome is that at present, this kind of investigation is often very frequent, often in units of weeks or even days, which will make R&D, deployment and operation and maintenance cost relatively high time; Different, it will lead to a large investment, and the flexibility is not high.

To sum up, when the message queue is used to troubleshoot problems such as failures and repetitions, firstly, there is no better tool and method to complete the retrieval of the content, the investigation is difficult, and the accuracy and ease of use are insufficient; secondly It requires a high investment of time and labor costs, and the investment is large and inflexible. These problems will bring a lot of trouble to users when troubleshooting message problems.

Introduction to Kafka Retrieval Components

From the introduction of the above pain points, we can see that there are many pain points in the message field, such as message troubleshooting. In order to solve the above problems, Alibaba Cloud Message Queue Kafka version has launched a message retrieval component. The components are described in detail below:

Introduction to Retrieval Components

The message queue Kafka "retrieval component" is a fully managed, highly elastic, and interactive retrieval component with a second-level response capability of trillion-level message content retrieval.

Mainly for operation and maintenance personnel troubleshooting and recovery scenarios, it is used for message-related full-link message troubleshooting, including message sending, repeated production, and loss checking; main functions include support for message retrieval by topic, location range, and time range , and supports retrieval by message Key and Value keywords, etc.;
It is mainly used to solve the problem that message products in the industry do not support retrieving message content.

The message queue Kafka "message retrieval" is implemented with the help of the Kafka Connect function and table store (Tablestore). The messages in the topic are dumped through the connector, and then sent to the data table in the table store, and finally the table store index function is used to provide message retrieval. Ability.

Its core is to provide complete message content retrieval capabilities, which can quickly locate problems, and at the same time facilitate operations and save manpower; when users use it, after completing the creation of the message queue Kafka instance, it only takes five simple steps to realize the Kafka retrieval component. Applications:

title=

The following is a brief introduction to the operation steps of message retrieval in the Kafka version of the message queue.

Introduction to Retrieval Component Operation

1) Enable message retrieval

First, enable the message retrieval function of a topic under an instance, so as to retrieve the messages in the topic as needed. Proceed as follows:

Log in to the message queue Kafka version console;
In the resource distribution area of the overview page, select a region;
In the left navigation bar, click Message Retrieval;
On the message retrieval page, select the instance to which the topic message to be retrieved belongs from the drop-down list of the selected instance, and then click Enable Message Retrieval;
Activate the message retrieval panel, fill in the activation parameters, and click OK.

title=

2) Test send message

After enabling message retrieval, you can send messages to the data source topic of the message queue Kafka version to schedule tasks and test whether the message retrieval is successfully created.

On the message retrieval page, find the target topic to be tested, and operate at the corresponding position according to the task status;
Send a test message in the Quick Experience Messaging panel.

title=

3) Search for messages

On the message retrieval page, find the target topic, and click Search in its operation column;
In the search panel, set the search criteria, select the search item to be added in the search item drop-down list, click Add search item, add a search item and set the search information in the value column, and then click OK.

title=

4) View the details of the message retrieval task

After enabling message retrieval, you can view the automatically created Topic, Group, Table Store instance name, Table Store data table name and other detailed information, or you can directly enter the Table Store data table in the details;
On the message retrieval page, find the target topic, and click Details in its operation column;
On the task details page, you can view the detailed information of the target topic related message retrieval; you can also click the table storage in the target service bar of the basic information area to enter the data table details page to view.

title=

5) View consumption details

Supports viewing the consumption progress of the online group subscribed to the current topic in each partition of the topic, and understands the consumption and accumulation of messages.

On the message retrieval page, find the target topic whose consumption progress needs to be viewed, and click the consumption progress in its operation column;
As shown in the figure below, on the consumption details page, you can view the consumption of each partition of the topic:

title=

In addition to the above functions, when running the message retrieval function, operations such as suspending the message retrieval task, enabling the message retrieval task, and deleting the message retrieval task can also be implemented.

Interpretation of Kafka Retrieval Component Technology

Previously, the message retrieval method of the Kafka version of the message queue only supported two types of searches based on the consumption location or the creation time. The Kafka system itself could not well support the user's demand for retrieving messages by keywords.

In order to better solve this problem, Kafka and Tablestore are powerfully combined to import Kafka messages into Tablestore's data table through Connector, and use Tablestore's capabilities to achieve keyword retrieval.

title=

The key technologies are explained below:

Kafka Connect

The core of Kafka Connect is to solve the synchronization problem of heterogeneous data. The solution is to add a layer of message middleware between each data source, and all data are stored and distributed through the message middleware.

title=

There are two advantages to doing this:

1) Asynchronous decoupling is done through message middleware, and all systems only communicate with message middleware;

2) The number of parsing tools to be developed has also changed from the original n squared to a linear 2*n; Kafka Connect is used to connect the message system and data source. According to the flow of data, the connection can be divided into Source Connector and Sink Connector.

The principle is also very simple. The Source Connector is responsible for parsing the source data, converting it into a standard format message, and sending it to the Kafka Broker through the Kafka Producer. In the same way, the Sink Connector consumes the corresponding topic through the Kafka Consumer, and then delivers it to the target system. In the whole process, Kafka Connect uniformly solves the problems of task scheduling, interaction with the message system, automatic scaling, fault tolerance and monitoring, which greatly reduces duplication of work.

The message queue Kafka version provides a fully managed, O&M-free Kafka Connect for data synchronization between the message queue Kafka version and other Alibaba Cloud services. As shown in the figure below, you can see that the message queue Kafka version supports mainstream connectors such as Tablestore, Mysql Source Connector, OSS Sink Connector, MaxCompute Sink Connector, and FC Sink Connector. If users want to use these Connectors for data synchronization, they only need to do a few configurations on the graphical interface of the message queue Kafka console, and they can start the Connector task with one click.

title=

Tablestore Tablestore

Tablestore Tablestore is a massive structured data storage service built on Alibaba Cloud Apsara distributed system. Based on the Feitian Pangu distributed file system as a storage base, a cloud-native serverless storage product is realized by adopting a storage computing separation architecture and elastic shared resource pool design. The built-in distributed indexing system can automatically expand the computing resources required for index building according to the write traffic, and support extremely high write traffic. At the same time, the index structure is optimized to support faster fuzzy query. Key capabilities such as the storage-computing separation architecture and high-throughput real-time indexing enable Tablestore to support the writing and efficient search of massive data in Kafka, helping to retrieve the required information quickly and efficiently.

title=

Technology leadership

Kafka+Kafka Connect+Tablestore is a cloud-native data application solution that uses Kafka Connect as a trigger for real-time processing tasks, and can receive new data sent to the message queue cluster in real time, and then forward it to Tablestore.

As a part of the subsequent data flow, Kafka Connect not only guarantees the real-time data, but also solves the problems of task scheduling, interaction with the message system, automatic scaling, fault tolerance and monitoring, which greatly reduces duplication of work. After the data arrives in Tablestore, with the help of the distributed storage and powerful indexing engine of Tablestore, it can support PB-level storage, tens of millions of TPS, and service capabilities of millisecond-level latency, as well as fully managed, highly elastic, and interactive retrieval components. , so that it can achieve a trillion-level message content retrieval capability with a second-level response.

User-oriented value and advantages

The Kafka retrieval component function not only has strong technical advantages, but also brings more convenience to the actual work of users:

1. Low cost of investigation

Only a simple configuration of the console can be used to view all messages in the Kafka server cluster;

2, the investigation speed is fast

Free development, free resource evaluation, free deployment, free operation and maintenance; as long as the retrieval conditions are established, the query response in seconds can be achieved;

3. High accuracy of investigation

The retrieval component function is jointly developed by the message commercialization team and the core research and development of the Tablestore team. Relying on the native capabilities of Alibaba Cloud, the retrieval accuracy is high, and the reliability and availability can be well guaranteed.

To sum up, the Kafka retrieval component function has the following advantages in actual business:

Quickly locate the problem, realize the fault and abnormal fast recovery of the upstream and downstream products of the message, and reduce the business capital loss;
Save enterprise costs and reduce investment in operation and maintenance, R&D and other personnel;
Reduce the cost of learning, and reduce the requirements for the understanding mechanism of the message product.

title=

Summarize

Alibaba Cloud's message queue Kafka "retrieval component" is the first component in the field of message queues to support interactive message content retrieval. It has the characteristics of free development, free operation and maintenance, and high flexibility. For moderate and heavy users in the message field, the Alibaba Cloud Message Queue Kafka "retrieval component" is a powerful tool for daily checking the existence and correctness of messages.

title=

Click here to go to the relevant product documentation for details!

The message queue Kafka "retrieval component" is launched!

foreword

Introduction to Pain Points

Pain points in application

Pain points on the technical side

Introduction to Kafka Retrieval Components

Introduction to Retrieval Components

Introduction to Retrieval Component Operation

1) Enable message retrieval

2) Test send message

4) View the details of the message retrieval task

5) View consumption details

Interpretation of Kafka Retrieval Component Technology

Kafka Connect

Tablestore Tablestore

Technology leadership

User-oriented value and advantages

Summarize

阿里云云原生

引用和评论

Spring AI Alibaba 发布企业级 MCP 分布式部署方案

支付宝H5下载被拦截的原因排查与解决指南

MCP协议重大升级，Spring AI Alibaba联合Higress发布业界首个Streamable HTTP实现方案

PAI Model Gallery 支持云上一键部署 Qwen3 全尺寸模型

JManus - 面向 Java 开发者的开源通用智能体

nats，一种高性能、轻量级的分布式消息系统（纯理论，无代码）

通过Milvus内置Sparse-BM25算法进行全文检索并将混合检索应用于RAG系统