Author: Kafka & Tablestore team
foreword
Are you still troubled by the inability to efficiently troubleshoot duplicate and failed messages when using message queues?
Are you still troubled by the inability to accurately find the content of the message and locate the problem when using the message queue?
. . .
Message queue Kafka "retrieval component" to help you~
This article introduces the "retrieval component" of the message queue Kafka in detail. First, it introduces the pain points in the use of the message queue, and then proposes corresponding solutions for the pain points, and interprets the key technologies. The characteristics and usage methods of the message queue Kafka's "retrieval component" are more familiar, in order to help you more effectively solve the pain points encountered in the message troubleshooting process.
Introduction to Pain Points
In the process of using the message queue, the industry default is to assume that after the message enters the message queue, the message is reliable and the probability of loss is low. But in practical applications, there are various problems:
Pain points in application
- Due to the characteristics of the distributed system, the failure and repetition of messages are inevitable. For the troubleshooting of failure and repetition, it is usually deduced by the log of the client. However, if the scale is large, it will be very difficult for the client to do this manually. This will challenge the reliability of the message;
- In addition, larger projects are generally completed by multiple people or teams, and the code implementation methods for message sending and consumption are also different, which will bring challenges to whether the message can successfully complete its mission in the end;
- In addition to the troubleshooting of the problem results, will the message not meet the expectations when it is generated? This is also one of the difficulties that troubles customers. Judging from the current system of message queues, there is still no way to check according to the content, which makes it difficult to check the correctness of the business.
To put it simply, in the message field, each message can often represent a specific meaning and action. Once failures, losses and errors occur, it is difficult to troubleshoot specific problems under the current status of message queues in the industry, which will lead to positioning the entire upstream and downstream. The link problem is more difficult.
Pain points on the technical side
The above are the problems that customers will face in the scenario of messaging applications. Based on application scenario problems, there will also be many pain points on the technical side. When dealing with message troubleshooting:
- First of all, the code investment, deployment and operation and maintenance of research and development are required. At the same time, the operation and maintenance personnel need to be familiar with the use of Kafka. They need to use the Kafka client for consumer consumption, and then confirm the existence of the message according to the traversal method;
- In addition to the code investment, deployment, and operation and maintenance that need to be developed, other products may also need to be introduced, such as docking stream computing and traversing messages through stream computing.
What's more troublesome is that at present, this kind of investigation is often very frequent, often in units of weeks or even days, which will make R&D, deployment and operation and maintenance cost relatively high time; Different, it will lead to a large investment, and the flexibility is not high.
To sum up, when the message queue is used to troubleshoot problems such as failures and repetitions, firstly, there is no better tool and method to complete the retrieval of the content, the investigation is difficult, and the accuracy and ease of use are insufficient; secondly It requires a high investment of time and labor costs, and the investment is large and inflexible. These problems will bring a lot of trouble to users when troubleshooting message problems.
Introduction to Kafka Retrieval Components
From the introduction of the above pain points, we can see that there are many pain points in the message field, such as message troubleshooting. In order to solve the above problems, Alibaba Cloud Message Queue Kafka version has launched a message retrieval component. The components are described in detail below:
Introduction to Retrieval Components
The message queue Kafka "retrieval component" is a fully managed, highly elastic, and interactive retrieval component with a second-level response capability of trillion-level message content retrieval.
- Mainly for operation and maintenance personnel troubleshooting and recovery scenarios, it is used for message-related full-link message troubleshooting, including message sending, repeated production, and loss checking; main functions include support for message retrieval by topic, location range, and time range , and supports retrieval by message Key and Value keywords, etc.;
- It is mainly used to solve the problem that message products in the industry do not support retrieving message content.
The message queue Kafka "message retrieval" is implemented with the help of the Kafka Connect function and table store (Tablestore). The messages in the topic are dumped through the connector, and then sent to the data table in the table store, and finally the table store index function is used to provide message retrieval. Ability.
Its core is to provide complete message content retrieval capabilities, which can quickly locate problems, and at the same time facilitate operations and save manpower; when users use it, after completing the creation of the message queue Kafka instance, it only takes five simple steps to realize the Kafka retrieval component. Applications:
The following is a brief introduction to the operation steps of message retrieval in the Kafka version of the message queue.
Introduction to Retrieval Component Operation
1) Enable message retrieval
First, enable the message retrieval function of a topic under an instance, so as to retrieve the messages in the topic as needed. Proceed as follows:
- Log in to the message queue Kafka version console;
- In the resource distribution area of the overview page, select a region;
- In the left navigation bar, click Message Retrieval;
- On the message retrieval page, select the instance to which the topic message to be retrieved belongs from the drop-down list of the selected instance, and then click Enable Message Retrieval;
- Activate the message retrieval panel, fill in the activation parameters, and click OK.
2) Test send message
After enabling message retrieval, you can send messages to the data source topic of the message queue Kafka version to schedule tasks and test whether the message retrieval is successfully created.
- On the message retrieval page, find the target topic to be tested, and operate at the corresponding position according to the task status;
- Send a test message in the Quick Experience Messaging panel.
3) Search for messages
- On the message retrieval page, find the target topic, and click Search in its operation column;
- In the search panel, set the search criteria, select the search item to be added in the search item drop-down list, click Add search item, add a search item and set the search information in the value column, and then click OK.
4) View the details of the message retrieval task
- After enabling message retrieval, you can view the automatically created Topic, Group, Table Store instance name, Table Store data table name and other detailed information, or you can directly enter the Table Store data table in the details;
- On the message retrieval page, find the target topic, and click Details in its operation column;
- On the task details page, you can view the detailed information of the target topic related message retrieval; you can also click the table storage in the target service bar of the basic information area to enter the data table details page to view.
5) View consumption details
Supports viewing the consumption progress of the online group subscribed to the current topic in each partition of the topic, and understands the consumption and accumulation of messages.
- On the message retrieval page, find the target topic whose consumption progress needs to be viewed, and click the consumption progress in its operation column;
- As shown in the figure below, on the consumption details page, you can view the consumption of each partition of the topic:
In addition to the above functions, when running the message retrieval function, operations such as suspending the message retrieval task, enabling the message retrieval task, and deleting the message retrieval task can also be implemented.
Interpretation of Kafka Retrieval Component Technology
Previously, the message retrieval method of the Kafka version of the message queue only supported two types of searches based on the consumption location or the creation time. The Kafka system itself could not well support the user's demand for retrieving messages by keywords.
In order to better solve this problem, Kafka and Tablestore are powerfully combined to import Kafka messages into Tablestore's data table through Connector, and use Tablestore's capabilities to achieve keyword retrieval.
The key technologies are explained below:
Kafka Connect
The core of Kafka Connect is to solve the synchronization problem of heterogeneous data. The solution is to add a layer of message middleware between each data source, and all data are stored and distributed through the message middleware.
There are two advantages to doing this:
1) Asynchronous decoupling is done through message middleware, and all systems only communicate with message middleware;
2) The number of parsing tools to be developed has also changed from the original n squared to a linear 2*n; Kafka Connect is used to connect the message system and data source. According to the flow of data, the connection can be divided into Source Connector and Sink Connector.
The principle is also very simple. The Source Connector is responsible for parsing the source data, converting it into a standard format message, and sending it to the Kafka Broker through the Kafka Producer. In the same way, the Sink Connector consumes the corresponding topic through the Kafka Consumer, and then delivers it to the target system. In the whole process, Kafka Connect uniformly solves the problems of task scheduling, interaction with the message system, automatic scaling, fault tolerance and monitoring, which greatly reduces duplication of work.
The message queue Kafka version provides a fully managed, O&M-free Kafka Connect for data synchronization between the message queue Kafka version and other Alibaba Cloud services. As shown in the figure below, you can see that the message queue Kafka version supports mainstream connectors such as Tablestore, Mysql Source Connector, OSS Sink Connector, MaxCompute Sink Connector, and FC Sink Connector. If users want to use these Connectors for data synchronization, they only need to do a few configurations on the graphical interface of the message queue Kafka console, and they can start the Connector task with one click.
Tablestore Tablestore
Tablestore Tablestore is a massive structured data storage service built on Alibaba Cloud Apsara distributed system. Based on the Feitian Pangu distributed file system as a storage base, a cloud-native serverless storage product is realized by adopting a storage computing separation architecture and elastic shared resource pool design. The built-in distributed indexing system can automatically expand the computing resources required for index building according to the write traffic, and support extremely high write traffic. At the same time, the index structure is optimized to support faster fuzzy query. Key capabilities such as the storage-computing separation architecture and high-throughput real-time indexing enable Tablestore to support the writing and efficient search of massive data in Kafka, helping to retrieve the required information quickly and efficiently.
Technology leadership
Kafka+Kafka Connect+Tablestore is a cloud-native data application solution that uses Kafka Connect as a trigger for real-time processing tasks, and can receive new data sent to the message queue cluster in real time, and then forward it to Tablestore.
As a part of the subsequent data flow, Kafka Connect not only guarantees the real-time data, but also solves the problems of task scheduling, interaction with the message system, automatic scaling, fault tolerance and monitoring, which greatly reduces duplication of work. After the data arrives in Tablestore, with the help of the distributed storage and powerful indexing engine of Tablestore, it can support PB-level storage, tens of millions of TPS, and service capabilities of millisecond-level latency, as well as fully managed, highly elastic, and interactive retrieval components. , so that it can achieve a trillion-level message content retrieval capability with a second-level response.
User-oriented value and advantages
The Kafka retrieval component function not only has strong technical advantages, but also brings more convenience to the actual work of users:
1. Low cost of investigation
Only a simple configuration of the console can be used to view all messages in the Kafka server cluster;
2, the investigation speed is fast
Free development, free resource evaluation, free deployment, free operation and maintenance; as long as the retrieval conditions are established, the query response in seconds can be achieved;
3. High accuracy of investigation
The retrieval component function is jointly developed by the message commercialization team and the core research and development of the Tablestore team. Relying on the native capabilities of Alibaba Cloud, the retrieval accuracy is high, and the reliability and availability can be well guaranteed.
To sum up, the Kafka retrieval component function has the following advantages in actual business:
- Quickly locate the problem, realize the fault and abnormal fast recovery of the upstream and downstream products of the message, and reduce the business capital loss;
- Save enterprise costs and reduce investment in operation and maintenance, R&D and other personnel;
- Reduce the cost of learning, and reduce the requirements for the understanding mechanism of the message product.
Summarize
Alibaba Cloud's message queue Kafka "retrieval component" is the first component in the field of message queues to support interactive message content retrieval. It has the characteristics of free development, free operation and maintenance, and high flexibility. For moderate and heavy users in the message field, the Alibaba Cloud Message Queue Kafka "retrieval component" is a powerful tool for daily checking the existence and correctness of messages.
Click here to go to the relevant product documentation for details!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。