author
Lv Yalin joined the Job Gang in 2019 and is the head of the job group architecture research and development. During the job group, he led the evolution of cloud native architecture, promoted the implementation of containerization transformation, service governance, GO microservice framework, and DevOps landing practice.
Mo Renpeng, joined the Job Helper in 2020, a senior architect of Job Helper. During the Job Helper period, he promoted the evolution of the Job Helper cloud native architecture, responsible for the design and implementation of the Job Helper service governance system, service awareness system construction, and self-developed mesh, MQproxy research and development work.
Summary
Logs are the main method of service observation. We rely on logs to perceive the running status and historical status of services; when errors occur, we rely on logs to understand the scene and locate problems. Logs are extremely critical for R&D engineers. At the same time, with the popularity of microservices, service deployment is becoming more and more decentralized, so we need a set of log services to collect, transmit, and retrieve logs.
Based on this situation, an open source log service represented by ELK was born.
Demand scenario
In our scenario, the peak log write pressure is high (tens of millions of logs per second); real-time requirements are high: the log processing time from collection to retrieval is within 1s (3s during peak periods); cost pressure is huge, It is required to save the log for half a year and can be retrospectively queried (100 PB scale).
Shortcomings of ElasticSearch
core of the 161d3c64ca8cdb ELK ElasticSearch , which is responsible for storing and indexing logs, and providing external query capabilities. Elasticsearch is a search engine, the bottom layer relies on Lucene's inverted index technology to achieve retrieval, and through shard is designed to break through the storage space and processing performance limitations of a single machine.
Write performance
ElasticSearch needs to update the inverted index of the log index field to write data, so that the latest log can be retrieved. In order to improve the write performance, you can do aggregation submission, delay indexing, reduce refersh, etc., but always create an index. In the case of huge log traffic (20GB data per second, tens of millions of logs), the bottleneck is obvious. The gap from the ideal is too large, and we expect the writing to be almost quasi-real-time.
Operating costs
ElasticSearch needs to regularly maintain indexes, data slicing, and retrieval caches, which consume a lot of CPU and memory. Log data is stored on the machine disk. When a large number of logs need to be stored and kept for a long time, the machine disk usage is huge. At the same time, data expansion after indexing will further increase the cost.
Poor support for unformatted logs
ELK needs to parse the log in order to index the log entries, and the unformatted log needs to add additional processing logic to adapt. There are many business logs that are not standardized and difficult to converge.
Summary: The log retrieval scenario is a scenario where writes more and reads less . In this scenario, maintaining a large and complex index is actually a very low cost-effective thing in our opinion. If the ElasticSearch solution is adopted, it is estimated that we need tens of thousands of core-scale clusters, but the efficiency of writing data and retrieval is still not guaranteed, and the waste of resources is serious.
Log retrieval design
In the face of this situation, we might as well look at the log retrieval scenario from a different perspective, and use a more suitable design to solve the log retrieval needs. The new design specifically has the following three points:
Log chunking
Similarly, we need to collect logs, but when processing logs, we do not parse and index the original text of the logs. Instead, we block the logs based on log metadata such as the log time, the instance of the log, the log type, and the log level. In this way, the retrieval system can make without any requirements on the log format , and because there is no parsing and indexing (this piece is expensive), the writing speed can also reach the extreme (only depends on the IO speed of the disk).
In simple terms, we can write the same type of logs generated by an instance to a file in chronological order, and split the file according to the time dimension. Different log blocks will be scattered on multiple machines (we generally follow the example The log block storage machine is sharded in dimensions such as size and type), so that we can process these log blocks concurrently on multiple machines. This method supports horizontal expansion. If the processing performance of one machine is not enough , Expand horizontally.
How to retrieve the data in the log block? This is very simple, because the original log is saved, you can directly use grep-related commands to directly retrieve the log block. For developers, grep is the most familiar command, and it is also very flexible to use, which can meet the various needs of development for log retrieval. Because we are directly adding additional writes to the log block, we don't need to wait for the index to take effect. When the log is to the log block, it can be retrieved immediately, ensuring the real-time the retrieval result 161d3c64ca8e06.
Metadata Index
Next, let's see how to retrieve such a large number of log blocks.
First, when the log block is created, we will build an index based on the metadata information of the log block, such as service name, log time, log instance, log type and other information, and store the storage location of the log block as the value. By indexing the metadata of the log block, when we need to initiate a retrieval of a certain type of log of a certain service within a certain period of time, we can quickly find the location of the log block that needs to be retrieved, and process it concurrently.
The structure of the index can be built on demand, and you can put the metadata information you care about into the index, so that it is convenient to quickly delineate the required log block. Because we only index the metadata of the log block, compared to indexing all the logs, the cost can be said to be extremely low, and the speed of locking the log block is also ideal.
Log life cycle and data settlement
Log data can be understood as a kind of time series data in the direction of the time dimension. The closer the log is to the current time, the more valuable it is and the higher the possibility of being queried. It presents a situation of cold and hot separation. And cold data is not worthless. Developers request log data from a few months ago to also exist. That is, our log needs to be able to provide external query capabilities during its life cycle.
In this case, if all log blocks in the life cycle are stored on the local disk, it will undoubtedly put a great demand on the capacity of our machine. For this log storage requirement, we can use compression and settlement methods to solve it.
Simply put, we divide log block storage into three levels: local storage (disk), remote storage (object storage), and archive storage; local storage is responsible for providing real-time and short-term log queries (one day or several hours), remote storage Responsible for log query requirements within a certain period of time (one week or several weeks), and archive storage is responsible for query requirements throughout the life cycle of logs.
Now let’s take a look at how the log block flows between multi-level storage during its life cycle. First, the log block will be created on the local disk and written to the corresponding log data. After completion, it will be retained on the local disk for a certain period of time (retention time). (Depending on the disk storage pressure), after a certain period of time, it will be compressed by will be uploaded to remote storage (usually the standard storage type in object storage), after a period of time, the log block will be migrated to the archive Save in storage (generally the type of archive storage in object storage).
What are the benefits of such a storage design? As shown in the following multi-level storage diagram, the lower the storage, the greater the amount of data and the lower the cost of the storage medium. Each layer is about 1/3 of the upper layer. And the data is stored after compression. The data compression ratio of the log can generally reach 10:1 . From this we can see that the cost of archive storage log can be stored 1% 161d3c64ca8ef7. If an SSD hard disk is used as a local Storage, the gap will be even greater.
Price reference:
Storage medium | Reference link |
---|---|
Local disk | https://buy.cloud.tencent.com/price/cvm?regionId=8&zoneId=800002 |
Object storage | https://buy.cloud.tencent.com/price/cos |
Archive storage | https://buy.cloud.tencent.com/price/cos |
How is the retrieval in the multi-level storage room? This is very simple. For retrieval on the local storage, you can directly perform the retrieval on the local disk.
If the retrieval involves log blocks on the remote storage, the retrieval service will download the involved log blocks to the local storage, and then complete the decompression and retrieval locally. Because of the design of log block, the download of log block is the same as retrieval. We can operate in parallel on multiple machines; the data that is downloaded back to the local can be copied after a certain period of time after the local cache is deleted, so that the same log block will be deleted within the validity period. The retrieval requirements can be completed locally without the need to pull it again (it is still common to retrieve the same log data multiple times in log retrieval scenarios).
For archive storage, before initiating a retrieval request, it is necessary to initiate a retrieval operation on the log block in the archive storage. The retrieval operation generally takes about a few minutes. After the retrieval operation is completed, the log block is retrieved back to the remote storage. The data flow after that is the same as before. That is, if developers want to retrieve cold data, they need to apply for archive retrieval of log blocks in advance, and after the retrieval is complete, they can perform log retrieval at the hot data speed.
Search service architecture
After understanding the above design ideas, let's take a look at how the log retrieval service based on this design is implemented.
The log retrieval service is divided into the following modules:
- GD-Search
Query scheduler, responsible for receiving query requests, parsing and optimizing query commands, and obtaining the address of the log block in the query range Chunk Index
GD-Search itself is stateless, multiple instances can be deployed to provide a unified access the external address through load balancing.
- Local-Search
Local storage querier, responsible for processing the query request of the local log block allocated by GD-Search
- Remote-Search
is responsible for processing the query request of the remote log block allocated by 161d3c64ca9136 GD-Search
the Remote-Search log block will be required from the local to the remote storage and decompression pulled, then with the Local-Search the same query on a local storage. At the same time, Remote-Search will update the local storage address of the log block to Chunk Index in order to route subsequent query requests for the same log block to the local storage.
- Log-Manager
The local storage manager is responsible for maintaining the life cycle of the log block on the local storage.
the Log-Manager periodically scan logs stored locally on the block, the block if the log retention period or over a local disk usage reaches the bottleneck, will be in accordance with the policy of the log block out of (compressed upload to the remote storage, compression algorithm ZSTD ), and update the storage information of the Chunk Index
- Log-Ingester
The log ingester module is responsible for subscribing to log data from log kafka, and then splitting the log data according to the time dimension and metadata dimension, and writing it to the corresponding log block. While generating a new log block, Log-Ingester will write the metadata of the log block to Chunk Index to ensure that the latest log block can be retrieved in real time.
- Chunk Index
Log block metadata storage, responsible for saving the metadata and storage information of the log block. At present, we have selected Redis as the storage medium. In the case that metadata indexing is not complicated, redis has been able to meet our needs for indexing log blocks, and the memory-based query speed can also meet our needs for quickly locking log blocks.
Search strategy
In the search strategy design, we believe that the search return speed is to pursue faster, while avoiding huge query requests to enter the system.
We believe that log retrieval generally has the following three scenarios:
- View the latest service log.
- View the log of a certain request and query based on logid.
- View certain types of logs, such as accessing mysql error logs, requesting downstream service logs, and so on.
In most scenarios, the user does not need all the matched logs, and a part of the logs is sufficient to deal with the problem. Therefore, the user can set the limit number when querying. When the query result meets the log number set by the limit, the entire retrieval service terminates the current query request and returns the result to the front end.
In addition, when the GD-Search component initiates a log block retrieval, it will also determine the total size of the retrieved log block in advance, and will reject the large-scale retrieval request that exceeds the limit. (Users can adjust the search time range to try a few more times or adjust the search sentence to make it more selective)
Performance at a glance
Use 1KB each log for testing, the total number of log blocks is about 10,000, local storage uses NVME SSD hard disk, remote storage uses S3 protocol standard storage.
• Write
A single core can support a writing speed of 2W/S, and a writing speed of 1W/S occupies about 1~2G of memory. It can be distributed and expanded without an upper limit.
• Query (full text search)
The query speed of 1TB log data based on local storage can be completed within 3S
The query of 1TB log data based on remote storage takes 10S.
Cost advantage
With tens of millions of writes per second and hundreds of petabytes of storage, we can use more than a dozen physical servers to ensure log writing and querying. The hot data is on the local nvme disk, the second hot data is in the object storage, and a large amount of log data is stored on the archive storage service.
Calculation comparison
Because there is no need to build an index, we only need a thousand-core level to guarantee writes. At the same time, the log index is a service that writes more and reads less, and thousands of cores can guarantee a hundred-level QPS query.
ES needs to invest tens of thousands of nuclear scales at this magnitude. To deal with write performance and query bottlenecks, but still cannot guarantee write and query efficiency.
Storage comparison
The core is to use cheaper storage media (archive storage VS local disk) and less storage data (compression ratio 1/10 vs log data index expansion) while ensuring business needs. There can be a gap of two orders of magnitude.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。