Introduction: The Alibaba Cloud ES full observation engine TimeStream time series enhancement function is newly released. Based on the cloud native ELK full hosting, the TimeStream time series enhancement function plug-in can realize high-performance, low-cost time series data storage and query analysis. This article introduces the applicable scenarios, functional advantages, performance test results and practical cases of TimeStream
The full observation capability of Elasticsearch
With the increasingly complex topological structure of enterprise IT systems, the system architecture has changed from single-channel distributed to micro-services, the deployment mode has changed from physical server deployment to virtualization to containerized applications, and the development mode after the infrastructure has been migrated to the cloud has also changed from the traditional waterfall type. Combined with DevOps development operation and maintenance. Behind the multiple data sources in complex system links are different data types and the extremely high cost of unified collection, processing, storage and maintenance of massive unstructured data. In addition to the traditional SRE operation and maintenance scenarios, enterprise business scenarios have derived various applications in real-time analysis, security audit, user behavior, operation growth, and transaction record scenarios. For a business component or system, the data generated by different solutions is difficult to communicate with each other and cannot give full play to the value of the data.
As a result, various enterprises are also paying more and more attention to the construction of system observability capabilities, and there is an urgent need to store, monitor, retrieve and analyze various types of data on a unified platform. It is recognized in the industry that log, metric, and trace are the three pillars of full observation. By building a unified observation system, in the operation and maintenance scenario, it helps the operation and maintenance personnel to understand the operating status of the system "before the event", quickly locate the fault "in the event", and "after the event". "Root cause analysis, in order to improve the high availability of the system, reduce costs and increase efficiency. However, in the evolution of full observation technology, it is not only necessary to realize the observation of logs and time series data across clouds and business systems, but there are many technical atomic tools supported by various data scenarios such as logs and time series, and the connection between tools is difficult. The price and the maintenance cost of the platform are high.
As one of the three core solutions of Elastic, observable can collect logs, indicators, uptime data, and application tracking data in a unified manner based on the full observation capability of Elasticsearch, and store all kinds of data in Elasticsearch for unified processing and analysis. Based on Kibana Complete visualization. As a result, the technology stack is unified in the observable scenario, and the SRE team does not need to build an observable platform based on multiple technical components.
In the full observation scenario, Alibaba Cloud Elasticsearch continuously optimizes the write performance and storage cost of massive log data based on the capabilities of the cloud-native serverless log engine. In the process of storing and processing Metric time series data, the following problems are often faced:
What is TimeStream?
TimeStream is a time series engine developed by the Alibaba Cloud Elasticsearch team and combined with the characteristics of the Elastic community time series products. On the basis of cloud-native ELK full hosting, high-performance, low-cost time series data storage and query analysis can be achieved through the TimeStream time series enhancement function plug-in.
Advantages of Alibaba Cloud ES TimeStream
As the core technology of Alibaba Cloud ES time series scenarios deeply integrated with the Ali kernel, Timestream has greatly optimized the cost, performance and ease of use of Alibaba Cloud ES time series scenarios:
- Data management efficiency improvement: Based on the Timestream time series data model and addition, deletion, modification and query, the best practice template of Elasticsearch in time series scenarios is integrated, which greatly reduces the threshold for Elasticsearch to manage time series indicator data.
- Improved query experience : Support using PromQL to query Elasticsearch data, seamlessly connect to Prometheus+Grafana, support DownSample sampling query and DataStream time partitioning
- Storage cost optimization : Through data compression optimization and metadata storage capacity optimization, the storage capacity of TimeStream index is reduced by more than 80% compared with the open source Elasticsearch ordinary index.
- Improved read and write performance : Compared with the open-source Elasticsearch general index, the TimeStream index is nearly 40% faster than the open-source Elasticsearch index. For common query analysis of time series data, the performance is improved by 5 times compared to the open-source Elasticsearch.
Compared with open source
In the time series scenario, when Elasticsearch uses and does not use the TimeStream plug-in, the scene-based configuration, storage, and query comparisons are as follows:
Contrast | Using TimeStream | Not using TimeStream |
<span class="lake-fontsize-10">Scene configuration</span> | <span class="lake-fontsize-10">TimeStream engine natively supports time series data model,</span> <span class="lake-fontsize-10">automatically generate _tsid, indexing sort optimization</span> < span class="lake-fontsize-10">etc</span> | <span class="lake-fontsize-10">Users are required to perform best practices for a large number of indicator scenarios, such as generating a timeline id field, using the timeline id and time to configure indexing sorting, using the timeline id for routing, etc.</span > |
<span class="lake-fontsize-10">Storage</span> | <ul><li><span class="lake-fontsize-10">ali-codec plugin supports generating _source through doc_values</span></li><li><span class="lake-fontsize-10" >support</span> <span class="lake-fontsize-10">do not store _id</span> </li><li><span class="lake-fontsize-10">ali-codec in timing Scene compression optimization</span></li></ul> | <ul><li><span class="lake-fontsize-10">Time sequence scene _id, _source and other metadata fields occupy</span> <span class="lake-fontsize-10">70%</span > <span class="lake-fontsize-10">+storage capacity</span></li><li><span class="lake-fontsize-10">doc value is not friendly to double type compression, timing scenarios The data similarity is very high, but the double data is basically not compressed</span></li></ul> |
<span class="lake-fontsize-10">Query statement</span> | <span class="lake-fontsize-10">Support</span> <span class="lake-fontsize-10">PromQL query DSL</span> | <span class="lake-fontsize-10">Specially build query DSL to query Metric data</span> |
<span class="lake-fontsize-10">Downsampling</span> | <span class="lake-fontsize-10">Simply configure the time interval to support</span> <span class="lake-fontsize-10">downsampling</span> | <span class="lake-fontsize-10">User-side downsampling is required</span> |
<span class="lake-fontsize-10">Time division</span> | <span class="lake-fontsize-10">According to the actual data partition, the data of a time range will be distributed in a certain index</span> | <span class="lake-fontsize-10">Partitioned in order of writing, data for a time range may be distributed across many indexes</span> |
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。