[CDS Technology Revealed Series 02] Alibaba Cloud CDS-SLS Revealed

Introduction to CDS-SLS, as a cloud-based log platform, combines components with high cohesion and low coupling. Offline users can automatically deploy all of the above functions on at least 6 machines in operation and maintenance, operations, and finance. Big data scenarios such as management, data analysis and reports use low-code mode to effectively solve the pain points in traditional software. This article mainly provides an overview of each function point of CDS-SLS.

Preface

This article, as the second in a series of articles about the upcoming hybrid cloud product CDS-SLS (Cloud Defined Storage-Simple Log Service), focuses on an overview of the functions of CDS-SLS. As a digital carrier, the log contains information about the operation of background programs, business operations and other information.

The log query analysis has developed from the earliest manual pssh+grep on several machines to the small-scale ELK or EFK (Elasticsearch, Logstash/Filebeat/Fluentd, Kibana) of each business, and the large amount plus Kafka, if there is a collection of metrics If there is a need for visualization, you also need to add Collectd. If there is a need for visualization, you need to add Grafana. If there is a need for storage and backup, Ceph will be introduced, and Salt will be introduced for the consistency management of the basic configuration. Operation and maintenance costs are increasing rapidly.

As a cloud-based log platform, CDS-SLS implements high cohesion and low coupling for these components. Offline users can automatically deploy all the above functions on at least 6 machines in operation and maintenance, operations, and financial management. , Data analysis reports and other big data scene areas use low-code mode to effectively solve the pain points in traditional software.

Term & Background

CDS

CDS (Cloud Defined Storage) cloud defined storage. It is an output form of Software Defined Storage (SDS). CDS has a unified storage architecture and a unified user experience with public clouds, reduces the base, provides flexible deployment scales and deployment forms, and integrates multiple storage products. Provide enterprise-level storage operation, maintenance and management.

CDS supports mixed combinations of various storage products, such as CDS-OSS + CDS-SLS, CDS-EBS + CDS-SLS, etc. In terms of products, there will be two output formats: agile version (SLS has a minimum of six units, and plans to launch a more streamlined version reduced to four) and an enterprise version (SLS 6 units to hundreds of units). On the one hand, CDS improves the product competitiveness and product maturity of the Proprietary Cloud Enterprise Edition and Agile Edition; on the other hand, it realizes the access, backup, and analysis of various data in environments such as the end, edge, and customer data center.

SLS

SLS (Simple Log Service) Alibaba Cloud log service. SLS originated from the Shennong monitoring service in the early Feitian base of Alibaba Cloud and has now developed into a cloud-native observability-oriented *Ops (DevOps, SecOps, FinOps) overall solution integrating collection, query analysis, and visualization.

Overview of the main functions of SLS

The log data in SLS is AppendOnly, writes more and less reads, and is time-sensitive but does not require strict order preservation. The query frequency and popularity decrease rapidly with time. The CDS-SLS version is inherited from SLS on Alibaba Cloud. At present, SLS has supported Alibaba’s Double Eleven/Double Twelve events for many years. At the same time, it supports many major events such as Chinese New Year red envelopes and anniversary promotions. It is in terms of stability, functionality and performance. This aspect has been fully verified.

This article focuses on the various functions of SLS from the perspective of operation and maintenance. The main link of SLS includes data acquisition-data query analysis-visualization-intelligent applications. As a product that places equal emphasis on computing and storage, in order to further reduce the hardware cost of offline users, some non-universal functions will be tailored and the hardware itself The computing resources and storage resources of the company are used to the extreme.

The above figure is the SLS function from the perspective of public cloud users. For offline users of CDS-SLS, you can see the sub-modules corresponding to the SLS service on the space-based platform, and you can also see the CPU and memory usage of each process in full. From the perspective of service, it is divided into two categories: data type and scheduling type. The former is divided into 34 service roles, and the latter is divided into 10 service roles. After the split, the upgrade and expansion of the service will become easier.

SLS internal service split

sls-service-master is mainly for scheduling related services, and each of its service roles has multiple instances to ensure high availability. The main service functions are concentrated in sls-backend-server, and the general hierarchical structure is as follows:

CDS-SLS currently defaults to the Pangu 2.0 system as the underlying distributed storage. Pangu 2.0, as the storage base of Alibaba Cloud, has the characteristics of high performance and high stability. SLS's internal business modules have also been well split into microservices, and the bottom layer is self-developed using C++ to achieve extreme performance.

There are a large number of background parameters in each module of SLS that can be adjusted, but for the convenience of customers, the default values can often meet the needs of most customers. Many designs follow the classic UNIX ideas of "Separation of mechanism and policy" and "Do one thing and do it well".

For the flow control that users care about, the background provides precise control in multiple dimensions, and the default parameters can cover most of the scenes.
The data collection agent (Logtail) has been verified on a large scale by millions of machines for many years, and has a good guarantee of performance and stability. Compared with open source software, it can greatly reduce the occupation of machine resources (up to 90%).
The pipeline design of "query|analysis" implements a single responsibility well, and query and analysis correspond to different back-end services.

Special design for hybrid cloud scenarios

Cluster form

There are two SLS-related clusters in the current space-based:

The sls-common cluster in the base shares Pangu, which provides the most basic query and analysis for self-operation and maintenance of various services in the Alibaba Cloud base. The storage time of the base resources is limited to 7 days. In the scenario where the hybrid cloud network is isolated and unreachable, the efficiency of operation and maintenance is significantly improved. Developers can quickly locate the problem by querying on-site personnel through a few keywords.
The CDS-SLS cluster purchased separately by the user has an exclusive set of Pangu. Only SLS-related processes are run in the cluster, which effectively alleviates the problem of the shortage of shared resources on the base, so the TTL of the log can be permanently stored, and there is a console with a better experience.

Most of the functions mentioned in this article are for CDS-SLS clusters purchased separately by users.

Localized Xinchuang Support

At present, CDS-SLS will support Haiguang, Kunpeng, Feiteng and other CPU architectures, and will have strict acceptance tests the same as Intel X86. Later, there will be more test support for heterogeneous CPUs and hybrid scenarios for offline output scenarios.

HTTPS access will support the national secret TLS channel transmission, making some financial or government-enterprise data access more compliant.

Comparison and migration of open source ELK solutions

ELK background

Elastic is mainly implemented based on Lucene. In 2012, Elastic became a more usable software based on the Lucene basic library package. In 2015, it launched ELK Stack (Elastic Logstash Kibana) to solve the problems of centralized log collection, storage, and query. However, the Lucene design scenario is Information Retrial, and the face is Document type, so there are certain restrictions on observable analysis (Log/Trace/Metric) data, such as scale, query capabilities, and some customized functions (such as intelligent clustering LogReduce).

<span class = "Lake-fontSize-12 is"> elasticsearch </ span>	<span class = "Lake-fontSize-12 is"> log service </ span>	<span class="lake-fontsize-12">Description</span>
<span class="lake-fontsize-12">index</span>	<span class="lake-fontsize-12" >logstore</span>	<span class="lake-fontsize-12">Allows users to migrate data from multiple indexes to one logstore. </span>
<span class="lake-fontsize-12">type</span>	<span class="lake-fontsize-12">Fields in logItem __tag__:_type</span >
<span class="lake-fontsize-12">document</span>	<span class="lake-fontsize-12">logItem</span>	<span class="lake- fontsize-12">Elasticsearch documents and logs in Log Service correspond one-to-one. </span>
<span class="lake-fontsize-12">mapping</span>	<span class="lake-fontsize-12">logstore index</span>	<span class="lake-fontsize-12">The tool will automatically create an index for you by default. </span>
<span class="lake-fontsize-12">field datatypes</span>	<span class="lake-fontsize-12">logItem data type</span>	< span class="lake-fontsize-12">Refer to the specific mapping relationship</span> data type mapping .

## Log collection terminal function comparison At present, the log collection terminals of mainstream open source communities are generally Logstash and Fluentd. Early versions will compare some functions and performance. For test results,

<span class = "Lake-fontSize-12 is"> function item </ span>	<span class = "Lake-fontSize-12 is"> Logstash </ span>	<span class="lake-fontsize-12">Fluentd</span>	<span class="lake-fontsize-12">Logtail</span>
<sizespan-fontsize 12">Log reading</span>	<span class="lake-fontsize-12">polling</span>	<span class="lake-fontsize-12">polling</span>	<span class="lake-fontsize-12">Event trigger</span>
<span class="lake-fontsize-12">File rotation</span>	<span class="lake -fontsize-12">support</span>	<span class="lake-fontsize- 12">Support</span>	<span class="lake-fontsize-12">Support</span>
<span class="lake-fontsize-12">Failover processing (local checkpoint)< /span>	<span class="lake-fontsize-12">support</span>	<span class="lake-fontsize-12">support</span>	<span class="lake- fontsize-12">support</span>
<span class="lake-fontsize-12">Common log analysis</span>	<span class="lake-fontsize-12">support grok( Based on regular expressions) parsing</span>	<span class="lake-fontsize-12">supports regular expression parsing</span>	<span class="lake-fontsize-12">supports regular expressions Type analysis</span>
<span class="lake-fontsize-12">specific log type</span>	<span class="lake-fontsize-12">support delimiter, key-value, json and other mainstream formats</span>	<span class="lake-fontsize-12">Support delimiter, key-value, json and other mainstream formats</span>	<span class="lake-fontsize-12">Support key-value format</span >
<span class="lake-fontsize-12">data transmission compression</span>	<span class="lake-fontsize-12">plug-in support</span>	<span> "lake-fontsize-12">Plugin support</span>	<span class="lake-fontsize-12">LZ4</span>
<span class="lake-fontsize-12">Data Filtering</span>	<span class="lake-fontsize-12">support</span>	<span class="lake-fontsize-12">support</span>	<span class=" lake-fontsize-12">support</span>
<span class="lake-fontsize-12">data buffer sending</span>	<span class="lake-fontsize-12">plugin Support</span>	<span class="lake-fontsize-12">Plugin support</span>	<span class="lake-fontsize-12">support</span>
<span class="lake-fontsize -12">Send exception handling</span>	<span class="lake-fontsize-12">Plugin support</span>	<span class="lake-fontsize-12">Plugin support</span >	<span class="lake-fontsize-12">support</span>
<span class="lake-fontsize-12">operating environment</span>	<span class="lake -fontsize-12">JRuby implementation, dependent on JVM environment</span>	<span class="lake-fontsize-12">CRuby, C implementation, dependent on Ruby environment</span>	<span class="lake -fontsize-12">C++ implementation, no special requirements</span>
<span class="lake-fontsize-12">Thread support</span>	<span class="lake-fontsize-12 ">Support multi-threading</span>	<span class="lake-fontsize-12">Multithreading is restricted by GIL</span>	<span class="lake-fontsize-12">Support for multithreading</span>
<span class ="lake-fontsize-12">Hot upgrade</span>	<span class="lake-fontsize-12">Not supported</span>	<span class="lake-fontsize-12">No Support</span>	<span class="lake-fontsize-12">Support</span>
<span class="lake-fontsize-12">centralized configuration management</span>	<span class="lake-fontsize-12">Not supported</span>	<span class="lake-fontsize-12">Not supported</span>	<span class="lake-fontsize-12 ">Support</span>
<span class="lake-fontsize-12">Operation status self-check</span>	<span class="lake-fontsize-12">Not supported</span >	<span class="lake-fontsize-12" >Not supported</span>	<span class="lake-fontsize-12">support cpu/memory threshold protection</span>

If users used logstash to collect before, they can easily migrate to SLS, please refer to: Logstash data source access SLS Logstash supports all mainstream log types, with the most abundant plug-in support and flexible DIY, but the performance is poor. JVM easily leads to high memory usage. As one of the most important plug-ins, Grok performance issues have caused many users to switch from ELK to EFK. Fluentd supports all mainstream log types and supports more plug-ins. Its performance is improved compared to Logstash but is limited by Ruby's GIL lock. The performance needs to be further improved. The official document also clarifies this defect. Ruby has GIL (Global Interpreter Lock), which allows only one thread to execute at a time. While I/O tasks can be multiplexed, CPU-intensive tasks will block other jobs. One of the CPU-intensive tasks in Fluentd is compression. Logtail occupies the least CPU and memory resources of the machine. The E2E experience combined with Alibaba Cloud Log Service is good. At present, after several years of iteration, the functions have been relatively complete, especially for container scenarios. A lot of performance optimization and user experience optimization have been performed. For Daemonset The support for Sidecar and Sidecar mode is relatively mature. ## Query analysis and comparison • Low threshold: complete SQL92 standard, complete JDBC protocol, support Join • High performance: query latency is lower than ES, •Intelligent: support machine learning AI algorithms, support scenario-based aggregation functions

* Scenario-based functions: 30+ aggregate calculation functions for data analysis scenarios are accumulated through more than ten years of practical experience * Machine learning functions: the combination of big data and machine learning, rich machine learning functions * Multi-channel data sources: leverage the advantages of Alibaba Cloud platform to link more data sources, such as IP geographic location database data, threat intelligence data, white hat security asset database

## One-click data migration from ELK to SLS Many current users choose SLS when it is difficult to operate and maintain when the water level of multiple scattered small ES clusters and message queues are high. Therefore, the migration from ES to SLS becomes very critical. How can it be elegant and easy? Migrating data to SLS, SLS currently has a more mature and easy-to-use solution. ELK to SLS data one-click migration program details ### Important Features * Support breakpoint resumable transfer: cache\_path in the call parameter specifies the storage location of the checkpoint. When the migration program is interrupted, specify the same cache\_path when reopening to continue the migration task. * Fast migration rate: single SLS shard, single ES shard, pool\_size is 1, which can achieve a migration speed close to 5M/s. Speed can be increased by adjusting the size of SLS shard and pool\_size. ### Commonly used migration command modes * Import all documents in Elasticsearch with hosts as localhost:9200 into project1 of Log Service. aliyunlog log es\_migration --cache\_path=/path/to/cache --hosts=localhost:9200 --project\_name=project1 * Specify to write data in Elasticsearch whose index name starts with myindex\_ to log store1, and write data in indexes index1, index2 to log store2. aliyunlog log es\_migration --cache\_path=/path/to/cache --hosts=localhost:9200,other\_host:9200 --project\_name=project1 --logstore\_index\_mappings='{"logstore1": "myindex\_*", "logstore2": "index1,index2"}}' # Follow up Under the premise of ensuring stability, in order to pursue the ultimate performance, new versions of iterative upgrades and the introduction of more new functions will be continued, and the general needs of customers will be put on a higher priority. For example, the hardware model SLS that users can feel more intuitively will increase from a single plan to three offline to meet the refined needs of more types of customers. On a regular basis, hardware with a relatively low computing threshold and logbooks will be introduced. High-density storage models for financial customers with special long-term storage requirements. Faster and smarter query analysis at the software level as a core function will enable CDS-SLS to give customers a better experience in a one-stop log platform. SLS starts with logs, not just logs, and other more functions will continue to be shared in the future. Original work: Alibaba Cloud Storage Zen Car Series articles pass the door: Cloud's cloud-defined storage is coming 1615ea20114edc https://developer.aliyun.com/article/792044?spm=a2c6h.13148508.0.0.3eef4f0ecyZOjQ 2. [CDS Technology Secret Series 01] Cloud CDS-OSS Disaster Recovery Big Secret 1615ea20114efb https://developer.aliyun.com/article/792000?spm=a2c6h.13148508.0.0.3eef4f0ecyZOjQ > Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.