Introduction to CDS-SLS, as a cloud-based log platform, combines components with high cohesion and low coupling. Offline users can automatically deploy all of the above functions on at least 6 machines in operation and maintenance, operations, and finance. Big data scenarios such as management, data analysis and reports use low-code mode to effectively solve the pain points in traditional software. This article mainly provides an overview of each function point of CDS-SLS.
Preface
This article, as the second in a series of articles about the upcoming hybrid cloud product CDS-SLS (Cloud Defined Storage-Simple Log Service), focuses on an overview of the functions of CDS-SLS. As a digital carrier, the log contains information about the operation of background programs, business operations and other information.
The log query analysis has developed from the earliest manual pssh+grep on several machines to the small-scale ELK or EFK (Elasticsearch, Logstash/Filebeat/Fluentd, Kibana) of each business, and the large amount plus Kafka, if there is a collection of metrics If there is a need for visualization, you also need to add Collectd. If there is a need for visualization, you need to add Grafana. If there is a need for storage and backup, Ceph will be introduced, and Salt will be introduced for the consistency management of the basic configuration. Operation and maintenance costs are increasing rapidly.
As a cloud-based log platform, CDS-SLS implements high cohesion and low coupling for these components. Offline users can automatically deploy all the above functions on at least 6 machines in operation and maintenance, operations, and financial management. , Data analysis reports and other big data scene areas use low-code mode to effectively solve the pain points in traditional software.
Term & Background
CDS
CDS (Cloud Defined Storage) cloud defined storage. It is an output form of Software Defined Storage (SDS). CDS has a unified storage architecture and a unified user experience with public clouds, reduces the base, provides flexible deployment scales and deployment forms, and integrates multiple storage products. Provide enterprise-level storage operation, maintenance and management.
CDS supports mixed combinations of various storage products, such as CDS-OSS + CDS-SLS, CDS-EBS + CDS-SLS, etc. In terms of products, there will be two output formats: agile version (SLS has a minimum of six units, and plans to launch a more streamlined version reduced to four) and an enterprise version (SLS 6 units to hundreds of units). On the one hand, CDS improves the product competitiveness and product maturity of the Proprietary Cloud Enterprise Edition and Agile Edition; on the other hand, it realizes the access, backup, and analysis of various data in environments such as the end, edge, and customer data center.
SLS
SLS (Simple Log Service) Alibaba Cloud log service. SLS originated from the Shennong monitoring service in the early Feitian base of Alibaba Cloud and has now developed into a cloud-native observability-oriented *Ops (DevOps, SecOps, FinOps) overall solution integrating collection, query analysis, and visualization.
Overview of the main functions of SLS
The log data in SLS is AppendOnly, writes more and less reads, and is time-sensitive but does not require strict order preservation. The query frequency and popularity decrease rapidly with time. The CDS-SLS version is inherited from SLS on Alibaba Cloud. At present, SLS has supported Alibaba’s Double Eleven/Double Twelve events for many years. At the same time, it supports many major events such as Chinese New Year red envelopes and anniversary promotions. It is in terms of stability, functionality and performance. This aspect has been fully verified.
This article focuses on the various functions of SLS from the perspective of operation and maintenance. The main link of SLS includes data acquisition-data query analysis-visualization-intelligent applications. As a product that places equal emphasis on computing and storage, in order to further reduce the hardware cost of offline users, some non-universal functions will be tailored and the hardware itself The computing resources and storage resources of the company are used to the extreme.
The above figure is the SLS function from the perspective of public cloud users. For offline users of CDS-SLS, you can see the sub-modules corresponding to the SLS service on the space-based platform, and you can also see the CPU and memory usage of each process in full. From the perspective of service, it is divided into two categories: data type and scheduling type. The former is divided into 34 service roles, and the latter is divided into 10 service roles. After the split, the upgrade and expansion of the service will become easier.
SLS internal service split
sls-service-master is mainly for scheduling related services, and each of its service roles has multiple instances to ensure high availability. The main service functions are concentrated in sls-backend-server, and the general hierarchical structure is as follows:
CDS-SLS currently defaults to the Pangu 2.0 system as the underlying distributed storage. Pangu 2.0, as the storage base of Alibaba Cloud, has the characteristics of high performance and high stability. SLS's internal business modules have also been well split into microservices, and the bottom layer is self-developed using C++ to achieve extreme performance.
There are a large number of background parameters in each module of SLS that can be adjusted, but for the convenience of customers, the default values can often meet the needs of most customers. Many designs follow the classic UNIX ideas of "Separation of mechanism and policy" and "Do one thing and do it well".
- For the flow control that users care about, the background provides precise control in multiple dimensions, and the default parameters can cover most of the scenes.
- The data collection agent (Logtail) has been verified on a large scale by millions of machines for many years, and has a good guarantee of performance and stability. Compared with open source software, it can greatly reduce the occupation of machine resources (up to 90%).
- The pipeline design of "query|analysis" implements a single responsibility well, and query and analysis correspond to different back-end services.
Special design for hybrid cloud scenarios
Cluster form
There are two SLS-related clusters in the current space-based:
- The sls-common cluster in the base shares Pangu, which provides the most basic query and analysis for self-operation and maintenance of various services in the Alibaba Cloud base. The storage time of the base resources is limited to 7 days. In the scenario where the hybrid cloud network is isolated and unreachable, the efficiency of operation and maintenance is significantly improved. Developers can quickly locate the problem by querying on-site personnel through a few keywords.
- The CDS-SLS cluster purchased separately by the user has an exclusive set of Pangu. Only SLS-related processes are run in the cluster, which effectively alleviates the problem of the shortage of shared resources on the base, so the TTL of the log can be permanently stored, and there is a console with a better experience.
Most of the functions mentioned in this article are for CDS-SLS clusters purchased separately by users.
Localized Xinchuang Support
At present, CDS-SLS will support Haiguang, Kunpeng, Feiteng and other CPU architectures, and will have strict acceptance tests the same as Intel X86. Later, there will be more test support for heterogeneous CPUs and hybrid scenarios for offline output scenarios.
HTTPS access will support the national secret TLS channel transmission, making some financial or government-enterprise data access more compliant.
Comparison and migration of open source ELK solutions
ELK background
Elastic is mainly implemented based on Lucene. In 2012, Elastic became a more usable software based on the Lucene basic library package. In 2015, it launched ELK Stack (Elastic Logstash Kibana) to solve the problems of centralized log collection, storage, and query. However, the Lucene design scenario is Information Retrial, and the face is Document type, so there are certain restrictions on observable analysis (Log/Trace/Metric) data, such as scale, query capabilities, and some customized functions (such as intelligent clustering LogReduce).
<span class = "Lake-fontSize-12 is"> elasticsearch </ span> | <span class = "Lake-fontSize-12 is"> log service </ span> | <span class="lake-fontsize-12">Description</span> |
<span class="lake-fontsize-12">index</span> | <span class="lake-fontsize-12" >logstore</span> | <span class="lake-fontsize-12">Allows users to migrate data from multiple indexes to one logstore. </span> |
<span class="lake-fontsize-12">type</span> | <span class="lake-fontsize-12">Fields in logItem __tag__:_type</span > | |
<span class="lake-fontsize-12">document</span> | <span class="lake-fontsize-12">logItem</span> | <span class="lake- fontsize-12">Elasticsearch documents and logs in Log Service correspond one-to-one. </span> |
<span class="lake-fontsize-12">mapping</span> | <span class="lake-fontsize-12">logstore index</span> | <span class="lake-fontsize-12">The tool will automatically create an index for you by default. </span> |
<span class="lake-fontsize-12">field datatypes</span> | <span class="lake-fontsize-12">logItem data type</span> | < span class="lake-fontsize-12">Refer to the specific mapping relationship</span> data type mapping . |
<span class = "Lake-fontSize-12 is"> function item </ span> | <span class = "Lake-fontSize-12 is"> Logstash </ span> | <span class="lake-fontsize-12">Fluentd</span> | <span class="lake-fontsize-12">Logtail</span> |
<sizespan-fontsize 12">Log reading</span> | <span class="lake-fontsize-12">polling</span> | <span class="lake-fontsize-12">polling</span> | <span class="lake-fontsize-12">Event trigger</span> |
<span class="lake-fontsize-12">File rotation</span> | <span class="lake -fontsize-12">support</span> | <span class="lake-fontsize- 12">Support</span> | <span class="lake-fontsize-12">Support</span> |
<span class="lake-fontsize-12">Failover processing (local checkpoint)< /span> | <span class="lake-fontsize-12">support</span> | <span class="lake-fontsize-12">support</span> | <span class="lake- fontsize-12">support</span> |
<span class="lake-fontsize-12">Common log analysis</span> | <span class="lake-fontsize-12">support grok( Based on regular expressions) parsing</span> | <span class="lake-fontsize-12">supports regular expression parsing</span> | <span class="lake-fontsize-12">supports regular expressions Type analysis</span> |
<span class="lake-fontsize-12">specific log type</span> | <span class="lake-fontsize-12">support delimiter, key-value, json and other mainstream formats</span> | <span class="lake-fontsize-12">Support delimiter, key-value, json and other mainstream formats</span> | <span class="lake-fontsize-12">Support key-value format</span > |
<span class="lake-fontsize-12">data transmission compression</span> | <span class="lake-fontsize-12">plug-in support</span> | <span> "lake-fontsize-12">Plugin support</span> | <span class="lake-fontsize-12">LZ4</span> |
<span class="lake-fontsize-12">Data Filtering</span> | <span class="lake-fontsize-12">support</span> | <span class="lake-fontsize-12">support</span> | <span class=" lake-fontsize-12">support</span> |
<span class="lake-fontsize-12">data buffer sending</span> | <span class="lake-fontsize-12">plugin Support</span> | <span class="lake-fontsize-12">Plugin support</span> | <span class="lake-fontsize-12">support</span> |
<span class="lake-fontsize -12">Send exception handling</span> | <span class="lake-fontsize-12">Plugin support</span> | <span class="lake-fontsize-12">Plugin support</span > | <span class="lake-fontsize-12">support</span> |
<span class="lake-fontsize-12">operating environment</span> | <span class="lake -fontsize-12">JRuby implementation, dependent on JVM environment</span> | <span class="lake-fontsize-12">CRuby, C implementation, dependent on Ruby environment</span> | <span class="lake -fontsize-12">C++ implementation, no special requirements</span> |
<span class="lake-fontsize-12">Thread support</span> | <span class="lake-fontsize-12 ">Support multi-threading</span> | <span class="lake-fontsize-12">Multithreading is restricted by GIL</span> | <span class="lake-fontsize-12">Support for multithreading</span> |
<span class ="lake-fontsize-12">Hot upgrade</span> | <span class="lake-fontsize-12">Not supported</span> | <span class="lake-fontsize-12">No Support</span> | <span class="lake-fontsize-12">Support</span> |
<span class="lake-fontsize-12">centralized configuration management</span> | <span class="lake-fontsize-12">Not supported</span> | <span class="lake-fontsize-12">Not supported</span> | <span class="lake-fontsize-12 ">Support</span> |
<span class="lake-fontsize-12">Operation status self-check</span> | <span class="lake-fontsize-12">Not supported</span > | <span class="lake-fontsize-12" >Not supported</span> | <span class="lake-fontsize-12">support cpu/memory threshold protection</span> |
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。