About the Author
Yuan Zhen, Technical Support Manager of SUSE Rancher, is responsible for the after-sales technical support team for subscription customers and provides technical support services for subscription customers. Since 2016, he has been exposed to container and Kubernetes technologies. He has in-depth research on automated operation and maintenance, Devops, Kubernetes, prometheus and other cloud-native related technologies. He has rich practical experience in SRE operation and maintenance system construction and SRE operation and maintenance system architecture design.
Background overview
Before SUSE Rancher 2.5, the log collection architecture was to use Fluentd to collect the specified directory log or container standard output stream, and then send it to the specified backend through the configuration on the UI log function page. Common backend tools such as Elasticsearch, splunk, kafka, syslog, fluentd, etc., can automatically collect logs of kubernetes clusters and workloads running on the clusters. This method is undoubtedly very simple, easy to use, and convenient. However, as users deepen their understanding of cloud native and increase the granularity of business log analysis, the previous log collection method is too rigid and has very poor flexibility. Users who have higher requirements for log collection can no longer be satisfied.
So starting from SUSE Rancher version 2.5, BanzaiCloud's open source product Logging Operator has become a component of the log collection function of the new generation of SUSE Rancher container cloud platform. SUSE Rancher version 2.5, as a transitional version of the new log component, retains the log collection method of the old version, and uses the new Logging Operator as an experimental function; SUSE Rancher version 2.6 has completely used the Logging Operator as a log collection tool.
This article will explore the new SUSE Rancher 2.6 Logging Operator feature and how to use it from shallow to deep.
is Logging Operator
Logging Operator is BanzaiCloud's open source log collection solution based on cloud-native scenarios. After SUSE Rancher 2.6 integrates this product, Logging Operator will be automatically deployed and the kuberletes logging pipeline will be automatically configured under the user's operation.
Logging Operator Architecture
The above is the official architecture diagram of Logging Operator. The way the Logging Operator uses CRD determines the configuration of the three stages of log collection, rule routing, and output. It uses Daemonset to deploy Fluentbit components on cluster nodes, and uses StatefulSet to deploy Fluentd components. First, the logs are collected and preliminarily processed by the Fluentbit component, and then sent to the Fluentd component for further analysis and processing, and finally sent to different back-end tools by the Fluentd component.
Logging Operator CRD
As mentioned above, the way the Logging Operator uses CRDs determines the configuration of the three stages of log collection, rule routing, and output. The CRDs it uses mainly include the following five types:
- logging: It is used to define the basic configuration of a log collection end (FleuntBit) and transmission end (Fleuntd) service. In SUSE Rancher 2.6 version, it has been automatically deployed by Rancher;
- flow: used to define a namespaces (namespace) level log filtering, parsing and routing rules;
- clusterflow: used to define a cluster-level log filtering, parsing and routing rules;
- output: used to define the output and parameters of the namespace (namespace) level log, which can only be associated with the flow in the namespace;
- clusteroutput: Used to define cluster-level log output and parameters, which can be associated with flows in other namespaces.
The above five CRD types are very important and determine where the logs of each namespace and even each container in a Kubernetes cluster are output.
Enable SUSE Rancher 2.6 Logging
After SUSE Rancher 2.6 successfully creates a downstream Kubernetes cluster, you can find the deployment entry of the Logging tool on the cluster tools page, as shown in the following figure:
After clicking the install button, select the component installation project, usually the System project, and run the components related to cluster operation in this project; after clicking Next, the UI will ask for the configuration Docker root directory and System Log Path;
- Note: If the underlying container runtime (Runtime) is Docker, the default is: /var/lib/docker. If you have customized it, please fill in the correct Docker Root Dir directory, which can be viewed through the docker info | grep Docker Root Dir command;
- Note: The System Log Path is used to collect the host OS Journal logs. If the input is incorrect, the host OS logs will not be collected. The method to confirm the path is as follows:
## 确认Journal Log Path检查方式,在节点上执行以下命令
cat /etc/systemd/journald.conf | grep -E ^\#?Storage | cut -d"=" -f2
1、如果返回persistent,则systemdLogPath应该是/var/log/journal;
2、如果返回volatile,则systemdLogPath应该是/run/log/journal;
3、如果返回auto,则检查/var/log/journal是否存在;
- 如果/var/log/journal存在,则使用/var/log/journal;
- 如果/var/log/journal不存在,则使用/run/log/journal;
After entering the correct Docker root directory and systemdLogPath, click Install, SUSE Rancher 2.6 will automatically deploy the Logging Operator
Execute the following command to check if the deployment was successful
kubectl get pod -n cattle-logging-system
NAME READY STATUS RESTARTS AGE
rancher-logging-96b68cc4b-vqxnd 1/1 Running 0 9m54s
rancher-logging-fluentbit-cntgb 1/1 Running 0 69s
rancher-logging-fluentbit-hwmdx 1/1 Running 0 71s
rancher-logging-fluentbit-nw7rw 1/1 Running 0 71s
rancher-logging-fluentd-0 2/2 Running 0 9m34s
rancher-logging-fluentd-configcheck-ac2d4553 0/1 Completed 0 9m48s
After the deployment is complete, there is an additional log tab on the SUSE Rancher 2.6 cluster UI, from which the flow, clusterflow, output, and clusteroutput resource objects mentioned above can be configured to complete the entire process configuration of log collection, filtering routing, and output. .
It is convenient to configure the log collection process directly from the SUSE Rancher 2.6 UI, but the "big gods" who are familiar with the Logging Operator or the "big men" who are often used can use the SUSE Rancher 2.6 UI to configure directly to complete the entire log collection. , the whole process configuration of filtering routing and output; for the "noob" friends who have just upgraded to SUSE Rancher 2.6 or just contacted the Logging Operator, let's not rush to get started, explain in simple terms, step by step, continue to read down to see these How CRDs are configured and work.
Flow and ClusterFlow
As can be seen from the architecture flow chart at the beginning of this article, the two core concepts of the entire Logging Operator are flow and output, where flow is used to process log streams, which determines where to collect, how to filter, and how to route and distribute; clusterflow Has the same functionality and is a global flow.
Before understanding the definition of flow CRD, let's use a simple flow example for the following analysis:
apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: default-flow
namespace: default ##定义收集日志的命名空间
spec:
filters: ##定义过滤器,一个flow可以定义一个或者多个
- parser:
remove_key_name_field: true
parse: ##parse支持apache2, apache_error, nginx, syslog, csv, tsv, ltsv, json, multiline, none, logfmt类型解析
type: nginx ##采集日志按照Nginx格式进行处理
- tag_normaliser:
format: ${namespace_name}.${pod_name}.${container_name} ##在fluentd里面使用${namespace_name}.${pod_name}.${container_name}的格式
localOutputRefs:
- "elasticsearch-output" ## 定义Output
match: ## Kubernetes标签来定义哪些日志需要被采集
- select:
labels: ## 使用标签匹配采集日志
app: nginx
This flow means that only the container logs with the label app=nginx in the default namespace will be collected, the logs will be parsed according to the nginx format after collection, and the tag of this log stream will be redirected to ${namespace_name} when it is finally aggregated by fluentd. ${pod_name}.${container_name} format.
match
There is a very important definition match in the above example, which defines which logs need to be collected. According to the instructions given by the official documentation of Logging Operato, the fields that can be used currently are as follows:
- namespaces uses namespaces for matching;
- labels use labels to match;
- hosts use hosts for matching;
- container_names uses container names for matching.
Imagine a scenario, in a Kubernetes cluster, we only want to not collect the logs of certain containers, and use matching requires writing countless matching rules, which is obviously unreasonable, so the official gives the exclude field and use exclusion. Examples are as follows:
apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: default-flow
namespace: default ##定义收集日志的命名空间
spec:
filters: ##定义过滤器,一个flow可以定义一个或者多个
- parser:
remove_key_name_field: true
parse: ##parse支持apache2, apache_error, nginx, syslog, csv, tsv, ltsv, json, multiline, none, logfmt类型解析
type: nginx ##采集日志按照Nginx格式进行处理
- tag_normaliser:
format: ${namespace_name}.${pod_name}.${container_name} ##在fluentd里面使用${namespace_name}.${pod_name}.${container_name}的格式
localOutputRefs:
- "es-output" ## 定义Output
match: ## Kubernetes标签来定义哪些日志需要被采集
- exclude: ##排除所有标签是app:nginx的容器
labels:
app: nginx
The above example will exclude the collection of all container logs with the label app:nginx; exclude and select can exist at the same time, or multiple exclude and select can exist, and it is more flexible to define which container logs need to be collected.
In addition to flow there is clusterflow. Imagine a scenario where we have N namespaces, but we need to collect container logs for all namespaces except a certain namespace. At this time, it is obviously unreasonable to set a flow for each namespace, which requires the use of clusterflow to set rules for collection. The example is as follows:
apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterFlow ##定义为clusterflow
metadata:
name: default-cluster-flow
spec:
filters: ##定义过滤器,一个flow可以定义一个或者多个
- parser:
remove_key_name_field: true
parse: ##parse支持apache2, apache_error, nginx, syslog, csv, tsv, ltsv, json, multiline, none, logfmt类型解析
type: nginx ##采集日志按照Nginx格式进行处理
- tag_normaliser:
format: ${namespace_name}.${pod_name}.${container_name} ##在fluentd里面使用${namespace_name}.${pod_name}.${container_name}的格式
localOutputRefs:
- "es-output" ## 定义Output
match: ## Kubernetes标签来定义哪些日志需要被采集
- exclude: ##排除不想要采集的namespaces
namespaces:
- default
- kube-system
The above example indicates that this is a cluster-level flow. According to the match definition, all namespace logs except the default and kube-system namespaces will be collected; the same as flow, exclude and select defined by match in clusterflow can be used at the same time. Exist, there can also be multiple exclude and select at the same time, so as to formulate more detailed rules.
filters
filters are log processing plug-ins in the Logging Operator. The currently officially supported log processing plug-ins are as follows:
- Concat: a plugin for fluentd to handle log multiline;
- Dedot: handles field replacement plugins with ., usually used for field conversion before output to elaticsearch;
- Exception Detector: Exception log catcher, supports java, js, csharp, python, go, ruby, php;
- Enhance K8s Metadata: k8s extension metadata developed by banzaicloud;
- Geo IP: fluentd's GeoIP address library;
- Grep: grep filter for fluentd;
- Parser: fluentd's Parser parser, parse supports apache2, apache_error, nginx, syslog, csv, tsv, ltsv, json, multiline, none;
- Prometheus: Prometheus plugin that can be used to count logs;
- Record Modifier: fluentd field modification plugin;
- Record Transformer: Mutates/transforms incoming event streams;
- Stdout: standard output plugin;
- SumoLogic: A log processing plugin from Sumo Logic;
- Tag Normaliser: The tag modifier in fluentd.
Parser plugin
The Parser plugin is the most common and simplest, and supports parsing logs in apache2, apache_error, nginx, syslog, csv, tsv, ltsv, json, multiline, none formats. If you need to parse the nginx log, you can use the Parser plugin to process it directly. The example is as follows:
apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: default-nginx-flow
namespace: default ##定义收集日志的命名空间
spec:
filters: ##定义过滤器,一个flow可以定义一个或者多个
- parser:
remove_key_name_field: true
parse: ##parse支持apache2, apache_error, nginx, syslog, csv, tsv, ltsv, json, multiline, none, logfmt类型解析
type: nginx ##采集日志按照Nginx格式进行处理
If you need to parse logs in json format, the example is as follows:
apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: default-flow
namespace: default ##定义收集日志的命名空间
spec:
filters: ##定义过滤器,一个flow可以定义一个或者多个
- parser:
remove_key_name_field: true
parse:
type: json ##解析为json格式
time_key: time ##自定义key
time_format: "%Y-%m-%dT%H:%M:%S"
We can specify multiple types of log parsing formats in the Parser plugin, examples are as follows:
apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: default-flow
namespace: default ##定义收集日志的命名空间
spec:
filters: ##定义过滤器,一个flow可以定义一个或者多个
- parser:
remove_key_name_field: true
parse:
type: multi_format
patterns:
- format: nginx ##解析Nginx格式
- format: json ##解析json格式
time_key: time
time_format: "%Y-%m-%dT%H:%M:%S"
As can be seen from the above examples, the processing of parsing log formats is very flexible, and different plug-ins and parsing formats can be used to deal with different types of log formats.
After introducing the commonly used Parser plug-ins, I will introduce you to a scenario that may not have been encountered before or is more complicated to configure: if you need to count the total number of business logs printed in a certain period of time, to analyze business operation indicators, How to do it? Maybe the first thing that everyone thinks is that they have already been processed and thrown into a tool like Kibana, so the data will be output after filtering it? This approach is fine, but what if you want to continuously analyze business operational metrics by the number of log entries? Next, share another plugin.
Prometheus plugin
As a monitoring tool in the cloud-native era, Prometheus is a powerful time series database. With PromQL, Grafana and many exporters, it can basically monitor any indicators we need. The Logging Operator also introduces the Prometheus plug-in, which is used for the specification of statistical exposure collection. log entry. An example is as follows:
apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: default-flow
namespace: default ##定义收集日志的命名空间
spec:
filters: ##定义过滤器,一个flow可以定义一个或者多个
- parser:
remove_key_name_field: true
parse: ##parse支持apache2, apache_error, nginx, syslog, csv, tsv, ltsv, json, multiline, none, logfmt类型解析
type: nginx ##采集日志按照Nginx格式进行处理
- prometheus: ##Pormetheus插件
metrics:
- desc: The total number of nginx in log. ##指标说明
name: nginx_log_total_counter ##指标名称
type: counter ##指标Prometheus类型
labels: ## 指标标签
app: nginx
labels: ##指标标签
host: ${hostname}
tag: ${tag}
namespace: $.kubernetes.namespaces
- tag_normaliser:
format: ${namespace_name}.${pod_name}.${container_name} ##在fluentd里面使用${namespace_name}.${pod_name}.${container_name}的格式
localOutputRefs:
- "es-output" ## 定义Output
match: ## Kubernetes标签来定义哪些日志需要被采集
- select:
labels: ## 使用标签匹配采集日志
app: nginx
The above example uses the Parser plugin to process Nginx logs, and uses the Prometheus plugin to count the output Nginx logs. The Prometheus plugin mainly has the following fields:
- desc: description of the indicator;
- name: the name of the indicator;
- type: the Prometheus data type of the indicator (for more information, please refer to the Prometheus data type documentation);
- labels: The labels of the metrics (for more information, please refer to the Prometheus Label documentation).
Through the above Flow, you can count how many lines of Nginx logs are processed in total, and use Prometheus to expose counter type indicators for monitoring and analysis.
Summary
The usage examples of the above two plug-ins show the flexibility and power of the Logging Operator. In addition, you can also cooperate with many other plug-ins to flexibly fulfill the requirements for log collection and processing. For more plugin descriptions supported by Logging Operator's filter, please check https://banzaicloud.com/docs/one-eye/logging-operator/configuration/plugins/filters/
Output and ClusterOutput
The two CRDs Output and ClusterOutput define where the processed logs should be output. Like flow and clusterflow, output is also at the namespaces level and can only be referenced by flows in the same namespace; clusterflow is at the cluster level and can be referenced by flows and clusterflows in different namespaces.
Output
Output defines the output method of logs. Currently, the Logging Operator supports the following output plugins:
- Alibaba Cloud
- Amazon CloudWatch
- Amazon Elasticsearch
- Amzon Kinesis
- Amazon S3
- Amzon Storage
- Buffer
- Datadog
- Elasticsearch
- File
- Format
- Format rfc5424
- Forward
- GELF
- Goole Cloud Storage
- Grafana Loki
- Http
- Kafka
- LogDNA
- LogZ
- NewRelic
- Splunk
- SumoLogic
- Syslog
It can be seen that the plugins supported by the Logging Operator are still very rich, basically covering the tools in the usage scenarios. Below we will use two commonly used plug-ins, Kafka and Elasticsearch, as examples to configure Output CRD.
Output-Kafka
apiVersion: logging.banzaicloud.io/v1beta1
kind: Output
metadata:
name: kafka-output
spec:
kafka:
brokers: kafka-headless.kafka.svc.cluster.local:29092 ##kafka地址
default_topic: topic
topic_key: kafka-output ##kafka topic名称;
sasl_over_ssl: false ##是否使用ssl
format:
type: json ##类型
buffer: ##发送buff配置
tags: topic
timekey: 1m
timekey_wait: 30s
timekey_use_utc: true
In the above simple example, we define an output CRD that is output to kafka, and we can see several key configurations, kafka address, topic name, whether to use ssl; the following buffer refers to the sending buffer configuration, such a one sent to The output CRD of kafka is completed. Buff configuration can also be adjusted according to the actual situation.
Output-Elasticsearch
apiVersion: logging.banzaicloud.io/v1beta1
kind: Output
metadata:
name: elasticsearch-output
spec:
elasticsearch:
host: elasticsearch-elasticsearch-cluster.default.svc.cluster.local
port: 9200
scheme: https
ssl_verify: false
ssl_version: TLSv1_2
buffer:
timekey: 1m
timekey_wait: 30s
timekey_use_utc: true
In the above simple example, we define an output CRD to output to elasticsearch, which is similar to the output configuration of kafka. It is necessary to define the address, port of elasticsearch, and whether ssl authentication is required; the buffer configuration defines the buff configuration when sending.
For more configuration items, check the documentation: https://banzaicloud.com/docs/one-eye/logging-operator/configuration/plugins/outputs/
flow and output are associated with
After defining the flow and output, you need to define localOutputRefs in the flow to associate with the output to send logs. An example is as follows:
apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: default-flow
namespace: default ##定义收集日志的命名空间
spec:
filters: ##定义过滤器,一个flow可以定义一个或者多个
- parser:
remove_key_name_field: true
parse: ##parse支持apache2, apache_error, nginx, syslog, csv, tsv, ltsv, json, multiline, none, logfmt类型解析
type: nginx ##采集日志按照Nginx格式进行处理
- tag_normaliser:
format: ${namespace_name}.${pod_name}.${container_name} ##在fluentd里面使用${namespace_name}.${pod_name}.${container_name}的格式
localOutputRefs:
- "elasticsearch-output" ## 输出到elasticsearch-output
match: ## Kubernetes标签来定义哪些日志需要被采集
- select:
labels: ## 使用标签匹配采集日志
app: nginx
In the above example, localOutputRefs is configured with elasticsearch-output, so that it is associated with the output named elasticsearch-output that we just defined, and the log can be output from the specified output. It is worth mentioning that the Logging Operator fully considers the requirement that the same log needs to be output to different locations. For example, the following example can output logs to kafka and elasticsearch at the same time:
apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: default-flow
namespace: default ##定义收集日志的命名空间
spec:
filters: ##定义过滤器,一个flow可以定义一个或者多个
- parser:
remove_key_name_field: true
parse: ##parse支持apache2, apache_error, nginx, syslog, csv, tsv, ltsv, json, multiline, none, logfmt类型解析
type: nginx ##采集日志按照Nginx格式进行处理
- tag_normaliser:
format: ${namespace_name}.${pod_name}.${container_name} ##在fluentd里面使用${namespace_name}.${pod_name}.${container_name}的格式
localOutputRefs:
- "elasticsearch-output" ## 输出到elasticsearch-output
- "kafka-output" ##输出到kafka-output
match: ## Kubernetes标签来定义哪些日志需要被采集
- select:
labels: ## 使用标签匹配采集日志
app: nginx
Summary
The flow, clusterflow and output, clusteroptput functions described above are flexible and powerful. After SUSE Rancher 2.6 deploys Logging, you can write CRDs through yaml and apply them directly to the cluster, or you can configure them directly on the SUSE Rancher UI. The next chapter will describe how to configure flow and output on the SUSE Rancher UI.
Configure flow and output on SUSE Rancher UI
This chapter will describe the log collection configuration on the SUSE Rancher UI.
Create outputs/clusteroutputs
To configure on the SUSE Rancher UI, you first need to create a flow or clusterflow, enter the log page and select outputs/clusteroutputs, and select the backend tool to be sent. In the example, take es as an example, configure the index name, host address, port, and outputs name. If If you have https or ssl, you need to create secrets first.
The created CRD yaml is as follows:
apiVersion: logging.banzaicloud.io/v1beta1kind: Outputmetadata: name: rancher26-es-output namespace: defaultspec: elasticsearch: buffer: timekey: 1m timekey_use_utc: true timekey_wait: 30s host: 172.16.0.14 index_name: rancher26 port: 9200 scheme: http
create flow/clusterflow
After the Outputs are created, start creating flow/clusterflow rules: go to the log page and select flows/clusterflows, click Create, and configure the flow. In the example, the container logs labeled app:nginx will be collected:
- Enter the container label matching rule and name; if there is a host or container that you do not want to collect, you can also select it; you can also add multiple label matching rules on the page;
- Configure the output rules, here select the outputs we just created, here you can select multiple outputs and clusteroutput, or you can select output and clusteroptput at the same time;
- Configure filtering rules, here we collect nginx logs, so use the parser plugin to automatically format nginx rules;
The created CRD yaml is as follows:
apiVersion: logging.banzaicloud.io/v1beta1kind: Flowmetadata: name: nginx-flow namespace: default fields: - nginx-flowspec: filters: - parser: parse: type: nginx remove_key_name_field: true localOutputRefs: - rancher26-es-output match: - select: labels: app: nginx
After the configuration is complete, if there are containers that meet the label requirements in the cluster, they will be automatically collected and sent to es;
- By looking at the configuration of fluentd, you can check whether the outputs are in effect
## 进入rancher-logging-fluentd-0 Pod命令行
cat fluentd/app-config/fluentd.conf
<match **>
@type elasticsearch
@id flow:default:nginx-flow:output:default:rancher26-es-output
exception_backup true
fail_on_putting_template_retry_exceed true
host 172.16.0.14
index_name rancher26
port 9200
reload_connections true
scheme http
ssl_verify true
utc_index true
verify_es_version_at_startup true
<buffer tag,time>
@type file
chunk_limit_size 8MB
path /buffers/flow:default:nginx-flow:output:default:rancher26-es-output.*.buffer
retry_forever true
timekey 1m
timekey_use_utc true
timekey_wait 30s
</buffer>
</match>
- View indexes in elasticsearch
- View log details in kibana
So far, the out-of-the-box use of the new Logging Operator of SUSE Rancher 2.6 is all over. Compared with the previous log collection function, it is indeed much more powerful, the configuration is very flexible, you can customize a variety of filters, and even use Prometheus to expose custom indicators for business operation analysis and other content. However, compared with the previous scenario where you only need to enter the target address, the Logging Operator does increase the learning cost. It is recommended that "Xiaobai" friends should learn from scratch and understand the overall operation structure and logic. In this way It is helpful for locating and troubleshooting when a fault occurs.
Some Tips
- The overall architecture is to collect logs by the fluentbit component of the Daemonst type. The fluentbit component log will have the print output of the log file. When it is found that some logs have not been collected, you can check whether the log of the fluentbit component has found these logs;
- The Fluentd component is responsible for summarizing, processing, and sending. When it is found that the target end does not receive the log, for example, when the ES does not create an index, you can look at the log of the Fluentd component. It should be noted that, as of the time of publication of the article, Fluentd's logs are not printed in the standard output. You need to enter the Pod and execute the command to view:
tail -f fluentd/log/out
- Whether it is the flow/outputs configuration of UI or CRD, it will eventually be converted into the configuration of fluentbit and Fluentd. If you are familiar with these two components, you can enter the Pod to check whether the specific effective configuration is correct when an exception occurs;
- Due to the existence of filters, incorrect filtering and matching conditions may cause the log to fail to be sent. At this time, you need to check the rules configured in the flow.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。