8

There are actually many solutions for log collection, processing, and analysis. The most common one is the ELK combination, namely: Elasticsearch + Logstash + Kibana , the official website: https://www.elastic.co/products

image.png

Later, with the optimization and evolution of the architecture, another lightweight component Filebeat was introduced. Filebeat, like Logstash, is a log collection and processing tool, based on the original Logstash-fowarder source code. Compared with Logstash, filebeat is lighter and takes up less resources.

Compared with Fluentd, Logstash is slightly inferior in performance, so it is gradually replaced by fluentd, and ELK also becomes EFK. EFK is composed of three open source tools, ElasticSearch, Fluentd and Kiabana. The combination of these three open source tools provides a distributed real-time collection and analysis monitoring system for log data.

Introduction to Fluentd

Fluentd is a free and completely open source log management tool, which simplifies the collection, processing, and storage of logs. You do not need to write special log processing scripts during maintenance.

image.png

Features Introduction

Use json to record log

Fluentd uses Json to structure data, which allows Fluentd to unify the data processing layer, including log collection, filtering, and output log buffer (multiple sources and targets), which makes downstream data processing much easier.

Plug-in architecture

Fluentd has a flexible plug-in system that allows the community to expand its functions. Our 300+ community contributed plugins can connect dozens of data sources and data outputs. By using plugins, you can make full use of your logs. The open source community has contributed the following storage plugins: MongoDB, Redis, CouchDB, Amazon S3, Amazon SQS, Scribe, 0MQ, AMQP, Delayed, Growl, etc.

Minimum required resources

Fluentd is written in C and Ruby and requires very few system resources. A single core running an instance with 30-40MB of memory can handle 13,000 events per second.

reliability

Fluentd supports data buffering based on memory or files to prevent data loss. Fluentd also has strong fault tolerance and can be set for high availability. 2000+ data-driven companies rely on fluentd to provide better products and services through their understanding and use of log data.

installation

https://docs.fluentd.org/installation/install-by-rpm

Centos system

curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent3.sh | sh

image.png

Start service

systemctl start td-agent

image.png

Installation methods for more platforms: https://docs.fluentd.org/installation

The default startup is started by the td-agent user. If you need to modify it to another user, use the following method:

[root@centos7 ~]# vim /usr/lib/systemd/system/td-agent.service

[Unit]
Description=td-agent: Fluentd based data collector for Treasure Data
Documentation=https://docs.treasuredata.com/articles/td-agent
After=network-online.target
Wants=network-online.target

[Service]
User=td-agent  #用户
Group=td-agent #用户组
LimitNOFILE=65536
Environment=LD_PRELOAD=/opt/td-agent/embedded/lib/libjemalloc.so
Environment=GEM_HOME=/opt/td-agent/embedded/lib/ruby/gems/2.4.0/
Environment=GEM_PATH=/opt/td-agent/embedded/lib/ruby/gems/2.4.0/
Environment=FLUENT_CONF=/etc/td-agent/td-agent.conf
Environment=FLUENT_PLUGIN=/etc/td-agent/plugin
Environment=FLUENT_SOCKET=/var/run/td-agent/td-agent.sock
Environment=TD_AGENT_LOG_FILE=/var/log/td-agent/td-agent.log
Environment=TD_AGENT_OPTIONS=
EnvironmentFile=-/etc/sysconfig/td-agent
PIDFile=/var/run/td-agent/td-agent.pid
RuntimeDirectory=td-agent
Type=forking
ExecStart=/opt/td-agent/embedded/bin/fluentd --log $TD_AGENT_LOG_FILE --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS
ExecStop=/bin/kill -TERM ${MAINPID}
ExecReload=/bin/kill -HUP ${MAINPID}
Restart=always
TimeoutStopSec=120

Profile introduction

Configuration file directory: /etc/td-agent/td-agent.conf

[root@centos7 ~]# cd /etc/td-agent/
[root@centos7 td-agent]# ll
total 4
drwxr-xr-x 2 root root    6 Jun  4 05:15 plugin
-rw-r--r-- 1 root root 2381 Jun  4 05:15 td-agent.conf

You can use the following command to check whether the configuration is correct

[root@centos7 ~]# /opt/td-agent/embedded/bin/fluentd -c /etc/td-agent/td-agent.conf 

The configuration file contains the following instructions:

source   #输入源,数据的来源
match    #确定输出目的地
filter   #确定事件处理管道
system   #设置系统范围的配置 
label    #对内部路由的输出和过滤器进行分组  
@include #包括其他文件

Official document: https://docs.fluentd.org/configuration

The configuration file also includes data formats supported by fluentd, including the following:

string:字符串,最常见的格式
integer:整数
float:浮点数
size 大小,仅支持整数
  <INTEGER>k 或 <INTERGER>K;
  <INTEGER>m 或 <INTERGER>M;
  <INTEGER>g 或 <INTERGER>G;
  <INTEGER>t 或 <INTERGER>T。
time:时间,也只支持整数;
  <INTEGER>s 或 <INTERGER>S;
  <INTEGER>m 或 <INTERGER>M;
  <INTEGER>h 或 <INTERGER>H;
  <INTEGER>d 或 <INTERGER>D。
array:按照 JSON array 解析;
hash:按照 JSON object 解析。

Complete configuration file after removing comments

[root@centos7 ~]# egrep -v "^#|^$" /etc/td-agent/td-agent.conf
<match td.*.*>
  @type tdlog
  @id output_td
  apikey YOUR_API_KEY
  auto_create_table
  <buffer>
    @type file
    path /var/log/td-agent/buffer/td
  </buffer>
  <secondary>
    @type file
    path /var/log/td-agent/failed_records
  </secondary>
</match>
<match debug.**>
  @type stdout
  @id output_stdout
</match>
<source>
  @type forward
  @id input_forward
</source>
<source>
  @type http
  @id input_http
  port 8888
</source>
<source>
  @type debug_agent
  @id input_debug_agent
  bind 127.0.0.1
  port 24230
</source>

The official also gave a simple demo, as follows

[root@centos7 ~]# netstat -utpln |grep ruby
tcp        0      0 0.0.0.0:8888            0.0.0.0:*               LISTEN      7008/ruby           
tcp        0      0 0.0.0.0:24224           0.0.0.0:*               LISTEN      7013/ruby           
tcp        0      0 127.0.0.1:24230         0.0.0.0:*               LISTEN      7013/ruby           
udp        0      0 0.0.0.0:24224           0.0.0.0:*                           7013/ruby 

Submit a test log through port 8888 and view it

[root@centos7 ~]# curl -X POST -d 'json={"json":"message"}' http://localhost:8888/debug.test
[root@centos7 ~]# tail -n 1 /var/log/td-agent/td-agent.log
2021-06-04 05:56:11.728512891 -0400 debug.test: {"json":"message"}

Plug-in introduction and installation

Plug-in introduction

Commonly used plugins for Fluentd are as follows:

Input:完成输入数据的读取,由source部分配置
常用类型:tail、http、forward、tcp、udp、exec
https://docs.fluentd.org/input

Parser:解析插件,常与输入、输处配合使用,多见于format字段后面
常用类型:ltsv、json、自定义等
https://docs.fluentd.org/parser

Output:完成输出数据的操作,由match部分配置
常用配置:file、forward、copy、stdout、exec
https://docs.fluentd.org/output

filter:过滤插件
常用配置:grep、ignore、record_transformer
https://docs.fluentd.org/filter

Buffer:缓存插件,用于缓存数据
常用配置:file、mem
https://docs.fluentd.org/buffer

Formatter:消息格式化的插件,用于输出,允许用户扩展和重新使用自定义输出格式
常用类型:ltsv、json等
https://docs.fluentd.org/formatter
installation

The official also has a detailed introduction, not too much repetition, it is relatively simple

image.png

Document: https://docs.fluentd.org/deployment/plugin-management

Simple application example

Parse system log
[root@centos7 ~]# vim /etc/rsyslog.conf 
#增加下面的配置行
*.* @127.0.0.1:5140

Restart service

[root@centos7 ~]# systemctl restart rsyslog
[root@centos7 ~]# ps -ef|grep rsys
root       7492      1  0 06:20 ?        00:00:00 /usr/sbin/rsyslogd -n
root       7497   6893  0 06:20 pts/0    00:00:00 grep --color=auto rsys

Configure Fluentd

[root@centos7 td-agent]# vim td-agent.conf
#增加下面的配置行
<source>
  @type syslog
  port 5140
  tag system
</source>

<match system.**>
  @type stdout
</match>

Restart service

[root@centos7 td-agent]# systemctl restart td-agent

View collected logs

2021-06-04 06:40:02.000000000 -0400 system.daemon.info: {"host":"centos7","ident":"systemd","message":"Started Session 119 of user root."}
2021-06-04 06:40:02.000000000 -0400 system.cron.info: {"host":"centos7","ident":"CROND","pid":"7658","message":"(root) CMD (/usr/lib64/sa/sa1 1 1)"}
Collect logs of Nginx server

fluentd configuration file

<source>
  @type tail
  path /var/log/nginx/access.log
  pos_file /var/log/nginx/access.log.pos

  tag nginx.access
  format /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/
  time_format %d/%b/%Y:%H:%M:%S %z
</source>
<match nginx.access>
  @type elasticsearch
  host localhost
  port 9200
#  index_name fluentd
  flush_interval 10s
  logstash_format true
#  typename fluentd
</match>

Then create a .pos file in the /var/log/nginx/ directory

[root@centos7 nginx]# touch access.log.pos
[root@centos7 nginx]# chown a+rw access.log.pos

Restart the Fluentd service

[root@centos7 ~]# systemctl restart td-agent

Restart nginx

[root@centos7 ~]# nginx -s reload

Finally, you can also integrate the logs into the Kinbana interface for display. This is no different from the previous ELK operation, creating a fluentd-* index.

image.png

The above is today's introduction to the open source log collection system Fluentd brought to you by Migrant Workers. Interested readers can consult the official documents for more in-depth study and exploration, and welcome to exchange experience.


民工哥
26.4k 声望56.7k 粉丝

10多年IT职场老司机的经验分享,坚持自学一路从技术小白成长为互联网企业信息技术部门的负责人。2019/2020/2021年度 思否Top Writer