There are actually many solutions for log collection, processing, and analysis. The most common one is the ELK combination, namely: Elasticsearch + Logstash + Kibana , the official website: https://www.elastic.co/products
Later, with the optimization and evolution of the architecture, another lightweight component Filebeat was introduced. Filebeat, like Logstash, is a log collection and processing tool, based on the original Logstash-fowarder source code. Compared with Logstash, filebeat is lighter and takes up less resources.
Compared with Fluentd, Logstash is slightly inferior in performance, so it is gradually replaced by fluentd, and ELK also becomes EFK. EFK is composed of three open source tools, ElasticSearch, Fluentd and Kiabana. The combination of these three open source tools provides a distributed real-time collection and analysis monitoring system for log data.
Introduction to Fluentd
Fluentd is a free and completely open source log management tool, which simplifies the collection, processing, and storage of logs. You do not need to write special log processing scripts during maintenance.
Features Introduction
Use json to record log
Fluentd uses Json to structure data, which allows Fluentd to unify the data processing layer, including log collection, filtering, and output log buffer (multiple sources and targets), which makes downstream data processing much easier.
Plug-in architecture
Fluentd has a flexible plug-in system that allows the community to expand its functions. Our 300+ community contributed plugins can connect dozens of data sources and data outputs. By using plugins, you can make full use of your logs. The open source community has contributed the following storage plugins: MongoDB, Redis, CouchDB, Amazon S3, Amazon SQS, Scribe, 0MQ, AMQP, Delayed, Growl, etc.
Minimum required resources
Fluentd is written in C and Ruby and requires very few system resources. A single core running an instance with 30-40MB of memory can handle 13,000 events per second.
reliability
Fluentd supports data buffering based on memory or files to prevent data loss. Fluentd also has strong fault tolerance and can be set for high availability. 2000+ data-driven companies rely on fluentd to provide better products and services through their understanding and use of log data.
installation
https://docs.fluentd.org/installation/install-by-rpm
Centos system
curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent3.sh | sh
Start service
systemctl start td-agent
Installation methods for more platforms: https://docs.fluentd.org/installation
The default startup is started by the td-agent user. If you need to modify it to another user, use the following method:
[root@centos7 ~]# vim /usr/lib/systemd/system/td-agent.service
[Unit]
Description=td-agent: Fluentd based data collector for Treasure Data
Documentation=https://docs.treasuredata.com/articles/td-agent
After=network-online.target
Wants=network-online.target
[Service]
User=td-agent #用户
Group=td-agent #用户组
LimitNOFILE=65536
Environment=LD_PRELOAD=/opt/td-agent/embedded/lib/libjemalloc.so
Environment=GEM_HOME=/opt/td-agent/embedded/lib/ruby/gems/2.4.0/
Environment=GEM_PATH=/opt/td-agent/embedded/lib/ruby/gems/2.4.0/
Environment=FLUENT_CONF=/etc/td-agent/td-agent.conf
Environment=FLUENT_PLUGIN=/etc/td-agent/plugin
Environment=FLUENT_SOCKET=/var/run/td-agent/td-agent.sock
Environment=TD_AGENT_LOG_FILE=/var/log/td-agent/td-agent.log
Environment=TD_AGENT_OPTIONS=
EnvironmentFile=-/etc/sysconfig/td-agent
PIDFile=/var/run/td-agent/td-agent.pid
RuntimeDirectory=td-agent
Type=forking
ExecStart=/opt/td-agent/embedded/bin/fluentd --log $TD_AGENT_LOG_FILE --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS
ExecStop=/bin/kill -TERM ${MAINPID}
ExecReload=/bin/kill -HUP ${MAINPID}
Restart=always
TimeoutStopSec=120
Profile introduction
Configuration file directory: /etc/td-agent/td-agent.conf
[root@centos7 ~]# cd /etc/td-agent/
[root@centos7 td-agent]# ll
total 4
drwxr-xr-x 2 root root 6 Jun 4 05:15 plugin
-rw-r--r-- 1 root root 2381 Jun 4 05:15 td-agent.conf
You can use the following command to check whether the configuration is correct
[root@centos7 ~]# /opt/td-agent/embedded/bin/fluentd -c /etc/td-agent/td-agent.conf
The configuration file contains the following instructions:
source #输入源,数据的来源
match #确定输出目的地
filter #确定事件处理管道
system #设置系统范围的配置
label #对内部路由的输出和过滤器进行分组
@include #包括其他文件
Official document: https://docs.fluentd.org/configuration
The configuration file also includes data formats supported by fluentd, including the following:
string:字符串,最常见的格式
integer:整数
float:浮点数
size 大小,仅支持整数
<INTEGER>k 或 <INTERGER>K;
<INTEGER>m 或 <INTERGER>M;
<INTEGER>g 或 <INTERGER>G;
<INTEGER>t 或 <INTERGER>T。
time:时间,也只支持整数;
<INTEGER>s 或 <INTERGER>S;
<INTEGER>m 或 <INTERGER>M;
<INTEGER>h 或 <INTERGER>H;
<INTEGER>d 或 <INTERGER>D。
array:按照 JSON array 解析;
hash:按照 JSON object 解析。
Complete configuration file after removing comments
[root@centos7 ~]# egrep -v "^#|^$" /etc/td-agent/td-agent.conf
<match td.*.*>
@type tdlog
@id output_td
apikey YOUR_API_KEY
auto_create_table
<buffer>
@type file
path /var/log/td-agent/buffer/td
</buffer>
<secondary>
@type file
path /var/log/td-agent/failed_records
</secondary>
</match>
<match debug.**>
@type stdout
@id output_stdout
</match>
<source>
@type forward
@id input_forward
</source>
<source>
@type http
@id input_http
port 8888
</source>
<source>
@type debug_agent
@id input_debug_agent
bind 127.0.0.1
port 24230
</source>
The official also gave a simple demo, as follows
[root@centos7 ~]# netstat -utpln |grep ruby
tcp 0 0 0.0.0.0:8888 0.0.0.0:* LISTEN 7008/ruby
tcp 0 0 0.0.0.0:24224 0.0.0.0:* LISTEN 7013/ruby
tcp 0 0 127.0.0.1:24230 0.0.0.0:* LISTEN 7013/ruby
udp 0 0 0.0.0.0:24224 0.0.0.0:* 7013/ruby
Submit a test log through port 8888 and view it
[root@centos7 ~]# curl -X POST -d 'json={"json":"message"}' http://localhost:8888/debug.test
[root@centos7 ~]# tail -n 1 /var/log/td-agent/td-agent.log
2021-06-04 05:56:11.728512891 -0400 debug.test: {"json":"message"}
Plug-in introduction and installation
Plug-in introduction
Commonly used plugins for Fluentd are as follows:
Input:完成输入数据的读取,由source部分配置
常用类型:tail、http、forward、tcp、udp、exec
https://docs.fluentd.org/input
Parser:解析插件,常与输入、输处配合使用,多见于format字段后面
常用类型:ltsv、json、自定义等
https://docs.fluentd.org/parser
Output:完成输出数据的操作,由match部分配置
常用配置:file、forward、copy、stdout、exec
https://docs.fluentd.org/output
filter:过滤插件
常用配置:grep、ignore、record_transformer
https://docs.fluentd.org/filter
Buffer:缓存插件,用于缓存数据
常用配置:file、mem
https://docs.fluentd.org/buffer
Formatter:消息格式化的插件,用于输出,允许用户扩展和重新使用自定义输出格式
常用类型:ltsv、json等
https://docs.fluentd.org/formatter
installation
The official also has a detailed introduction, not too much repetition, it is relatively simple
Document: https://docs.fluentd.org/deployment/plugin-management
Simple application example
Parse system log
[root@centos7 ~]# vim /etc/rsyslog.conf
#增加下面的配置行
*.* @127.0.0.1:5140
Restart service
[root@centos7 ~]# systemctl restart rsyslog
[root@centos7 ~]# ps -ef|grep rsys
root 7492 1 0 06:20 ? 00:00:00 /usr/sbin/rsyslogd -n
root 7497 6893 0 06:20 pts/0 00:00:00 grep --color=auto rsys
Configure Fluentd
[root@centos7 td-agent]# vim td-agent.conf
#增加下面的配置行
<source>
@type syslog
port 5140
tag system
</source>
<match system.**>
@type stdout
</match>
Restart service
[root@centos7 td-agent]# systemctl restart td-agent
View collected logs
2021-06-04 06:40:02.000000000 -0400 system.daemon.info: {"host":"centos7","ident":"systemd","message":"Started Session 119 of user root."}
2021-06-04 06:40:02.000000000 -0400 system.cron.info: {"host":"centos7","ident":"CROND","pid":"7658","message":"(root) CMD (/usr/lib64/sa/sa1 1 1)"}
Collect logs of Nginx server
fluentd configuration file
<source>
@type tail
path /var/log/nginx/access.log
pos_file /var/log/nginx/access.log.pos
tag nginx.access
format /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/
time_format %d/%b/%Y:%H:%M:%S %z
</source>
<match nginx.access>
@type elasticsearch
host localhost
port 9200
# index_name fluentd
flush_interval 10s
logstash_format true
# typename fluentd
</match>
Then create a .pos file in the /var/log/nginx/ directory
[root@centos7 nginx]# touch access.log.pos
[root@centos7 nginx]# chown a+rw access.log.pos
Restart the Fluentd service
[root@centos7 ~]# systemctl restart td-agent
Restart nginx
[root@centos7 ~]# nginx -s reload
Finally, you can also integrate the logs into the Kinbana interface for display. This is no different from the previous ELK operation, creating a fluentd-* index.
The above is today's introduction to the open source log collection system Fluentd brought to you by Migrant Workers. Interested readers can consult the official documents for more in-depth study and exploration, and welcome to exchange experience.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。