最近照着 dapeng 的日志管理系统, 想着自己也来搭建一个EFK试试. 在这里记录一下自己的踩坑经历, 也非常感谢 Ever_00、洋洋_3720 等大佬们的支持和帮助
技术选型
回归正题, 我们选取的依旧是fluent-bit+fluentd+kafka+elasticsearch作为日志系统的方案, 其中dapeng服务已经集成了单结点fluent-bit收集各个docker容器中的日志文件发往fluentd, fluentd做为中转收集所有的日志发往kafak用于削峰填谷,削峰后的数据再经由fluentd发送给elasticsearch进行存储. 此次搭建过程中没有使用Kibana, 而是使用elasticsearch-head来做日志界面展示.
对于非dapeng服务需要自己修改Dockerfile, 将修改过得fluent-bit打包到容器 服务启动时运行sh /opt/fluent-bit/fluent-bit.sh
即可
fluent-bit的日志收集配置
fluent-bit-dapeng.conf
[SERVICE]
Flush 5
Daemon On
Log_Level error
Log_File /fluent-bit/log/fluent-bit.log
Parsers_File parse_dapeng.conf
[INPUT]
Name tail
Path /dapeng-container/logs/*.log
Exclude_Path /dapeng-container/logs/fluent*.log,/dapeng-container/logs/console.log,/dapeng-container/logs/gc*.log
Tag dapeng
Multiline on
Buffer_Chunk_Size 2m
buffer_max_size 30m
Mem_Buf_Limit 32m
DB.Sync Normal
db_count 400
Parser_Firstline dapeng_multiline
db /fluent-bit/db/logs.db
[FILTER]
Name record_modifier
Match *
Record hostname ${soa_container_ip}
Record tag ${serviceName}
[OUTPUT]
Name Forward
Match *
Host fluentd
Port 24224
HostStandby fluentdStandby
PortStandby 24224
在dapeng服务中, 对于每个服务 serviceName
, soa_container_ip
, fluentd
, fluentdStandby
的配置必不可少, 其中的Path, Exclude_Path用来指定哪些日志需要收集, 哪些需要过滤, 可以通过环境变量来修改:
fluentBitLogPath=/dapeng-container/logs/*.log
fluentBitLogPathExclude=/dapeng-container/logs/fluent*.log,/dapeng-container/logs/console.log,/dapeng-container/logs/gc*.log
同时需要将上面的 fluent-bit-dapeng.conf
挂载到 /opt/fluent-bit/etc/fluent-bit.conf
environment:
- serviceName=payment
- container_ip=${host_ip}
- soa_container_port=${payment_port}
- soa_container_ip=${host_ip}
- host_ip=${host_ip}
- soa_service_timeout=60000
- JAVA_OPTS=-Dname=payment -Dfile.encoding=UTF-8 -Dsun.jun.encoding=UTF-8 -Dio.netty.leakDetectionLevel=advanced
- kafka_consumer_host=${kafka_host_ip}:9092
- kafka_producer_host=${kafka_host_ip}:9092
env_file:
- .envs/application.env
- .envs/common.env
volumes:
- "/data/logs/payment:/dapeng-container/logs"
- "/data/var/fluent/order/:/fluent-bit/db/"
- "./config/fluent-bit-dapeng.conf:/opt/fluent-bit/etc/fluent-bit.conf"
- "/data/var/shm:/data/shm"
ports:
- "${payment_port}:${payment_port}"
extra_hosts:
- "fluentd:${fluentd_host}"
- "fluentdStandby:${fluentdStandby_host}"
- "db-master:${mysql_host_ip}"
- "soa_zookeeper:${zookeeper_host_ip}"
- "redis_host:${redis_host_ip}"
同时在dapeng服务容器中可以看到parse_dapeng.conf如下
[PARSER]
Name dapeng_multiline
Format regex
Regex (?<logtime>\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2} \d{1,3}) (?<threadPool>.*) (?<level>.*) \[(?<sessionTid>.*)\] - (?<message>.*)
其中的Regex就是对日志进行正则匹配解析出我们需要的信息, 例如其中的logtime, message等
我们也可以通过环境变量来设置解析表达式
fluentbitParserRegex=(?<logtime>^\d{2}-\d{2} \d{2}:\d{2}:\d{2} \d{3}) (?<threadPool>[^ ]+|Check idle connection Thread) (?<level>[^ ]+) \[(?<sessionTid>\w*)\] - (?<message>.*)
注意: 虽然dapeng集成了fluent-bit, 但是默认是不开启的, 需要修改环境变量:fluent_bit_enable=true
fluentd的镜像
首先是准备fluentd的镜像, 以下是fluentd的Dockerfile
FROM fluent/fluentd:v1.2
#增加es插件, kafka插件
RUN fluent-gem install fluent-plugin-elasticsearch
RUN fluent-gem install fluent-plugin-kafka
CMD exec fluentd -c /fluentd/etc/${FLUENTD_CONF} -p /fluentd/plugins $FLUENTD_OPT
- 打包image (注意在Dockerfile所在目录,
.
即代表Dockerfile的上下文)docker build docker.****.com:80/basic/fluentd:v1.2 .
- push到docker私服
docker push docker.****.com:80/basic/fluentd:v1.2
- dc-all.yml文件中配置fluentd (dapeng 的
source-compose
封装了docker-compose
)
fluentd:
container_name: fluentd
image: docker.****.com:80/basic/fluentd:v1.2
restart: on-failure:3
volumes:
- /data/var/fluentd/log:/fluentd/log
- /data/var/fluentd/etc:/fluentd/etc
environment:
- LANG=zh_CN.UTF-8
- TZ=CST-8
ports:
- "24224:24224"
labels:
- project.source=
- project.extra=public-image
- project.depends=
- project.owner=
对于fluentd的相关配置配置在/data/var/fluentd/etc下 fluent.conf
配置fluentd的转发器
理论上需要开启两个fluentd, 分别做下面1和2的工作, 这里我们先合并到一个服务当中
# 1. 收集日志发送到kafka, topic为efk
# 开启8个工作线程, 端口从24225往后累加
<system>
log_level error
flush_thread_count 8
workers 8
</system>
<source>
@type forward
port 24224
</source>
<source>
@type monitor_agent
port 24225
</source>
<match dapeng>
@type kafka_buffered
brokers kafak服务器地址:9092
topic_key efk
buffer_type file
buffer_path /tmp/buffer
flush_interval 5s
default_topic efk
output_data_type json
compression_codec gzip
max_send_retries 3
required_acks -1
discard_kafka_delivery_failed true
</match>
# 1. 收集日志发送到kafka, topic为efk 结尾
# 2. 消费kafka中的日志消息发送到elasticsearch, topic为efk, group为efk-consumer
#<system>
# log_level error
# flush_thread_count 2
# workers 2
#</system>
#<source>
# @type monitor_agent
# port 24225
#</source>
<source>
@type kafka_group
brokers kafka服务器地址:9092
consumer_group efk-consumer
topics efk
format json
start_from_beginning false
max_wait_time 5
max_bytes 1500000
</source>
<match>
@type elasticsearch
hosts elasticsearch服务器地址:9200
index_name dapeng_log_index
type_name dapeng_log
#content_type application/x-ndjson
buffer_type file
buffer_path /tmp/buffer_file
buffer_chunk_limit 10m
buffer_queue_limit 512
flush_mode interval
flush_interval 5s
request_timeout 5s
flush_thread_count 2
reload_on_failure true
resurrect_after 30s
reconnect_on_error true
with_transporter_log true
logstash_format true
logstash_prefix dapeng_log_index
template_name dapeng_log_index
template_file /fluentd/etc/template.json
num_threads 2
utc_index false
</match>
# 2. 消费kafka中的日志消息发送到elasticsearch 结尾
template.json
配置elasticsearch关于索引创建的模板
{
"template": "dapeng_log_index-*",
"mappings": {
"dapeng_log": {
"properties": {
"logtime": {
"type": "date",
"format": "MM-dd HH:mm:ss SSS"
},
"threadPool": {
"type": "keyword",
"norms": false,
"index_options": "docs"
},
"level": {
"type": "keyword",
"norms": false,
"index_options": "docs"
},
"tag": {
"type": "keyword",
"norms": false,
"index_options": "docs"
},
"message": {
"type": "keyword",
"ignore_above": 2048,
"norms": false,
"index_options": "docs"
},
"hostname": {
"type": "keyword",
"norms": false,
"index_options": "docs"
},
"sessionTid": {
"type": "keyword",
"norms": false,
"index_options": "docs"
},
"log": {
"type": "keyword",
"norms": false,
"index_options": "docs"
}
}
}
},
"settings": {
"index": {
"max_result_window": "100000000",
"number_of_shards": "3",
"number_of_replicas": "1",
"codec": "best_compression",
"translog": {
"sync_interval": "60s",
"durability": "async",
"flush_threshold_size": "1024mb"
},
"merge":{
"policy":{
"max_merged_segment": "2gb"
}
},
"refresh_interval": "10s"
}
},
"warmers": {}
}
elasticsearch的镜像准备
- dc-all.yml文件中关于elasticsearch的配置
elasticsearch:
image: elasticsearch:6.7.1
container_name: elasticsearch
restart: on-failure:3
environment:
- LANG=zh_CN.UTF-8
- TZ=CST-8
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
volumes:
- /data/var/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
ports:
- "9200:9200"
- "9300:9300"
labels:
- project.source=
- project.extra=public-image
- project.depends=
- project.owner=
elasticsearch.yml
配置启用cors跨域访问, 就可以通过elasticsearch-head来访问elasticsearch了
cluster.name: "docker-cluster"
network.host: 0.0.0.0
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-methods: OPTIONS, HEAD, GET, POST, PUT, DELETE
http.cors.allow-headers: "X-Requested-With, Content-Type, Content-Length, X-User"
elasticsearch启动报错:
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
即: elasticsearch用户拥有的内存权限太小,至少需要262144
sudo vi /etc/sysctl.conf
文件最后添加一行: vm.max_map_count=262144
然后 sudo sysctl -p
重新加载配置, 再重启elasticsearch即可
elasticsearch-head的镜像准备
- 首先clone elasticsearch-head 项目到/data/workspace目录下
git clone git://github.com/mobz/elasticsearch-head.git
- dc-all.yml文件中配置elasticsearch-head
elasticsearch-head:
image: mobz/elasticsearch-head:5
container_name: elasticsearch-head
restart: on-failure:3
environment:
- LANG=zh_CN.UTF-8
- TZ=CST-8
volumes:
- /data/workspace/elasticsearch-head/Gruntfile.js:/usr/src/app/Gruntfile.js
- /data/workspace/elasticsearch-head/_site/app.js:/usr/src/app/_site/app.js
ports:
- "9100:9100"
labels:
- project.source=
- project.extra=public-image
- project.depends=
- project.owner=
对于 Gruntfile.js
需要改动97行如下:
connect: {
server: {
options: {
hostname: '0.0.0.0',
port: 9100,
base: '.',
keepalive: true
}
}
}
对于 app.js
需要改动4379行: 修改localhost为elasticsearch集群地址
/** 修改localhost为elasticsearch集群地址,Docker部署中,一般是elasticsearch宿主机地址 */
this.base_uri = this.config.base_uri || this.prefs.get("app-base_uri") || "http://elasticsearch服务器地址:9200/";
服务启动
将以上的服务全部启动之后, 访问 http://elasticsearch-head服务器地址:9100/
, 可以看到如下界面(集群健康值为黄色, 是因为我没有做备份)
当然, 刚开始因为没有创建索引, 所以是看不到日志的, 我们可以加一个定时任务每天自动创建索引并处理掉之前的索引:autoIndex4DapengLog.sh
: 定期保存7天的索引, 打开最近三天的索引, 创建第二天的索引
#!/bin/bash
#
# 索引关闭及删除
# @date 2018年05月10日18:00:00
# @description Copyright (c) 2015, github.com/dapeng-soa All Rights Reserved.
date=`date -d "2 days ago" +%Y.%m.%d`
date1=`date -d "6 days ago" +%Y.%m.%d`
echo $date
echo $date1
#关闭索引
curl -H "Content-Type: application/json" -XPOST http://elasticsearch服务器地址:9200/dapeng_log_index-$date/_close
#删除索引
curl -H "Content-Type: application/json" -XDELETE "http://elasticsearch服务器地址:9200/dapeng_log_index-$date1"
#添加索引
tomorrow=`date -d tomorrow +%Y.%m.%d`
# 需要创建索引的elasticsearch服务器列表
ipList=(elasticsearch服务器地址:9200)
for i in ${ipList[@]};do
curl -H "Content-Type: application/json" -XPUT http://$i/dapeng_log_index-$tomorrow -d'
{
"mappings": {
"_default_": {
"_all": {
"enabled": "false"
}
},
"dapeng_log": {
"properties": {
"logtime": {
"type": "date",
"format": "MM-dd HH:mm:ss SSS"
},
"threadPool": {
"type": "keyword",
"norms": false,
"index_options": "docs"
},
"level": {
"type": "keyword",
"norms": false,
"index_options": "docs"
},
"tag": {
"type": "keyword",
"norms": false,
"index_options": "docs"
},
"message": {
"type": "keyword",
"ignore_above": 2048,
"norms": false,
"index_options": "docs"
},
"hostname": {
"type": "keyword",
"norms": false,
"index_options": "docs"
},
"sessionTid": {
"type": "keyword",
"norms": false,
"index_options": "docs"
},
"log": {
"type": "keyword",
"norms": false,
"index_options": "docs"
}
}
}
},
"settings": {
"index": {
"max_result_window": "100000000",
"number_of_shards": "3",
"number_of_replicas": "1",
"codec": "best_compression",
"translog": {
"sync_interval": "60s",
"durability": "async",
"flush_threshold_size": "1024mb"
},
"merge":{
"policy":{
"max_merged_segment": "2gb"
}
},
"refresh_interval": "10s"
}
},
"warmers": {}
}'
response=`curl -H "Content-Type: application/json" -s "http://$i/_cat/indices?v" |grep open | grep dapeng_log_index-$tomorrow |wc -l`
echo -e "\n"
if [ "$response" == 1 ];then
break
else
continue
fi
done;
crontab -e
将此命令加入定时任务, 每天23:00定时执行, 创建第二天的索引:
0 23 * * * (cd /data/workspace/elasticsearch-head/; sh autoIndex4DapengLog.sh) > /data/workspace/elasticsearch-head/autoIndex4DapengLog.log
现在就可以在 查看日志数据了
如果想去除elasticsearch自带的一些字段信息(例如_index, _id, _score等)显示在表格中, 需要修改elasticsearch-head/_site/app.js, 改动2038行如下:
_data_handler: function(store) {
// 去除结果集中无用字段
var customFields = ["logtime", "hostname", "tag", "sessionTid", "threadPool", "level", "message", "log"];
store.columns = customFields;
//store.columns = store.columns.filter(i => customFields.indexOf(i) > -1);
this.tools.text(store.summary);
this.headers.empty().append(this._header_template(store.columns));
this.body.empty().append(this._body_template(store.data, store.columns));
this._reflow();
},
注意 customFields
中的字段和创建索引时的字段一致, 且其中部分字段是由fluent-bit解析得到的
todo
- 当然 这边图中也可以看到一些字段为空的数据, 将其中的log字段的值拷贝到 https://regex101.com/ 网站进行解析, 发现和之前的正则解析
Regex
不匹配, 所以部分字段没有解析到值, 无法解析的部分内容就在log中, 后续需要将这些内容过滤掉 - 基于现有的日志系统,开发生产故障实时告警系统
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。