裸机使用 filebeat 采集裸机上的日志文件，并发到 elasticsearch

部署 elasticsearch 和 kibana

因为是 demo 性质的使用，所以就用 docker-compose 简单跑一下

version: "3"
services:
  elk-elasticsearch:
    container_name: elk-elasticsearch
    image: elasticsearch:7.17.1
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      - discovery.type=single-node
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g" # 限制 es 的内存大小，不然会吃掉 10GB+ 的 RAM


  elk-kibana:
    container_name: elk-kibana
    image: kibana:7.17.1
    ports:
      - "5601:5601"
    environment:
      - ELASTICSEARCH_HOSTS=http://192.168.31.245:9200
    volumes:
      - ./kibana.yml:/usr/share/kibana/config/kibana.yml # kibana 其实是支持中文的，只要在 /usr/share/kibana/config/kibana.yml 加一行 i18n.locale: "zh-CN"

/kibana.yml 的内容，主要是为了显示中文

#
# ** THIS IS AN AUTO-GENERATED FILE **
#

# Default Kibana configuration for docker target
server.host: "0.0.0.0"
server.shutdownTimeout: "5s"
elasticsearch.hosts: ["http://192.168.31.174:9200"]
monitoring.ui.container.elasticsearch.enabled: true
i18n.locale: "zh-CN"

elasticsearch 和 kibana 的版本要一致。同样的，后面的 filebeat 也要版本一致！

这次我们都用 7.17.1

如果需要用 dbeaver 之类的工具连接 elasticsearch 遇到 current license is non-compliant for [jdbc]，可以参考：current license is non-compliant for jdbc 解决！

创建索引

一开始，我参考了这个教程：EFK 搭建简单的日志分析系统

不得不说这个教程写的非常的烂，缺斤短两，但即便如此，这已经还是谷歌搜索出来排名靠前

部署好了 elasticsearch 和 kibana 之后，我们需要去 elasticsearch 创建索引

为什么需要创建索引（index）？
index 这个概念在 elasticsearch 中就相当于 mysql 中的 table（elasticsearch 没有 db 这个概念）。主要注意的是，elasticsearch 的 index，创建的时候，只给一个 index name 就可以，不需要定义 schema。
我们需要 index 作为日志的存储容器，不同 index 存储不同 project 或者说不同业务的日志。总不能把所有日志都存在同一个 index 吧！

怎么创建索引？方法有很多，比如：

通过 dbeaver，连接到 elasticsearch，然后在 dbeaver 为 elasticsearch 创建索引
使用 kibana 连接到 elasticsearch，然后在 kibana 为 elasticsearch 创建索引
使用 python、java 等编程语言提供的 elasticsearch client sdk 直接连接到 elasticsearch 创建索引

使用 Python 创建 elasticsearch 的 index

我选择 python，因为我对这个最熟悉

python 的 elasticsearch client sdk 版本也要版本保持一致！
pip install elasticsearch==7.17.1
关于如何使用 elasticsearch client sdk 操作 elasticsearch，可以参考这篇文章: Elasticsearch 基本介绍及其与 Python 的对接实现（这个文章中的 elasticsearch client sdk 比较老，所以一些 api 的参数名等等都发生了变化了）

下面给几个参考代码：

创建 index, 我这里就叫做 ideaboom 了

from elasticsearch import Elasticsearch


es = Elasticsearch("http://192.168.31.245:9200")
result = es.indices.create(index='ideaboom')
print(result)

这里很简单，实际上，你还需要添加一些参数，比如指定 index 的『副本数』、『分片数』。比如你的 elasticsearch 集群有 3 个 node，那『分片数』就应该是 3 的倍数，这样才能更好的利用集群的资源！

使用 Kibana 创建 elasticsearch 的 index

貌似使用 python 的 sdk 创建的 index，不能指定『分片数』和『副本数』，所以，补充一个 Kibana 的版本

图片.png

PUT crawlab
{
    "settings" : {
        "index" : {
            "number_of_shards" : 9, 
            "number_of_replicas" : 1 
        }
    }
}

参考：ElasticSearch 设置分片数量及副本数量

裸机使用 filebeat 采集裸机上的日志文件，并发到 elasticsearch

先安装 filebeat

参考：Filebeat quick start: installation and configuration

在 debian 系中安装 filebeat7.17.1

下载：

wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.17.1-amd64.deb

超级快，都不需要代理

安装：

sudo apt install ./filebeat-7.17.1-amd64.deb

使用 filebeat 采集数据，并发送到 elasticsearch

关于 filebeat 的语法，可以参考：filebeat输出结果到elasticsearch的多个索引

我的 filebeat.yml 内容如下：

filebeat.inputs:
  - type: log
    # Change to true to enable this input configuration.
    enabled: true
    # Paths that should be crawled and fetched. Glob based paths.
    paths:
      # 日志实际路径地址
      - /home/bot/Desktop/coder/ideaboom/test_ELK_EFK/logs/run.log
    fields:
      # 日志标签，区别不同日志，下面建立索引会用到
      type: "ideaboom"
    fields_under_root: true
    # 指定被监控的文件的编码类型，使用plain和utf-8都是可以处理中文日志的
    encoding: utf-8

# setup.kibana:
#   host: "192.168.31.245:5601"
setup.ilm.enabled: false

output.elasticsearch:
  hosts: ["192.168.31.245:9200"]
  indices:
       #索引名称，一般为  ‘服务名称+ip+ --%{+yyyy.MM.dd}’。
    - index: "ideaboom-%{+yyyy.MM.dd}"  
      when.contains:
      #标签，对应日志和索引，和上面对应
        type: "ideaboom"

前面已经在 elasticsearch 创建了名为 ideaboom 的 index

采集的是 run.log 文件，我往这个文件中随便放一点东西：

2022-07-08 02:39:39,746 - 13651 - logger - INFO - total [1] urls, max_id [69013414]
2022-07-08 02:39:39,746 - 13651 - logger - INFO - total [1] urls, max_id [69013415]
2022-07-08 02:39:39,746 - 13651 - logger - INFO - total [1] urls, max_id [69013414]
2022-07-08 02:39:39,746 - 13651 - logger - INFO - total [1] urls, max_id [69013415]
2022-07-08 02:39:39,746 - 13651 - logger - INFO - total [1] urls, max_id [69013414]
2022-07-08 02:39:39,746 - 13651 - logger - INFO - total [1] urls, max_id [69013415]

类似 nginx，我们需要把自己的 yml 映射到 filebeat 的默认配置文件路径

sudo cp /home/bot/Desktop/coder/ideaboom/test_ELK_EFK/filebeat.yml /etc/filebeat/filebeat.yml

关于默认配置文件路径的问题，可以参考：为什么使用了 -c 参数，filebeat 还是去加载 etc 中的 filebeat.yml？

运行 filebeat

sudo filebeat

注意，运行 filebeat 有很多的坑，filebeat 为了所以的数据安全，要求 root 运行 filebeat，如果不是 root 运行，相关的配置文件的 owner 必须要是运行 filebeat user 保持一致等等。不然会遇到 ubuntu debian Exiting: error loading config file: open /etc/filebeat/filebeat.yml: permission denied 类似的报错！
还有一点需要超级注意，filebeat 是没有输出的，就是不管有没有采集到日志，有没有把日志发送到 elasticsearch 他都不会告诉你！（可能有办法可以知道，但是我不知道怎么搞），这就很坑，为了知道 filebeat 有没有正常工作，我还特意开了 wireshark 来抓包！

飞书20220723-005519.jpg