Bare metal uses filebeat to collect log files on bare metal and send them to elasticsearch

Deploy elasticsearch and kibana

Because it is used in the nature of demo , so use docker-compose simply run

 version: "3"
services:
  elk-elasticsearch:
    container_name: elk-elasticsearch
    image: elasticsearch:7.17.1
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      - discovery.type=single-node
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g" # 限制 es 的内存大小，不然会吃掉 10GB+ 的 RAM


  elk-kibana:
    container_name: elk-kibana
    image: kibana:7.17.1
    ports:
      - "5601:5601"
    environment:
      - ELASTICSEARCH_HOSTS=http://192.168.31.245:9200
    volumes:
      - ./kibana.yml:/usr/share/kibana/config/kibana.yml # kibana 其实是支持中文的，只要在 /usr/share/kibana/config/kibana.yml 加一行 i18n.locale: "zh-CN"

/kibana.yml , mainly to display Chinese

 #
# ** THIS IS AN AUTO-GENERATED FILE **
#

# Default Kibana configuration for docker target
server.host: "0.0.0.0"
server.shutdownTimeout: "5s"
elasticsearch.hosts: ["http://192.168.31.174:9200"]
monitoring.ui.container.elasticsearch.enabled: true
i18n.locale: "zh-CN"

The versions of elasticsearch and kibana should be the same. Similarly, the following filebeat also have the same version!

This time we all use 7.17.1

If you need to use tools such as dbeaver to connect to elasticsearch and encounter current license is non-compliant for [jdbc] , you can refer to: current license is non-compliant for jdbc solution!

create index

At the beginning, I referred to this tutorial: EFK builds a simple log analysis system

I have to say that this tutorial is very badly written, but even so, it is already ranked in the top of Google search.

After deploying elasticsearch and kibana, we need to go to elasticsearch to create an index

Why do you need to create an index?
index This concept is equivalent to the table in mysql in elasticsearch (elasticsearch does not have the concept of db). The main thing to note is that when the index of elasticsearch is created, only one index name is enough, and there is no need to define a schema.
We need index as a storage container for logs, and different indexes store logs of different projects or different businesses. You can't keep all logs in the same index!

How to create an index? There are many ways, such as:

Connect to elasticsearch through dbeaver , and then create an index for elasticsearch in dbeaver
Use kibana to connect to elasticsearch, and then create an index for elasticsearch in kibana
Use python , java and other programming languages to provide elasticsearch client sdk directly connect to elasticsearch to create an index

Use Python to create an elasticsearch index

I choose python as I am most familiar with this

The elasticsearch client sdk version of python should also be the same version!
 pip install elasticsearch==7.17.1
For how to use the elasticsearch client sdk to operate elasticsearch, you can refer to this article: Basic introduction to Elasticsearch and its implementation with Python (the elasticsearch client sdk in this article is relatively old, so the parameter names of some apis have changed)

Here are some reference codes:

Create index , I will call it ideaboom here

 from elasticsearch import Elasticsearch


es = Elasticsearch("http://192.168.31.245:9200")
result = es.indices.create(index='ideaboom')
print(result)

This is very simple, in fact, you need to add some parameters, such as specifying the "number of copies" and "number of shards" of the index. For example, if your elasticsearch cluster has 3 nodes, the "number of shards" should be a multiple of 3, so that the resources of the cluster can be better utilized!

Create elasticsearch index using Kibana

It seems that the index created by python's sdk cannot specify the "number of fragments" and "number of copies", so add a version of Kibana

图片.png

 PUT crawlab
{
    "settings" : {
        "index" : {
            "number_of_shards" : 9, 
            "number_of_replicas" : 1 
        }
    }
}

Reference: ElasticSearch setting the number of shards and replicas

Bare metal uses filebeat to collect log files on bare metal and send them to elasticsearch

Install filebeat first

Reference: Filebeat quick start: installation and configuration

Install filebeat7.17.1 in debian system

download:

 wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.17.1-amd64.deb

Super fast, no proxy needed

Install:

 sudo apt install ./filebeat-7.17.1-amd64.deb

Use filebeat to collect data and send it to elasticsearch

For the syntax of filebeat, you can refer to: filebeat outputs results to multiple indexes of elasticsearch

My filebeat.yml content is as follows:

 filebeat.inputs:
  - type: log
    # Change to true to enable this input configuration.
    enabled: true
    # Paths that should be crawled and fetched. Glob based paths.
    paths:
      # 日志实际路径地址
      - /home/bot/Desktop/coder/ideaboom/test_ELK_EFK/logs/run.log
    fields:
      # 日志标签，区别不同日志，下面建立索引会用到
      type: "ideaboom"
    fields_under_root: true
    # 指定被监控的文件的编码类型，使用plain和utf-8都是可以处理中文日志的
    encoding: utf-8

# setup.kibana:
#   host: "192.168.31.245:5601"
setup.ilm.enabled: false

output.elasticsearch:
  hosts: ["192.168.31.245:9200"]
  indices:
       #索引名称，一般为  ‘服务名称+ip+ --%{+yyyy.MM.dd}’。
    - index: "ideaboom-%{+yyyy.MM.dd}"  
      when.contains:
      #标签，对应日志和索引，和上面对应
        type: "ideaboom"

An index named ideaboom has been created in elasticsearch earlier

What is collected is the run.log file, and I put a few things in this file:

 2022-07-08 02:39:39,746 - 13651 - logger - INFO - total [1] urls, max_id [69013414]
2022-07-08 02:39:39,746 - 13651 - logger - INFO - total [1] urls, max_id [69013415]
2022-07-08 02:39:39,746 - 13651 - logger - INFO - total [1] urls, max_id [69013414]
2022-07-08 02:39:39,746 - 13651 - logger - INFO - total [1] urls, max_id [69013415]
2022-07-08 02:39:39,746 - 13651 - logger - INFO - total [1] urls, max_id [69013414]
2022-07-08 02:39:39,746 - 13651 - logger - INFO - total [1] urls, max_id [69013415]

Similar to nginx, we need to map our yml to the default configuration file path of filebeat

 sudo cp /home/bot/Desktop/coder/ideaboom/test_ELK_EFK/filebeat.yml /etc/filebeat/filebeat.yml

For the default configuration file path, you can refer to: Why is the -c parameter used, does filebeat still load filebeat.yml in etc?

run filebeat

 sudo filebeat

Note that there are many pits in running filebeat. For all data security, filebeat requires root to run filebeat. If it is not running as root, the owner of the relevant configuration file must be the same as the running filebeat user, etc. Otherwise, you will encounter ubuntu debian Exiting: error loading config file: open /etc/filebeat/filebeat.yml: permission denied similar error!
Another point to pay attention to is that filebeat has no output, that is, it will not tell you whether the log is collected or not, whether the log is sent to elasticsearch or not! (There may be a way to find out, but I don't know how to do it), this is very pitiful. In order to know whether filebeat is working normally, I also specially opened wireshark to capture packets!

飞书20220723-005519.jpg