Where is Loki, which does not index the full text content, excels? It can occupy a part of the log monitoring field

Video tutorial

tutorial address

Summarize the advantages of loki

1. Low index overhead

The biggest difference between loki and es is that loki only indexes tags and not content.
This can greatly reduce the index resource overhead (es whether you check or not, the huge index overhead must be borne at all times)

2. Concurrent query + use cache

At the same time, in order to compensate for the query slowdown caused by the lack of full-text indexing, Loki will break the query into smaller fragments, which can be understood as concurrent grep
Support index, chunk and result cache speed at the same time

3. Use the same label as prometheus to connect to alertmanager

The label consistency between Loki and Prometheus is one of Loki's super powers

4. Use grafana as the front end, avoid switching back and forth between kibana and grafana

Architecture description

Address https://grafana.com/docs/loki/latest/architecture/

Architecture description

Component description

promtail as a collector, analogous to filebeat

Loki is equivalent to the server side, analogous to es

The loki process contains four roles

querier
ingester log storage
query-frontend
distributor write distributor

You can specify the running role through the -target parameter of the loki binary

read path

The querier receives HTTP/1 data requests.
The querier passes the query to all ingesters to request the data in the memory.
The receiver receives the read request and returns the data that matches the query (if any).
If no receiver returns data, the querier lazily loads the data from the backing store and executes the query on it.
The querier will iterate all the received data and perform deduplication, thereby returning the final data set via the HTTP/1 connection.

write path

The distribution server receives an HTTP/1 request to store streaming data.
Each stream is hashed using a hash ring.
The dispatcher sends each stream to the appropriate inester and its copy (based on the configured replication factor).
Each instance will create a block for the data of the stream or append it to an existing block. Each tenant and each tag set block is unique.
The distributor connects via HTTP/1 and responds with a success code.

Install using localized mode

Download promtail and loki binary

wget  https://github.com/grafana/loki/releases/download/v2.2.1/loki-linux-amd64.zip

wget https://github.com/grafana/loki/releases/download/v2.2.1/promtail-linux-amd64.zip

Find a linux machine for testing

Install promtail


mkdir /opt/app/{promtail,loki} -pv 

# promtail配置文件
cat <<EOF> /opt/app/promtail/promtail.yaml
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /var/log/positions.yaml # This location needs to be writeable by promtail.

client:
  url: http://localhost:3100/loki/api/v1/push

scrape_configs:
 - job_name: system
   pipeline_stages:
   static_configs:
   - targets:
      - localhost
     labels:
      job: varlogs  # A `job` label is fairly standard in prometheus and useful for linking metrics and logs.
      host: yourhost # A `host` label will help identify logs from this machine vs others
      __path__: /var/log/*.log  # The path matching uses a third party library: https://github.com/bmatcuk/doublestar
EOF

# service文件

cat <<EOF >/etc/systemd/system/promtail.service
[Unit]
Description=promtail server
Wants=network-online.target
After=network-online.target

[Service]
ExecStart=/opt/app/promtail/promtail -config.file=/opt/app/promtail/promtail.yaml
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=promtail
[Install]
WantedBy=default.target
EOF

systemctl daemon-reload
systemctl restart promtail 
systemctl status promtail

Install loki


mkdir /opt/app/{promtail,loki} -pv 

# promtail配置文件
cat <<EOF> /opt/app/loki/loki.yaml
auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

ingester:
  wal:
    enabled: true
    dir: /opt/app/loki/wal
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 1h       # Any chunk not receiving new logs in this time will be flushed
  max_chunk_age: 1h           # All chunks will be flushed when they hit this age, default is 1h
  chunk_target_size: 1048576  # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
  chunk_retain_period: 30s    # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
  max_transfer_retries: 0     # Chunk transfers disabled

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /opt/app/loki/boltdb-shipper-active
    cache_location: /opt/app/loki/boltdb-shipper-cache
    cache_ttl: 24h         # Can be increased for faster performance over longer query periods, uses more disk space
    shared_store: filesystem
  filesystem:
    directory: /opt/app/loki/chunks

compactor:
  working_directory: /opt/app/loki/boltdb-shipper-compactor
  shared_store: filesystem

limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 168h

chunk_store_config:
  max_look_back_period: 0s

table_manager:
  retention_deletes_enabled: false
  retention_period: 0s

ruler:
  storage:
    type: local
    local:
      directory: /opt/app/loki/rules
  rule_path: /opt/app/loki/rules-temp
  alertmanager_url: http://localhost:9093
  ring:
    kvstore:
      store: inmemory
  enable_api: true
EOF

# service文件

cat <<EOF >/etc/systemd/system/loki.service
[Unit]
Description=loki server
Wants=network-online.target
After=network-online.target

[Service]
ExecStart=/opt/app/loki/loki -config.file=/opt/app/loki/loki.yaml
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=loki
[Install]
WantedBy=default.target
EOF

systemctl daemon-reload
systemctl restart loki 
systemctl status loki

Configure loki data source on grafana

Configure view log on grafana explore

View log rate({job="message"} |="kubelet"
Calculate qps rate({job="message"} |="kubelet" [1m])

Tab only

It has been mentioned many times before that the biggest difference between loki and es is that loki only indexes tags and does not index content.
Let's take an example

Static label matching mode

Take a simple promtail configuration example

Configuration interpretation

scrape_configs:
 - job_name: system
   pipeline_stages:
   static_configs:
   - targets:
      - localhost
     labels:
      job: message
      __path__: /var/log/messages

The above configuration represents starting a log collection task
This task has 1 fixed label job="syslog"
The collection log path is /var/log/messages , and it will be a fixed label named filename
You can see the target information page similar to prometheus on the web page of promtail

When querying, you can use the same tag matching statement as prometheus to query

{job="syslog"}

scrape_configs:
 - job_name: system
 pipeline_stages:
 static_configs:
 - targets:
    - localhost
   labels:
    job: syslog
    __path__: /var/log/syslog
 - job_name: system
 pipeline_stages:
 static_configs:
 - targets:
    - localhost
   labels:
    job: apache
    __path__: /var/log/apache.log

If we configure two jobs, we can use {job=~”apache|syslog”} for multi-job matching
It also supports regular and regular non-matching

Features of tag matching mode

principle

Consistent with prometheus, the same label corresponds to a stream
Prometheus processing series mode
The label in prometheus corresponds to the same hash value and refid (id with increasing positive integer), that is, the same series
- Time series data is continuously appended to this memseries
- When any label changes, a new hash value and refid will be generated, corresponding to the new series

Loki's log processing mode

Consistent with prometheus, a set of tag values in loki will generate a stream
- Logs will be appended to this stream as time increases, and finally compressed into chunks
- When any tag changes, a new hash value will be generated, corresponding to the new stream

Query process

So loki first calculates the hash value according to the label and finds the corresponding chunk in the inverted index?
Then filter according to the keywords in the query statement, which can greatly speed up
Because of the hash calculation based on the label to find the id in the inverted row, the corresponding found block has been verified in prometheus
- Low overhead
- high speed

Dynamic labels and high cardinality

So with the above knowledge, then we have to talk about the problem of dynamic tags

Two concepts

What is a dynamic label: To put it bluntly, the value of the label is not fixed
What is a high cardinality label: To put it bluntly, there are too many possibilities for the value of the label, reaching 100,000, 1 million or more

Promtail supports regular matching of dynamic tags in pipline_stages

For example, apache's access log

11.11.11.11 - frank [25/Jan/2000:14:00:01 -0500] "GET /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"

Use regex in promtail to match the two tags action and status_code

job_name: system
pipeline_stages:

- regex:
  expression: "^(?P<ip>\\S+) (?P<identd>\\S+) (?P<user>\\S+) \\[(?P<timestamp>[\\w:/]+\\s[+\\-]\\d{4})\\] \"(?P<action>\\S+)\\s?(?P<path>\\S+)?\\s?(?P<protocol>\\S+)?\" (?P<status_code>\\d{3}|-) (?P<size>\\d+|-)\\s?\"?(?P<referer>[^\"]*)\"?\\s?\"?(?P<useragent>[^\"]*)?\"?$"

labels:
action:
status_code:

static_configs:

targets:
- localhost
labels:
job: apache
env: dev
__path__: /var/log/apache.log

Then corresponding action=get/post and status_code=200/400 correspond to 4 streams

11.11.11.11 - frank [25/Jan/2000:14:00:01 -0500] "GET /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"
11.11.11.12 - frank [25/Jan/2000:14:00:02 -0500] "POST /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"
11.11.11.13 - frank [25/Jan/2000:14:00:03 -0500] "GET /1986.js HTTP/1.1" 400 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"
11.11.11.14 - frank [25/Jan/2000:14:00:04 -0500] "POST /1986.js HTTP/1.1" 400 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"

Those four log lines will become four separate streams and begin to fill in four separate blocks.
If another unique tag combination appears (e.g. status_code="500"), another new stream will be created

High cardinality problem

Just like above, if you set a label for ip, now imagine that if you set a label for ip, each different ip request from the user not only becomes a unique stream
Thousands of streams can be generated quickly, which is a high cardinality, which can kill Loki
Therefore, in order to avoid high cardinality, you should avoid using labels with too large quantiles.

If the field is not indexed as a label, will it cause the query to be very slow?

Loki's super power is to break the query into small pieces and distribute them in parallel so that you can query a large amount of log data in a short time

Full-text indexing problem

Large indexes are complicated and expensive. Generally, the size of the full-text index of log data is equal to or greater than the size of the log data itself
To query log data, this index needs to be loaded, and in order to improve performance, it should probably be in memory. This is difficult to scale, and as you consume more logs, the index will quickly grow larger.
Loki's index is usually an order of magnitude smaller than the amount of logs ingested, and the index grows very slowly

So how to speed up the query of unlabeled fields

Take the ip field mentioned above as an example

Use filter expression query
```
{job="apache"} |= "11.11.11.11"
```

Fragmentation during loki query (grep by time range)

Loki will break the query into smaller fragments, open each block for the stream that matches the tag, and start looking for the IP address.
The size of these shards and the number of parallelizations are configurable and depend on the resources you provide
If necessary, you can configure the sharding interval to 5m, deploy 20 queriers, and process gigabytes of logs in a few seconds
Or, you can go crazy and set up 200 queryers and process terabytes of logs!

Comparison of two index modes

The big index of es, whether you check it or not, it must exist at all times. Such as taking up too much memory for a long time
The logic of loki is to start multiple segmented parallel queries when querying

Tag less when the log volume is small

Because every additional chunk is loaded, there is additional overhead
For example, if the query is {app="loki",level!="debug"}
Without the level tag, just load a chunk, that is, the tag with app="loki"
If level is added, you need to load all 5 chunks of level=info, warn, error, and critical before querying

Add tags when you need them

When chunk_target_size=1MB, it means to cut the block with a compressed size of 1MB
The corresponding original log size is 5MB-10MB. If the log can reach 10MB within max_chunk_age time, consider adding tags

The log should be incremented by time

This problem is the same as dealing with old data in tsdb
At present, loki directly rejects old data for performance considerations

Where is Loki, which does not index the full text content, excels? It can occupy a part of the log monitoring field

Video tutorial

Summarize the advantages of loki

1. Low index overhead

2. Concurrent query + use cache

3. Use the same label as prometheus to connect to alertmanager

4. Use grafana as the front end, avoid switching back and forth between kibana and grafana

Architecture description

Architecture description

Component description

promtail as a collector, analogous to filebeat

Loki is equivalent to the server side, analogous to es

read path

write path

Install using localized mode

Download promtail and loki binary

Find a linux machine for testing

Install promtail

Install loki

Configure loki data source on grafana

Configure view log on grafana explore

Tab only

Static label matching mode

Configuration interpretation

When querying, you can use the same tag matching statement as prometheus to query

Features of tag matching mode

principle

Query process

Dynamic labels and high cardinality

Two concepts

Promtail supports regular matching of dynamic tags in pipline_stages

High cardinality problem

If the field is not indexed as a label, will it cause the query to be very slow?

Full-text indexing problem

So how to speed up the query of unlabeled fields

Fragmentation during loki query (grep by time range)

Comparison of two index modes

Tag less when the log volume is small

Add tags when you need them

The log should be incremented by time

ning1875

引用和评论

k8s中的11运维开发方向是哪些 如果快速提升k8s开发能力

高并发微服务日志管理：ELK、Loki、Fluentd 终极对决与实战指南

夜莺监控 v8.0 新版通知规则 | 对接企微告警

观测云多步拨测最佳实践

夜莺监控 v8.0 新版通知规则 | 对接飞书告警

夜莺监控新版，中心端连不通的时序库也可以告警了

Prometheus+Grafana+Alertmanager监控

k8s中的11运维开发方向是哪些如果快速提升k8s开发能力