Video tutorial

Summarize the advantages of loki

1. Low index overhead

  • The biggest difference between loki and es is that loki only indexes tags and not content.
  • This can greatly reduce the index resource overhead (es whether you check or not, the huge index overhead must be borne at all times)

2. Concurrent query + use cache

  • At the same time, in order to compensate for the query slowdown caused by the lack of full-text indexing, Loki will break the query into smaller fragments, which can be understood as concurrent grep
  • Support index, chunk and result cache speed at the same time

3. Use the same label as prometheus to connect to alertmanager

  • The label consistency between Loki and Prometheus is one of Loki's super powers

4. Use grafana as the front end, avoid switching back and forth between kibana and grafana

Architecture description

Architecture description

Component description

promtail as a collector, analogous to filebeat

Loki is equivalent to the server side, analogous to es

The loki process contains four roles
  • querier
  • ingester log storage
  • query-frontend
  • distributor write distributor
You can specify the running role through the -target parameter of the loki binary

read path

  • The querier receives HTTP/1 data requests.
  • The querier passes the query to all ingesters to request the data in the memory.
  • The receiver receives the read request and returns the data that matches the query (if any).
  • If no receiver returns data, the querier lazily loads the data from the backing store and executes the query on it.
  • The querier will iterate all the received data and perform deduplication, thereby returning the final data set via the HTTP/1 connection.

write path

  • The distribution server receives an HTTP/1 request to store streaming data.
  • Each stream is hashed using a hash ring.
  • The dispatcher sends each stream to the appropriate inester and its copy (based on the configured replication factor).
  • Each instance will create a block for the data of the stream or append it to an existing block. Each tenant and each tag set block is unique.
  • The distributor connects via HTTP/1 and responds with a success code.

Install using localized mode

Download promtail and loki binary



Find a linux machine for testing

Install promtail

mkdir /opt/app/{promtail,loki} -pv 

# promtail配置文件
cat <<EOF> /opt/app/promtail/promtail.yaml
  http_listen_port: 9080
  grpc_listen_port: 0

  filename: /var/log/positions.yaml # This location needs to be writeable by promtail.

  url: http://localhost:3100/loki/api/v1/push

 - job_name: system
   - targets:
      - localhost
      job: varlogs  # A `job` label is fairly standard in prometheus and useful for linking metrics and logs.
      host: yourhost # A `host` label will help identify logs from this machine vs others
      __path__: /var/log/*.log  # The path matching uses a third party library:

# service文件

cat <<EOF >/etc/systemd/system/promtail.service
Description=promtail server

ExecStart=/opt/app/promtail/promtail -config.file=/opt/app/promtail/promtail.yaml

systemctl daemon-reload
systemctl restart promtail 
systemctl status promtail 

Install loki

mkdir /opt/app/{promtail,loki} -pv 

# promtail配置文件
cat <<EOF> /opt/app/loki/loki.yaml
auth_enabled: false

  http_listen_port: 3100
  grpc_listen_port: 9096

    enabled: true
    dir: /opt/app/loki/wal
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 1h       # Any chunk not receiving new logs in this time will be flushed
  max_chunk_age: 1h           # All chunks will be flushed when they hit this age, default is 1h
  chunk_target_size: 1048576  # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
  chunk_retain_period: 30s    # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
  max_transfer_retries: 0     # Chunk transfers disabled

    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
        prefix: index_
        period: 24h

    active_index_directory: /opt/app/loki/boltdb-shipper-active
    cache_location: /opt/app/loki/boltdb-shipper-cache
    cache_ttl: 24h         # Can be increased for faster performance over longer query periods, uses more disk space
    shared_store: filesystem
    directory: /opt/app/loki/chunks

  working_directory: /opt/app/loki/boltdb-shipper-compactor
  shared_store: filesystem

  reject_old_samples: true
  reject_old_samples_max_age: 168h

  max_look_back_period: 0s

  retention_deletes_enabled: false
  retention_period: 0s

    type: local
      directory: /opt/app/loki/rules
  rule_path: /opt/app/loki/rules-temp
  alertmanager_url: http://localhost:9093
      store: inmemory
  enable_api: true

# service文件

cat <<EOF >/etc/systemd/system/loki.service
Description=loki server

ExecStart=/opt/app/loki/loki -config.file=/opt/app/loki/loki.yaml

systemctl daemon-reload
systemctl restart loki 
systemctl status loki 

Configure loki data source on grafana

Configure view log on grafana explore

View log rate({job="message"} |="kubelet"

Calculate qps rate({job="message"} |="kubelet" [1m])

Tab only

It has been mentioned many times before that the biggest difference between loki and es is that loki only indexes tags and does not index content.
Let's take an example

Static label matching mode

Take a simple promtail configuration example

Configuration interpretation

 - job_name: system
   - targets:
      - localhost
      job: message
      __path__: /var/log/messages
  • The above configuration represents starting a log collection task
  • This task has 1 fixed label job="syslog"
  • The collection log path is /var/log/messages , and it will be a fixed label named filename
  • You can see the target information page similar to prometheus on the web page of promtail

When querying, you can use the same tag matching statement as prometheus to query

  • {job="syslog"}

     - job_name: system
     - targets:
        - localhost
        job: syslog
        __path__: /var/log/syslog
     - job_name: system
     - targets:
        - localhost
        job: apache
        __path__: /var/log/apache.log
  • If we configure two jobs, we can use {job=~”apache|syslog”} for multi-job matching
  • It also supports regular and regular non-matching

Features of tag matching mode


  • Consistent with prometheus, the same label corresponds to a stream

    Prometheus processing series mode
  • The label in prometheus corresponds to the same hash value and refid (id with increasing positive integer), that is, the same series

    • Time series data is continuously appended to this memseries
    • When any label changes, a new hash value and refid will be generated, corresponding to the new series
Loki's log processing mode
  • Consistent with prometheus, a set of tag values in loki will generate a stream

    • Logs will be appended to this stream as time increases, and finally compressed into chunks
    • When any tag changes, a new hash value will be generated, corresponding to the new stream

Query process

  • So loki first calculates the hash value according to the label and finds the corresponding chunk in the inverted index?
  • Then filter according to the keywords in the query statement, which can greatly speed up
  • Because of the hash calculation based on the label to find the id in the inverted row, the corresponding found block has been verified in prometheus

    • Low overhead
    • high speed

Dynamic labels and high cardinality

So with the above knowledge, then we have to talk about the problem of dynamic tags

Two concepts

What is a dynamic label: To put it bluntly, the value of the label is not fixed

What is a high cardinality label: To put it bluntly, there are too many possibilities for the value of the label, reaching 100,000, 1 million or more

Promtail supports regular matching of dynamic tags in pipline_stages

  • For example, apache's access log - frank [25/Jan/2000:14:00:01 -0500] "GET /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv: Gecko/20091221 Firefox/3.5.7 GTB6"
  • Use regex in promtail to match the two tags action and status_code

  • job_name: system

    - regex:
      expression: "^(?P<ip>\\S+) (?P<identd>\\S+) (?P<user>\\S+) \\[(?P<timestamp>[\\w:/]+\\s[+\\-]\\d{4})\\] \"(?P<action>\\S+)\\s?(?P<path>\\S+)?\\s?(?P<protocol>\\S+)?\" (?P<status_code>\\d{3}|-) (?P<size>\\d+|-)\\s?\"?(?P<referer>[^\"]*)\"?\\s?\"?(?P<useragent>[^\"]*)?\"?$"
    • labels:


    • targets:

      • localhost

      job: apache
      env: dev
      __path__: /var/log/apache.log

  • Then corresponding action=get/post and status_code=200/400 correspond to 4 streams - frank [25/Jan/2000:14:00:01 -0500] "GET /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv: Gecko/20091221 Firefox/3.5.7 GTB6" - frank [25/Jan/2000:14:00:02 -0500] "POST /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv: Gecko/20091221 Firefox/3.5.7 GTB6" - frank [25/Jan/2000:14:00:03 -0500] "GET /1986.js HTTP/1.1" 400 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv: Gecko/20091221 Firefox/3.5.7 GTB6" - frank [25/Jan/2000:14:00:04 -0500] "POST /1986.js HTTP/1.1" 400 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv: Gecko/20091221 Firefox/3.5.7 GTB6"
  • Those four log lines will become four separate streams and begin to fill in four separate blocks.
  • If another unique tag combination appears (e.g. status_code="500"), another new stream will be created

High cardinality problem

  • Just like above, if you set a label for ip, now imagine that if you set a label for ip, each different ip request from the user not only becomes a unique stream
  • Thousands of streams can be generated quickly, which is a high cardinality, which can kill Loki
  • Therefore, in order to avoid high cardinality, you should avoid using labels with too large quantiles.

If the field is not indexed as a label, will it cause the query to be very slow?

Loki's super power is to break the query into small pieces and distribute them in parallel so that you can query a large amount of log data in a short time

Full-text indexing problem

  • Large indexes are complicated and expensive. Generally, the size of the full-text index of log data is equal to or greater than the size of the log data itself
  • To query log data, this index needs to be loaded, and in order to improve performance, it should probably be in memory. This is difficult to scale, and as you consume more logs, the index will quickly grow larger.
  • Loki's index is usually an order of magnitude smaller than the amount of logs ingested, and the index grows very slowly

So how to speed up the query of unlabeled fields

Take the ip field mentioned above as an example
  • Use filter expression query

    {job="apache"} |= ""

Fragmentation during loki query (grep by time range)

  • Loki will break the query into smaller fragments, open each block for the stream that matches the tag, and start looking for the IP address.
  • The size of these shards and the number of parallelizations are configurable and depend on the resources you provide
  • If necessary, you can configure the sharding interval to 5m, deploy 20 queriers, and process gigabytes of logs in a few seconds
  • Or, you can go crazy and set up 200 queryers and process terabytes of logs!

Comparison of two index modes

  • The big index of es, whether you check it or not, it must exist at all times. Such as taking up too much memory for a long time
  • The logic of loki is to start multiple segmented parallel queries when querying

Tag less when the log volume is small

  • Because every additional chunk is loaded, there is additional overhead
  • For example, if the query is {app="loki",level!="debug"}
  • Without the level tag, just load a chunk, that is, the tag with app="loki"
  • If level is added, you need to load all 5 chunks of level=info, warn, error, and critical before querying

Add tags when you need them

  • When chunk_target_size=1MB, it means to cut the block with a compressed size of 1MB
  • The corresponding original log size is 5MB-10MB. If the log can reach 10MB within max_chunk_age time, consider adding tags

The log should be incremented by time

  • This problem is the same as dealing with old data in tsdb
  • At present, loki directly rejects old data for performance considerations

167 声望70 粉丝

k8s/prometheus/cicd运维开发专家,想进阶的dy搜 小乙运维杂货铺