Video tutorial
Summarize the advantages of loki
1. Low index overhead
- The biggest difference between loki and es is that loki only indexes tags and not content.
- This can greatly reduce the index resource overhead (es whether you check or not, the huge index overhead must be borne at all times)
2. Concurrent query + use cache
- At the same time, in order to compensate for the query slowdown caused by the lack of full-text indexing, Loki will break the query into smaller fragments, which can be understood as concurrent grep
- Support index, chunk and result cache speed at the same time
3. Use the same label as prometheus to connect to alertmanager
- The label consistency between Loki and Prometheus is one of Loki's super powers
4. Use grafana as the front end, avoid switching back and forth between kibana and grafana
Architecture description
Architecture description
Component description
promtail as a collector, analogous to filebeat
Loki is equivalent to the server side, analogous to es
The loki process contains four roles
- querier
- ingester log storage
- query-frontend
- distributor write distributor
You can specify the running role through the -target parameter of the loki binary
read path
- The querier receives HTTP/1 data requests.
- The querier passes the query to all ingesters to request the data in the memory.
- The receiver receives the read request and returns the data that matches the query (if any).
- If no receiver returns data, the querier lazily loads the data from the backing store and executes the query on it.
- The querier will iterate all the received data and perform deduplication, thereby returning the final data set via the HTTP/1 connection.
write path
- The distribution server receives an HTTP/1 request to store streaming data.
- Each stream is hashed using a hash ring.
- The dispatcher sends each stream to the appropriate inester and its copy (based on the configured replication factor).
- Each instance will create a block for the data of the stream or append it to an existing block. Each tenant and each tag set block is unique.
- The distributor connects via HTTP/1 and responds with a success code.
Install using localized mode
Download promtail and loki binary
wget https://github.com/grafana/loki/releases/download/v2.2.1/loki-linux-amd64.zip
wget https://github.com/grafana/loki/releases/download/v2.2.1/promtail-linux-amd64.zip
Find a linux machine for testing
Install promtail
mkdir /opt/app/{promtail,loki} -pv
# promtail配置文件
cat <<EOF> /opt/app/promtail/promtail.yaml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /var/log/positions.yaml # This location needs to be writeable by promtail.
client:
url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: system
pipeline_stages:
static_configs:
- targets:
- localhost
labels:
job: varlogs # A `job` label is fairly standard in prometheus and useful for linking metrics and logs.
host: yourhost # A `host` label will help identify logs from this machine vs others
__path__: /var/log/*.log # The path matching uses a third party library: https://github.com/bmatcuk/doublestar
EOF
# service文件
cat <<EOF >/etc/systemd/system/promtail.service
[Unit]
Description=promtail server
Wants=network-online.target
After=network-online.target
[Service]
ExecStart=/opt/app/promtail/promtail -config.file=/opt/app/promtail/promtail.yaml
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=promtail
[Install]
WantedBy=default.target
EOF
systemctl daemon-reload
systemctl restart promtail
systemctl status promtail
Install loki
mkdir /opt/app/{promtail,loki} -pv
# promtail配置文件
cat <<EOF> /opt/app/loki/loki.yaml
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
ingester:
wal:
enabled: true
dir: /opt/app/loki/wal
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 1h # Any chunk not receiving new logs in this time will be flushed
max_chunk_age: 1h # All chunks will be flushed when they hit this age, default is 1h
chunk_target_size: 1048576 # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
chunk_retain_period: 30s # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
max_transfer_retries: 0 # Chunk transfers disabled
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /opt/app/loki/boltdb-shipper-active
cache_location: /opt/app/loki/boltdb-shipper-cache
cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
shared_store: filesystem
filesystem:
directory: /opt/app/loki/chunks
compactor:
working_directory: /opt/app/loki/boltdb-shipper-compactor
shared_store: filesystem
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: false
retention_period: 0s
ruler:
storage:
type: local
local:
directory: /opt/app/loki/rules
rule_path: /opt/app/loki/rules-temp
alertmanager_url: http://localhost:9093
ring:
kvstore:
store: inmemory
enable_api: true
EOF
# service文件
cat <<EOF >/etc/systemd/system/loki.service
[Unit]
Description=loki server
Wants=network-online.target
After=network-online.target
[Service]
ExecStart=/opt/app/loki/loki -config.file=/opt/app/loki/loki.yaml
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=loki
[Install]
WantedBy=default.target
EOF
systemctl daemon-reload
systemctl restart loki
systemctl status loki
Configure loki data source on grafana
Configure view log on grafana explore
View log
rate({job="message"} |="kubelet"
Calculate qps
rate({job="message"} |="kubelet" [1m])
Tab only
It has been mentioned many times before that the biggest difference between loki and es is that loki only indexes tags and does not index content.
Let's take an example
Static label matching mode
Take a simple promtail configuration example
Configuration interpretation
scrape_configs:
- job_name: system
pipeline_stages:
static_configs:
- targets:
- localhost
labels:
job: message
__path__: /var/log/messages
- The above configuration represents starting a log collection task
- This task has 1 fixed label
job="syslog"
- The collection log path is
/var/log/messages
, and it will be a fixed label named filename - You can see the target information page similar to prometheus on the web page of promtail
When querying, you can use the same tag matching statement as prometheus to query
{job="syslog"}
scrape_configs: - job_name: system pipeline_stages: static_configs: - targets: - localhost labels: job: syslog __path__: /var/log/syslog - job_name: system pipeline_stages: static_configs: - targets: - localhost labels: job: apache __path__: /var/log/apache.log
- If we configure two jobs, we can use
{job=~”apache|syslog”}
for multi-job matching - It also supports regular and regular non-matching
Features of tag matching mode
principle
Consistent with prometheus, the same label corresponds to a stream
Prometheus processing series mode
The label in prometheus corresponds to the same hash value and refid (id with increasing positive integer), that is, the same series
- Time series data is continuously appended to this memseries
- When any label changes, a new hash value and refid will be generated, corresponding to the new series
Loki's log processing mode
Consistent with prometheus, a set of tag values in loki will generate a stream
- Logs will be appended to this stream as time increases, and finally compressed into chunks
- When any tag changes, a new hash value will be generated, corresponding to the new stream
Query process
- So loki first calculates the hash value according to the label and finds the corresponding chunk in the inverted index?
- Then filter according to the keywords in the query statement, which can greatly speed up
Because of the hash calculation based on the label to find the id in the inverted row, the corresponding found block has been verified in prometheus
- Low overhead
- high speed
Dynamic labels and high cardinality
So with the above knowledge, then we have to talk about the problem of dynamic tags
Two concepts
What is a dynamic label: To put it bluntly, the value of the label is not fixed
What is a high cardinality label: To put it bluntly, there are too many possibilities for the value of the label, reaching 100,000, 1 million or more
Promtail supports regular matching of dynamic tags in pipline_stages
For example, apache's access log
11.11.11.11 - frank [25/Jan/2000:14:00:01 -0500] "GET /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"
Use regex in promtail to match the two tags
action
andstatus_code
job_name: system
pipeline_stages:- regex: expression: "^(?P<ip>\\S+) (?P<identd>\\S+) (?P<user>\\S+) \\[(?P<timestamp>[\\w:/]+\\s[+\\-]\\d{4})\\] \"(?P<action>\\S+)\\s?(?P<path>\\S+)?\\s?(?P<protocol>\\S+)?\" (?P<status_code>\\d{3}|-) (?P<size>\\d+|-)\\s?\"?(?P<referer>[^\"]*)\"?\\s?\"?(?P<useragent>[^\"]*)?\"?$"
- labels:
action:
status_code:
static_configs:
targets:
- localhost
labels:
job: apache
env: dev
__path__: /var/log/apache.log
- labels:
Then corresponding action=get/post and status_code=200/400 correspond to 4 streams
11.11.11.11 - frank [25/Jan/2000:14:00:01 -0500] "GET /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6" 11.11.11.12 - frank [25/Jan/2000:14:00:02 -0500] "POST /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6" 11.11.11.13 - frank [25/Jan/2000:14:00:03 -0500] "GET /1986.js HTTP/1.1" 400 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6" 11.11.11.14 - frank [25/Jan/2000:14:00:04 -0500] "POST /1986.js HTTP/1.1" 400 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"
- Those four log lines will become four separate streams and begin to fill in four separate blocks.
- If another unique tag combination appears (e.g. status_code="500"), another new stream will be created
High cardinality problem
- Just like above, if you set a label for ip, now imagine that if you set a label for ip, each different ip request from the user not only becomes a unique stream
- Thousands of streams can be generated quickly, which is a high cardinality, which can kill Loki
- Therefore, in order to avoid high cardinality, you should avoid using labels with too large quantiles.
If the field is not indexed as a label, will it cause the query to be very slow?
Loki's super power is to break the query into small pieces and distribute them in parallel so that you can query a large amount of log data in a short time
Full-text indexing problem
- Large indexes are complicated and expensive. Generally, the size of the full-text index of log data is equal to or greater than the size of the log data itself
- To query log data, this index needs to be loaded, and in order to improve performance, it should probably be in memory. This is difficult to scale, and as you consume more logs, the index will quickly grow larger.
- Loki's index is usually an order of magnitude smaller than the amount of logs ingested, and the index grows very slowly
So how to speed up the query of unlabeled fields
Take the ip field mentioned above as an example
Use filter expression query
{job="apache"} |= "11.11.11.11"
Fragmentation during loki query (grep by time range)
- Loki will break the query into smaller fragments, open each block for the stream that matches the tag, and start looking for the IP address.
- The size of these shards and the number of parallelizations are configurable and depend on the resources you provide
- If necessary, you can configure the sharding interval to 5m, deploy 20 queriers, and process gigabytes of logs in a few seconds
- Or, you can go crazy and set up 200 queryers and process terabytes of logs!
Comparison of two index modes
- The big index of es, whether you check it or not, it must exist at all times. Such as taking up too much memory for a long time
- The logic of loki is to start multiple segmented parallel queries when querying
Tag less when the log volume is small
- Because every additional chunk is loaded, there is additional overhead
- For example, if the query is {app="loki",level!="debug"}
- Without the level tag, just load a chunk, that is, the tag with app="loki"
- If level is added, you need to load all 5 chunks of level=info, warn, error, and critical before querying
Add tags when you need them
- When chunk_target_size=1MB, it means to cut the block with a compressed size of 1MB
- The corresponding original log size is 5MB-10MB. If the log can reach 10MB within max_chunk_age time, consider adding tags
The log should be incremented by time
- This problem is the same as dealing with old data in tsdb
- At present, loki directly rejects old data for performance considerations
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。