Remember an Istio sprint tuning

Why tune

If it is said that introducing a technology requires interest and enthusiasm, then it takes persistence and perseverance to make this technology online. Cloud Native is true, Istio is true.
In the performance test before going live, the use of Istio provides observability, convenience in operation and maintenance, and also introduces pain: increasing service response delay. How to minimize the pain has become an urgent matter at the moment.

Performance: SERVICE-A, which was 9ms before, now takes 14 ms. SERVICE-A depends on SERVICE-B.

Road to analysis

There are two ways under your feet:

Directly adjust some configurations that are considered suspicious and disable some functions. Then pressure test to see the results.
Do cpu profile for sidecar to locate suspicious places. Perform relatively well-founded tuning

I chose 2.

Sidecar CPU Profile (photo lung)

As a relatively mature open source product, Istio has its official benchmark project:

https://github.com/istio/tools/tree/release-1.8/perf/benchmark

I refer to: https://github.com/istio/tools/tree/release-1.8/perf/benchmark/flame#setup-perf-tool-envoy .

Install perf

Run the perf tool of Linux in the container to profile the sidecar. There are some difficulties. For example, the Istio-proxy container is read-only by default for the entire file system. I modified it to be writable. Need to enter the container as root. If you find it troublesome, you can also make a custom image based on the original image. The specific method is not the focus of this article, so I won't talk about it. Afterwards, you can use package tools (such as apt) to install perf.

This is an example of istio-proxy container configuration:

spec:
  containers:
  - name: istio-proxy
    image: xyz
    securityContext:
      allowPrivilegeEscalation: true
      capabilities:
        add:
        - ALL
      privileged: true
      readOnlyRootFilesystem: false
      runAsGroup: 1337
      runAsNonRoot: false
      runAsUser: 1337

Execute profile and generate Flame Graph

Enter the istio-proxy container as root (yes, root can save a bit)

perf record -g  -F 19 -p `pgrep envoy` -o perf.data -- sleep 120
perf script --header -i perf.data > perf.stacks

perf.stacks copied to the development machine, Flame Graph is generated. Yes, a perl script is needed: https://github.com/brendangregg/FlameGraph (proudly produced by my idol Brendan Gregg)

export FlameGraph=/xyz/FlameGraph
$FlameGraph/stackcollapse-perf.pl < perf.stacks | $FlameGraph/flamegraph.pl --hash > perf.svg

Finally, perf.svg generated:

The picture above is just an envoy worker thread, and there is another thread similar to it. So the above proxy_wasm::ContextBase::onLog uses 14% of the CPU of the whole process. As can be seen from the above figure, this is probably an Envoy extension Filter. The question is, what kind of Filter is this, and why some stack information cannot be obtained (perf-18.map in the figure above).

Envoy Filter-Utopia of wasm

What I know is that wasm is a vm engine (analogous to jvm). Envoy supports Native mode to achieve extension, and also supports wasm mode to achieve extension. Of course, there must be a performance loss between vm engine and Native.

Fortunately, a brother search led me to find this document:

https://istio.io/v1.8/docs/ops/deployment/performance-and-scalability/

One of the pictures, and a paragraph gave me a hint:

baseline Client pod directly calls the server pod, no sidecars are present.
none_both Istio proxy with no Istio specific filters configured.
v2-stats-wasm_both Client and server sidecars are present with telemetry v2 v8 configured.
v2-stats-nullvm_both Client and server sidecars are present with telemetry v2 nullvm configured by default.
v2-sd-full-nullvm_both Export Stackdriver metrics, access logs and edges with telemetry v2 nullvm configured.
v2-sd-nologging-nullvm_both Same as above, but does not export access logs.

Well (now popular in Cantonese) a performance test, so many lines do? What translates into grounding is:

baseline does not use sidecars
none_both does not use Istio's Filter
v2-stats-wasm_both filter implemented using wasm
v2-stats-nullvm_both uses the Filter implemented by Native

What do you want to say in these few words? Foreigners are sometimes more reserved. To put it down to the ground, we want to promote the use of wasm technology, so we use this by default. If you mind the 1ms delay, and a little bit of CPU. Please use Native technology again. Well, I admit, I mind.

Note: Later I discovered that the official standard version of Istio 1.8 uses Native Filter. Our environment is an internal customized version, which uses wasm Filter by default (or a utopia based on security, isolation, and portability greater than performance). So, for you, Native Filter is already the default configuration.

The exhausted Worker Thread and the idling core

The following is the thread-level top monitoring of the enovy process. Yes, pthread said, thread naming is not a patent of the Java world. The COMMAND column is the thread name.

top -p `pgrep envoy` -H -b

top - 01:13:52 up 42 days, 14:01,  0 users,  load average: 17.79, 14.09, 10.73
Threads:  28 total,   2 running,  26 sleeping,   0 stopped,   0 zombie
%Cpu(s): 42.0 us,  7.3 sy,  0.0 ni, 46.9 id,  0.0 wa,  0.0 hi,  3.7 si,  0.1 st
MiB Mem : 94629.32+total, 67159.44+free, 13834.21+used, 13635.66+buff/cache
MiB Swap:    0.000 total,    0.000 free,    0.000 used. 80094.03+avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    42 istio-p+  20   0  0.274t 221108  43012 R 60.47 0.228 174:48.28 wrk:worker_1
    41 istio-p+  20   0  0.274t 221108  43012 R 55.81 0.228 149:33.37 wrk:worker_0
    18 istio-p+  20   0  0.274t 221108  43012 S 0.332 0.228   2:22.48 envoy

At the same time, it is found that the increase in the concurrency pressure of the client does not significantly increase the CPU usage of the envoy inter-thread of this two worker threads to 100%. The fact that the hyper-threaded CPU core that has been circulating among the people cannot reach core * 2 performance is here. How to do? Try adding a worker.

One word: tone

Istio EnvoyFilter , so I played like this:

kubectl apply -f - <<"EOF"
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  ...
  name: stats-filter-1.8
spec:
  configPatches:
  - applyTo: HTTP_FILTER
    match:
      context: SIDECAR_OUTBOUND
      listener:
        filterChain:
          filter:
            name: envoy.http_connection_manager
            subFilter:
              name: envoy.router
      proxy:
        proxyVersion: ^1\.8.*
    patch:
      operation: INSERT_BEFORE
      value:
        name: istio.stats
        typed_config:
          '@type': type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
          value:
            config:
              configuration:
                '@type': type.googleapis.com/google.protobuf.StringValue
                value: |
                  {
                  }
              root_id: stats_outbound
              vm_config:
                allow_precompiled: true
                code:
                  local:
                    inline_string: envoy.wasm.stats
                runtime: envoy.wasm.runtime.null
                vm_id: stats_outbound
...
EOF

kubectl apply -f - <<"EOF"
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: metadata-exchange-1.8
spec:
  configPatches:
  - applyTo: HTTP_FILTER
    match:
      context: SIDECAR_INBOUND
      listener:
        filterChain:
          filter:
            name: envoy.http_connection_manager
      proxy:
        proxyVersion: ^1\.8.*
    patch:
      operation: INSERT_BEFORE
      value:
        name: istio.metadata_exchange
        typed_config:
          '@type': type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
          value:
            config:
              configuration:
                '@type': type.googleapis.com/google.protobuf.StringValue
                value: |
                  {}
              vm_config:
                allow_precompiled: true
                code:
                  local:
                    inline_string: envoy.wasm.metadata_exchange
                runtime: envoy.wasm.runtime.null
...
EOF

Note: Later I discovered that the official standard version of Istio 1.8 uses Native Filter, which is envoy.wasm.runtime.null . Our environment is an internal customized version, which uses wasm Filter by default (or a utopia based on security, isolation, and portability greater than performance). Therefore, the above optimization may be that the default configuration has been completed for you. That is, you can ignore...

The following is the number of threads to modify envoy:

kubectl edit deployments.apps my-service-deployment

spec:
  template:
    metadata:
      annotations:
        proxy.istio.io/config: 'concurrency: 4'

Sidecar CPU Profile (take the lungs again)

Because of the use of native envoy filter instead of wasm filter. As can be seen from the above figure, the stack loss situation is gone. The measured CPU usage has dropped by about 8%, and the latency has been reduced by 1ms.

Summarize

Instead of condemning the cheating custom version of the default wasm envoy filter configuration and thread configuration, it is better to think about why you paid for several days to locate this problem. When we were very excited to board a new technology ship, in addition to remembering to bring a lifebuoy, we must not forget: you are the captain, besides knowing how to drive, you should also understand the working principle and maintenance technology of the ship so that you can deal with emergencies. Negative trust.

Original: https://blog.mygraphql.com/zh/posts/cloud/istio/istio-tunning/istio-filter-tunning-thread/