Database query performance optimization guide

Database query performance optimization has always been a topic that programmers can't avoid. When we encounter slow business report refresh or too long query results, we can use the questioning method to think about how to optimize.

1. What kind of environment

Hardware environment

The speed of query execution is closely related to our hardware. What kind of CPU is currently used, how many cores and threads, and how large the memory is directly affects the computing speed. Whether the disk is SSD or HDD, and what rate of network card directly affects our data reading Time delay

Software Environment

Although the software environment is not like hardware, various parameters are tangible, but it still affects our query performance. Each system actually has its own advantages in specific scenarios. What kind of structure our query system is, what kind of query it is suitable for, online or offline, more calculations or more data reads, these should all be clear when we optimize.

Let's take a look at how to optimize performance based on this idea

2. What kind of query

First of all, when we optimize the query, we need to see what type of query it is. Write or query (in view of the length here, only query), CPU-intensive or IO-intensive. If our system is suitable for OLTP low-latency inspection scenarios, it is obviously not suitable to do OLAP large-scale analysis on this system. OLTP generally focuses on inspections with higher data consistency, while OLAP is due to The amount of data is huge, and vector concurrent queries are generally required. OLAP does not focus on millisecond-level low latency, while OLTP does not focus on hundreds of millions of data statistics.

3. How to find performance bottlenecks

3.1 vmstat view system conditions

When the overall system does not know where the current bottleneck is, we can first use the vmstat tool to briefly look at the general situation of the system. As shown in the figure below, 2 means that the server status is collected every two seconds.

procs: view process status

r: Run queue, that is, the number of processes currently running (running or waiting to run). At present, the CPU is relatively idle, and this number is very small. When this value exceeds the number of CPUs, a CPU bottleneck will appear.

b: blocked processes, that is, the number of processes in an uninterruptible sleep state.

memory: View memory status

swpd: The used virtual memory size. If it is greater than 0, it means that the machine has started to use virtual memory, and the virtual memory will run very slowly. The value 0 here means that we have turned off the virtual memory function.

free: The size of free physical memory.

buff: The memory is the size of the system buffers.

cache: The memory is used as the size of the system cache.

swap: The state of data exchange between disk and memory

nesi: The size of the virtual memory read from the disk per second. If this value is greater than 0, it means that the physical memory is not enough.

so: The size of the virtual memory written to the disk per second.

io: Disk io information

bi: The number of blocks received from the block device per second.

bo: The number of blocks sent to the block device per second.

If these two values are larger, it means that IO is more frequent, and IO optimization can be considered.

system: system status information

in: CPU interrupts per second (including clock interrupts)

cs: The number of context switches per second. When we call system functions and thread switching, we need context switching. If this value is too large, we can consider reducing system context switching, such as coroutines instead of multithreading.

CPU: CPU information

us: Including user time and nice time, time to run non-kernel code (or user code).

sy: The time occupied by the system, the time occupied by running kernel code (such as system calls).

id: CPU time spent on idle.

wa: wait for IO CPU time. If this value is too large, it means that the IO system bottleneck is IO.

If the CPU occupy high representation system in the CPU on, if the system swap more frequently, the system is likely to be a memory leak or memory is not enough, you need expanded memory, if it is IO wait more then the system bottleneck On IO , if context switching or system calls account for too much, we need to think about the design of our program to reduce system calls or context switching.

3.2 CPU usage is too high

We can use some tools such as uptime, top, mpstat or sar to check the current high CPU usage.

We can look at the overall situation of the current system through uptime, the current system time and running time, the number of users logged in, and the average load of the system in the last 5, 10, and 15 minutes.

top can display more detailed information. The head part has detailed information about the CPU usage, and the following list also records the CPU usage of each process.

If it is multi-threaded, we can also use top -H -p pid to view the CPU usage of each thread of the process

After we find which thread occupies the most proportion, we can check what processing the thread is used for according to the thread name of this thread. Understand roughly what kind of processing makes the CPU higher.

mpstat can view the operating status of each core of the system.

The function of sar is relatively complete, so science is no longer done here.

The CPU user mode occupies relatively high. Generally, the efficiency of our program writing is too low. If it is low, we can use the perf tool or Intel's vtunes to check the performance bottleneck. The execution result of perf top is shown in the figure below. After we get the corresponding stack information, we can eliminate the CPU bottleneck in a targeted manner. (The usage of vtune can be Googled by yourself).

In view of the situation checked by the above tools, if the CPU is indeed high, then the CPU is basically a performance bottleneck. If it is not high, you need to proceed to the next step to determine the performance bottleneck.

3.3 IO occupancy is too high

There are many tools for IO positioning. Generally, we can use iostat, pidstat and iotop tools to check IO problems. Of course, we can also use other tools. You can search for related tools by yourself. Here we mainly introduce several commonly used tools.

pidstat

Pidstat is a command of the sysstat tool, which is used to monitor the occupancy of system resources such as cpu, memory, thread, device IO, etc. of all or specified processes. The user can obtain the required statistical information by specifying the number and time of statistics.

We can know which process occupies more IO through this command. Then we can view more detailed information by specifying the process number.

In this way, we can know which thread in which process occupies more IO resources, and then we can find the corresponding execution code for analysis through the corresponding TID.

iostat

Iostat is the abbreviation of I/O statistics (input/output statistics), which can monitor the disk operation activities of the system and report disk activity statistics. But iostat only performs statistics on the overall situation of the system, and cannot perform in-depth analysis of a certain process. For individual process analysis, we can use the iotop tool, which is similar to top.

1 means to print the statistics of the current disk once every second. We need to pay attention to the following indicators.

avgrq-sz	Average amount of data per IO operation (number of sectors as unit)
avgqu-sz	Average queue length of IO requests waiting to be processed
await	Average waiting time for each IO request (including waiting time and processing time, in milliseconds)
svctm	Average processing time of each IO request (in milliseconds)
%util	The ratio of time used for IO operations in the adoption cycle, that is, the ratio of time when the IO queue is not empty

avgrq-sz directly reflects the current type of io, such as reading large blocks of data or reading small amounts of data.

avgqu-sz reflects the current busy situation of IO. If the queue length is too long, it means that IO is very busy and cannot handle many tasks. In other words, I and IO have become the bottleneck.

The same is true for await. If the wait is relatively high, it means that IO has become a burden.

Svctm, like avgrq-sz, reflects the processing scale of IO operations. If it is a large block of data read and write, this time will be lengthened.

iotop

iotop can be used to view which processes are executing the occupied I/O, and the usage is similar to top, so I won’t describe it too much here.

3.4 Other situations

If the TOP occupancy is not high and the IO is not the bottleneck, it may be in the program architecture, for example, the concurrency control is not good enough, and more threads are in sleep state. In this case, you can use pstack to see the stacks of all current threads.

4. Optimize performance bottlenecks

CPU bottleneck type

Faced with this type, generally we need to perf with the corresponding code. The core idea is to reduce the amount of calculation. The specific method is for reference only:

SIMD is often used to replace old-fashioned calculation instructions or C++ operators. The MKL library similar to Intel can be introduced to assist calculations.
Reduce unnecessary repeated calculations and reduce the number of for loops. For example, some data structures of the std library have find functions with start coordinates. Make good use of the start coordinates to avoid repeated queries from 0 coordinates.
If there are too many system calls, such as allocating memory, you can consider the way of pre-allocating memory, or directly use tcmalloc and other similar memory management libraries for the bottom line. If possible, you can develop your own memory management based on this type of library. system

IO bottleneck type

IO bottlenecks are generally related to disks. On the network, due to the upgrade of the network card, the speed is faster. In contrast, the restricted io is basically the io on the disk. The following will only talk about the IO optimization method of the disk.

If a read request causes an IO bottleneck, you can consider opening more caches at the upper level. For example, the global query cache, session-level session cache, block cache of block devices, etc., reduce disk io requests from the upper level.
If the IO bottleneck is caused by small data and large concurrent writes, we can consider doing a cache in memory, and process these multiple writes in memory first, and then use time or size thresholds and other policy controls to flush to disk. on.
If it is the writing of big data, we can consider doing smooth writing, limiting the number of writes each time.
If it is because of traffic that a peak occurs at a certain point and then falls back, you can consider writing through a third party. For example, the message queue is first written to the message queue i for peak clipping, and then smoothly written into the system.
In addition, we can also change to better hardware, such as disk arrays.

Memory bottleneck type

Memory bottlenecks are generally difficult to appear. After all, memory is relatively cheap and basically meets the demand for memory. If it is really because of the problem of virtual memory that causes the program to run inefficiently, we should consider increasing the memory and turning off the virtual memory to solve the problem. At the same time, we should also think about our own program model, such as reducing the existence of intermediate data and using copy-on-write Technology, use the no copy interface of the system to replace the old interface and so on.

5. Follow-up

If there is really no way to optimize, we really need to see if the current query is really suitable for our system. Again, every system has its own business. Generally, there will be multiple database engines in the system of a company. Find the right one for our query.