About Author: Alibaba Cloud Database OLAP Product Department-Zihua
a background
AnalyticDB PostgreSQL version (referred to as ADB PG) is a cloud-native data warehouse product built by the Alibaba Cloud database team based on the PostgreSQL kernel (referred to as PG). In business scenarios such as real-time interactive data analysis, HTAP, ETL, and BI report generation, ADB PG has unique technical advantages. It has a wide range of applications in industries such as finance, logistics, and Internet. Go to T and replace the benchmark cloud data warehouse product built by Greenplum.
Data warehouse products are one of the important components of data analysis systems, and various online businesses have high requirements for the stability and availability of data warehouse products. The lack of an effective resource management mechanism will lead to a decrease in the stability of database products, such as the number of connections hitting the OS limit, insufficient memory, process jams, etc., which will affect the usability of the product.
Resource Queue (resource queue) is a resource management method of ADB PG, which can limit the CPU, memory and other resources of the database, and has a certain effect on multi-tenant resource restriction and ensuring the stable operation of the database. As the name implies, Resource Queue manages the SQL running on the database cluster in the form of a queue. For each user, all his connections can only belong to one queue. And for each queue can manage multiple user connections. Users who do not display the specified resource queue will belong to the default resource queue management. By limiting the total amount of resources in each queue, we can achieve the goal of limiting the total amount of resources used by a certain type of business or a certain user.
Let's take customer A of an online trading platform of ADB PG as an example to introduce the use of Resource Queue. Customer A builds a data warehouse based on ADB PG, and runs three types of business daily: real-time business represented by transaction data storage; report business for supporting decision analysis; and visualization business for real-time large-screen display. According to the different characteristics of the three types of services, we configure resource queues according to the following strategies.
A typical representative of customer A's real-time business is that transaction data is written into ADB PG in real time via the Kafka->Flink->ADB PG link. The typical characteristics of this type of business are that the peak concurrency is relatively large and the consumption of a single SQL resource is small. Before the resource queue restriction, the sudden increase in concurrent queries often filled up database connections during peak business hours, causing the execution of high-availability probing queries to fail and causing instances to become unavailable. For this type of business, we associate it with a queue with high CPU weight, large concurrency limit (within the safety threshold), and a low share of single SQL memory. It not only guarantees the fast storage of data, but also prevents the system from being unstable due to flow peaks.
Another typical business of customer A is report and ETL business, which will be scheduled during the low peak period of real-time business and generate reports to provide decision support. This type of business involves a large amount of data, consumes a large amount of memory and generates a large amount of temporary files. For this type of business, we associate it with low CPU weight, low concurrency limit, but high memory share queues, while meeting business needs, control the upper limit of memory usage;
In addition, the customer also supports real-time visual display of data based on the ADB PG data warehouse. This type of visualization scheme often has very stable concurrency, but has certain requirements for query delay. For this type of business, we set its resource queue to high CPU weight, low concurrency limit, and a broad optimizer query plan consumption share, and generate a good query plan to the greatest extent to ensure business stability.
Next, this article will specifically introduce the use of Resource Queue, status monitoring, and its implementation mechanism.
Two Introduction to Resource Queue
Resource Queue supports SQL configuration and supports four types of resource restrictions: concurrency limit, CPU limit, memory limit and query plan limit. Users can define multiple resource queues in the database through SQL, and set the resource limit of each resource queue. In a database, each resource queue can be associated with one or more database users, and each database user can only belong to a single resource queue.
In addition, not all SQL submitted to the resource queue will be restricted by the queue limit. The database will only limit the resource utilization of SELECT, SELECT INTO, CREATE TABLE AS SELECT, DECLARE CURSOR, INSERT, UPDATE, and DELETE. In addition, the SQL run during the execution of the EXPLAIN ANALYZE command will also be excluded from the resource queue.
The resource limit configuration supported by the resource queue is as follows:
Configuration name | Configuration description |
16013 | |
ACTIVE_STATEMENTS | The number of queries allowed to run simultaneously in the queue. Queries that exceed this set value need to be queued for execution. |
PRIORITY | The CPU usage priority of the queue can be set to the following levels: LOW, MEDIUM, HIGH and MAX. The default value is MEDIUM, the higher the priority queue will be allocated the higher the CPU share. |
MAX_COST | query plan consumption limit. |
`
CREATE RESOURCE QUEUE etl WITH (ACTIVE_STATEMENTS=3,
MEMORY_LIMIT='1GB', PRIORITY=LOW, MAX_COST=-1.0);
`
ACTIVE\_STATEMENTS:
The number of queries allowed to run at the same time in the queue, that is, the maximum concurrent value allowed for concurrent queries in the queue. The database allows links exceeding the number of ACTIVE\_STATEMENTS but less than the maximum number of database connections MAX\_CONNECTIONS to connect to the database, but this part of the SQL connection will not start running immediately, but will wait in line.
MAX\_COST:
Cost limit for query plan. The database optimizer will calculate the Cost for each query. If the total Cost exceeds the MAX\_COST value set by the resource queue, the query will be rejected. The default configuration of ADB PG is 0, which means it is not restricted.
Memory Limit:
ADB PG can determine the upper limit of memory used by each SQL on each segment by setting statement\_mem. The Memory Limit has neither a default value, nor can it be specified. When the MEMORY\_LIMIT parameter is not configured, the allowed memory size of a piece of SQL in a resource queue is determined by the statement\_mem parameter:
* If MEMORY\_LIMIT is not set for a resource queue, the memory size allocated for each resource is the server configuration parameter of statement\_mem, and the available memory size of a resource queue is based on the calculation results of statement\_mem and ACTIVE\_STATEMENTS.
* When the resource queue has MEMORY\_LIMIT set, the amount of memory used by a single SQL will be determined by the average allocation value in the queue (MEMORY\_LIMIT/ACTIVE\_STATEMENTS) and the maximum value in statement\_mem, the specific calculation method can refer to Implement the chapter later.
The number of queries that can be executed in parallel in a resource queue is limited by the available memory of the queue. For example: for the queue etl, set STATEMENTS=3 and MEMORY\_LIMIT=2.1G; then if statement\_mem is not set, each query uses 700MB of memory by default.
SQL1 enters the queue and uses 700MB of memory. At this time, the remaining memory of the queue is 1.4G;
SQL2 enters the queue and sets statement\_mem to 1.0GB, at this time the remaining memory of the queue is 0.4GB;
At this time, the remaining memory in the queue cannot meet the memory usage requirements of SQL3 (default 700GB), so although the number of parallel queries in the queue does not reach the queue limit, SQL3 still cannot be executed and needs to be queued.
PRIORITY:
The SQL running in the database will share the available CPU resources according to the priority setting of the resource queue where it is located. When a statement from the high-priority queue enters the active running statement group, it can get a higher share of the available CPU, and it will also reduce the share of the statements that are already running in the queue with a lower priority setting.
The relative size or complexity of the query does not affect the CPU allocation. If a simple low-cost query is run at the same time as a large complex query, and their priority settings are the same, they will be allocated the same share of available CPU resources. When a new query becomes active, the CPU share will be recalculated, but queries with equal priority will still get the same amount of CPU.
For example, the administrator creates three resource queues: streaming, etl, and prod, and configures them accordingly with the following priorities:
* streaming — low priority
* etl—High priority
* prod — maximum priority
When there are queries 1 and 2 running at the same time in the etl queue in the database, they have an equal share of CPU because their priority settings are equal:
`
CREATE RESOURCE QUEUE etl WITH (ACTIVE_STATEMENTS=3);
`
This means that for all roles assigned to the etl resource queue, only three active queries can be run on this system at any given time. If this queue already has three queries running and a role submits the fourth query in the queue, the fourth query can only be run after a slot is released.
###
### Create a queue with memory limit
The resource queue with MEMORY\_LIMIT setting controls the total memory of all queries submitted through the queue. When used in conjunction with ACTIVE\_STATEMENTS, the default amount of memory allocated for each query is: MEMORY\_LIMIT /ACTIVE\_STATEMENTS.
For example, to create a resource queue with an active query limit of 10 and a total memory limit of 2000MB (each query will be allocated 200MB of segment host memory when it is executed):
`
CREATE RESOURCE QUEUE etl WITH (ACTIVE_STATEMENTS=10,
MEMORY_LIMIT='2000MB');
`
In addition, the gp\_vmem\_protect\_limit parameter will limit the total size of memory allocated on a segment. This parameter has a higher priority. If this parameter exceeds the limit, the query may be cancelled.
###
### Set priority
In order to control the consumption of available CPU resources by a resource queue, the user can assign an appropriate priority.
`
ALTER RESOURCE QUEUE etl WITH (PRIORITY=LOW);
ALTER RESOURCE QUEUE etl WITH (PRIORITY=MAX);
`
## 3.2 Assign roles (users) to resource queues
Once a resource queue is created, users must assign roles (users) to their appropriate resource queue. If the roles are not explicitly assigned to the resource queue, they will enter the default resource queue pg\_default.
Use the ALTER ROLE or CREATE ROLE command to assign roles to the resource queue. E.g:
`
ALTER ROLE name RESOURCE QUEUE queue_name;
CREATE ROLE name WITH LOGIN RESOURCE QUEUE queue_name;
`
### from resource queue
All users must be assigned to the resource queue. If it is not explicitly assigned to a specific queue, the user will enter the default resource queue pg\_default. If a user wants to remove a role from a resource queue and put them in the default queue, the queue assignment of the role can be changed to none. E.g:
`
ALTER ROLE role_name RESOURCE QUEUE none;
`
## 3.3 Modify resource queue
After the resource queue is created, the user can use the ALTER RESOURCE QUEUE command to change the queue limit, or use the DROP RESOURCE QUEUE command to remove a resource queue.
### Modify resource queue configuration
The ALTER RESOURCE QUEUE command changes the limit of the resource queue. To change the limit of a resource queue, you can specify the desired new value for the queue. E.g:
`
ALTER RESOURCE QUEUE etl WITH (ACTIVE_STATEMENTS=5);
ALTER RESOURCE QUEUE etl WITH (PRIORITY=MAX);
`
### Delete resource queue
The DROP RESOURCE QUEUE command can delete the resource queue. To delete a resource queue, the queue cannot have a role assigned to it, nor can there be any statements waiting in it.
`
DROP RESOURCE QUEUE etl;
`
# Four Resource Queue status monitoring
##
## 4.1 View the statements in the queue and the status of the resource queue
The gp\_toolkit.gp\_resqueue\_status view allows users to view the status and activity of a load management resource queue. For a specific resource queue, it shows how many queries are waiting to run and how many queries are currently active in the system. To view the resource queues created in the system, their restriction attributes and current status:
`
SELECT * FROM gp_toolkit.gp_resqueue_status;
`
`
SELECT * FROM pg_stat_resqueues;
`
## 4.3 View the roles assigned to the resource queue
To view the roles assigned to the resource queue, execute the following queries on the pg\_roles and gp\_toolkit.gp\_resqueue\_status system catalog tables:
`
SELECT rolname, rsqname FROM pg_roles, gp_toolkit.gp_resqueue_status
WHERE pg_roles.rolresqueue=gp_toolkit.gp_resqueue_status.queueid;
`
## 4.4 View the waiting query of the resource queue
Users can see all currently active and waiting queries of all resource queues:
`
SELECT * FROM gp_toolkit.gp_locks_on_resqueue WHERE
lorwaiting='true';
`
If this query does not return results, it means that there are currently no statements waiting in the resource queue.
## 4.5 View the priority of active sentences
View the statement currently being executed and provide priority, session ID and other information:
`
SELECT * FROM gp_toolkit.gp_resq_priority_statement;
`
## 4.6 Reset the priority of active sentences
The user can use the function gp\_adjust\_priority(session\_id, statement\_count, priority) to adjust the priority of the statement currently being executed. Using this function, users can increase or decrease the priority of any query. E.g:
`
SELECT gp_adjust_priority(12, 10000, 'LOW');
`
In the parameters of this function, session\_id represents the session id, statement\_count represents the sequence number of the SQL to be adjusted in the session, and priority is the priority to be adjusted. The above information of the existing statement can be queried through the gp\_resq\_priority\_statement view.
`
select * from gp_toolkit.gp_resq_priority_statement;
`
# Five Resource Queue implementation
The ADB PG database is an MPP architecture, which is divided into one or more Masters and multiple segments. Data can be distributed randomly, hashed, and replicated among multiple segments. In ADB PG, the resource limit level of Resource Queue is the statement level, that is, at any moment of the entire SQL execution, regardless of whether it is in a transaction, it will be restricted by the resource queue.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。