Introduction to This article will introduce how to perform performance testing based on the TPCH data set in Hologres, and provide a reference for the test results to facilitate your product specification selection.
Background Information
TPC-H (Business Intelligence Computing Test) is a test set developed by the Transaction Processing Performance Council (TPC) to simulate decision support applications. At present, it is widely used in academia and industry to evaluate the performance of decision support technology applications. TPC-H is modeled according to the real production operating environment, simulating a data warehouse of a sales system. It contains a total of 8 tables, and the amount of data can be set from 1G to 3T. The benchmark test contains a total of 22 queries, and the main evaluation indicators are the response time of each query, that is, the time from submitting the query to the return of the result. The test results can comprehensively reflect the system's ability to process queries. For details, refer to TPCH document .
Data set introduction
The data set contains the following 8 tables, and the relationship between them is shown in the figure below.
Test details
Test data volume description
The amount of test data will directly affect the test results. The TPC-H generation tool uses SF (scale factor) to control the size of the data volume of the generated data. 1 SF corresponds to 1 GB.
Note: The amount of data mentioned above is only original data of , excluding the space occupied by indexes, so more space needs to be reserved when preparing the environment.
test environment
This test uses an instance of an exclusive instance (pay-as-you-go). Since it is only used for testing purposes, the 8-core 32G is selected for the computing resource configuration.
testing scenarios
This test scenario mainly consists of 3 parts:
- OLAP query scenario test, mainly using column storage table, directly use 22 queries in the TPCH test;
- Key/Value check scenario test, mainly use row storage table, after using row storage table for orders, perform primary key filtering check;
Basic environment preparation
- This step is mainly used to prepare the data required for OLAP query scenarios and Key/Value point-check scenarios;
Basic environment preparation
1. Create an ECS instance
Log in to Alibaba Cloud and create an ECS instance for data generation, data import to Hologres, and client testing. Suggested specifications:
- ecs.g6.4xlarge specifications
- CentOS 7.9 system
- ESSD data disk, the specific data capacity is determined by the amount of data to be tested
- It is recommended that ECS and Hologres instances use the same Region and VPC network
2. Create Hologres instance
- Log in to Alibaba Cloud, enter the Hologres product console, click add engine instance
- Select the configuration and fill in the instance name. For details, please refer to official document .
3. Create a test database
- After creating the instance, you need to log in to the Hologres instance you created and create a database. In this test, the database is named
tpch_1sf
. For detailed operation steps, please refer to official document
Generate TPC-H data
1. Prepare data generation tools
- Remote link to ECS instance
- Update all libraries
yum update
- Install git
yum install git
- Install gcc
yum install gcc
- Download TPC-H data generation code
git clone https://github.com/gregrahn/tpch-kit.git
- Enter the data generation tool code directory
cd tpch-kit/dbgen
- Compile the data generation tool code
make
2. Generate data
- After the compilation is successful, you can use the following code to view the relevant parameters of the code generation tool.
./dbgen --help
- This test only generates 1 GB of data, so run the following code to generate data.
./dbgen -vf -s 1
If you need to generate more data volume, you can adjust the parameters of SF. For example, you can use the following code to generate 1 T data
./dbgen -vf -s 1000
- Under normal circumstances, 32CU can run TPCH SF10, 256CU can run TPCH SF50
- After the data is generated, you can use the following code to view the generated file. You can see that the generation tool has generated 8 data files, and each data file corresponds to a table in the data set.
ls | grep '.*.tbl'
OLAP query scenario test
Prepare the data
1. Create a table
- Since this article mainly uses psql for data import operations, you need to run the following commands in ECS to install psql
yum install postgresql-server
- After installing psql, you can use the following command to log in to the Hologres instance
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database>
- After using psql to connect to Hologres, you can use the following table creation statement to create a database table
DROP TABLE IF EXISTS LINEITEM;
BEGIN;
CREATE TABLE LINEITEM
(
L_ORDERKEY INT NOT NULL,
L_PARTKEY INT NOT NULL,
L_SUPPKEY INT NOT NULL,
L_LINENUMBER INT NOT NULL,
L_QUANTITY DECIMAL(15,2) NOT NULL,
L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
L_DISCOUNT DECIMAL(15,2) NOT NULL,
L_TAX DECIMAL(15,2) NOT NULL,
L_RETURNFLAG TEXT NOT NULL,
L_LINESTATUS TEXT NOT NULL,
L_SHIPDATE TIMESTAMPTZ NOT NULL,
L_COMMITDATE TIMESTAMPTZ NOT NULL,
L_RECEIPTDATE TIMESTAMPTZ NOT NULL,
L_SHIPINSTRUCT TEXT NOT NULL,
L_SHIPMODE TEXT NOT NULL,
L_COMMENT TEXT NOT NULL,
PRIMARY KEY (L_ORDERKEY,L_LINENUMBER)
);
CALL set_table_property('LINEITEM', 'clustering_key', 'L_SHIPDATE,L_ORDERKEY');
CALL set_table_property('LINEITEM', 'segment_key', 'L_SHIPDATE');
CALL set_table_property('LINEITEM', 'distribution_key', 'L_ORDERKEY');
CALL set_table_property('LINEITEM', 'bitmap_columns', 'L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
CALL set_table_property('LINEITEM', 'dictionary_encoding_columns', 'L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
CALL set_table_property('LINEITEM', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS ORDERS;
BEGIN;
CREATE TABLE ORDERS
(
O_ORDERKEY INT NOT NULL PRIMARY KEY,
O_CUSTKEY INT NOT NULL,
O_ORDERSTATUS TEXT NOT NULL,
O_TOTALPRICE DECIMAL(15,2) NOT NULL,
O_ORDERDATE timestamptz NOT NULL,
O_ORDERPRIORITY TEXT NOT NULL,
O_CLERK TEXT NOT NULL,
O_SHIPPRIORITY INT NOT NULL,
O_COMMENT TEXT NOT NULL
);
CALL set_table_property('ORDERS', 'segment_key', 'O_ORDERDATE');
CALL set_table_property('ORDERS', 'distribution_key', 'O_ORDERKEY');
CALL set_table_property('ORDERS', 'colocate_with', 'LINEITEM');
CALL set_table_property('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');
CALL set_table_property('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');
CALL set_table_property('ORDERS', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS PARTSUPP;
BEGIN;
CREATE TABLE PARTSUPP
(
PS_PARTKEY INT NOT NULL,
PS_SUPPKEY INT NOT NULL,
PS_AVAILQTY INT NOT NULL,
PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,
PS_COMMENT TEXT NOT NULL,
PRIMARY KEY(PS_PARTKEY,PS_SUPPKEY)
);
CALL set_table_property('PARTSUPP', 'distribution_key', 'PS_PARTKEY');
CALL set_table_property('PARTSUPP', 'colocate_with', 'LINEITEM');
CALL set_table_property('PARTSUPP', 'bitmap_columns', 'PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY,PS_COMMENT');
CALL set_table_property('PARTSUPP', 'dictionary_encoding_columns', 'PS_COMMENT');
CALL set_table_property('PARTSUPP', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS PART;
BEGIN;
CREATE TABLE PART
(
P_PARTKEY INT NOT NULL PRIMARY KEY,
P_NAME TEXT NOT NULL,
P_MFGR TEXT NOT NULL,
P_BRAND TEXT NOT NULL,
P_TYPE TEXT NOT NULL,
P_SIZE INT NOT NULL,
P_CONTAINER TEXT NOT NULL,
P_RETAILPRICE DECIMAL(15,2) NOT NULL,
P_COMMENT TEXT NOT NULL
);
CALL set_table_property('PART', 'distribution_key', 'P_PARTKEY');
CALL set_table_property('PART', 'colocate_with', 'LINEITEM');
CALL set_table_property('PART', 'bitmap_columns', 'P_PARTKEY,P_SIZE,P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');
CALL set_table_property('PART', 'dictionary_encoding_columns', 'P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');
CALL set_table_property('PART', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS CUSTOMER;
BEGIN;
CREATE TABLE CUSTOMER
(
C_CUSTKEY INT NOT NULL PRIMARY KEY,
C_NAME TEXT NOT NULL,
C_ADDRESS TEXT NOT NULL,
C_NATIONKEY INT NOT NULL,
C_PHONE TEXT NOT NULL,
C_ACCTBAL DECIMAL(15,2) NOT NULL,
C_MKTSEGMENT TEXT NOT NULL,
C_COMMENT TEXT NOT NULL
);
CALL set_table_property('CUSTOMER', 'distribution_key', 'C_CUSTKEY');
CALL set_table_property('CUSTOMER', 'colocate_with', 'LINEITEM');
CALL set_table_property('CUSTOMER', 'bitmap_columns', 'C_CUSTKEY,C_NATIONKEY,C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');
CALL set_table_property('CUSTOMER', 'dictionary_encoding_columns', 'C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');
CALL set_table_property('CUSTOMER', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS SUPPLIER;
BEGIN;
CREATE TABLE SUPPLIER
(
S_SUPPKEY INT NOT NULL PRIMARY KEY,
S_NAME TEXT NOT NULL,
S_ADDRESS TEXT NOT NULL,
S_NATIONKEY INT NOT NULL,
S_PHONE TEXT NOT NULL,
S_ACCTBAL DECIMAL(15,2) NOT NULL,
S_COMMENT TEXT NOT NULL
);
CALL set_table_property('SUPPLIER', 'distribution_key', 'S_SUPPKEY');
CALL set_table_property('SUPPLIER', 'colocate_with', 'LINEITEM');
CALL set_table_property('SUPPLIER', 'bitmap_columns', 'S_SUPPKEY,S_NAME,S_ADDRESS,S_NATIONKEY,S_PHONE,S_COMMENT');
CALL set_table_property('SUPPLIER', 'dictionary_encoding_columns', 'S_NAME,S_ADDRESS,S_PHONE,S_COMMENT');
CALL set_table_property('SUPPLIER', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS NATION;
BEGIN;
CREATE TABLE NATION(
N_NATIONKEY INT NOT NULL PRIMARY KEY,
N_NAME text NOT NULL,
N_REGIONKEY INT NOT NULL,
N_COMMENT text NOT NULL
);
CALL set_table_property('NATION', 'distribution_key', 'N_NATIONKEY');
CALL set_table_property('NATION', 'colocate_with', 'LINEITEM');
CALL set_table_property('NATION', 'bitmap_columns', 'N_NATIONKEY,N_NAME,N_REGIONKEY,N_COMMENT');
CALL set_table_property('NATION', 'dictionary_encoding_columns', 'N_NAME,N_COMMENT');
CALL set_table_property('NATION', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS REGION;
BEGIN;
CREATE TABLE REGION
(
R_REGIONKEY INT NOT NULL PRIMARY KEY,
R_NAME TEXT NOT NULL,
R_COMMENT TEXT
);
CALL set_table_property('REGION', 'distribution_key', 'R_REGIONKEY');
CALL set_table_property('REGION', 'colocate_with', 'LINEITEM');
CALL set_table_property('REGION', 'bitmap_columns', 'R_REGIONKEY,R_NAME,R_COMMENT');
CALL set_table_property('REGION', 'dictionary_encoding_columns', 'R_NAME,R_COMMENT');
CALL set_table_property('REGION', 'time_to_live_in_seconds', '31536000');
COMMIT;
- Once created, you can in psql in use the following code to see if successfully created
tpch_1sf=# \dt
- If successful, the actual effect is as follows
tpch_1sf=# \dt
List of relations
Schema | Name | Type | Owner
--------+----------+-------+--------------------
public | customer | table | tpch_1sf_developer
public | lineitem | table | tpch_1sf_developer
public | nation | table | tpch_1sf_developer
public | orders | table | tpch_1sf_developer
public | part | table | tpch_1sf_developer
public | partsupp | table | tpch_1sf_developer
public | region | table | tpch_1sf_developer
public | supplier | table | tpch_1sf_developer
(8 rows)
2. Import data
- This test program mainly uses
COPY FROM STDIN
to import data. For details, please refer to official document . Here, the previously generated tbl data file will be imported into the table created in Hologres. shell
script to import data in the catalog of the data generation tool
for i in `ls *.tbl`; do
echo $i;
name=`echo $i| cut -d'.' -f1`;
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY $name from stdin with delimiter '|' csv;" < $i;
done
- So far you have completed the data import
3. Collect statistics
- In order to better execute the query, you can use the following statement in psql to make Hologres collect the characteristic information of each table.
vacuum region;
vacuum nation;
vacuum supplier;
vacuum customer;
vacuum part;
vacuum partsupp;
vacuum orders;
vacuum lineitem;
analyze nation;
analyze region;
analyze lineitem;
analyze orders;
analyze customer;
analyze part;
analyze partsupp;
analyze supplier;
Execute query
- In order to facilitate statistical query information, you need to use the pgbench tool. You can use the following command to install pgbench (if there is already pgbench on the test machine, please make sure the version is greater than 9.6, preferably the major version is 13 or more, otherwise the following tests will encounter various problems compatible)
yum install postgresql-contrib
- In order to facilitate the query, you can download the required 22 SQL directly through the following connection
- Then upload to ECS
- Enter ECS, and enter the upload file directory, use the following shell command to decompress the file
unzip tpch_data_tpch_query
- At this point, you have completed the preparations, you can use pgbench to test, you can use the following command to execute a single query
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f xxx.sql
- Parameter explanation
Configuration item | parameter | Description |
---|---|---|
-h | The endpoint of the Hologres instance | View in Hologres control console |
-p | Port address of the Hologres instance | View in Hologres control console |
-d | Hologres specifies the database name in the instance | |
-c | Number of clients (concurrency) | Example: 1. Since this test only tests query performance and does not test concurrency, set the concurrency to 1 |
-t | The number of pressure test queries that each client needs to execute | 50 |
-f | Stress-tested sql | Example: 6.sql |
- You can also directly execute the following
shell
script, directly execute 22 queries in batches, and output the results to the file hologres\_tpch\_test.out
rm -f hologres_tpch_test.out
echo `date +"%Y-%m-%d %H:%M:%S"` begin >> ./hologres_tpch_test.out
for i in {1..22}
do
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f ./tpch_data_tpch_query/${i}.sql >> ./hologres_tpch_test.out
done
Check hologres\_tpch\_test.out to get the query result, the sample is as follows
- transaction type: describes the specific SQL file executed
- latency average: Records the average time of 3 queries corresponding to the SQL file
2021-03-23 03:50:54 begin
pghost: hgpostcn-cn-oew21c935002-cn-hangzhou.hologres.aliyuncs.com pgport: 80 nclients: 1 nxacts: 3 dbName: tpch_100
transaction type: ./tpch_data_tpch_query/1.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 3
number of transactions actually processed: 3/3
latency average = 76.936 ms
tps = 12.997850 (including connections establishing)
tps = 15.972757 (excluding connections establishing)
...
TPCH 22 query statements
Q1
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '90' day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
Q2
select
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
from
part,
supplier,
partsupp,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 15
and p_type like '%BRASS'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'EUROPE'
and ps_supplycost = (
select
min(ps_supplycost)
from
partsupp,
supplier,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'EUROPE'
)
order by
s_acctbal desc,
n_name,
s_name,
p_partkey
limit 100;
Q3
select
l_orderkey,
sum(l_extendedprice * (1 - l_discount)) as revenue,
o_orderdate,
o_shippriority
from
customer,
orders,
lineitem
where
c_mktsegment = 'BUILDING'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < date '1995-03-15'
and l_shipdate > date '1995-03-15'
group by
l_orderkey,
o_orderdate,
o_shippriority
order by
revenue desc,
o_orderdate
limit 10;
Q4
select
o_orderpriority,
count(*) as order_count
from
orders
where
o_orderdate >= date '1993-07-01'
and o_orderdate < date '1993-07-01' + interval '3' month
and exists (
select
*
from
lineitem
where
l_orderkey = o_orderkey
and l_commitdate < l_receiptdate
)
group by
o_orderpriority
order by
o_orderpriority;
Q5
select
n_name,
sum(l_extendedprice * (1 - l_discount)) as revenue
from
customer,
orders,
lineitem,
supplier,
nation,
region
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and l_suppkey = s_suppkey
and c_nationkey = s_nationkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'ASIA'
and o_orderdate >= date '1994-01-01'
and o_orderdate < date '1994-01-01' + interval '1' year
group by
n_name
order by
revenue desc;
Q6
select
sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1994-01-01'
and l_shipdate < date '1994-01-01' + interval '1' year
and l_discount between 6 - 1 and 6 + 1
and l_quantity < 2400
Q7
set hg_experimental_enable_double_equivalent=on;
select
supp_nation,
cust_nation,
l_year,
sum(volume) as revenue
from
(
select
n1.n_name as supp_nation,
n2.n_name as cust_nation,
extract(year from l_shipdate) as l_year,
l_extendedprice * (1 - l_discount) as volume
from
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2
where
s_suppkey = l_suppkey
and o_orderkey = l_orderkey
and c_custkey = o_custkey
and s_nationkey = n1.n_nationkey
and c_nationkey = n2.n_nationkey
and (
(n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
)
and l_shipdate between date '1995-01-01' and date '1996-12-31'
) as shipping
group by
supp_nation,
cust_nation,
l_year
order by
supp_nation,
cust_nation,
l_year;
Q8
set hg_experimental_enable_double_equivalent=on;
select
o_year,
sum(case
when nation = 'BRAZIL' then volume
else 0
end) / sum(volume) as mkt_share
from
(
select
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) as volume,
n2.n_name as nation
from
part,
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2,
region
where
p_partkey = l_partkey
and s_suppkey = l_suppkey
and l_orderkey = o_orderkey
and o_custkey = c_custkey
and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey
and r_name = 'AMERICA'
and s_nationkey = n2.n_nationkey
and o_orderdate between date '1995-01-01' and date '1996-12-31'
and p_type = 'STANDARD POLISHED TIN'
) as all_nations
group by
o_year
order by
o_year;
Q9
set hg_experimental_enable_double_equivalent=on;
select
nation,
o_year,
sum(amount) as sum_profit
from
(
select
n_name as nation,
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
part,
supplier,
lineitem,
partsupp,
orders,
nation
where
s_suppkey = l_suppkey
and ps_suppkey = l_suppkey
and ps_partkey = l_partkey
and p_partkey = l_partkey
and o_orderkey = l_orderkey
and s_nationkey = n_nationkey
and p_name like '%green%'
) as profit
group by
nation,
o_year
order by
nation,
o_year desc;
Q10
select
c_custkey,
c_name,
sum(l_extendedprice * (1 - l_discount)) as revenue,
c_acctbal,
n_name,
c_address,
c_phone,
c_comment
from
customer,
orders,
lineitem,
nation
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate >= date '1993-10-01'
and o_orderdate < date '1993-10-01' + interval '3' month
and l_returnflag = 'R'
and c_nationkey = n_nationkey
group by
c_custkey,
c_name,
c_acctbal,
c_phone,
n_name,
c_address,
c_comment
order by
revenue desc
limit 20;
Q11
select
ps_partkey,
sum(ps_supplycost * ps_availqty) as value
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'GERMANY'
group by
ps_partkey having
sum(ps_supplycost * ps_availqty) > (
select
sum(ps_supplycost * ps_availqty) * 0.0000010000
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'GERMANY'
)
order by
value desc
limit 100;
Q12
select
l_shipmode,
sum(case
when o_orderpriority = '1-URGENT'
or o_orderpriority = '2-HIGH'
then 1
else 0
end) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT'
and o_orderpriority <> '2-HIGH'
then 1
else 0
end) as low_line_count
from
orders,
lineitem
where
o_orderkey = l_orderkey
and l_shipmode in ('MAIL', 'SHIP')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= date '1994-01-01'
and l_receiptdate < date '1994-01-01' + interval '1' year
group by
l_shipmode
order by
l_shipmode;
Q13
select
c_count,
count(*) as custdist
from
(
select
c_custkey,
count(o_orderkey)
from
customer left outer join orders on
c_custkey = o_custkey
and o_comment not like '%special%requests%'
group by
c_custkey
) as c_orders (c_custkey, c_count)
group by
c_count
order by
custdist desc,
c_count desc;
Q14
select
100.00 * sum(case
when p_type like 'PROMO%'
then l_extendedprice * (1 - l_discount)
else 0
end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
lineitem,
part
where
l_partkey = p_partkey
and l_shipdate >= date '1995-09-01'
and l_shipdate < date '1995-09-01' + interval '1' month;
Q15
with revenue0(SUPPLIER_NO, TOTAL_REVENUE) as
(
select
l_suppkey,
sum(l_extendedprice * (1 - l_discount))
from
lineitem
where
l_shipdate >= date '1995-12-01'
and l_shipdate < date '1995-12-01' + interval '3' month
group by
l_suppkey
)
select
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
from
supplier,
revenue0
where
s_suppkey = supplier_no
and total_revenue = (
select
max(total_revenue)
from
revenue0
)
order by
s_suppkey;
Q16
select
p_brand,
p_type,
p_size,
count(distinct ps_suppkey) as supplier_cnt
from
partsupp,
part
where
p_partkey = ps_partkey
and p_brand <> 'Brand#45'
and p_type not like 'MEDIUM POLISHED%'
and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
and ps_suppkey not in (
select
s_suppkey
from
supplier
where
s_comment like '%Customer%Complaints%'
)
group by
p_brand,
p_type,
p_size
order by
supplier_cnt desc,
p_brand,
p_type,
p_size;
Q17
select
sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem,
part
where
p_partkey = l_partkey
and p_brand = 'Brand#23'
and p_container = 'MED BOX'
and l_quantity < (
select
0.2 * avg(l_quantity)
from
lineitem
where
l_partkey = p_partkey
and l_partkey in(
select p_partkey
from part
where p_brand = 'Brand#23' and p_container = 'MED BOX')
);
Q18
select
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice,
sum(l_quantity)
from
customer,
orders,
lineitem
where
o_orderkey in (
select
l_orderkey
from
lineitem
group by
l_orderkey having
sum(l_quantity) > 300
)
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice
order by
o_totalprice desc,
o_orderdate
limit 100;
Q19
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#12'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity >= 1 and l_quantity <= 1 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#23'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity >= 10 and l_quantity <= 10 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#34'
and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
and l_quantity >= 20 and l_quantity <= 20 + 10
and p_size between 1 and 15
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
);
Q20
select
s_name,
s_address
from
supplier,
nation
where
s_suppkey in (
select
ps_suppkey
from
partsupp
where
ps_partkey in (
select
p_partkey
from
part
where
p_name like 'forest%'
)
and ps_availqty > (
select
0.5 * sum(l_quantity)
from
lineitem
where
l_partkey = ps_partkey
and l_suppkey = ps_suppkey
and l_shipdate >= date '1994-01-01'
and l_shipdate < date '1994-01-01' + interval '1' year
)
)
and s_nationkey = n_nationkey
and n_name = 'CANADA'
order by
s_name;
Q21
select
s_name,
count(*) as numwait
from
supplier,
lineitem l1,
orders,
nation
where
s_suppkey = l1.l_suppkey
and o_orderkey = l1.l_orderkey
and o_orderstatus = 'F'
and l1.l_receiptdate > l1.l_commitdate
and exists (
select
*
from
lineitem l2
where
l2.l_orderkey = l1.l_orderkey
and l2.l_suppkey <> l1.l_suppkey
)
and not exists (
select
*
from
lineitem l3
where
l3.l_orderkey = l1.l_orderkey
and l3.l_suppkey <> l1.l_suppkey
and l3.l_receiptdate > l3.l_commitdate
)
and s_nationkey = n_nationkey
and n_name = 'SAUDI ARABIA'
group by
s_name
order by
numwait desc,
s_name
limit 100;
Q22
select
cntrycode,
count(*) as numcust,
sum(c_acctbal) as totacctbal
from
(
select
substring(c_phone from 1 for 2) as cntrycode,
c_acctbal
from
customer
where
substring(c_phone from 1 for 2) in
('13', '31', '23', '29', '30', '18', '17')
and c_acctbal > (
select
avg(c_acctbal)
from
customer
where
c_acctbal > 0.00
and substring(c_phone from 1 for 2) in
('13', '31', '23', '29', '30', '18', '17')
)
and not exists (
select
*
from
orders
where
o_custkey = c_custkey
)
) as custsale
group by
cntrycode
order by
cntrycode;
Key/Value spot check scenario test
Prepare the data
1. Create a table
- Continue to use the OLAP query scenario to create the database. We will use the orders table in the TPCH data set for testing. After connecting to Hologres using psql, you can use the following table creation statement to create a database table;
Note: Inquiry scenarios need to use row storage tables, so you need to create a new table, you cannot use OLAP to query the tables used in the scenario
DROP TABLE IF EXISTS orders_row;
BEGIN;
CREATE TABLE public.orders_row (
"o_orderkey" int8 NOT NULL,
"o_custkey" int8,
"o_orderstatus" bpchar(1),
"o_totalprice" numeric(15,2),
"o_orderdate" date,
"o_orderpriority" bpchar(15),
"o_clerk" bpchar(15),
"o_shippriority" int8,
"o_comment" varchar(79),
PRIMARY KEY (o_orderkey)
);
CALL SET_TABLE_PROPERTY('public.orders_row', 'orientation', 'row');
CALL SET_TABLE_PROPERTY('public.orders_row', 'clustering_key', 'o_orderkey');
CALL SET_TABLE_PROPERTY('public.orders_row', 'time_to_live_in_seconds', '3153600000');
CALL SET_TABLE_PROPERTY('public.orders_row', 'distribution_key', 'o_orderkey');
COMMIT;
2. Import data by COPY
- This test program mainly uses
COPY FROM STDIN
to import data. For details, please refer to official document . Here, the previously generated tbl data file will be imported into the table created in Hologres. - You can import the data by referring to the following command in the catalog of the data generation tool
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY public.orders_row from stdin with delimiter '|' csv;" < orders.tbl
3. INSERT INTO way to import data
- Since you have imported the data from the orders table in the OLAP scenario, you can run the following SQL statement to import the data
INSERT INTO public.orders_row
SELECT *
FROM public.orders;
Inquire
1. Generate query statement
- The main query sentence characteristics of the Key/Value check scenario are as follows
SELECT column_a
,column_b
,...
,column_x
FROM table_x
WHERE pk = value_x
;
or
SELECT column_a
,column_b
,...
,column_x
FROM table_x
WHERE pk IN ( value_a, value_b,..., value_x )
;
You can use the following script to generate the required sql, the script will generate 2 sql
- kv\_query\_single.sql SQL for single value filtering
- kv\_query\_in.sql SQL for multi-value filtering, the script will randomly generate a SQL for filtering 10 values
rm -rf kv_query
mkdir kv_query
cd kv_query
echo '\set column_values random(1,99999999)
select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey =:column_values;' >> kv_query_single.sql
echo '\set column_values1 random(1,99999999)
\set column_values2 random(1,99999999)
\set column_values3 random(1,99999999)
\set column_values4 random(1,99999999)
\set column_values5 random(1,99999999)
\set column_values6 random(1,99999999)
\set column_values7 random(1,99999999)
\set column_values8 random(1,99999999)
\set column_values9 random(1,99999999)
\set column_values10 random(1,99999999)
select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey in(:column_values1,:column_values2,:column_values3,:column_values4,:column_values5,:column_values6,:column_values7,:column_values8,:column_values9,:column_values10);' >> kv_query_in.sql
2. Make an inquiry
- Query needs to use pgbench, you can use the following command to install pgbench
yum install postgresql-contrib
- After that, you can use pgbench to perform stress testing for single-value filtering scenarios
Note, please execute the following command in the directory where SQL is generated
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f kv_query_single.sql
- Scenarios for multi-value filtering
Note, please execute the following command in the directory where SQL is generated
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f kv_query_in.sql
- Parameter explanation
Configuration item | parameter | Description |
---|---|---|
-h | The endpoint of the Hologres instance | View in Hologres control console |
-p | Port address of the Hologres instance | View in Hologres control console |
-d | Hologres specifies the database name in the instance | |
-c | Number of clients (concurrency) | Example: 8 |
-t | The number of pressure test queries that each client needs to execute | 50 |
-f | Stress-tested sql | Example: 6.sql |
Test result reference
Test data volume:
- This test is based on the data set of TPCH 100G. The specific data volume is shown in the table below
Table Name | Rows |
---|---|
LINEITEM | 600,037,902 |
ORDERS | 150,000,000 |
PARTSUPP | 80,000,000 |
PART | 20,000,000 |
CUSTOMER | 15,000,000 |
SUPPLIER | 1,000,000 |
NATION | 25 |
REGION | 5 |
Cluster specifications
Computing resources | storage capacity | Software version | Remarks |
---|---|---|---|
64 CU | |||
(CPU: 64 Core memory: 256 GB) | 100 GB | r0.10.20 | Use the cluster default configuration, the number of shards: 40 |
128 CU | |||
(CPU: 128 Core memory: 512 GB) | 100 GB | r0.10.20 | Use the cluster default configuration, the number of shards: 80 |
Test time: June 2021
Test Results
Data import time
- The data import execution time is in seconds (s).
- Import time refers to import data into Hologres internal table
- When using the COPY method to import data, a table corresponds to a data file, and the concurrent import method is not used.
- The specific values are shown in the table below
Note: When using COPY to import, a table corresponds to a data file, and the concurrent import method is not used
Table Name | Rows | The amount of data | Hologres 64CU | ||
---|---|---|---|---|---|
Import using COPY (public network) | Import using COPY (VPC network import) | Import using MaxCompute external table | |||
LINEITEM | 600,037,902 | 73.6GB | 3,070.453 | 694.364 | 148.165 |
ORDERS | 150,000,000 | 16.4GB | 691.060 | 172.529 | 37.741 |
PARTSUPP | 80,000,000 | 2.3GB | 468.560 | 107.092 | 18.488 |
PART | 20,000,000 | 11.3GB | 96.342 | 24.020 | 8.083 |
CUSTOMER | 15,000,000 | 2.3GB | 95.190 | 22.937 | 10.363 |
SUPPLIER | 1,000,000 | 132MB | 5.057 | 1.803 | 1.503 |
NATION | 25 | 2KB | 0.580 | 0.584 | 0.747 |
REGION | 5 | 0.375KB | 0.168 | 0.153 | 0.430 |
Total | 106G | 4427.410 | 1023.482 | 225.52 |
- The blue in the figure below is the time to import data under public network conditions using COPY, the green is the time to import data under VPC network conditions using COPY, and the gray is the time to import data using MaxCompute externally.
- The lower the ordinate value, the faster the import speed
- Horizontal axis: table name. Vertical axis: data import time (s)
- It can be seen that due to the influence of network bandwidth, when using COPY to import local file data, the time of using the VPC network to import data is significantly shorter than the time of using the public network to import data; the time of using MaxCompute to import data is significantly shorter than the time of using COPY to import local file data .
Query time
- The query execution time is in seconds (s).
- The query results are based on the Hologres internal table
- The specific values are shown in the table below
TPCH Query number | Hologres 64CU | Hologres 128CU |
---|---|---|
1 | 3.120 | 2.150 |
2 | 0.581 | 0.467 |
3 | 1.735 | 1.005 |
4 | 1.558 | 0.836 |
5 | 2.921 | 1.917 |
6 | 0.297 | 0.096 |
7 | 2.006 | 1.029 |
8 | 2.674 | 1.679 |
9 | 5.298 | 2.796 |
10 | 1.944 | 0.924 |
11 | 0.397 | 0.297 |
12 | 1.531 | 0.852 |
13 | 1.741 | 0.971 |
14 | 0.286 | 0.160 |
15 | 0.293 | 0.177 |
16 | 1.223 | 1.020 |
17 | 1.405 | 0.607 |
18 | 3.817 | 2.169 |
19 | 1.400 | 0.622 |
20 | 1.358 | 0.868 |
21 | 4.164 | 2.047 |
22 | 1.121 | 0.654 |
Total | 40.870 | 23.343 |
- In the figure below, the blue color is the query result of the 64CU instance, and the green color is the query result of the 128CU instance.
- The lower the ordinate value, the better the performance of TPC-H.
- It can be seen that as the cost of the instance scale increases, the query time is also in a linear downward trend
- Horizontal axis: query number in document . Vertical axis: query execution time (s)
Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。