Abstract: For tables with frequent update and delete operations, there will be a lot of garbage data, resulting in a waste of disk space and additional IO overhead during query scans. Periodic cleaning operations (vacuum) are required to control row storage tables and indexes on the table. The expansion. This article will briefly introduce the principle of vacuum and the factors that affect the effect of vacuum.
This article is shared from the HUAWEI cloud community " Why does the table continue to expand after vacuum? ", the original author: Dawei Tianlong:-.
Introduction to vacuum:
For the row storage table in GaussDB, after the tuple is updated or the tuple is deleted, the old version of the data still exists, and only the deleted or updated transaction number (xmax) is marked in the tuple header information. For tables with frequent update and delete operations, there will be a lot of garbage data, resulting in waste of disk space and additional IO overhead during query scans. Periodic vacuum operations are required to control the expansion of row storage tables and indexes on the tables.
The internal principle of vacuum operation:
The main steps of Vacuum:
1. Remove dead tuples and perform frozen operation on old tuples that meet the conditions.
2. Remove the index tuple pointing to the dead tuple, and update the fsm and vm files of the corresponding table
- FSM: free space map Free space map file, when inserting data, the appropriate page will be selected according to the file.
- VM: visibility map visibility mapping file. During subsequent vacuuming, it will choose whether to scan a page according to the file to improve the efficiency of the vacuum; at the same time, the file will also be used during index-only-scan to improve the efficiency of visibility judgment) .
3. Update statistical data pg_stat_all_tables. The Linepointer will not be removed and will be reused later.
The schematic diagram is as follows:
vacuum before
after vacuum
Why does the table continue to expand after vacuuming
Factors affecting the vacuum effect
1. The promotion of Oldestxmin
Vacuum can only clean up the garbage data generated by the transaction before the oldest transaction (OldestXmin) that currently survives globally, so if there are still old transactions (such as the existence of long transactions or long SQL), the garbage data generated by the new transaction will not Will not be vacuumed immediately. E.g:
session 1: create a new table row_tbl and insert a piece of data, do not submit a transaction first
gaussdb=# create table row_tbl(a int, b int);
CREATE TABLE
gaussdb=# insert into row_tbl values(1,1);
INSERT 0 1
gaussdb=# begin;
BEGIN
gaussdb=# SELECT txid_current_snapshot();
txid_current_snapshot
-----------------------
210115:210115:
(1 row)
gaussdb=# SELECT txid_current();
txid_current
--------------
210115
(1 row)
Session 2: After vacuum cleaning operation, inserts the data again, and the data does not reuse the previous space; the query view shows that the junk data has not been cleaned up.
gaussdb=# select ctid,* from row_tbl;
ctid | a | b
-------+---+---
(0,1) | 1 | 1
(1 row)
gaussdb=# delete row_tbl;
DELETE 1
gaussdb=# SELECT txid_current_snapshot();
txid_current_snapshot
-----------------------
210115:210122:
(1 row)
gaussdb=# vacuum row_tbl;
VACUUM
gaussdb=# insert into row_tbl values(2,2);
INSERT 0 1
gaussdb=# select ctid,* from row_tbl;
ctid | a | b
-------+---+---
(0,2) | 2 | 2
(1 row)
gaussdb=# select n_dead_tup, last_vacuum from pg_stat_all_tables where relname='row_tbl';
n_dead_tup | last_vacuum
------------+-------------------------------
1 | 2021-06-10 20:04:58.987631+08
(1 row)
Session 1: ends the transaction of Session 1 and then executes the vacuum. The query view can find that there are no dead tuples. Inserting data can find that the old space is reused (that is, the ctid is (0,1) space).
gaussdb=# SELECT txid_current_snapshot();
txid_current_snapshot
-----------------------
210136:210136:
(1 row)
gaussdb=# vacuum row_tbl;
VACUUM
gaussdb=# select n_dead_tup, last_vacuum from pg_stat_all_tables where relname='row_tbl';
n_dead_tup | last_vacuum
------------+-------------------------------
0 | 2021-06-10 20:09:10.516564+08
(1 row)
gaussdb=# insert into row_tbl values(3,3);
INSERT 0 1
gaussdb=# select ctid,* from row_tbl;
ctid | a | b
-------+---+---
(0,1) | 3 | 3
(0,2) | 2 | 2
(2 rows)
2. The LinePointer state has not been unused
After the tuple is deleted, only when the vacuum sets the LinePointer (or item pointer, pointing to a specific tuple) of the tuple to the LP_UNUSED state, the LinePointer may be reused when new data is inserted.
3. Fsm has not been generated yet
When inserting data, rely on the fsm file to select the available page. If the fsm is not generated, it will cause the new page to be used instead of reusing the old one.
4. Batch import
In the old version of Gaussdb, when inserting data into tables in batches, a new page is directly applied to insert data. Therefore, in some scenarios, although dirty data is cleaned up after vacuuming, because the business scenario is dominated by batch insertion, the vacuum's control effect on expansion is not ideal. At present, it supports the reuse of space when inserting data in batches.
Some suggestions and conclusions
- Try to avoid long transactions. You can use the view pg_running_xacts to check whether there are old transactions that have not ended or two-phase transactions remain
- Do vacuum regularly to reclaim garbage space in time
- The size of an index that has been inflated can be reduced by reindex.
- Vacuum can clean up garbage data, but cannot return these spaces to the operating system. For tables that have been inflated, they can only be reduced by vacuum full.
For more information about GuassDB (DWS), welcome to search "GaussDB DWS" on WeChat and follow the WeChat official account to share with you the latest and most complete PB-level data warehouse black technology~
Click to follow, and get to know the fresh technology of Huawei Cloud for the first time~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。