本文为墨天轮数据库管理服务团队第96期技术分享,内容原创,作者为技术顾问陈昱,如需转载请联系小墨(VX:modb666)并注明来源。如需查看更多文章可关注【墨天轮】公众号。
适用范围
DB: ALL VERSION
OS: ALL PLATFORM
问题概述
数据库在7号9:00 左右开始卡顿,采集了对应时间段的AWR报告如下:
–node1
–node2
–node1
–node2
有3个EVENT需要注意,分别是
1)其中enq: TX - index contention主要由 apd5vv681f54w 引起。
SQL语句:INSERT INTO A_LM_*****_HOUR ("DATA_DATE", "METER_ID", "ASSET_NO", "MGT_ORG_CODE", "MGT_ORG_NAME", "CUST_ID", "CUST_NO", "CUST_NAME", "CONS_TYPE", "IS_SPECIAL", "UI_ABNOR_TYPE", "ABNOR_DESC_RATE", "ABNOR_DESC_DATA", "ABNOR_DATE", "CREATE_TIME", "IMPO_CONS_TYPE", "IMPO_CONS_SORT", "TMNL_ONLINE_FLAG", "THRES_HOLD") VALUES (:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 , :16 , :17 , :18 , :19 )查询索引分裂对应的索引
select instance_number,current_obj#,count(*) cnt
from ashtmp
where to_char(sample_time,'yyyy-mm-dd hh24') between '2025-06-04 09' and '2025-06-04 11'
and sql_id = 'apd5vv681f54w'
group by instance_number,current_obj#
order by instance_number,cnt;
11-12236
2104979
31313244769254
413132448144496可以看见 3132447 和 3132448 最严重,下面查询2个索引对象
select owner,object_name,object_type from dba_objects where object_id in (3132447,3132448);建议:修改_assm_high_gsp_threshold 和 _assm_low_gsp_threshold 参数,功能:
- 控制在索引分裂过程中,搜索可重用的空闲块(free block) 时的最大拒绝次数阈值。
- 当空间管理模块(ASSM)在寻找空闲块过程中累计的块拒绝次数达到该阈值时,会跳过继续搜索,直接扩展新的数据块,从而减少因反复扫描高使用率块带来的开销。
alter system set event='43822 trace name context forever,level 1','450502 trace name context forever, level 1',''450503 trace name context forever, level 1' scope=spfile sid='*';
ALTER SYSTEM SET "_assm_low_gsp_threshold"=512 SCOPE=BOTH;
ALTER SYSTEM SET "_assm_high_gsp_threshold"=512 SCOPE=BOTH;
2)其中enq: HW - contention主要由 8dyk8nmwd39aw引起。
SQL语句为:INSERT INTO A_COLL_*****_DAY (MGT_ORG_CODE, TMNL_ASSET_NO, METER_POINT_CNT, POINT_CNT, METER_U_POINT_CNT, U_POINT_CNT, METER_I_POINT_CNT, I_POINT_CNT, METER_P_POINT_CNT, P_POINT_CNT, METER_PF_POINT_CNT, PF_POINT_CNT, METER_PAR_POINT_CNT, PAR_POINT_CNT, METER_I0_POINT_CNT, I0_POINT_CNT, METER_CNT, METER_SUCC_CNT, TMNL_INTE_RATE, U_INST_RATE, I_INST_RATE, P_INST_RATE, PF_INST_RATE, PAR_INST_RATE, I0_INST_RATE, PO_POINT_CNT, IOT_POINT_NO, IOT_POINT_NAME, PRAP_IR_DAY, STAT_DATE) VALUES(:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 , :16 , :17 , :18 , :19 , :20 , :21 , :22 , :23 , :24 , :25 , :26 , :27 , :28 , :29 , :30 )建议:
1) 将A_COLL_TMNL_ALL_INTE_DAY表的next extent 增大,默认是1M,可以修改为10M,减小大量insert带来的高水位争用。
2) 手工对改表进行扩容
3)enq: TX – contention等待链条如下。
with ash as
(select /*+ materialize*/ *
from ashtmp t
where to_char(sample_time,'yyyy-mm-dd hh24:mi') = '2025-06-04 08:35'
),
chains as
(select instance_number inst_id,
blocking_session blocking_sid,
blocking_session_serial# blocking_serial,
session_id,
session_serial# session_serial,
level lvl,
sys_connect_by_path(instance_number || ' ' || session_id || ',' ||
session_serial# || ' ' || sql_id || ' ' ||
event,
' <-by ') path,
connect_by_isleaf isleaf
from ashtmp
start with event in ('enq: TX - contention')
connect by nocycle(prior blocking_session = session_id
and prior blocking_session_serial# = session_serial#
and prior sample_id = sample_id))
select inst_id,
blocking_sid,
blocking_serial,
lpad(round(ratio_to_report(count(*)) over() * 100) || '%', 5, ' ') "%This",
count(*) ash_time,
path
from chains
where isleaf = 1
group by inst_id, blocking_sid, blocking_serial, path
order by inst_id, ash_time desc;可以发现enq: TX – contention 基本都被row cache lock 阻塞,对 row cache lock 进行分析发现
select inst_id,to_char(min(sample_time),'yyyy-mm-dd hh24:mi:ss') mintim
,to_char(max(sample_time),'yyyy-mm-dd hh24:mi:ss') maxtim
,event,p1,count(*) cnt
from ashtmp
where to_char(sample_time,'yyyy-mm-dd hh24:mi') between '2025-06-04 09' and '2025-06-04 11'
and event in ('row cache lock')
group by inst_id,event,p1
order by inst_id,mintim,cnt;DC_SEGMENTS:该缓冲队列锁会在段分配的时候发生,观察持有这个队列锁的会话在做什么。This is likely to be down to segment allocation. Identify what the session holding the enqueue is doing and use errorstacks to diagnose.
DC_TABLESPACES:该缓冲队列锁会在申请新的extent时产生,当next extent设置的较小,同时申请的较为频繁时产生。
select instance_number,user_id,count(*) cnt
from ashtmp
where to_char(sample_time,'yyyy-mm-dd hh24:mi') between '2025-06-04 09' and '2025-06-04 11'
and event in ('row cache lock')
group by instance_number,user_id
order by instance_number,cnt;
1113
09
11717733
10319
1171238
117 就是 SG_MIS用户对应的 quota tablespace 并没有做限制。
with ash as
(select /*+ materialize*/ *
from ashtmp t
where to_char(sample_time,'yyyy-mm-dd hh24:mi') = '2025-06-04 08:35'
),
chains as
(select instance_number instance_number,
blocking_session blocking_sid,
blocking_session_serial# blocking_serial,
session_id,
session_serial# session_serial,
level lvl,
sys_connect_by_path(instance_number || ' ' || session_id || ',' ||
session_serial# || ' ' || sql_id || ' ' ||
event,
' <-by ') path,
connect_by_isleaf isleaf
from ashtmp
start with event in ('row cache lock')
connect by nocycle(prior blocking_session = session_id
and prior blocking_session_serial# = session_serial#
and prior sample_id = sample_id))
select instance_number,
blocking_sid,
blocking_serial,
lpad(round(ratio_to_report(count(*)) over() * 100) || '%', 5, ' ') "%This",
count(*) ash_time,
path
from chains
where isleaf = 1
group by instance_number, blocking_sid, blocking_serial, path
order by instance_number, ash_time desc;以 1 4509,50180 为例进行分析
select to_char(sample_time,'yyyy-mm-dd hh24:mi:ss') tim,session_id,session_serial#,sql_id,event,p1,session_state,blocking_session,
to_char(sql_exec_start,'yyyy-mm-dd hh24:mi:ss') sql_exec_start,sql_exec_id
from ashtmp t
where to_char(sample_time,'yyyy-mm-dd hh24:mi') between '2025-06-04 08:30' and '2025-06-04 08:45'
and instance_number = 1
and session_id = 4509
and session_serial#=50180
order by tim;可以发现P1也是5,且SQL一直没有执行完成。
所以,本质上 enq: TX – contention 被 row cache lock 阻塞,raw cache lock 又因为 enq: HW – contention 引起,查询HW等待最严重的SQL
select sql_id,count(*) cnt
from ashtmp t
where to_char(sample_time,'yyyy-mm-dd hh24:mi') between '2025-06-04 08' and '2025-06-04 11'
and event = 'enq: HW - contention'
group by sql_id
order by cnt;
159jx8w4gh1d1q 4
33v43zsn3h19n2 9
4 apd5vv681f54w 19
5 cz87u7n30qvh8 38
68tv7bhm1dvzvv 46
7 chua70mgbmx5j 57
864vb4f762gnaw 131
90w5addar5sg1j 206
10796yx0b9w53cm 520
11 dctqsa2bc5zru 863
12 c5hayahuzby96 1469
135a7451c2ku1j4 2570
143fa019sbh4653 3455
15079s8t9k1jzmu 4015
16 cufvk33jrzrpk 5310
178dyk8nmwd39aw 129508最严重的SQL是 8dyk8nmwd39aw,建议参考2), 同时对于1000以上的也可以适当增加next extent.
问题原因
- 大量插入造成的索引分裂
2)大量插入造成的高水位争用 - 由于enq: HW contention 导致raw cache lock 从而导致 enq: TX contetion,还是由于ENQ: HW引起
解决方案
1) 索引分裂
alter system set event=‘43822 trace name context forever,level 1’,‘450502 trace name context forever, level 1’,‘‘450503 trace name context forever, level 1’ scope=spfile sid=’*';
ALTER SYSTEM SET “\_assm\_low\_gsp\_threshold”=512 SCOPE=BOTH;
ALTER SYSTEM SET "assm\_high\_gsp\_threshold"=512 SCOPE=BOTH;
2) enq: HW
将A\_COLL\_TMNL*****\_DAY表的next extent 增大,默认是1M,可以修改为10M,减小大量insert带来的高水位争用。手工对改表进行扩容。
墨天轮从乐知乐享的数据库技术社区蓄势出发,全面升级,提供多类型数据库管理服务。墨天轮数据库管理服务旨在为用户构建信赖可托付的数据库环境,并为数据库厂商提供中立的生态支持。
墨天轮数据库服务官网:https://www.modb.pro/service
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用。你还可以使用@来通知其他用户。