问题:昨天线上出现报错导致一个功能无法执行,查找日志发现是mysql的死锁问题。

分析问题:其实解决问题最大的难点在于分析问题找到出现问题出现在哪里,这个过程花费的时间和思考是最多的,而使用代码解决问题反而很快速。

(1)报错日志如下:

2021-10-09 18:59:04] local.INFO: RobotAuction-SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (SQL: select id, cash, freeze, area_id from users where users.id in (2634, 2662, 2672, 2673, 2675, 2685, 2808, 2811, 2818, 2834, 2869, 2886, 2926, 2961, 2962, 2981, 3066, 3080, 3124, 3131, 3135) and users.deleted_at is null for update)-669

从报错日志上去查看msyql的死锁日志,再结合业务上的可能的操作进行分析

(2)死锁日志:

2021-10-09 18:59:04 0x150d81d79700
* (1) TRANSACTION:
TRANSACTION 385220, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 11 lock struct(s), heap size 1136, 19 row lock(s)
MySQL thread id 1376113, OS thread handle 23147762767616, query id 23440050 localhost 127.0.0.1 lpt Sending data

select `id`, `cash`, `freeze`, `area_id` from `users` where `users`.`id` in (2634, 2662, 2672, 2673, 2675, 2685, 2808, 2811, 2818, 2834, 2869, 2886, 2926, 2961, 2962, 2981, 3066, 3080, 3124, 3131, 3135) and `users`.`deleted_at` is null
for update

(1) WAITING FOR THIS LOCK TO BE GRANTED: 【1】持有id = 2675的锁,等待id = 3124 的事务锁
RECORD LOCKS space id 191 page no 14 n bits 120 index PRIMARY of table lpt.users trx id 385220 lock_mode X locks rec but not gap waiting
Record lock, heap no 49 PHYSICAL RECORD: n_fields 52; compact format; info bits 0
** 省略部分

* (2) TRANSACTION:
TRANSACTION 385219, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
7 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 6
MySQL thread id 1376112, OS thread handle 23147757147904, query id 23440106 localhost 127.0.0.1 lpt statistics

select `id`, `cash` from `users` where `users`.`id` = 2675 and `users`.`deleted_at` is null limit 1 for update

(2) HOLDS THE LOCK(S): 【2】持有id=3124的锁
RECORD LOCKS space id 191 page no 14 n bits 120 index PRIMARY of table lpt.users trx id 385219 lock_mode X locks rec but not gap
Record lock, heap no 49 PHYSICAL RECORD: n_fields 52; compact format; info bits 0
** 省略部分

(2) WAITING FOR THIS LOCK TO BE GRANTED: 【3】等待 id = 2675的数据释放锁
RECORD LOCKS space id 191 page no 6 n bits 120 index PRIMARY of table lpt.users trx id 385219 lock_mode X locks rec but not gap waiting
Record lock, heap no 32 PHYSICAL RECORD: n_fields 52; compact format; info bits 0
** 省略部分

* WE ROLL BACK TRANSACTION (1) 【4】选择开销小的进行回滚,选择了事务(1),对照前面的任务未执行,回滚了

问题所在
1、使用的锁都是排他锁
2、这典型的加锁顺序不同造成的死锁
3、分析过程【1】【2】【3】【4】

解析办法
1、通过分析业务,调整业务上的顺序来解决 ✔
2、通过表锁来解决,却会牺牲性能并且不能使用索引 ×


sengerlion
55 声望401 粉丝

了解自己到细胞粒度。