因为 int32 溢出导致的 peewee get_or_create时出现 peewee.IntegrityError错误事故记录

详细看这里:https://github.com/coleifer/peewee/issues/2948

关于使用 peewee 的 get_or_create 时候,出现 peewee.IntegrityError 报错

下面的是完整的代码日志

error handling worker <WorkerContext [taisan_task_ocr_service.start_ocr] at 0x7efd9bd31410>: (1062, "Duplicate entry '2147483647' for key 'found_video_cover_image_ocr_count.foundedvideocoverimageocrcounttable_video_id'")
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/peewee.py", line 7285, in get
    return clone.execute(database)[0]
           ~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/usr/local/lib/python3.11/site-packages/peewee.py", line 4581, in __getitem__
    return self.row_cache[item]
           ~~~~~~~~~~~~~~^^^^^^
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/peewee.py", line 6875, in get_or_create
    return query.get(), False
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/peewee.py", line 7288, in get
    raise self.model.DoesNotExist('%s instance matching query does '
core.mysql.models.FoundedVideoCoverImageOcrCountTableDoesNotExist: <Model: FoundedVideoCoverImageOcrCountTable> instance matching query does not exist:
SQL: SELECT `t1`.`id`, `t1`.`video_id`, `t1`.`search_count`, `t1`.`created_at`, `t1`.`updated_at` FROM `found_video_cover_image_ocr_count` AS `t1` WHERE (`t1`.`video_id` = %s) LIMIT %s OFFSET %s
Params: [2397088498, 1, 0]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/nameko/containers.py", line 392, in _run_worker
    result = method(*worker_ctx.args, **worker_ctx.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/services/taisan_task_ocr_service.py", line 25, in start_ocr
    fvcisc, fvcisc_created = FoundedVideoCoverImageOcrCountTable.get_or_create(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/peewee.py", line 6877, in get_or_create
    raise exc
  File "/usr/local/lib/python3.11/site-packages/peewee.py", line 6872, in get_or_create
    return cls.create(**kwargs), True
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/peewee.py", line 6741, in create
    inst.save(force_insert=True)
  File "/usr/local/lib/python3.11/site-packages/peewee.py", line 6951, in save
    pk = self.insert(**field_dict).execute()
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/peewee.py", line 2036, in inner
    return method(self, database, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/peewee.py", line 2107, in execute
    return self._execute(database)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/peewee.py", line 2912, in _execute
    return super(Insert, self)._execute(database)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/peewee.py", line 2625, in _execute
    cursor = database.execute(self)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/peewee.py", line 3330, in execute
    return self.execute_sql(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/playhouse/shortcuts.py", line 256, in execute_sql
    return self._reconnect(super(ReconnectMixin, self).execute_sql, sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/playhouse/shortcuts.py", line 268, in _reconnect
    raise exc
  File "/usr/local/lib/python3.11/site-packages/playhouse/shortcuts.py", line 263, in _reconnect
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/peewee.py", line 3320, in execute_sql
    with __exception_wrapper__:
  File "/usr/local/lib/python3.11/site-packages/peewee.py", line 3088, in __exit__
    reraise(new_type, new_type(exc_value, *exc_args), traceback)
  File "/usr/local/lib/python3.11/site-packages/peewee.py", line 196, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.11/site-packages/peewee.py", line 3322, in execute_sql
    cursor.execute(sql, params or ())
  File "/usr/local/lib/python3.11/site-packages/pymysql/cursors.py", line 153, in execute
    result = self._query(query)
             ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymysql/cursors.py", line 322, in _query
    conn.query(q)
  File "/usr/local/lib/python3.11/site-packages/pymysql/connections.py", line 563, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymysql/connections.py", line 825, in _read_query_result
    result.read()
  File "/usr/local/lib/python3.11/site-packages/pymysql/connections.py", line 1199, in read
    first_packet = self.connection._read_packet()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymysql/connections.py", line 775, in _read_packet
    packet.raise_for_error()
  File "/usr/local/lib/python3.11/site-packages/pymysql/protocol.py", line 219, in raise_for_error
    err.raise_mysql_exception(self._data)
  File "/usr/local/lib/python3.11/site-packages/pymysql/err.py", line 150, in raise_mysql_exception
    raise errorclass(errno, errval)
peewee.IntegrityError: (1062, "Duplicate entry '2147483647' for key 'found_video_cover_image_ocr_count.foundedvideocoverimageocrcounttable_video_id'")

下面是 peewee 的模型

class FoundedVideoCoverImageOcrCountTable(Model):
    id = BigAutoField(primary_key=True)
    video_id = IntegerField(unique=True)
    search_count = IntegerField(default=0)

    created_at = DateTimeField(
        null=False,
        constraints=[SQL('DEFAULT CURRENT_TIMESTAMP')],
        help_text='使用数据库时间'
    )
    updated_at = DateTimeField(
        null=False,
        constraints=[
            SQL('DEFAULT CURRENT_TIMESTAMP'),
            SQL('ON UPDATE CURRENT_TIMESTAMP'),
        ]
    )

    class Meta:
        database = super_taisan_db
        table_name = 'found_video_cover_image_ocr_count'

数据库里面的 ddl

-- super_taisan.found_video_cover_image_ocr_count definition

CREATE TABLE `found_video_cover_image_ocr_count` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `video_id` int NOT NULL,
  `search_count` int NOT NULL,
  `created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `foundedvideocoverimageocrcounttable_video_id` (`video_id`)
) ENGINE=InnoDB AUTO_INCREMENT=76871889 DEFAULT CHARSET=utf8mb3;

执行的部分代码

class TaiSanTaskSubmitService:
    name = 'taisan_task_ocr_service'

    def start_ocr(self, parse_result_json: str):
        # logger.debug(parse_result_json)

        parse_result = ParseResult.model_validate_json(parse_result_json)
        video_id: int = parse_result.extra['video_id']
        
        fvcisc, fvcisc_created = FoundedVideoCoverImageOcrCountTable.get_or_create(video_id=video_id)

        fvcisc: FoundedVideoCoverImageOcrCountTable
        if fvcisc.search_count >= 1 and timedelta_seconds(get_utc_now_timestamp(), fvcisc.updated_at) < 4*HOUR:
            logger.warning(f'当前图片重试次数过多,抛弃 video_id: {video_id}')

            fvcisc.search_count += 1
            fvcisc.save()

            return

很显然问题出现在 FoundedVideoCoverImageOcrCountTable.get_or_create(video_id=video_id) 阶段

让我感到困惑的是二次读取的时候,为什么会出现 WHERE (t1.video_id = %s) LIMIT %s OFFSET %s Params: [2397088498, 1, 0] ?这和 peewee.IntegrityError: (1062, "Duplicate entry '2147483647' for key 'found_video_cover_image_ocr_count.foundedvideocoverimageocrcounttable_video_id'") 的 2147483647 对应不起来。我不知道为什么会出现 2397088498 这个值?


我一开始觉得是 peewee 出问题了,然后提了 issues 给 peewee 的作者

在作者回复之前,我自己也开始了排查,最后发现的原因是

video_id 的长度超过 int32 ,所以出现了溢出循环导致的

至于为什么会超出 int32 ,是因为 video 表的 id 用的是 int64(即 mysql 的 bigint)

但是 found_video_cover_image_ocr_count 表的 video_id 不小心设置为 int 了


universe_king
3.4k 声望677 粉丝