Recently, I encountered a problem that has been bothering me for several days. I have been investigating this problem for a whole day without success. Thinking of this, I feel really sorry for the boss.

background

The thing is like this, a newly developed function in the company has not been launched yet. It is currently deployed in the test environment. The Node service will start a scheduled task, and every 5 minutes, a part of the data will be processed and written to the mysql database.

Before that, everything was running normally. On the first day of work after the Mid-Autumn Festival, I opened the background system and found that there was no data to display. Then I queried the database and found that the historically stored data was gone, gone. Now only the data after the latest scheduled task is executed will be stored. But no code has been modified in between, which is amazing.

headache time

Check that the written data always only stores the latest data, then check whether there is no logic to trigger the update, and all hit the new logic.

 const isExist = await this.Model.findOne({
  where: {
    projectId
  }
});
if (isExist) {
  await isExist.destroy()
  updateList.push(item)
} else {
  createList.push(item)
}

The current logic is to separate the addition and the update. After inspection, it is found that all isExist are null, resulting in all hitting the newly added logic. But there is data in the database, why can't it be queried? It is suspected that there is dirty data in third-party data, so I emptied all the existing data in the database and re-written to check the effect. As a result, the first write is normal, and subsequent updates will not be triggered. After querying, it is found that the data is emptied every time it is written to the database for about ten seconds.

However, the process of deleting data is not performed in the code logic after writing, and it is stably reproduced every time. After writing, it is deleted. The query is fruitless and I have no choice but to find db to help find the reason. The conclusion given by the db query log is that there is a logic to execute deletions regularly.

Seeing the log can only continue to find the reason in the code. Since the bulkCreate of sequelize is used at this time to create data in batches, I began to wonder if there was a problem in the process of batch processing. At first, it was because the amount of data executed each time was too much, so I did not choose a single execution. At this time, in order to troubleshoot the problem , so I changed to a single data create method to create data.

 this.Model.bulkCreate(list)

change into

 for (const item of list) {
  this.Model.create(item)     
}

The result is not unexpected or deleted regularly, 😭
Then I began to suspect that the transaction was not committed. Although this logic does not require transaction operations at all, I still try it with a skeptical attitude.

 let transaction;
try {
  // 建立事务对象
  transaction = await this.ctx.model.transaction();
  for (const item of list) {
    // 事务增操作
    await this.Model.create(item, {
        transaction,
    });
    // 提交事务
    await transaction.commit();
  }
} catch (err) {
    // 事务回滚
    await transaction.rollback();
}

The result is not unexpected or deleted regularly. 😭😭😭 There is no room for changes at this time. It is already dark at this time, but the problem has not been solved, so we can only continue to program for Baidu. At this time, the search results The same person also encountered such a problem. His solution was to modify the table name. At this time, he could only use a dead horse as a living horse doctor.

The result unexpectedly resumed normal writing and updating.

Why is it normal after changing the table name, I can't think of why after thinking about it. As a result, when I redeployed the service today, I glanced at the historical deployment records and found clues. There was a deployment record on the last night of the vacation, and I looked back at the same time period as the data anomaly in the first place. Basically, it can be concluded that it has a great relationship with this deployment. Since the company's internal deployment scheme includes docker and virtual machine, two scheduled tasks will be executed at the same time in each time period. When querying third-party data, there will be a certain delay in the writing time of the last two sides, which will cause the written data to be deleted by the other side. Since the server has not updated the modified code, it has been executed at the beginning. The logic of deleting copies first and then updating them. As for why the deletion was performed but not updated, the guess is that the logic of the update after deletion is wrong. This is why it is normal after modifying the table name, because the code on that server is still old, and the previous table cannot be read by adding or deleting. The problem is finally over.

ending

At this point, do you feel like watching a big oolong event? The final reason has nothing to do with the code, but the repeated changes are fruitless. There are still a lot of things that have not been written during the investigation process, such as adding a unique index due to the suspicion of duplicate data, and suspecting how much the self-incrementing ID is reset to zero, but I also learned a lot of new knowledge in the process of this change, how to use transactions, new Add unique index, modify table name, reset auto-increment ID and many other server-related knowledge. The final conclusion is that you should not question the code before encountering problems. You must also think about the system level, running version, environment variables, operation and maintenance, etc. 🤔.

Focus on front-end development, share dry goods related to front-end technology, public account: Nancheng Front-end (ID: nanchengfe)


南城FE
2.2k 声望574 粉丝