Working overtime in the company until eight o'clock, this is the background.
The reason for working overtime is to go online to solve the problem of duplicate data in the online database. A bug in the program is found and it is easy to solve. The problem is to correct the duplicate data on the line.
There are 6 tables in the online library with duplicate data, 2 of which are relatively large, one is 960,000+, and the other is 300,000+. Because the same problem has been dealt with before, I directly used the last Python deduplication script and script. It's very simple. It connects to the database, finds out the duplicate data, and deletes it cyclically.
emmmm, but this efficiency is too low, one per second, repeated data is about 20,000+, and the estimated time is about 8 hours. . .
It is problematic to blindly rely on the things of the predecessors instead of thinking for yourself! Always think about how it was possible before and how it is not now, this is also a problem! I found that my state was really not quite right recently, and I lost my desire to explore and seek knowledge. Today is a vigilance, and I feel quite ashamed.
Closer to home, the steps to remove duplication are described in detail below.
CREATE TABLE `animal` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(20) DEFAULT NULL,
`age` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('1', 'cat', '12');
INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('2', 'dog', '13');
INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('3', 'camel', '25');
INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('4', 'cat', '32');
INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('5', 'dog', '42');
Goal: We want to remove the data with the same name.
Let’s take a look at which data is duplicated
SELECT name,count( 1 )
FROM
student
GROUP BY
NAME
HAVING
count( 1 ) > 1;
Output:
name count(1) cat 2 dog 2
The data whose name is cat and dog are duplicated, and there are two duplicate data for each;
Select * From 表 Where 重复字段 In (Select 重复字段 From 表 Group By 重复字段 Having Count(1)>1)
Delete all duplicate data, leave no one
Delete directly will report an error
DELETE
FROM
student
WHERE
NAME IN (
SELECT NAME
FROM
student
GROUP BY
NAME
HAVING
count( 1 ) > 1)
Error:
1093 - You can't specify target table 'student' for update in FROM clause, Time: 0.016000s
reason for is 16111e4b324c17: The table is queried while updating this table, and the table is updated while querying this table, which can be understood as a deadlock. mysql does not support this update query operation of the same table
solution : query the columns of data to be updated as a third-party table, and then filter and update.
DELETE
FROM
student
WHERE
NAME IN (
SELECT
t.NAME
FROM
( SELECT NAME FROM student GROUP BY NAME HAVING count( 1 ) > 1 ) t)
Delete duplicate data in the delete table, only keep one
Before deleting, we can check first, what kind of duplicate data we want to delete
SELECT
*
FROM
student
WHERE
id NOT IN (
SELECT
t.id
FROM
( SELECT MIN( id ) AS id FROM student GROUP BY `name` ) t
)
What do you mean, it is to first group by name to find out the data with the smallest id. These data are the kind of fire we want to leave, and then query to find out that the id is not in it, which is the duplicate data we want to delete.
Start to delete duplicate data, leaving only one
Very simple, just replace the select with delete.
DELETE
FROM
student
WHERE
id NOT IN (
SELECT
t.id
FROM
( SELECT MIN( id ) AS id FROM student GROUP BY `name` ) t
)
900,000+ watches execute super fast.
All done 👏👏👏👏~
(Thanks for reading, I hope for all your help)
Source: telami.cn/2019/mysql-removes-duplicate
-data-and-keeping-only-one/
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。