Is the database batch insert so particular?

Recently, new projects have written a lot insertBatch . Some people have always said that batch insertion is much more efficient than circular insertion. Then this article will experiment, is it true?

test environment:

SpringBoot 2.5
Mysql 8
JDK 8
Docker

First of all, for the insertion of multiple pieces of data, the options are as follows:

foreach cycle insert
Splicing sql , one execution
Insert using batch function

Build a test environment`

sql file:

drop database IF EXISTS test;
CREATE DATABASE test;
use test;
DROP TABLE IF EXISTS `user`;
CREATE TABLE `user` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT "",
  `age` int(11) DEFAULT 0,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Application configuration file:

server:
  port: 8081
spring:
  #数据库连接配置
  datasource:
    driver-class-name: com.mysql.cj.jdbc.Driver
    url: jdbc:mysql://127.0.0.1:3306/test?characterEncoding=utf-8&useSSL=false&allowPublicKeyRetrieval=true&&serverTimezone=UTC&setUnicode=true&characterEncoding=utf8&&nullCatalogMeansCurrent=true&&autoReconnect=true&&allowMultiQueries=true
    username: root
    password: 123456
#mybatis的相关配置
mybatis:
  #mapper配置文件
  mapper-locations: classpath:mapper/*.xml
  type-aliases-package: com.aphysia.spingbootdemo.model
  #开启驼峰命名
  configuration:
    map-underscore-to-camel-case: true
logging:
  level:
    root: error

The startup file is configured with the path for scanning Mapper

import org.mybatis.spring.annotation.MapperScan;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
@MapperScan("com.aphysia.springdemo.mapper")
public class SpringdemoApplication {

    public static void main(String[] args) {
        SpringApplication.run(SpringdemoApplication.class, args);
    }

}

Mapper file has prepared several methods in total, inserting a single object, deleting all objects, and splicing and inserting multiple objects:

import com.aphysia.springdemo.model.User;
import org.apache.ibatis.annotations.Param;

import java.util.List;

public interface UserMapper {

    int insertUser(User user);

    int deleteAllUsers();


    int insertBatch(@Param("users") List<User>users);
}

Mapper.xml file is as follows:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd" >
<mapper namespace="com.aphysia.springdemo.mapper.UserMapper">
    <insert id="insertUser" parameterType="com.aphysia.springdemo.model.User">
        insert  into user(id,age) values(#{id},#{age})
    </insert>

    <delete id="deleteAllUsers">
        delete from user where id>0;
    </delete>

    <insert id="insertBatch" parameterType="java.util.List">
        insert into user(id,age) VALUES
        <foreach collection="users" item="model" index="index" separator=",">
            (#{model.id}, #{model.age})
        </foreach>
    </insert>
</mapper>

When testing, we delete all data every time we operate to ensure that the test is objective and not affected by previous data.

Different tests

1. foreach insert

Get the list first, and then perform a database operation for each piece of data to insert the data:

@SpringBootTest
@MapperScan("com.aphysia.springdemo.mapper")
class SpringdemoApplicationTests {

    @Autowired
    SqlSessionFactory sqlSessionFactory;

    @Resource
    UserMapper userMapper;

    static int num = 100000;

    static int id = 1;

    @Test
    void insertForEachTest() {
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for (int i = 0; i < users.size(); i++) {
            userMapper.insertUser(users.get(i));
        }
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }
}

2. Splicing SQL Insert

In fact, the data is inserted in the following way:

INSERT INTO `user` (`id`, `age`) 
VALUES (1, 11),
(2, 12),
(3, 13),
(4, 14),
(5, 15);

    @Test
    void insertSplicingTest() {
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        userMapper.insertBatch(users);
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }

3. Use Batch to insert in batches

Set MyBatis session of executor type to Batch , use sqlSessionFactory set the execution mode to batch, set the automatic submission to false , and then submit all at once:

    @Test
    public void insertBatch(){
        SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);
        UserMapper mapper = sqlSession.getMapper(UserMapper.class);
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for(int i=0;i<users.size();i++){
            mapper.insertUser(users.get(i));
        }
        sqlSession.commit();
        sqlSession.close();
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }

4. Batch processing + batch submission

On the basis of batch processing, every 1000 pieces of data are submitted first, that is, submitted in batches.

    @Test
    public void insertBatchForEachTest(){
        SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);
        UserMapper mapper = sqlSession.getMapper(UserMapper.class);
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for(int i=0;i<users.size();i++){
            mapper.insertUser(users.get(i));
            if (i % 1000 == 0 || i == num - 1) {
                sqlSession.commit();
                sqlSession.clearCache();
            }
        }
        sqlSession.close();
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }

The first result is obviously wrong?

Running the above code, we can get the following results. for loop insertion is really poor, and the efficiency of splicing sql is relatively high. See some information that splicing sql may be mysql , but when I execute it to 1000w , Only saw the heap memory overflow.

The following is an incorrect result! ! !

Insertion method	10	100	1000	1w	10w	100w	1000w
for loop insertion	387	1150	7907	70026	635984	too long...	too long...
Splicing sql insert	308	320	392	838	3156	24948	OutOfMemoryError: Heap memory overflow
Batch processing	392	917	5442	51647	470666	too long...	too long...
Batch processing + submission in batches	359	893	5275	50270	472462	too long...	too long...

Splicing SQL does not exceed memory

Let's take a look at the limitation of mysql

mysql> show VARIABLES like '%max_allowed_packet%';
+---------------------------+------------+
| Variable_name             | Value      |
+---------------------------+------------+
| max_allowed_packet        | 67108864   |
| mysqlx_max_allowed_packet | 67108864   |
| slave_max_allowed_packet  | 1073741824 |
+---------------------------+------------+
3 rows in set (0.12 sec)

This 67108864 600 . It is too big and I have to report an error. Then let's change it, and retest after the change:

First, when starting mysql , enter the container, or you can directly click on the Cli Docker desktop version to enter:

docker exec -it mysql bash

Enter the /etc/mysql directory and modify the my.cnf file:

cd /etc/mysql

Follow vim first, or you can’t edit the file:

apt-get update
apt-get install vim

Modify my.cnf

vim my.cnf

In the last line add max_allowed_packet=20M (by i edit, finish editing by esc , enter :wq exit)

 
[mysqld]
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
datadir         = /var/lib/mysql
secure-file-priv= NULL
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
 
# Custom config should go here
!includedir /etc/mysql/conf.d/
max_allowed_packet=2M

Exit the container

# exit

View mysql container id

docker ps -a

Restart mysql

docker restart c178e8998e68

After restarting successfully, check the largest max_allowed_pactet and find that it has been modified successfully:

mysql> show VARIABLES like '%max_allowed_packet%';
+---------------------------+------------+
| Variable_name             | Value      |
+---------------------------+------------+
| max_allowed_packet        | 2097152    |
| mysqlx_max_allowed_packet | 67108864   |
| slave_max_allowed_packet  | 1073741824 |
+---------------------------+------------+

We performed the stitching sql again, and when we found that 100w sql reached about 3.6M , which exceeded the 2M we set, and the successful demonstration threw an error:

org.springframework.dao.TransientDataAccessResourceException: 
### Cause: com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.
; Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.; nested exception is com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.

Why is batch processing so slow?

However, a closer look will reveal that the above method does not show any advantages in batch processing, and it is no different from the for Is this correct?

This is definitely not right. From the official documentation, we can see that it will be updated in batches and will not create prepared statements every time. The theory is faster.

Then I found one of my most important problems: the database connection URL address is 161d3a3275fada rewriteBatchedStatements=true

If we don’t write, the MySQL JDBC driver will ignore the executeBatch() statement by default. We expect a group of sql statements to be executed in batches, but when executed, they are sent to the MySQL database one by one. In fact, it is a single insert, which directly causes the lower Performance. I said how the performance is similar to that of looping to insert data.

Only when the rewriteBatchedStatements is set to true , the database driver will help us execute SQL .

Correct database connection:

jdbc:mysql://127.0.0.1:3306/test?characterEncoding=utf-8&useSSL=false&allowPublicKeyRetrieval=true&&serverTimezone=UTC&setUnicode=true&characterEncoding=utf8&&nullCatalogMeansCurrent=true&&autoReconnect=true&&allowMultiQueries=true&&&rewriteBatchedStatements=true

After finding the problem, we re-tested the batch test, and the final results are as follows:

Insertion method	10	100	1000	1w	10w	100w	1000w
for loop insertion	387	1150	7907	70026	635984	too long...	too long...
Splicing sql insert	308	320	392	838	3156	24948 (it is likely to exceed the sql length limit)	OutOfMemoryError: Heap memory overflow
Batch processing (emphasis)	333	323	362	636	1638	8978	OutOfMemoryError: Heap memory overflow
Batch processing + submission in batches	359	313	394	630	2907	18631	OutOfMemoryError: Heap memory overflow

From the above results, it is true that batch processing is much faster. When the order of magnitude is too large, it will actually exceed the memory overflow. Batch processing and batch submission have not become faster, which is similar to batch processing, but slower. sql , I have submitted too many times. The solution of splicing 061d3a3275fd1f is actually not much different from batch processing when the number is relatively small. The worst solution is to for cycle, which is really time-consuming. 100 bar when he had need 1s , and can not choose this program.

At first, when I found that batch processing was slow, I really doubted myself. Later I found that there was a parameter, which was a feeling of getting rid of the clouds. The more I knew, the more I didn't know.

[Profile of the author] :
Qin Huai, [161d3a3275fd80 Qinhuai Grocery Store ], the road to technology is not at a time, the mountains are high and the rivers are long, even if it is slow, it will never stop.

refers to all offer solutions PDF

What did I write in 2020?

Open source programming notes

Is the database batch insert so particular?

Build a test environment`

Different tests

1. foreach insert

2. Splicing SQL Insert

3. Use Batch to insert in batches

4. Batch processing + batch submission

The first result is obviously wrong?

Splicing SQL does not exceed memory

Why is batch processing so slow?

秦怀杂货店

引用和评论

Redis【2】- SDS源码分析

💢线上高延迟请求排查

这些年

MongoDB Atlas与YoMio.AI近乎完美适配:推理更快速、查询更灵活、场景更丰富

Spring-@Configuration注解简析

单元测试-PowerMock

MySQL 回收表碎片实践教程