1

Recently, new projects have written a lot insertBatch . Some people have always said that batch insertion is much more efficient than circular insertion. Then this article will experiment, is it true?

test environment:

  • SpringBoot 2.5
  • Mysql 8
  • JDK 8
  • Docker

First of all, for the insertion of multiple pieces of data, the options are as follows:

  • foreach cycle insert
  • Splicing sql , one execution
  • Insert using batch function

Build a test environment`

sql file:

drop database IF EXISTS test;
CREATE DATABASE test;
use test;
DROP TABLE IF EXISTS `user`;
CREATE TABLE `user` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT "",
  `age` int(11) DEFAULT 0,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Application configuration file:

server:
  port: 8081
spring:
  #数据库连接配置
  datasource:
    driver-class-name: com.mysql.cj.jdbc.Driver
    url: jdbc:mysql://127.0.0.1:3306/test?characterEncoding=utf-8&useSSL=false&allowPublicKeyRetrieval=true&&serverTimezone=UTC&setUnicode=true&characterEncoding=utf8&&nullCatalogMeansCurrent=true&&autoReconnect=true&&allowMultiQueries=true
    username: root
    password: 123456
#mybatis的相关配置
mybatis:
  #mapper配置文件
  mapper-locations: classpath:mapper/*.xml
  type-aliases-package: com.aphysia.spingbootdemo.model
  #开启驼峰命名
  configuration:
    map-underscore-to-camel-case: true
logging:
  level:
    root: error

The startup file is configured with the path for scanning Mapper

import org.mybatis.spring.annotation.MapperScan;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
@MapperScan("com.aphysia.springdemo.mapper")
public class SpringdemoApplication {

    public static void main(String[] args) {
        SpringApplication.run(SpringdemoApplication.class, args);
    }

}

Mapper file has prepared several methods in total, inserting a single object, deleting all objects, and splicing and inserting multiple objects:

import com.aphysia.springdemo.model.User;
import org.apache.ibatis.annotations.Param;

import java.util.List;

public interface UserMapper {

    int insertUser(User user);

    int deleteAllUsers();


    int insertBatch(@Param("users") List<User>users);
}

Mapper.xml file is as follows:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd" >
<mapper namespace="com.aphysia.springdemo.mapper.UserMapper">
    <insert id="insertUser" parameterType="com.aphysia.springdemo.model.User">
        insert  into user(id,age) values(#{id},#{age})
    </insert>

    <delete id="deleteAllUsers">
        delete from user where id>0;
    </delete>

    <insert id="insertBatch" parameterType="java.util.List">
        insert into user(id,age) VALUES
        <foreach collection="users" item="model" index="index" separator=",">
            (#{model.id}, #{model.age})
        </foreach>
    </insert>
</mapper>

When testing, we delete all data every time we operate to ensure that the test is objective and not affected by previous data.

Different tests

1. foreach insert

Get the list first, and then perform a database operation for each piece of data to insert the data:

@SpringBootTest
@MapperScan("com.aphysia.springdemo.mapper")
class SpringdemoApplicationTests {

    @Autowired
    SqlSessionFactory sqlSessionFactory;

    @Resource
    UserMapper userMapper;

    static int num = 100000;

    static int id = 1;

    @Test
    void insertForEachTest() {
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for (int i = 0; i < users.size(); i++) {
            userMapper.insertUser(users.get(i));
        }
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }
}

2. Splicing SQL Insert

In fact, the data is inserted in the following way:

INSERT INTO `user` (`id`, `age`) 
VALUES (1, 11),
(2, 12),
(3, 13),
(4, 14),
(5, 15);
    @Test
    void insertSplicingTest() {
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        userMapper.insertBatch(users);
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }

3. Use Batch to insert in batches

Set MyBatis session of executor type to Batch , use sqlSessionFactory set the execution mode to batch, set the automatic submission to false , and then submit all at once:

    @Test
    public void insertBatch(){
        SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);
        UserMapper mapper = sqlSession.getMapper(UserMapper.class);
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for(int i=0;i<users.size();i++){
            mapper.insertUser(users.get(i));
        }
        sqlSession.commit();
        sqlSession.close();
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }

4. Batch processing + batch submission

On the basis of batch processing, every 1000 pieces of data are submitted first, that is, submitted in batches.

    @Test
    public void insertBatchForEachTest(){
        SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);
        UserMapper mapper = sqlSession.getMapper(UserMapper.class);
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for(int i=0;i<users.size();i++){
            mapper.insertUser(users.get(i));
            if (i % 1000 == 0 || i == num - 1) {
                sqlSession.commit();
                sqlSession.clearCache();
            }
        }
        sqlSession.close();
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }

The first result is obviously wrong?

Running the above code, we can get the following results. for loop insertion is really poor, and the efficiency of splicing sql is relatively high. See some information that splicing sql may be mysql , but when I execute it to 1000w , Only saw the heap memory overflow.

The following is an incorrect result! ! !

Insertion method1010010001w10w100w1000w
for loop insertion3871150790770026635984too long...too long...
Splicing sql insert308320392838315624948OutOfMemoryError: Heap memory overflow
Batch processing392917544251647470666too long...too long...
Batch processing + submission in batches359893527550270472462too long...too long...

Splicing SQL does not exceed memory

Let's take a look at the limitation of mysql

mysql> show VARIABLES like '%max_allowed_packet%';
+---------------------------+------------+
| Variable_name             | Value      |
+---------------------------+------------+
| max_allowed_packet        | 67108864   |
| mysqlx_max_allowed_packet | 67108864   |
| slave_max_allowed_packet  | 1073741824 |
+---------------------------+------------+
3 rows in set (0.12 sec)

This 67108864 600 . It is too big and I have to report an error. Then let's change it, and retest after the change:

  1. First, when starting mysql , enter the container, or you can directly click on the Cli Docker desktop version to enter:
docker exec -it mysql bash
  1. Enter the /etc/mysql directory and modify the my.cnf file:
cd /etc/mysql
  1. Follow vim first, or you can’t edit the file:
apt-get update
apt-get install vim
  1. Modify my.cnf
vim my.cnf
  1. In the last line add max_allowed_packet=20M (by i edit, finish editing by esc , enter :wq exit)
 
[mysqld]
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
datadir         = /var/lib/mysql
secure-file-priv= NULL
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
 
# Custom config should go here
!includedir /etc/mysql/conf.d/
max_allowed_packet=2M
  1. Exit the container
# exit
  1. View mysql container id
docker ps -a

image-20211130005909539

  1. Restart mysql
docker restart c178e8998e68

After restarting successfully, check the largest max_allowed_pactet and find that it has been modified successfully:

mysql> show VARIABLES like '%max_allowed_packet%';
+---------------------------+------------+
| Variable_name             | Value      |
+---------------------------+------------+
| max_allowed_packet        | 2097152    |
| mysqlx_max_allowed_packet | 67108864   |
| slave_max_allowed_packet  | 1073741824 |
+---------------------------+------------+

We performed the stitching sql again, and when we found that 100w sql reached about 3.6M , which exceeded the 2M we set, and the successful demonstration threw an error:

org.springframework.dao.TransientDataAccessResourceException: 
### Cause: com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.
; Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.; nested exception is com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.

Why is batch processing so slow?

However, a closer look will reveal that the above method does not show any advantages in batch processing, and it is no different from the for Is this correct?

This is definitely not right. From the official documentation, we can see that it will be updated in batches and will not create prepared statements every time. The theory is faster.

image-20211130011820487

Then I found one of my most important problems: the database connection URL address is 161d3a3275fada rewriteBatchedStatements=true

If we don’t write, the MySQL JDBC driver will ignore the executeBatch() statement by default. We expect a group of sql statements to be executed in batches, but when executed, they are sent to the MySQL database one by one. In fact, it is a single insert, which directly causes the lower Performance. I said how the performance is similar to that of looping to insert data.

Only when the rewriteBatchedStatements is set to true , the database driver will help us execute SQL .

Correct database connection:

jdbc:mysql://127.0.0.1:3306/test?characterEncoding=utf-8&useSSL=false&allowPublicKeyRetrieval=true&&serverTimezone=UTC&setUnicode=true&characterEncoding=utf8&&nullCatalogMeansCurrent=true&&autoReconnect=true&&allowMultiQueries=true&&&rewriteBatchedStatements=true

After finding the problem, we re-tested the batch test, and the final results are as follows:

Insertion method1010010001w10w100w1000w
for loop insertion3871150790770026635984too long...too long...
Splicing sql insert308320392838315624948 (it is likely to exceed the sql length limit)OutOfMemoryError: Heap memory overflow
Batch processing (emphasis)33332336263616388978OutOfMemoryError: Heap memory overflow
Batch processing + submission in batches359313394630290718631OutOfMemoryError: Heap memory overflow

From the above results, it is true that batch processing is much faster. When the order of magnitude is too large, it will actually exceed the memory overflow. Batch processing and batch submission have not become faster, which is similar to batch processing, but slower. sql , I have submitted too many times. The solution of splicing 061d3a3275fd1f is actually not much different from batch processing when the number is relatively small. The worst solution is to for cycle, which is really time-consuming. 100 bar when he had need 1s , and can not choose this program.

At first, when I found that batch processing was slow, I really doubted myself. Later I found that there was a parameter, which was a feeling of getting rid of the clouds. The more I knew, the more I didn't know.

[Profile of the author] :
Qin Huai, [161d3a3275fd80 Qinhuai Grocery Store ], the road to technology is not at a time, the mountains are high and the rivers are long, even if it is slow, it will never stop.

refers to all offer solutions PDF

What did I write in 2020?

Open source programming notes


秦怀杂货店
147 声望38 粉丝

山高水长,纵使缓慢,驰而不息。