Recently, new projects have written a lot insertBatch
. Some people have always said that batch insertion is much more efficient than circular insertion. Then this article will experiment, is it true?
test environment:
- SpringBoot 2.5
- Mysql 8
- JDK 8
- Docker
First of all, for the insertion of multiple pieces of data, the options are as follows:
foreach
cycle insert- Splicing
sql
, one execution - Insert using batch function
Build a test environment`
sql
file:
drop database IF EXISTS test;
CREATE DATABASE test;
use test;
DROP TABLE IF EXISTS `user`;
CREATE TABLE `user` (
`id` int(11) NOT NULL,
`name` varchar(255) DEFAULT "",
`age` int(11) DEFAULT 0,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Application configuration file:
server:
port: 8081
spring:
#数据库连接配置
datasource:
driver-class-name: com.mysql.cj.jdbc.Driver
url: jdbc:mysql://127.0.0.1:3306/test?characterEncoding=utf-8&useSSL=false&allowPublicKeyRetrieval=true&&serverTimezone=UTC&setUnicode=true&characterEncoding=utf8&&nullCatalogMeansCurrent=true&&autoReconnect=true&&allowMultiQueries=true
username: root
password: 123456
#mybatis的相关配置
mybatis:
#mapper配置文件
mapper-locations: classpath:mapper/*.xml
type-aliases-package: com.aphysia.spingbootdemo.model
#开启驼峰命名
configuration:
map-underscore-to-camel-case: true
logging:
level:
root: error
The startup file is configured with the path for scanning Mapper
import org.mybatis.spring.annotation.MapperScan;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
@MapperScan("com.aphysia.springdemo.mapper")
public class SpringdemoApplication {
public static void main(String[] args) {
SpringApplication.run(SpringdemoApplication.class, args);
}
}
Mapper
file has prepared several methods in total, inserting a single object, deleting all objects, and splicing and inserting multiple objects:
import com.aphysia.springdemo.model.User;
import org.apache.ibatis.annotations.Param;
import java.util.List;
public interface UserMapper {
int insertUser(User user);
int deleteAllUsers();
int insertBatch(@Param("users") List<User>users);
}
Mapper.xml
file is as follows:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd" >
<mapper namespace="com.aphysia.springdemo.mapper.UserMapper">
<insert id="insertUser" parameterType="com.aphysia.springdemo.model.User">
insert into user(id,age) values(#{id},#{age})
</insert>
<delete id="deleteAllUsers">
delete from user where id>0;
</delete>
<insert id="insertBatch" parameterType="java.util.List">
insert into user(id,age) VALUES
<foreach collection="users" item="model" index="index" separator=",">
(#{model.id}, #{model.age})
</foreach>
</insert>
</mapper>
When testing, we delete all data every time we operate to ensure that the test is objective and not affected by previous data.
Different tests
1. foreach insert
Get the list first, and then perform a database operation for each piece of data to insert the data:
@SpringBootTest
@MapperScan("com.aphysia.springdemo.mapper")
class SpringdemoApplicationTests {
@Autowired
SqlSessionFactory sqlSessionFactory;
@Resource
UserMapper userMapper;
static int num = 100000;
static int id = 1;
@Test
void insertForEachTest() {
List<User> users = getRandomUsers();
long start = System.currentTimeMillis();
for (int i = 0; i < users.size(); i++) {
userMapper.insertUser(users.get(i));
}
long end = System.currentTimeMillis();
System.out.println("time:" + (end - start));
}
}
2. Splicing SQL Insert
In fact, the data is inserted in the following way:
INSERT INTO `user` (`id`, `age`)
VALUES (1, 11),
(2, 12),
(3, 13),
(4, 14),
(5, 15);
@Test
void insertSplicingTest() {
List<User> users = getRandomUsers();
long start = System.currentTimeMillis();
userMapper.insertBatch(users);
long end = System.currentTimeMillis();
System.out.println("time:" + (end - start));
}
3. Use Batch to insert in batches
Set MyBatis session
of executor type
to Batch
, use sqlSessionFactory
set the execution mode to batch, set the automatic submission to false
, and then submit all at once:
@Test
public void insertBatch(){
SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);
UserMapper mapper = sqlSession.getMapper(UserMapper.class);
List<User> users = getRandomUsers();
long start = System.currentTimeMillis();
for(int i=0;i<users.size();i++){
mapper.insertUser(users.get(i));
}
sqlSession.commit();
sqlSession.close();
long end = System.currentTimeMillis();
System.out.println("time:" + (end - start));
}
4. Batch processing + batch submission
On the basis of batch processing, every 1000 pieces of data are submitted first, that is, submitted in batches.
@Test
public void insertBatchForEachTest(){
SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);
UserMapper mapper = sqlSession.getMapper(UserMapper.class);
List<User> users = getRandomUsers();
long start = System.currentTimeMillis();
for(int i=0;i<users.size();i++){
mapper.insertUser(users.get(i));
if (i % 1000 == 0 || i == num - 1) {
sqlSession.commit();
sqlSession.clearCache();
}
}
sqlSession.close();
long end = System.currentTimeMillis();
System.out.println("time:" + (end - start));
}
The first result is obviously wrong?
Running the above code, we can get the following results. for
loop insertion is really poor, and the efficiency of splicing sql
is relatively high. See some information that splicing sql
may be mysql
, but when I execute it to 1000w
, Only saw the heap memory overflow.
The following is an incorrect result! ! !
Insertion method | 10 | 100 | 1000 | 1w | 10w | 100w | 1000w |
---|---|---|---|---|---|---|---|
for loop insertion | 387 | 1150 | 7907 | 70026 | 635984 | too long... | too long... |
Splicing sql insert | 308 | 320 | 392 | 838 | 3156 | 24948 | OutOfMemoryError: Heap memory overflow |
Batch processing | 392 | 917 | 5442 | 51647 | 470666 | too long... | too long... |
Batch processing + submission in batches | 359 | 893 | 5275 | 50270 | 472462 | too long... | too long... |
Splicing SQL does not exceed memory
Let's take a look at the limitation of mysql
mysql> show VARIABLES like '%max_allowed_packet%';
+---------------------------+------------+
| Variable_name | Value |
+---------------------------+------------+
| max_allowed_packet | 67108864 |
| mysqlx_max_allowed_packet | 67108864 |
| slave_max_allowed_packet | 1073741824 |
+---------------------------+------------+
3 rows in set (0.12 sec)
This 67108864
600
. It is too big and I have to report an error. Then let's change it, and retest after the change:
- First, when starting
mysql
, enter the container, or you can directly click on theCli
Docker
desktop version to enter:
docker exec -it mysql bash
- Enter the
/etc/mysql
directory and modify themy.cnf
file:
cd /etc/mysql
- Follow
vim
first, or you can’t edit the file:
apt-get update
apt-get install vim
- Modify
my.cnf
vim my.cnf
- In the last line add
max_allowed_packet=20M
(byi
edit, finish editing byesc
, enter:wq
exit)
[mysqld]
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
datadir = /var/lib/mysql
secure-file-priv= NULL
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Custom config should go here
!includedir /etc/mysql/conf.d/
max_allowed_packet=2M
- Exit the container
# exit
- View
mysql
containerid
docker ps -a
- Restart
mysql
docker restart c178e8998e68
After restarting successfully, check the largest max_allowed_pactet
and find that it has been modified successfully:
mysql> show VARIABLES like '%max_allowed_packet%';
+---------------------------+------------+
| Variable_name | Value |
+---------------------------+------------+
| max_allowed_packet | 2097152 |
| mysqlx_max_allowed_packet | 67108864 |
| slave_max_allowed_packet | 1073741824 |
+---------------------------+------------+
We performed the stitching sql
again, and when we found that 100w
sql
reached about 3.6M
, which exceeded the 2M
we set, and the successful demonstration threw an error:
org.springframework.dao.TransientDataAccessResourceException:
### Cause: com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.
; Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.; nested exception is com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.
Why is batch processing so slow?
However, a closer look will reveal that the above method does not show any advantages in batch processing, and it is no different from the for
Is this correct?
This is definitely not right. From the official documentation, we can see that it will be updated in batches and will not create prepared statements every time. The theory is faster.
Then I found one of my most important problems: the database connection URL
address is 161d3a3275fada rewriteBatchedStatements=true
If we don’t write, the MySQL JDBC
driver will ignore the executeBatch()
statement by default. We expect a group of sql
statements to be executed in batches, but when executed, they are sent to the MySQL
database one by one. In fact, it is a single insert, which directly causes the lower Performance. I said how the performance is similar to that of looping to insert data.
Only when the rewriteBatchedStatements
is set to true
, the database driver will help us execute SQL
.
Correct database connection:
jdbc:mysql://127.0.0.1:3306/test?characterEncoding=utf-8&useSSL=false&allowPublicKeyRetrieval=true&&serverTimezone=UTC&setUnicode=true&characterEncoding=utf8&&nullCatalogMeansCurrent=true&&autoReconnect=true&&allowMultiQueries=true&&&rewriteBatchedStatements=true
After finding the problem, we re-tested the batch test, and the final results are as follows:
Insertion method | 10 | 100 | 1000 | 1w | 10w | 100w | 1000w |
---|---|---|---|---|---|---|---|
for loop insertion | 387 | 1150 | 7907 | 70026 | 635984 | too long... | too long... |
Splicing sql insert | 308 | 320 | 392 | 838 | 3156 | 24948 (it is likely to exceed the sql length limit) | OutOfMemoryError: Heap memory overflow |
Batch processing (emphasis) | 333 | 323 | 362 | 636 | 1638 | 8978 | OutOfMemoryError: Heap memory overflow |
Batch processing + submission in batches | 359 | 313 | 394 | 630 | 2907 | 18631 | OutOfMemoryError: Heap memory overflow |
From the above results, it is true that batch processing is much faster. When the order of magnitude is too large, it will actually exceed the memory overflow. Batch processing and batch submission have not become faster, which is similar to batch processing, but slower. sql
, I have submitted too many times. The solution of splicing 061d3a3275fd1f is actually not much different from batch processing when the number is relatively small. The worst solution is to for
cycle, which is really time-consuming. 100
bar when he had need 1s
, and can not choose this program.
At first, when I found that batch processing was slow, I really doubted myself. Later I found that there was a parameter, which was a feeling of getting rid of the clouds. The more I knew, the more I didn't know.
[Profile of the author] :
Qin Huai, [161d3a3275fd80 Qinhuai Grocery Store ], the road to technology is not at a time, the mountains are high and the rivers are long, even if it is slow, it will never stop.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。