TiDB adaptation application practice: Troubleshoot and optimize the performance of MyBatis 3.5.X in JDK8

Recently, financial customers have used TiDB to adapt batch processing scenarios, and the data volume is in the hundreds of millions. The processing time for the same amount of data is 35 minutes for TiDB and 15 minutes for Oracle, a full difference of 20 minutes. From previous experience, TiDB's performance is better than Oracle's in batch processing scenarios, which confuses us. After some investigation, the final positioning is a batch process problem. After the adjustment, when the application server has a performance bottleneck, the database pressure is still low, and the parameter optimization is not performed, the processing time of TiDB is reduced to 16 minutes, which is almost the same as that of Oracle.

Remote investigation

Through Grafana, it is found that the resource usage rate of the database cluster is very low when executing batch processing, and the pressure to judge the application is small. The number of concurrency is increased from 40 to 100, and the resource usage rate and QPS indicators are almost unchanged. Through the connection count monitoring, it can be seen that the number of connections increases as the number of concurrent connections increases. Confirm that the modification of the number of concurrent connections is effective. Execute show processlist and find that most of the connections are idle. A brief walk through the application code is the Spring batch + MyBatis structure. Because Spring batch is very simple to set up concurrency, the adjustment of the number of threads should be effective and work normally.
Although the problem of low resource utilization has not been clarified, there are other gains. The network delay of the ping application and the TiDB cluster has reached 2~3 ms. In order to eliminate the interference of high network latency, the application was deployed to the TiDB cluster to run. The batch processing time dropped from 35 minutes to 27 minutes, but it was still far behind Oracle. Because there is no pressure on the database itself, there is no point in adjusting database parameters.

Because the effect of the application in improving concurrency did not meet expectations, considering that the thread may cause blocking, but there is no evidence, so I thought of this scenario to simply verify whether it is an application problem or a database problem:
Create two identical databases, d1 and d2 in the TiDB cluster, and use two identical batch applications to process the data in d1 and d2 respectively, which is equivalent to writing to the TiDB cluster under double pressure. The expected result is Double the amount of data can also be processed in 27 minutes, and the database resource usage rate should be greater than that of one application.
The test results are in line with expectations, proving that the application does not really improve concurrency.

possible reason?

Application concurrency is too high, and the CPU is busy causing application performance bottlenecks.
The CPU consumption of the application server is only 6%, and there should be no performance bottleneck.
There are some metadata tables in Spring batch, and updating the same data in the metadata table at the same time will cause blockage.
This situation should be blocked in the database causing lock waiting or lock timeout, and should not be blocked on the application side.

How to solve it?

Multi-application deployment runs concurrently, and performance increases linearly with the number of application deployments.
It cannot solve the performance bottleneck problem of stand-alone applications, and it is also very inconvenient for business expansion during peak hours.
Use asynchronous processing to improve application throughput.
At present, there are some technologies for asynchronously accessing databases, but their maturity is low and it is strongly not recommended to use them.

On-site investigation

In order to find out the root cause of the problem, I came to the scene.

A demo was written using JDBC on site to perform stress testing on the problem cluster. It was found that the database resource usage rate increased with the increase in the number of concurrent demos, which proved that increasing the number of concurrency can create higher pressure on the database. At this time, the possibility of database problems is completely eliminated.
Through VisualVM, it is found that a large number of threads of the application are in a blocking state. In this case, it is useless to open more threads. The real performance bottleneck comes from the application.

I walked through the application code and found that although logic such as synchronization locks is useful, it should not cause serious thread blocking.
Through the dump, it is found that the threads are all blocked in the MyBatis stack, which is in this position of the source code:

@Override
  public Reflector findForClass(Class<?> type) {
    if (classCacheEnabled) {
      // synchronized (type) removed see issue #461
      return MapUtil.computeIfAbsent(reflectorMap, type, Reflector::new);
    } else {
      return new Reflector(type);
    }
  }

This is roughly the case here. When MyBatis performs parameter processing, result mapping and other operations, a lot of reflection operations are involved. Although reflection in Java is powerful, the code is more complicated and error-prone to write. In order to simplify the code related to reflection operations, MyBatis provides a special reflection module, which further encapsulates common reflection operations, providing more conciseness and convenience The reflection API. DefaultReflectorFactory provided by findForClass() will create a Reflector object for the specified Class and cache the Reflector object reflectorMap . It is the operation of reflectorMap that causes thread blocking.

Because MyBatis supports ReflectorFactory , the idea at the time was to bypass the cache step, that is, set classCacheEnabled to false and follow the logic of return new Reflector(type). But it will still be blocked ConcurrentHashmap.computeIfAbsent

This seems to be a general problem, so I turn my attention to concurrentHashmap of computerIfAbsent . computerIfAbsent is a new method provided for map in JDK8

public V computeIfAbsent(K key, Function<? super K,? extends V> mappingFunction)

It first judges whether the value of the specified key exists in the cache map. If it does not exist, it will automatically call mappingFunction (key) calculate the value of the key, and then put key = value into the cache map. ConcurrentHashMap method has been rewritten in computeIfAbsent to ensure mappingFunction are thread-safe.

A paragraph in the official description:

The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.

It can be seen that in order to ensure atomicity, when the same key is modified, the thread may be blocked. Obviously this will cause more serious performance problems. In the official Java Jira, some users have mentioned the same problem.

[[JDK-8161372] ConcurrentHashMap.computeIfAbsent(k,f) locks bin when k present](https://bugs.openjdk.java.net/browse/JDK-8161372?focusedCommentId=14260334&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14260334)

Many developers think that computeIfAbsent will not cause thread block, but the opposite is true. At that time, the Java official thought that this design was fine, but later I also felt that it was not appropriate to have such a Concurrenthashmap This problem was finally fixed in JDK9.

verification

Upgrade the on-site JDK version to 9 and apply it to 500 concurrency and eliminate network delay interference. The batch processing takes 16 minutes. Application server CPU utilization rate reached about 85%, and performance bottleneck appeared. In theory, improving application server configuration and optimizing database parameters can further improve performance.

The conclusion at the time

computerIfAbsent method used by MyBatis 3.5.X to cache reflection objects does not perform well in JDK8. Need to upgrade jdk9 and above to solve this problem. For MyBatis 3.5.X itself, there is no special treatment for the performance problem of computerIfAbsent

The upgrade path is not working, you can also try to downgrade to MyBatis 3.4.X, this version has not introduced computerIfAbsent , theoretically there is no such problem.

@Override
public Reflector findForClass(Class<?> type) {
    if (classCacheEnabled) {
            // synchronized (type) removed see issue #461
      Reflector cached = reflectorMap.get(type);
      if (cached == null) {
        cached = new Reflector(type);
        reflectorMap.put(type, cached);
      }
      return cached;
    } else {
      return new Reflector(type);
    }
  }

Current conclusion

After receiving our feedback, the official MyBatis fixed this problem very efficiently. Like manually.

It can be seen that MyBatis has officially computerIfAbsent . If the value already exists, return directly, so that the thread blocking problem of operating the same key is bypassed. MyBatis will incorporate this PR in version 3.5.7.

public class MapUtil {
  /**
   * A temporary workaround for Java 8 specific performance issue JDK-8161372 .<br>
   * This class should be removed once we drop Java 8 support.
   *
   * @see <a href="https://bugs.openjdk.java.net/browse/JDK-8161372">https://bugs.openjdk.java.net/browse/JDK-8161372</a>
   */
  public static <K, V> V computeIfAbsent(Map<K, V> map, K key, Function<K, V> mappingFunction) {
    V value = map.get(key);
    if (value != null) {
      return value;
    }
    return map.computeIfAbsent(key, mappingFunction::apply);
  }

  private MapUtil() {
    super();
  }
}

Conclusion

After this investigation, we found a bug in the JAVA language source code, and further advanced the MyBatis framework that has been affected by this bug to bypass this programming language-level bug. The adjusted application processing speed has been greatly improved, more than doubled in our scenario. I believe it will be of great help to enterprises that use the application development framework MyBatis.

TiDB adaptation application practice: Troubleshoot and optimize the performance of MyBatis 3.5.X in JDK8

Remote investigation

On-site investigation

verification

The conclusion at the time

Current conclusion

Conclusion

PingCAP

引用和评论

从企业数智化四阶段解读 TiDB 场景价值

MySQL慢查询日志：性能优化的终极指南

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

好用的开源埋点方案-ClkLog埋点用户分析系统

DNS服务器地址大全

实战分享：DolphinScheduler 中 Shell 任务环境变量最佳配置方式