Author: Fuyi
problem background
When using JMeter stress test, it was found that the same backend service, under the single machine 500 concurrency, the HTTP and HTTPS protocol stress test RT gap is very large. At the same time, the water level of each monitoring indicator of the back-end service is very low, so it is suspected that the performance bottleneck is the JMeter pressure client.
problem analysis
Entry Point: Garbage Collection
First of all, it is observed that the CPU usage and memory usage are very high on the presser. Let's look at the CPU and memory usage of each thread in detail:
top -Hp {pid}
It is found that the CPU usage of the process is nearly full, and the GC thread CPU usage is very high
\
Looking at the frequency and time-consuming of gc, it is found that there are YoungGCs every second, and the cumulative time is relatively long, so start with frequent GCs to locate the problem.
java/bin/jstat -gcutil {pid} 1000
During the stress test, after performing HeapDump on the running process of JMeter, analyze the heap memory:
It can be seen that the cacheMap object occupies 93.3% of the memory, and it is referenced by the SSLSessionContextImpl class. After analyzing the source code, it can be seen that when each SSLSessionContextImpl object is constructed, two soft reference caches, sessionHostPortCache and sessionCache, are initialized. Because it is a soft reference, the JVM will only reclaim such objects when memory is insufficient.
// 默认缓存大小
private final static int DEFAULT_MAX_CACHE_SIZE = 20480;
// package private
SSLSessionContextImpl() {
cacheLimit = getDefaultCacheLimit(); // default cache size,这里默认是20480
timeout = 86400; // default, 24 hours
// use soft reference
// 这里初始化了2个默认大小20480的缓存,是频繁GC的原因
sessionCache = Cache.newSoftMemoryCache(cacheLimit, timeout);
sessionHostPortCache = Cache.newSoftMemoryCache(cacheLimit, timeout);
}
// 获取默认缓存大小
private static int getDefaultCacheLimit() {
try {
int defaultCacheLimit = GetIntegerAction.privilegedGetProperty(
"javax.net.ssl.sessionCacheSize", DEFAULT_MAX_CACHE_SIZE);
if (defaultCacheLimit >= 0) {
return defaultCacheLimit;
} else if (SSLLogger.isOn && SSLLogger.isOn("ssl")) {
SSLLogger.warning(
"invalid System Property javax.net.ssl.sessionCacheSize, " +
"use the default session cache size (" +
DEFAULT_MAX_CACHE_SIZE + ") instead");
}
} catch (Exception e) {
// unlikely, log it for safe
if (SSLLogger.isOn && SSLLogger.isOn("ssl")) {
SSLLogger.warning(
"the System Property javax.net.ssl.sessionCacheSize is " +
"not available, use the default value (" +
DEFAULT_MAX_CACHE_SIZE + ") instead");
}
}
return DEFAULT_MAX_CACHE_SIZE;
}
Through the above code, it is found that the default size of sessionCache and sessionHostPortCache cache is DEFAULT_MAX_CACHE_SIZE, which is 20480. For our stress test scenario, if the connection is re-established every time a request is made, then this cache is not needed at all. Looking at the code logic again, I found that you can actually set the cache size through javax.net.ssl.sessionCacheSize. When JMeter starts, add the JVM parameter -Djavax.net.ssl.sessionCacheSize=1, set the cache size to 1, and re- Pressure test verification, observe GC.
It can be seen that the YGC has significantly decreased, from 1 time per second to 1 time in 5-6 seconds. Then observe the RT of the pressure test, the result. . . It is still 1800ms, the original 100ms service is compressed into 1800ms, it seems that the problem is not the cache of SSLSession. Returning to the time-consuming analysis of GC, take a closer look. In fact, there is only one Full GC, and the blocking time is not much. Although Young GC is frequent, the blocking time is very short, and it is not enough for the CPU to encrypt and decrypt SSL. All computing time slices are preempted. It seems that the pressure is simply the high number of SSL handshakes, resulting in a performance bottleneck.
Adjustment of ideas: why frequent SSL handshakes
Back to the background of the question, we are doing a stress test, and a single machine runs a high number of concurrent simulated users. For performance reasons, it is completely possible to share the SSL connection after one handshake, and no handshake will follow. Why does JMeter handshake so frequently?
With this question in mind, I read the official JMeter documentation, and I was pleasantly surprised!
It turns out that JMeter has 2 switches to control whether to reset the SSL context. The first is https.sessioncontext.shared to control whether to share the same SSLContext globally. If it is set to true, then each thread shares the same SSL context, which will put pressure on the pressure machine. Lowest performance pressure, but cannot simulate a real multi-user SSL handshake.
The second switch httpclient.reset_state_on_thread_group_iteration is whether the thread group resets the SSL context every time it loops. After 5.0, it defaults to true, which means that the SSL context is reset every time it loops. It seems that this is the reason for frequent SSL handshakes.
problem verification
Regression Testing
In jmeter.properties, when each thread loops, the SSL context is not reset, and the pressure test is started again in the PTS console, and the RT directly drops by 10 times.
httpclient.reset_state_on_thread_group_iteration=false
before fixing
after modification
Source code verification
Let's analyze how JMeter implements the cyclic reset of the SSL context from the source code level. The code is as follows:
/**
* Whether SSL State/Context should be reset
* Shared state for any HC based implementation, because SSL contexts are the same
*/
protected static final ThreadLocal<Boolean> resetStateOnThreadGroupIteration =
ThreadLocal.withInitial(() -> Boolean.FALSE);
/**
* Reset SSL State. <br/>
* In order to do that we need to:
* <ul>
* <li>Call resetContext() on SSLManager</li>
* <li>Close current Idle or Expired connections that hold SSL State</li>
* <li>Remove HttpClientContext.USER_TOKEN from {@link HttpClientContext}</li>
* </ul>
* @param jMeterVariables {@link JMeterVariables}
* @param clientContext {@link HttpClientContext}
* @param mapHttpClientPerHttpClientKey Map of {@link Pair} holding {@link CloseableHttpClient} and {@link PoolingHttpClientConnectionManager}
*/
private void resetStateIfNeeded(JMeterVariables jMeterVariables,
HttpClientContext clientContext,
Map<HttpClientKey, Pair<CloseableHttpClient, PoolingHttpClientConnectionManager>> mapHttpClientPerHttpClientKey) {
if (resetStateOnThreadGroupIteration.get()) {
// 关闭当前线程对应连接池的超时、空闲连接,重置连接池状态
closeCurrentConnections(mapHttpClientPerHttpClientKey);
// 移除Token
clientContext.removeAttribute(HttpClientContext.USER_TOKEN);
// 重置SSL上下文
((JsseSSLManager) SSLManager.getInstance()).resetContext();
// 标记置为false,保证一次循环中,只有第一个采样器走进此逻辑
resetStateOnThreadGroupIteration.set(false);
}
}
@Override
protected void notifyFirstSampleAfterLoopRestart() {
log.debug("notifyFirstSampleAfterLoopRestart called "
+ "with config(httpclient.reset_state_on_thread_group_iteration={})",
RESET_STATE_ON_THREAD_GROUP_ITERATION);
resetStateOnThreadGroupIteration.set(RESET_STATE_ON_THREAD_GROUP_ITERATION);
}
Every time the HTTP sampler based on Apache HTTPClient4 is executed, the resetStateIfNeeded method will be called, and the httpclient.reset_state_on_thread_group_iteration configuration, ie resetStateOnThreadGroupIteration, will be read when entering the method. If true, reset the current thread's connection pool state, reset the SSL context, and then set resetStateOnThreadGroupIteration to false.
Because the concurrency of JMeter is implemented based on threads, the resetStateOnThreadGroupIteration switch is placed in ThreadLocal. At the beginning of each loop, the notifyFirstSampleAfterLoopRestart method will be called to reset the switch. After running once, force the switch to false. This ensures that only the first sampler enters this logic per loop, which is only executed once per loop.
Summarize
This time, the performance problem of the HTTPS protocol under pressure testing of JMeter version 5.0 and above has been solved. The experience is summarized as follows:
- If you want the pressure machine to maximize performance, you can set https.sessioncontext.shared to true, so that all threads will share the same SSL context, and will not shake hands frequently, but it cannot simulate the multi-user scenario in real situations.
- If you want to simulate multiple users and perform a certain action in a loop, that is, a thread group simulates the behavior of the same user every time, you can set httpclient.reset_state_on_thread_group_iteration to false, which can also greatly improve the single-machine stress test HTTPS performance.
- If you want each thread group to simulate different users each cycle, you need to set httpclient.reset_state_on_thread_group_iteration=true. At this time, the stress test will simulate multiple users' frequent SSL handshakes, and the performance of the pressure machine is the lowest. From experience, the upper limit of a single machine is about 50 concurrent . This is also the default setting after version 5.0 of JMeter.
Alibaba Cloud JMeter Stress Test
Alibaba Cloud's PTS stress testing tool [ 1] supports native JMeter scripts, and has set httpclient.reset_state_on_thread_group_iteration to false by default in HTTPS stress testing, which greatly improves the performance of the stressing machine and saves stress testing costs when testing HTTPS. If you simulate the most real user access situation for stress testing, you can set httpclient.reset_state_on_thread_group_iteration to true by modifying the custom properties configuration [ 2] in the JMeter environment.
In addition, Alibaba Cloud JMeter stress testing has the following advantages:
- Zero operation and maintenance cost supports distributed pressure measurement, and can be used immediately after pressure
- View second-level monitoring during pressure measurement, and observe system performance water level in real time
- Support RPS mode, intuitively measure system throughput
- Initiating millions of concurrent traffic in global regions to simulate real user distribution
- Support Alibaba Cloud VPC stress test, one-click access to cloud intranet environment
- Support JMeter client plug-in to quickly initiate cloud pressure testing locally
For more communication, welcome to the DingTalk group to communicate, PTS user communication group number: 11774967
At the same time, the new sales method of PTS is coming, and the price of the basic version will drop by 50%! The price of one million concurrent transactions is only 6200! There are also 0.99 trial version for new users and VPC stress test exclusive version, welcome to buy!
Reference link:
[1] Alibaba Cloud PTS stress measurement tool
https://pts.console.aliyun.com/#/jmeter/create
[2] Custom properties configuration
https://common-buy.aliyun.com/?commodityCode=pts#/open
Click here to go to the performance testing service PTS official website for more details!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。