5
头图

Author: Mikhail Vorontsov

Why reduce memory usage

This article will give you general advice on optimizing Java memory consumption.

Memory usage optimization is important in Java. System performance is mostly limited to memory access performance rather than CPU frequency, otherwise, why would CPU manufacturers implement all these L1, L2, and L3 caches? This means that by reducing your application's memory footprint, you will most likely increase your program's data processing speed by letting the CPU wait for a smaller amount of data. That is: saving memory improves performance!

Java memory layout

Let's start by reviewing the memory layout of Java objects in second grade: any Java Object occupies at least 16 bytes, of which 12 bytes are occupied by the Java object header. In addition, all Java objects are aligned on 8-byte boundaries. This means, an object with 2 fields: int and byte: will take 17 bytes (12 + 4 + 1), not 24 bytes (17 aligned by 8 bytes).

If the Java heap is below 32G and the option XX:+UseCompressedOops is enabled (UseCompressedOops is enabled by default since JDK6_u23), each Object reference occupies 4 bytes. Otherwise, the Object reference occupies 8 bytes.

All primitive data types occupy their exact size in bytes:

byte, boolean 1 byte
short, char 2 bytes
integer, float 4 bytes
long, double 8 bytes

Essentially, this information is sufficient for Java memory optimization. But it will be more convenient if you know Array/String the memory consumption of the numeric wrapper.

Most Common Java Types of Memory Consumption

Arrays consume 12 bytes plus their length times their element size (plus, of course, the extra occupancy of 8-byte alignment).

As of Java 7 build 06, String , contains 3 fields - a char[] int field with string data plus 2 fields with 2 hash codes computed by different algorithms . This means that String itself takes 12 (header) + 4 ( char[]reference) + 4 2 (int) = 24 bytes (as you can see, it fits perfectly into 8-byte alignment). Besides that, char[] with String data takes 12 + length 2 bytes (plus alignment). This means that String takes 36 + length*2 bytes aligned 8 bytes (by the way, this is 8 bytes less than the memory consumption before Java 7 build 06 String).

Number wrappers take 12 bytes plus the size of the underlying type. Byte/Short/Character/Integer/Long is cached by the JDK, so for values in the range -128~127, the actual memory consumption may be smaller. Regardless, these types can be a source of severe memory overhead in collection-based applications:

Byte, Boolean 16 bytes
Short, Character 16 bytes
Integer, Float 16 bytes
Long, Double 24 bytes

General Java Memory Optimization Tips

Armed with all this knowledge, it's not hard to give general Java memory optimization tips:

  • Prefer primitive types over their Object wrappers. The main reason to use wrapper types is JDK Collections, so consider using one of the primitive types collection frameworks like Trove.
  • Controls the number of Objects you have. For example, prefer array-based structs over pointer-based structs like: ArrayList/ArrayDeque/LinkedList

Java memory optimization example

Here is an example. Suppose you have to create a map from an int to a string 20 characters long. The size of this map is equal to one million, and all maps are static and predefined (eg saved in some dictionary).

The first method is to use Map<Integer, String> one of the standard JDKs. Let's roughly estimate the memory consumption of this structure. Each Integer occupies 16 bytes plus 4 bytes for Integer mapped references. Every 20 characters long String occupies 36 + 20*2 = 76 bytes (see above String description), aligned to 80 bytes. Plus 4 bytes for reference. The total memory consumption is approximately (16 + 4 + 80 + 4) * 1M = 104M .

A better approach is to wrap the Part 1 UTF-8 encoding with a String and replace it with byte[] (see Converting Characters to Bytes article ). Our Map will be Map<Integer, byte[]> . Assume that all string characters belong to the ASCII set (0-127), which is the case in most English-speaking countries. byte[20] occupies 12 (header) + 20*1 = 32 bytes, which conveniently fits into 8-byte alignment. The entire Map will now occupy (16 + 4 + 32 + 4) * 1M = 56M , 1 and a half less than the previous example.

Now let's use Trove TIntObjectMap<byte[]> . int[] stores key-values normally compared to wrapper types in JDK collections. Each key will now occupy 4 bytes. The total memory consumption will drop to (4 + 32 + 4) * 1M = 40M .

The final structure will be more complex. All String values will be stored byte[] one after the other (we're still assuming we have a text-based ASCII string) with a byte 0 as a separator in between. The overall byte[] will occupy (20 + 1) * 1M = 21M . Our Map will store the offset of the string, byte[] instead of the string itself. We will use Trove's TIntIntMap for this purpose. It will consume (4 + 4) * 1M = 8M. The total memory consumption in this example would be 8M + 21M = 29M . By the way, this is the first example that relies on the invariance of this dataset.

Can we achieve better results? Yes, we can, but at the cost of CPU consumption. The obvious "optimization" is to store the value into a large byte[]. Now we can store the key in an int[] and use a binary search to find the key. If a key is found, its index multiplied by 21 (remember, all strings have the same length) will give us a value in byte[]. Compared to the lookup in the hashmap case, this structure" Only" takes 21M + 4M (for int[]) = 25M , at the cost of the lookup complexity going from O(1) to O(log N).

Is this the best we can do? Do not! We forget that all values are 20 characters long, so we don't really need the separator between byte[]. This means that if we agree to do lookups in O(log N), we can use 24M of memory to store our "Map". Compared to the theoretical data size, there is absolutely no overhead and almost 4.5 times less than what the original solution ( Map<Integer, String> ) required! Who told you that Java programs are memory-hungry?

Summarize

Prefer primitive types over their Object wrappers. The main reason to use wrapper types is JDK collections, so consider using one of the primitive types collection frameworks like Trove.

Minimize the number of Objects you have. For example, favor array-based structs over pointer-based structs like. ArrayList/ArrayDeque/LinkedList

Recommended reading

If you want to learn more about clever data compression algorithms, it's worth reading "Programming Pearls" (Second Edition) by Jon Bentley. This is a wonderful collection of very unexpected algorithms. For example, in Section 13.8, the authors describe how Doug McIlroy managed to install a 75,000-word spell checker in 64 KB of RAM. That spell checker keeps all the needed information in such a small amount of memory and doesn't use disk! It might also be worth noting that Programming Pearls is one of the recommended prep books for Google SRE interviews.


Yujiaao
12.7k 声望4.7k 粉丝

[链接]