This article is about some notes taken during the course of learning G1. Most of the content is copied from the reference article.

introduce

GC evolution

Evolve as the memory size continues to grow:

  • Several M-dozens of M: Serial, single-threaded STW (Stop The World) garbage collection.
  • Hundreds of M-1G: parallel, parallel multi-threaded garbage collection.
  • Several G: cms, Concurrent Gc.
  • Dozens of G: G1.
  • Hundreds of G-TB: ZGC.

image.png

problem

For the collector before G1, the STW phase takes longer when the Heap area is getting larger and larger, and the CMS will also cause a longer pause time if it needs to be compressed due to memory fragmentation. Therefore, a high-throughput collector with short pause time is needed, regardless of how large the heap memory is.

Introduction

The full name of G1 is Garbage First. It was released in JDK 6u14 and officially launched when JDK 7u4 was released. It was designed to replace the CMS garbage collector. It has become the default garbage collector in JDK9.
G1 is a response time priority GC algorithm. The biggest feature is that the pause time is configurable. The user can set the expected pause time of the entire GC process. The parameter -XX:MaxGCPauseMillis specifies a target pause time for the G1 collection process. The default value is 200ms, but it is It's not a hard condition, it's just an expectation. So how does G1 meet user expectations? You need to pause the prediction model (Pause Prediction Model). G1 predicts the number of regions that need to be selected for this collection based on the historical data calculated by this model, so as to meet the target pause time set by the user as much as possible.

Internal structure

The heap is divided into N (configurable, default 2048) regions (regions) of equal size. Each region occupies a continuous address space. Garbage collection is performed in units of regions, and the size of this region is configurable. If not configured, G1 will automatically determine the size of your area based on the heap size. During allocation, if the selected area is full, it will automatically find the next free area to perform the allocation.
image.png

The size of a Region can be set by the parameter -XX:G1HeapRegionSize, the value range is from 1M to 32M, and it is an exponent of 2. If not set, G1 will be automatically determined according to the Heap size (heap size/2048).
image.png
Areas in G1 are mainly divided into two types:

  • Young generation area: G1 does not need to set the young generation size (default 5-60%),

    • Eden area-newly allocated objects
    • Survivor area-objects that survive after GC but do not need to be promoted
  • Old age area

    • Promoted to the old age
    • Huge objects directly assigned to the old age, objects occupying multiple areas

Humongous Region:
G1 has a Region dedicated to allocating giant objects, rather than entering the old-age Region. An object whose size reaches or exceeds half of the partition size (configurable) is called a Humongous Object. When a thread allocates space for huge, it cannot simply be allocated in TLAB, because the moving cost of huge objects is very high, and it is possible that a partition cannot accommodate huge objects. Therefore, huge objects will be allocated directly in the old generation, and the continuous space occupied is called Humongous Region. G1 has made an internal optimization. Once it is found that no reference points to a giant object, it can be recycled directly in the young generation collection cycle.
A giant object will monopolize one or more consecutive partitions. The first partition is marked as StartsHumongous, and the adjacent consecutive partitions are marked as ContinuesHumongous. Since you can't enjoy the optimization brought by Lab, and it is necessary to scan the entire pile to determine a continuous memory space, the cost of determining the starting position of a giant object is very high. If possible, the application should avoid generating giant objects.
Huge objects will never be moved, they are either directly recycled, or they always exist. The cost of determining the starting position of a huge object is very high, and applications should avoid generating huge objects.

GC basics

Accessibility analysis

judge whether the object is garbage?
JVM uses the reachability analysis algorithm, with "GC ROOT" as the root node, and searches downwards according to the reference relationship.
In the following figure, objects a and b are unreachable and will be marked as garbage.
image.png

GC Root object:

  • Objects referenced in the virtual machine stack (local variable table in the stack frame), such as the parameters, local variables, and temporary variables in the method stack called by each thread.
  • The object referenced by JNI (the Native method in general) in the native method stack.
  • Objects referenced by class static properties in the method area, such as reference type static variables of a Java class.
  • Objects referenced by constants in the method area, such as references in the string constant pool.
  • References inside the Jvm, such as the Class object corresponding to the basic data type, the system class loader, etc.

Three-color marking algorithm

In the process of traversing the object, the object is marked with the following three colors according to the condition of "have you visited":

  • White: Haven't visited yet.
  • Black: This object has been visited, and all other objects referenced by this object have also been visited.
  • Gray: This object has been visited, but all other objects referenced by this object have not been visited.
    image.png

floating garbage
image.png
Assuming that E has been traversed (turned to gray), the application has executed objD.fieldE = null (the reference of D> E is broken).
After this moment, the object E/F/G is "should" be recovered. However, because E has turned gray, it will still be treated as a live object and continue to traverse. The final result is: this part of the object will still be marked as alive, that is, this round of GC will not reclaim this part of the memory.
This part of the memory that should have been recycled, but is not recycled, is called "floating garbage". It will not be cleared until the next round of garbage collection.

missing label
image.png
Assuming that the GC thread has traversed to E (turned to gray), the application thread executes first:

var G = objE.fieldG; 
objE.fieldG = null; // 灰色E 断开引用 白色G 
objD.fieldG = G; // 黑色D 引用 白色G

The result of the missing label is: G will stay in the white set forever, and finally be cleared as garbage. This directly affects the correctness of the application and is unacceptable.

The solution for G1 is write barrier + SATB.

Write barrier

Write barrier: When assigning a value to a member variable of an object, add some processing before and after the assignment operation (similar to the concept of AOP)

void oop_field_store(oop* field, oop new_value) {
    pre_write_barrier(field); // 写屏障-写前操作 
    *field = new_value; 
    post_write_barrier(field, value); // 写屏障-写后操作 
}

SATB

SATB (Snapshot At The Beginning) is a means to save the reference relationship between objects at the beginning of the concurrent marking phase in the form of a logical snapshot. The simple understanding is that when concurrently marking, the current reference relationship is used as the basic reference data, regardless of the modification of the reference relationship during concurrent Mutator runtime (the origin of the Snapshot naming), the alive state at the time of marking is considered the alive state. SATB data will be scanned when gc.

Rset

image.png
In the above figure, Region B and C are the old generation, and Region A is the new generation. Region A is unreachable for GC Root:

  • When young gc, need to scan all old objects?
  • When mix gc (recycling some objects), need to scan all the old generations?
    RememberedSet (RS or RSet for short) is used to solve this problem. RSet will record the reference relationship (record old refers to young, old refers to old, and others do not record).
    image.png
    Each Region has an RSet, which is implemented through a hash table. The key of this hash table refers to the address of other regions in the region (only records old references young and old references old, and does not record young references young and young Reference old), value is an array, and the elements of the array are the subscripts in the Card Table of the Card Page corresponding to the referenced object. Rset will be reset when gc is mixed.
    image.png
    When doing young GC, you only need to select the RSet of the young region as the root set (that is, when marking, the RSet is also traversed as ROOTS). These RSets record old->young cross-generation references, avoiding scanning The same is true for the entire old generation, mixed gc. Therefore, the introduction of RSet greatly reduces the workload of GC.
    Take a paragraph of R big explanation: G1 GC adds a layer of structure on the points-out card table to form points-into RSet: each region will record which other regions have pointers to itself. And these pointers are in the range of which cards. This RSet is actually a hash table, the key is the starting address of other regions, the value is a set, and the elements inside are the index of the card table. For example, if the key of an item in the RSet of region A is region B, and the value has a card with index 1234, it means that there is a reference to region A in a card of region B. So for region A, the RSet records the points-into relationship; the card table still records the points-out relationship.
    image.png

TLAB

TLAB (Thread Local Allocation Buffer): local thread buffer.
Since the heap memory is shared by the application, multiple threads of the application need to lock for synchronization when allocating memory. In order to avoid locking, G1 GC will enable TLAB optimization by default. Each application thread will be assigned a TLAB, each TLAB is exclusive to a thread, when the object is not a Humongous object, TLAB can also be installed, the object will be assigned to the thread that created the object first TLAB. This allocation will be very fast, because TLAB belongs to the thread, so there is no need to lock.

PLAB

PLAB (Promotion Thread Local Allocation Buffer): "Promotion" thread local allocation buffer
The idea is the same as that of TLAB. The recycling process of G1 is executed by multiple threads. In order to avoid multiple threads from copying to the same memory segment, the copying process also needs to be locked. In order to avoid locking, each thread of G1 is associated with a PLAB, so there is no need to lock.

GC

gc points of g1:

  • young gc, using mark-copy algorithm.
  • mix gc, using mark-copy algorithm.
  • full gc, using a mark-and-sort algorithm.

young gc

image.png

  • When the JVM cannot allocate a new object to the eden area (the total size of the new generation area exceeds the limit of the new generation size), if it exceeds the limit, young gc will be performed.
  • Young gc only selects the young generation area (Eden/Survivor) to enter the collection set (Collection Set, CSet for short) for collection mode.
  • In order to meet the user's pause time configuration, G1 will dynamically adjust the size of the young generation based on the upper limit of the GC pause time set by the user after each GC.
  • Young gc is STW.

gc step

  1. Choose CSet: G1 will select a maximum number of young zone regions as the collection set based on the upper limit of the GC pause time set by the user.
    image.png
  2. Root Scanning: Next, you need to traverse the GC ROOTS to find the objects from the ROOTS to the collection set, move them to the Survivor area and add their reference objects to the mark stack.
    image.png
  3. RSet Scan (Scan RS): traverse RSet as ROOTS, find objects that can reach the collection set, move them to the Survivor area, and add their reference objects to the mark stack.
    image.png
  4. Move (Evacuation/Object Copy), traverse the mark stack above, and move all objects in the stack to the Survivor area.
    image.png
  5. The rest is some finishing work, Redirty (with the following concurrency mark), Clear CT (Clean up Card Table), Free CSet (Clean up the collection set), empty the area before the move and add it to the free area, etc. These operations are generally time-consuming They are all short.
    image.png

mix gc

image.png

  • Mixed recycling: young + old.
  • Will select all young generation regions (Eden/Survivor) and some old generation regions to enter the recycling collection for recycling mode.
  • When the memory used by the old generation plus the memory to be allocated this time exceeds the IHOP threshold (InitiatingHeapOccupancyPercent, default 45%), mx gc will be started.

image.png

First carry out a young generation recovery process, this process is STW.
initial mark
Initial Mark: Mark all objects that can be reached directly from the GC Root. The survivor objects after the young gc will also be regarded as the GC Root, STW, and will reuse the pause time of the young gc (executed together with the young gc).
The initial mark is responsible for marking all the root objects (native stack objects, global objects, JNI objects) that can be directly reached. The root is the starting point of the object graph. Therefore, the initial mark needs to suspend the Mutator thread (Java application thread), which means it needs A time period of STW. In fact, when the IHOP threshold is reached, G1 does not immediately initiate a concurrent marking cycle, but waits for the next young generation collection, and uses the STW time period collected by the young generation to complete the initial marking. This method is called Piggybacking ). In the initial marking pause, the NTAMS of the partition is set to the top of the partition, and the initial marking is executed concurrently until all partitions are processed.
Root partition scan
Root Region Scanning
After the initial mark pause ends, the objects collected by the young generation are copied to Survivor's work, and the application thread becomes active. At this time, in order to ensure the correctness of the marking algorithm, all objects newly copied to the Survivor partition need to be scanned and marked as root. This process is called Root Region Scanning, and the scanned Suvivor partition is also called Root Region. The root partition scan must be completed before the next young generation garbage collection starts (the concurrent marking process may be interrupted by several young generation garbage collections), because each GC will generate a new set of live objects.
Concurrent mark
Concurrent Marking: Concurrent phase. Starting from the objects scanned in the previous stage, the search is traversed one by one. Every time an object is found, it is marked as alive, and SATB is scanned.
Execute concurrently with the application thread. The concurrent marking thread is started in the concurrent marking phase. The number of starts is controlled by the parameter -XX:ConcGCThreads (1/4 of the default GC thread number, ie -XX:ParallelGCThreads/4). Each thread only scans each time A partition, thereby marking a graph of surviving objects. At this stage, the Previous/Next mark bitmap will be processed, and the reference field of the mark object will be scanned. At the same time, the concurrent marking thread will periodically check and process the records of the STAB global buffer list and update the object reference information. The parameter -XX:+ClassUnloadingWithConcurrentMark will enable an optimization. If a class is unreachable (not that the object is unreachable), this class will be unloaded directly during the remarking phase. All marking tasks must be scanned before the heap becomes full. If concurrent marking takes a long time, then it is possible that in the concurrent marking process, several young generation collections have been experienced. If the marking task is not completed before the heap is full, the guarantee mechanism will be triggered and a long serial Full GC will be experienced.
Survival data calculation
Live Data Accounting
Live Data Accounting is an additional product of the marking operation. As long as an object is marked, the number of bytes will be counted and included in the partition space. Only the objects below NTAMS will be marked and counted. At the end of the marking cycle, the Next bitmap will be cleared, waiting for the next marking cycle.
re-marking (final marking)
Remark Remark: It will be STW. Although SATB was scanned during the previous concurrent marking process, after all, the previous stage is still a concurrent process. Therefore, after the concurrent marking is completed, all user threads need to be suspended again and SATB is marked again.
Remarking is the final marking stage. In this stage, G1 needs a pause time to process the remaining SATB log buffer and all updates, find all unvisited surviving objects, and safely complete survival data calculations. This stage is also executed in parallel. The number of GC threads available when the GC is paused can be set through the parameter -XX:ParallelGCThread. At the same time, reference processing is also part of the re-marking phase. All applications that heavily use reference objects (weak references, soft references, phantom references, and final references) will incur overhead in reference processing.
clear
Cleanup: Identify high-yield old-generation partitions, clean and reset flag status, STW.
The Clean phase next to the re-marking phase is also STW. The Previous/Next marked bitmap and PTAMS/NTAMS will switch roles during the cleanup phase. The cleaning phase mainly performs the following operations:
RSet combing, the heuristic algorithm will define different levels of partitions according to activity and RSet size, and RSet mathematics also helps to find useless references. The parameter -XX:+PrintAdaptiveSizePolicy can turn on the decision details of the printing heuristic algorithm; Organize the heap partitions and identify the old-generation partition sets with high recovery returns (based on the free space and the suspension target) for the mixed collection cycle; identify all the free partitions, that is, find no survival The partition of the object. The partition can be recycled directly during the cleaning phase without waiting for the next collection cycle.

full gc

When mix gc cannot keep up with the speed of memory allocation and the old generation is full, Full GC will be performed to reclaim the entire heap. The Full GC in G1 is also single-threaded serial, and it is fully suspended, which is very expensive.
Full gc will be triggered in the following scenarios:

  • When copying surviving objects from the young generation partition, no available free partition can be found.
  • When surviving objects are transferred from the old partition, no available free partition can be found.
  • When allocating huge objects, it is impossible to find enough contiguous partitions in the old generation.

Summarize

G1 is not a high-efficiency collector. It uses a replicated recycling algorithm for the old generation. Although there is no fragmentation problem, the efficiency is low. Because most objects in the old generation are alive, there are many objects that need to be moved each time they are collected. In the clearing algorithm, dead objects are cleared, so in terms of efficiency, the clearing algorithm will be better in the old age.
However, due to the incremental recovery of G1, which can control the pause, it can ensure that each pause time is within the allowable range. For most applications, the pause time is more important than the throughput. Coupled with the optimization of various details of G1, the efficiency is already very high.

refer to:
This is probably the clearest and most understandable G1 GC information
JVM series sixteen (three-color notation and read-write barrier)

noname
314 声望49 粉丝

一只菜狗