16
头图

Preface

Hi everyone, this is Lin Sanxin. Two days ago, I accidentally saw a V8 garbage collection mechanism on station B. If I was interested, I watched it and found it a bit difficult to understand, so I was wondering if you all treat V8 garbage like me. The knowledge of the recycling mechanism is quite ignorant, or I have read this knowledge, but I don’t understand it. So, I thought about it for three days, and thought about can use the most popular words to talk about the most difficult knowledge points.

image.png

Common understanding

I believe most students are often asked in interviews: about the V8 garbage collection mechanism"

At this time, most students will definitely answer: "There are two ways to collect garbage, one is citation method, the other is notation"

Citation method

It is to determine the number of references of an object. If the number of references is 0, it will be recycled, and if the number of references greater than 0, it will not be recycled. Please look at the following code

let obj1 = { name: '林三心', age: 22 }
let obj2 = obj1
let obj3 = obj1

obj1 = null
obj2 = null
obj3 = null

截屏2021-08-12 下午10.23.45.png

The citation method has disadvantages. After the following code is executed, it stands to reason that obj1 and obj2 will be recycled, but because they refer to each other, each reference number is 1, so they will not be recycled, resulting in a memory leak of

function fn () {
  const obj1 = {}
  const obj2 = {}
  obj1.a = obj2
  obj2.a = obj1
}
fn()

截屏2021-08-12 下午10.11.39.png

Notation

The notation method is to mark the reachable objects of as garbage collection.

Then the question is, is it unreachable? What can be used to judge? (The reachable here is not reachable duck)

image.png

Closer to home, if you want to judge whether it is reachable, you have to say the of . What is the reachability of 161d7a5b3f1aa9? It starts from root object (window or global), and searches down the child node. The child node is found, indicating that the reference object of the child node is reachable, and mark it, and then search recursively until All child nodes are traversed to the end. Then it is not traversed to the node, it is not marked, and it will be regarded as not being referenced anywhere. It can be proved that this is an object that needs to be freed and can be recycled by the garbage collector.

// 可达
var name = '林三心'
var obj = {
  arr: [1, 2, 3]
}
console.log(window.name) // 林三心
console.log(window.obj) // { arr: [1, 2, 3] }
console.log(window.obj.arr) // [1, 2, 3]
console.log(window.obj.arr[1]) // 2

function fn () {
  var age = 22
}
// 不可达
console.log(window.age) // undefined

截屏2021-08-12 下午10.29.39.png

Ordinary understanding is actually not enough, because the garbage collection mechanism (GC) is actually more than these two algorithms. If you want to know more about the V8 garbage collection mechanism, just continue to look down! ! !

JavaScript memory management

In fact, the process of JavaScript memory is very simple, divided into 3 steps:

  • 1. Allocate the memory required by users
  • 2. users get the memory and use the memory
  • 3. The user of does not need this memory, release it and return it to the system

So who are users? for example:

var num = ''
var str = '林三心'

var obj = { name: '林三心' }
obj = { name: '林胖子' }

The above num,str,obj are users . As we all know, JavaScript data types are divided into basic data types and reference data types:

  • basic data type: has a fixed size, the value is stored in the stack memory, and can be directly accessed by value
  • reference data type: the size is not fixed (attributes can be added), the stack memory contains a pointer to heap memory, which is accessed by reference

image.png

  • Since the size of the basic data type stored in the stack memory is fixed, the memory of the stack memory is automatically allocated and released by the
  • Since the size of the heap memory is not fixed, the system cannot automatically release and recycle, so the JS engine is required to manually release these memory

Why garbage collection

In Chrome, V8 is limited to the memory usage (64-bit about 1.4G/1464MB, 32-bit about 0.7G/732MB), why do you want to limit it?

  • Surface reasons: V8 was originally designed for browsers, it is unlikely to encounter scenes that use a lot of memory
  • Deep reason: V8's garbage collection mechanism is limited (if cleaning up a large amount of memory garbage is very time-consuming, this will cause the JavaScript thread to pause execution time, then performance and application plummet)

Speaking of the memory in the stack, the operating system will automatically allocate and release memory, and the memory in the heap is manually released by the JS engine (such as Chrome's V8). When our code is not written in the correct way, it will This makes the garbage collection mechanism of the JS engine unable to release the memory correctly (memory leak), which causes the memory occupied by the browser to continue to increase, which in turn causes the performance of JavaScript, applications, and operating systems to decline.

V8's garbage collection algorithm

1. Generational collection

In JavaScript, the object life cycle is divided into two situations

  • The life cycle is very short: after a garbage collection, it is released and recycled
  • The life cycle is very long: after many garbage collections, he still exists, and it doesn’t leave.

So here comes the problem. For those with a short life cycle, just recycle them, but for those with a long life cycle, they can’t be recycled after many times of recycling. If they know that they can’t be recycled, they continue to recycle useless efforts. Wouldn’t it be very expensive? performance?

For this problem, V8 has made an by generations. In layman's terms: V8 divides the heap into two spaces, one is called the new generation, the other is called the old generation, and the new generation stores objects with short survival periods. Place, the old generation is the place where the long-lived objects are stored

image.png

1-8M generation usually only has the capacity of 061d7a5b3f1ec7, while the capacity of the old generation is much larger. For these two areas, V8 uses different garbage collectors and different collection algorithms to implement garbage collection more efficiently

  • secondary garbage collector + Scavenge algorithm: mainly responsible for the new generation of garbage collection
  • main garbage collector + Mark-Sweep && Mark-Compact algorithm: mainly responsible for garbage collection of the old generation

1.1 The new generation

In JavaScript, the memory allocated by any object declaration will be placed in the new generation first, and because most objects live in memory for a short period of time, a very efficient algorithm is required. In the new generation, mainly used Scavenge algorithm for garbage collection, Scavenge algorithm is a typical replication algorithm sacrifice space for time, ideal for small footprint on the scene.

Scavange new generation heap algorithm is divided into two parts, called from-space and to-space , work is also very simple, is to from-space copy of viable active object to to-space , and the orderly line up the memory of these objects, then the from-space memory after the release of the inactive object to be complete, the from space and to space interchanged, so that it can be made in a new generation of two regions may be reused.

image.png

The specific steps are the following 4 steps:

  • 1. Mark active and inactive objects
  • 2, copy from-space active objects to to-space and sort
  • 3. Clear inactive objects in from-space
  • 4. Exchange the roles of from-space and to-space Scavenge algorithm garbage collection

So, how does the garbage collector know which objects are active and which are inactive?

This has to mention one thing- reachability. What is accessibility? It root object (window or global) and searches down the child node. The child node is found, indicating that the reference object of the child node is reachable, and mark it, and then search recursively until All child nodes are traversed to the end. Then it is not traversed to the node, it is not marked, and it will be regarded as not being referenced anywhere. It can be proved that this is an object that needs to be freed and can be recycled by the garbage collector.

When do objects in the young generation become objects in the old generation?

In the new generation, further subdivisions have been made. It is divided into nursery child and intermediate child. When an object allocates memory for the first time, it will be allocated to the generation 161d7a5b3f20cf nursery child. If the object still exists in the young generation after the next garbage collection, At this time, we move this object to the intermediate child of . After the next garbage collection, if the object is still in the generation, the 161d7a5b3f20d2 secondary garbage collector will move the object to the old generation. This moving process is called Promoted for

1.2 Old generation

The objects in the new generation space, the old objects left behind after many battles, are successfully promoted to the old generation space. Because these objects have gone through multiple recycling processes but have not been recycled, they are a group of tenacious vitality and survival. Objects with high rates, so in the old generation, the recovery algorithm should not use the Scavenge algorithm. Why, there are the following reasons:

  • Scavenge algorithm is a replication algorithm. Repeated replication of these objects with high survival rate is meaningless and extremely low in efficiency.
  • Scavenge algorithm is a space-for-time algorithm. The old generation has a large memory space. If the Scavenge algorithm is used, the space resources are very wasteful, and the gains outweigh the losses. .

Mark-Sweep algorithm (mark cleaning) and Mark-Compact algorithm (mark finishing) are used in the old generation.

Mark-Sweep (Mark-Sweep)

Mark-Sweep divided into two stages, marking and cleaning. The previous Scavenge algorithm also has marking and cleaning, but Mark-Sweep algorithm and the Scavenge algorithm is that the latter needs to be copied and then cleaned up, and the former does not need to be Mark-Sweep directly. 061d7a5 After marking the active and inactive objects, cleanup is performed directly.

  • Marking stage: the first scan of the old generation objects and marking the active objects
  • Cleaning phase: Perform a second scan on the old generation objects to remove unmarked objects, that is, inactive objects

image.png

From the above picture, I think everyone has also discovered that there is a problem: after clearing inactive objects, there are a lot of scattered vacancies.

Mark-Compact (mark finishing)

Mark-Sweep algorithm performs garbage collection, it leaves many scattered vacancies. What is the disadvantage? If a large object comes in at this time, and a large memory needs to be allocated to this object, first find a position from the scattered vacancies , and find a circle, and find that there is no vacancy suitable for your size, so I have to fight it at the end. the process of finding the open consumption performance, which is a Mark-Sweep algorithm disadvantages

This time Mark-Compact algorithm appeared, he was enhanced version of the Mark-Sweep algorithm, in basis of Mark-Sweep algorithm, coupled with the finishing stage, each cleaned inactive objects, the left will The active objects below are sorted to one side of the memory. After sorting is completed, the memory on the boundary is directly reclaimed

image.png

2. Stop-The-World

After talking about the generational recycling of V8, let's talk about a problem. The running of JS code needs the JS engine, and the garbage collection also needs the JS engine. What if the two are carried out at the same time and conflict? The answer is that garbage collection takes precedence over code execution. The code execution will be stopped first, and the JS code will be executed after the garbage collection is complete. This process is called full pause

Due to the small space of the Cenozoic and few surviving objects, coupled with the Scavenge algorithm, the pause time is shorter. But the old generation is different. In some cases, when there are more active objects, the pause time will be longer, making the page stuck.

3. Orinoco optimization

Orinoco is the project code name of the V8 garbage collector. In order to improve the user experience and solve the full stop problem, it proposes incremental marking, lazy cleaning, concurrency, and parallel optimization methods.

3.1 Incremental marking

We have repeatedly emphasized that marked first and then cleared. The incremental marking is optimized at the stage of marking Let me give a vivid example: there are a lot of rubbish on the road, passers-by unable to walk, and they need cleaners to clean them before they can go. A few days ago, there was relatively little rubbish on the road, so passers-by waited until the cleaners cleaned them up before passing, but in the next few days there was more and more rubbish. The cleaners took too long to clean up, and passers-by couldn’t wait. Talk to the cleaners. Say: "If you clean a section, I will walk a section. This is more efficient."

In the above example, cleaning up garbage-marking process, passerby-JS code, one-to-one correspondence. When the amount of garbage is small, the incremental mark optimization will not be done, but when the amount of garbage reaches a certain amount, the incremental mark will be turned on: mark one point, and the JS code runs for a period of time to improve efficiency

image.png

3.2 Lazy sweeping

As mentioned above, the incremental marking is only for the marking stage, and the lazy cleaning is for the clearing stage. After the incremental mark, when cleaning up inactive objects, the garbage collector found that even if it did not clean up, the remaining space was enough for the JS code to run, so delayed the cleanup and let the JS code execute first. Or only cleans part of the garbage, but not all of it. This optimization is called lazy cleaning

The emergence of sorting marks and lazy cleaning has greatly improved the full stop phenomenon. But the problem is also here: the increment mark is mark a bit, JS runs for a while, then if you just mark an object as an active object on the front foot, the back foot JS code will set this object as an inactive object, or vice versa, the front foot is not marked An object is an active object, and the back foot JS code sets this object as an active object. To sum up: the interleaving of marking and code execution may cause object references to change and marking errors. This requires the use of write barrier technology to record the changes in these references

3.3 Concurrent

Concurrent GC allows garbage collection without suspending the main thread at the same time. The two can be performed at the same time. Only in some cases, the garbage collector needs to be temporarily paused to allow the garbage collector to do some special operations. But this method also faces the problem of incremental collection, that is, during the garbage collection process, because the JavaScript code is executing, the reference relationship of the objects in the heap may change at any time, so the write barrier operation must be performed.

image.png

3.4 Parallel

Parallel GC allows the main thread and the auxiliary thread to perform the same GC work at the same time, so that the auxiliary thread can share the GC work of the main thread, so that the time spent in garbage collection is equal to the total time divided by the number of threads involved (plus some synchronization Overhead).

image.png

V8's current garbage collection mechanism

In 2011, V8 applied the incremental marking mechanism. Until 2018, Chrome64 and Node.js V10 started concurrent mark (Concurrent), and at the same time added parallel (Parallel) technology on the basis of concurrency, which greatly reduced the garbage collection time.

Secondary garbage collector

In the new generation of garbage collection, V8 uses a parallel mechanism. In the sorting phase, that is, when copying from-to space-to , multiple auxiliary threads are enabled to perform sorting in parallel. Since multiple threads compete for the memory resources of a new generation of heap, there may be a problem of an active object being copied by multiple threads. In order to solve this problem, V8 copies the active object in the first thread and the copy is completed. After that, it is necessary to maintain the pointer forwarding address after copying the active object, so that other assisting threads can determine whether the active object has been copied after finding the active object.

image.png

Main garbage collector

In V8, in the old generation garbage collection, if the memory size in the heap exceeds a certain threshold, concurrent (Concurrent) marking tasks will be enabled. Each auxiliary thread will track the pointer of each marked object and the reference to this object. When the JavaScript code is executed, the concurrent marking is also performed in the auxiliary process in the background. When an object pointer in the heap is used by JavaScript When the code is modified, the write barriers ( write barriers ) technology will track when the auxiliary thread performs concurrent marking.

When the concurrent marking is completed or the dynamically allocated memory reaches the limit, the main thread will perform the final fast marking step. At this time, the main thread will hang, and the main thread will scan the root set again to ensure that all objects are marked. , Because the auxiliary thread has already marked the active object, this scan of the main thread is only a check operation. After the confirmation is completed, some auxiliary threads will perform memory cleaning operations, and some auxiliary processes will perform memory cleaning operations, because they are all concurrent , And will not affect the execution of JavaScript code on the main thread.

image.png

Concluding remarks

After reading this article, the next time the interviewer asks you, you don't have to say silly: "quotation and notation". But you can conquer the interviewer more comprehensively and in more detail.

A follow-up article about the memory leak project will be published, so stay tuned! ! !

I am Lin Sanxin, an enthusiastic front-end rookie programmer. If you are motivated, like the front-end, and want to learn the front-end, then we can make friends, fish together haha, fish school, add me, please note [think]

image.png


Sunshine_Lin
2.1k 声望7.1k 粉丝