Preface
Hi everyone, this is Lin Sanxin. Two days ago, I accidentally saw a V8 garbage collection mechanism on station B. If I was interested, I watched it and found it a bit difficult to understand, so I was wondering if you all treat
V8 garbage like me. The knowledge of the recycling mechanism is quite ignorant, or I have read this knowledge, but I don’t understand it. So, I thought about it for three days, and thought about can use the most popular words to talk about the most difficult knowledge points.
Common understanding
I believe most students are often asked in interviews: about the V8 garbage collection mechanism"
At this time, most students will definitely answer: "There are two ways to collect garbage, one is citation method, the other is
notation"
Citation method
It is to determine the number of references of an object. If the number of references is 0, it will be recycled, and if the number of references
greater than 0, it will not be recycled. Please look at the following code
let obj1 = { name: '林三心', age: 22 }
let obj2 = obj1
let obj3 = obj1
obj1 = null
obj2 = null
obj3 = null
The citation method has disadvantages. After the following code is executed, it stands to reason that obj1 and obj2 will be recycled, but because they refer to each other, each reference number is 1, so they will not be recycled, resulting in a memory leak of
function fn () {
const obj1 = {}
const obj2 = {}
obj1.a = obj2
obj2.a = obj1
}
fn()
Notation
The notation method is to mark the reachable objects of
as garbage collection.
Then the question is, is it unreachable? What can be used to judge? (The reachable here is not reachable duck)
Closer to home, if you want to judge whether it is reachable, you have to say the of
. What is the reachability of 161d7a5b3f1aa9? It starts from
root object (window or global), and searches down the child node. The child node is found, indicating that the reference object of the child node is reachable, and mark it, and then search recursively until All child nodes are traversed to the end. Then it is not traversed to the node, it is not marked, and it will be regarded as not being referenced anywhere. It can be proved that this is an object that needs to be freed and can be recycled by the garbage collector.
// 可达
var name = '林三心'
var obj = {
arr: [1, 2, 3]
}
console.log(window.name) // 林三心
console.log(window.obj) // { arr: [1, 2, 3] }
console.log(window.obj.arr) // [1, 2, 3]
console.log(window.obj.arr[1]) // 2
function fn () {
var age = 22
}
// 不可达
console.log(window.age) // undefined
Ordinary understanding is actually not enough, because the garbage collection mechanism (GC) is actually more than these two algorithms. If you want to know more about the
V8 garbage collection mechanism, just continue to look down! ! !
JavaScript memory management
In fact, the process of JavaScript memory is very simple, divided into 3 steps:
- 1. Allocate the memory required by
users
- 2.
users get the memory and use the memory
- 3. The user of
does not need this memory, release it and return it to the system
So who are users? for example:
var num = ''
var str = '林三心'
var obj = { name: '林三心' }
obj = { name: '林胖子' }
The above num,str,obj
are users . As we all know, JavaScript data types are divided into
basic data types and
reference data types:
basic data type: has a fixed size, the value is stored in the
stack memory, and can be directly accessed by value
reference data type: the size is not fixed (attributes can be added), the
stack memory contains a pointer to
heap memory, which is accessed by reference
- Since the size of the basic data type stored in the stack memory is fixed, the memory of the stack memory is automatically allocated and released by the
- Since the size of the heap memory is not fixed, the system
cannot automatically release and recycle, so the
JS engine is required to manually release these memory
Why garbage collection
In Chrome, V8 is limited to the memory usage (64-bit about 1.4G/1464MB, 32-bit about 0.7G/732MB), why do you want to limit it?
- Surface reasons: V8 was originally designed for browsers, it is unlikely to encounter scenes that use a lot of memory
- Deep reason: V8's garbage collection mechanism is limited (if cleaning up a large amount of memory garbage is very time-consuming, this will cause the JavaScript thread to pause execution time, then performance and application plummet)
Speaking of the memory in the stack, the operating system will automatically allocate and release memory, and the memory in the heap is manually released by the JS engine (such as Chrome's V8). When our code is not written in the correct way, it will This makes the garbage collection mechanism of the JS engine unable to release the memory correctly (memory leak), which causes the memory occupied by the browser to continue to increase, which in turn causes the performance of JavaScript, applications, and operating systems to decline.
V8's garbage collection algorithm
1. Generational collection
In JavaScript, the object life cycle is divided into two situations
- The life cycle is very short: after a garbage collection, it is released and recycled
- The life cycle is very long: after many garbage collections, he still exists, and it doesn’t leave.
So here comes the problem. For those with a short life cycle, just recycle them, but for those with a long life cycle, they can’t be recycled after many times of recycling. If they know that they can’t be recycled, they continue to recycle useless efforts. Wouldn’t it be very expensive? performance?
For this problem, V8 has made an by generations. In layman's terms: V8 divides the heap into two spaces, one is called the new generation, the other is called the old generation, and the new generation stores objects with short survival periods. Place, the old generation is the place where the long-lived objects are stored
1-8M
generation usually only has the capacity of 061d7a5b3f1ec7, while the capacity of the old generation is much larger. For these two areas, V8 uses different garbage collectors and different collection algorithms to implement garbage collection more efficiently
secondary garbage collector + Scavenge algorithm: mainly responsible for the new generation of garbage collection
main garbage collector + Mark-Sweep && Mark-Compact algorithm: mainly responsible for garbage collection of the old generation
1.1 The new generation
In JavaScript, the memory allocated by any object declaration will be placed in the new generation first, and because most objects live in memory for a short period of time, a very efficient algorithm is required. In the new generation, mainly used Scavenge
algorithm for garbage collection, Scavenge
algorithm is a typical replication algorithm sacrifice space for time, ideal for small footprint on the scene.
Scavange new generation heap algorithm is divided into two parts, called
from-space
and to-space
, work is also very simple, is to from-space
copy of viable active object to to-space
, and the orderly line up the memory of these objects, then the from-space
memory after the release of the inactive object to be complete, the from space
and to space
interchanged, so that it can be made in a new generation of two regions may be reused.
The specific steps are the following 4 steps:
- 1. Mark active and inactive objects
- 2, copy
from-space
active objects toto-space
and sort - 3. Clear inactive objects in
from-space
- 4. Exchange the roles of
from-space
andto-space
Scavenge algorithm garbage collection
So, how does the garbage collector know which objects are active and which are inactive?
This has to mention one thing- reachability. What is accessibility? It
root object (window or global) and searches down the child node. The child node is found, indicating that the reference object of the child node is reachable, and mark it, and then search recursively until All child nodes are traversed to the end. Then it is not traversed to the node, it is not marked, and it will be regarded as not being referenced anywhere. It can be proved that this is an object that needs to be freed and can be recycled by the garbage collector.
When do objects in the young generation become objects in the old generation?
In the new generation, further subdivisions have been made. It is divided into nursery child and
intermediate child. When an object allocates memory for the first time, it will be allocated to the
generation 161d7a5b3f20cf nursery child. If the object still exists in the young generation after the next garbage collection, At this time, we move this object to the intermediate child of
. After the next garbage collection, if the object is still in the
generation, the 161d7a5b3f20d2 secondary garbage collector will move the object to the old generation. This moving process is called Promoted for
1.2 Old generation
The objects in the new generation space, the old objects left behind after many battles, are successfully promoted to the old generation space. Because these objects have gone through multiple recycling processes but have not been recycled, they are a group of tenacious vitality and survival. Objects with high rates, so in the old generation, the recovery algorithm should not use the Scavenge algorithm. Why, there are the following reasons:
Scavenge algorithm is a replication algorithm. Repeated replication of these objects with high survival rate is meaningless and extremely low in efficiency.
Scavenge algorithm is a space-for-time algorithm. The old generation has a large memory space. If the
Scavenge algorithm is used, the space resources are very wasteful, and the gains outweigh the losses. .
Mark-Sweep algorithm (mark cleaning) and
Mark-Compact algorithm (mark finishing) are used in the old generation.
Mark-Sweep (Mark-Sweep)
Mark-Sweep
divided into two stages, marking and cleaning. The previous Scavenge algorithm also has marking and cleaning, but
Mark-Sweep algorithm and the
Scavenge algorithm is that the latter needs to be copied and then cleaned up, and the former does not need to be
Mark-Sweep
directly. 061d7a5 After marking the active and inactive objects, cleanup is performed directly.
- Marking stage: the first scan of the old generation objects and marking the active objects
- Cleaning phase: Perform a second scan on the old generation objects to remove unmarked objects, that is, inactive objects
From the above picture, I think everyone has also discovered that there is a problem: after clearing inactive objects, there are a lot of scattered vacancies.
Mark-Compact (mark finishing)
Mark-Sweep algorithm performs garbage collection, it leaves many
scattered vacancies. What is the disadvantage? If a large object comes in at this time, and a large memory needs to be allocated to this object, first find a position from the scattered vacancies
, and find a circle, and find that there is no vacancy suitable for your size, so I have to fight it at the end. the process of finding the open consumption performance, which is
a Mark-Sweep algorithm
disadvantages
This time Mark-Compact algorithm appeared, he was
enhanced version of the Mark-Sweep algorithm, in
basis of Mark-Sweep algorithm, coupled with the
finishing stage, each cleaned inactive objects, the left will The active objects below are sorted to one side of the memory. After sorting is completed, the memory on the boundary is directly reclaimed
2. Stop-The-World
After talking about the generational recycling of V8, let's talk about a problem. The running of JS code needs the JS engine, and the garbage collection also needs the JS engine. What if the two are carried out at the same time and conflict? The answer is that garbage collection takes precedence over code execution. The code execution will be stopped first, and the JS code will be executed after the garbage collection is complete. This process is called
full pause
Due to the small space of the Cenozoic and few surviving objects, coupled with the Scavenge algorithm, the pause time is shorter. But the old generation is different. In some cases, when there are more active objects, the pause time will be longer, making the page
stuck.
3. Orinoco optimization
Orinoco is the project code name of the V8 garbage collector. In order to improve the user experience and solve the full stop problem, it proposes
incremental marking, lazy cleaning, concurrency, and parallel optimization methods.
3.1 Incremental marking
We have repeatedly emphasized that marked first and then cleared. The incremental marking is optimized at the stage of marking
Let me give a vivid example: there are a lot of
rubbish on the road,
passers-by unable to walk, and they need
cleaners to clean them before they can go. A few days ago, there was relatively little rubbish on the road, so passers-by waited until the cleaners cleaned them up before passing, but in the next few days there was more and more rubbish. The cleaners took too long to clean up, and passers-by couldn’t wait. Talk to the cleaners. Say: "If you clean a section, I will walk a section. This is more efficient."
In the above example, cleaning up garbage-marking process, passerby-JS code, one-to-one correspondence. When the amount of garbage is small, the incremental mark optimization will not be done, but when the amount of garbage reaches a certain amount, the incremental mark will be turned on:
mark one point, and the JS code runs for a period of time to improve efficiency
3.2 Lazy sweeping
As mentioned above, the incremental marking is only for the marking stage, and the lazy cleaning is for the
clearing stage. After the incremental mark, when cleaning up inactive objects, the garbage collector found that even if it did not clean up, the remaining space was enough for the JS code to run, so
delayed the cleanup and let the JS code execute first. Or
only cleans part of the garbage, but not all of it. This optimization is called
lazy cleaning
The emergence of sorting marks and lazy cleaning has greatly improved the full stop phenomenon. But the problem is also here: the increment mark is
mark a bit, JS runs for a while, then if you just mark an object as an active object on the front foot, the back foot JS code will set this object as an inactive object, or vice versa, the front foot is not marked An object is an active object, and the back foot JS code sets this object as an active object. To sum up: the interleaving of marking and code execution may cause
object references to change and marking errors. This requires the use of
write barrier technology to record the changes in these references
3.3 Concurrent
Concurrent GC allows garbage collection without suspending the main thread at the same time. The two can be performed at the same time. Only in some cases, the garbage collector needs to be temporarily paused to allow the garbage collector to do some special operations. But this method also faces the problem of incremental collection, that is, during the garbage collection process, because the JavaScript code is executing, the reference relationship of the objects in the heap may change at any time, so the write barrier operation must be performed.
3.4 Parallel
Parallel GC allows the main thread and the auxiliary thread to perform the same GC work at the same time, so that the auxiliary thread can share the GC work of the main thread, so that the time spent in garbage collection is equal to the total time divided by the number of threads involved (plus some synchronization Overhead).
V8's current garbage collection mechanism
In 2011, V8 applied the incremental marking mechanism. Until 2018, Chrome64 and Node.js V10 started
concurrent mark (Concurrent), and at the same time added
parallel (Parallel) technology on the basis of concurrency, which greatly reduced the garbage collection time.
Secondary garbage collector
In the new generation of garbage collection, V8 uses a parallel mechanism. In the sorting phase, that is, when copying from-to
space-to
, multiple auxiliary threads are enabled to perform sorting in parallel. Since multiple threads compete for the memory resources of a new generation of heap, there may be a problem of an active object being copied by multiple threads. In order to solve this problem, V8 copies the active object in the first thread and the copy is completed. After that, it is necessary to maintain the pointer forwarding address after copying the active object, so that other assisting threads can determine whether the active object has been copied after finding the active object.
Main garbage collector
In V8, in the old generation garbage collection, if the memory size in the heap exceeds a certain threshold, concurrent (Concurrent) marking tasks will be enabled. Each auxiliary thread will track the pointer of each marked object and the reference to this object. When the JavaScript code is executed, the concurrent marking is also performed in the auxiliary process in the background. When an object pointer in the heap is used by JavaScript When the code is modified, the write barriers ( write barriers ) technology will track when the auxiliary thread performs concurrent marking.
When the concurrent marking is completed or the dynamically allocated memory reaches the limit, the main thread will perform the final fast marking step. At this time, the main thread will hang, and the main thread will scan the root set again to ensure that all objects are marked. , Because the auxiliary thread has already marked the active object, this scan of the main thread is only a check operation. After the confirmation is completed, some auxiliary threads will perform memory cleaning operations, and some auxiliary processes will perform memory cleaning operations, because they are all concurrent , And will not affect the execution of JavaScript code on the main thread.
Concluding remarks
After reading this article, the next time the interviewer asks you, you don't have to say silly: "quotation and notation". But you can conquer the interviewer more comprehensively and in more detail.
A follow-up article about the memory leak project will be published, so stay tuned! ! !
I am Lin Sanxin, an enthusiastic front-end rookie programmer. If you are motivated, like the front-end, and want to learn the front-end, then we can make friends, fish together haha, fish school, add me, please note [think]
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。