4

Each JS execution engine has its own implementation, this time we focus on how the V8 engine implements arrays.

The main intensive reading article this week is How JavaScript Array Works Internally?, which briefly introduces the array implementation mechanism of the V8 engine. The author will also refer to some other articles and combine the source code to explain.

Overview

The internal types of JS arrays have many patterns, such as:

  • PACKED_SMI_ELEMENTS
  • PACKED_DOUBLE_ELEMENTS
  • PACKED_ELEMENTS
  • HOLEY_SMI_ELEMENTS
  • HOLEY_DOUBLE_ELEMENTS
  • HOLEY_ELEMENTS

PACKED is translated as packing, which actually means "continuous array of values"; HOLEY is translated as holes, indicating that the array has many invalid items like holes, which actually means "array with holes in the middle", these two terms are mutually exclusive of.

SMI indicates that the data type is a 32-bit integer, DOUBLE indicates a floating-point type, and nothing is written, indicating that the type of the array is also mixed with strings, functions, etc., and the descriptions in this position are also mutually exclusive.

So you can look at the internal type of the array like this: [PACKED, HOLEY]_[SMI, DOUBLE, '']_ELEMENTS .

The most efficient type PACKED_SMI_ELEMENTS

One of the simplest empty array types defaults to PACKED_SMI_ELEMENTS:

 const arr = [] // PACKED_SMI_ELEMENTS

The PACKED_SMI_ELEMENTS type is the best performing mode, and the stored type defaults to a contiguous integer. When we insert an integer, V8 will automatically expand the array, and the type is still PACKED_SMI_ELEMENTS:

 const arr = [] // PACKED_SMI_ELEMENTS
arr.push(1) // PACKED_SMI_ELEMENTS

Or directly create an array with content, which is also of this type:

 const arr = [1, 2, 3] // PACKED_SMI_ELEMENTS

Automatic downgrade

V8 silently performs type downgrades when we use slang operations on arrays. For example, suddenly accessing the 100th item:

 const arr = [1, 2, 3] // PACKED_SMI_ELEMENTS
arr[100] = 4 // HOLEY_SMI_ELEMENTS

If you suddenly insert a floating point type, it will downgrade to DOUBLE:

 const arr = [1, 2, 3] // PACKED_SMI_ELEMENTS
arr.push(4.1) // PACKED_DOUBLE_ELEMENTS

Of course, if the two show operations are combined, HOLEY_DOUBLE_ELEMENTS will be successfully created by you:

 const arr = [1, 2, 3] // PACKED_SMI_ELEMENTS
arr[100] = 4.1 // HOLEY_DOUBLE_ELEMENTS

Be a little harder, insert a string or a function, and then it's the most bottom-line type, HOLEY_ELEMENTS:

 const arr = [1, 2, 3] // PACKED_SMI_ELEMENTS
arr[100] = '4' // HOLEY_ELEMENTS

Judging from whether there is Empty, the performance of PACKED > HOLEY, the Benchmark test result is about 23% faster.

In terms of type, SMI > DOUBLE > Empty type. The reason is that the type determines the length of each item in the array. The DOUBLE type means that each item may be SMI or DOUBLE, while the type of each item of the empty type is completely unconfirmed, and extra overhead will be spent on length confirmation.

Therefore, HOLEY_ELEMENTS is the least performant type of bottom line.

irreversibility of degradation

A key point is mentioned in the article, indicating that the downgrade is irreversible, as shown in the following figure:

<img width=500 src="https://s1.ax1x.com/2022/05/08/O3nzsf.png">

In fact, the law to be expressed is very simple, that is, PACKED will only become a worse HOLEY, and SMI will only change to a worse DOUBLE and an empty type, and these two changes are irreversible.

intensive reading

In order to verify the conjecture of the article, the author used v8-debug to debug it.

Debug with v8-debug

Let's first introduce v8-debug, it is a v8 engine debugging tool, first execute the following command line installation jsvu :

 npm i -g jsvu

Then execute jsvu , select your own system type according to the guide, select the js engine to be installed in the second step, select v8 and v8-debug :

 jsvu
// 选择 macos
// 选择 v8,v8-debug

Then create a js file, such as test.js , and then pass ~/.jsvu/v8-debug ./test.js to perform debugging. By default, no debugging content is output. We add parameters to output the information to be debugged according to requirements, such as:

 ~/.jsvu/v8-debug ./test.js --print-ast

This will print the syntax tree of the test.js file.

Use v8-debug to debug the internal implementation of arrays

In order to observe the internal implementation of the array, use console.log(arr) obviously not, we need to use %DebugPrint(arr) to print the array in debug mode, and this %DebugPrint Native API provided by functional V8 , it is not recognized in ordinary js scripts, so we need to add parameters --allow-natives-syntax when executing:

 ~/.jsvu/v8-debug ./test.js --allow-natives-syntax

At the same time, use --- test.js in %DebugPrint to print the array we want to debug, such as:

 const arr = []
%DebugPrint(arr)

The output is:

 DebugPrint: 0x120d000ca0b9: [JSArray]
 - map: 0x120d00283a71 <Map(PACKED_SMI_ELEMENTS)> [FastProperties]

That is, arr = [] creates an array with an internal type of PACKED_SMI_ELEMENTS , as expected.

Verify irreversible transformation

If you don't look at the source code, let's believe that the type conversion in the original text is irreversible, so let's do a test:

 const arr = [1, 2, 3]
arr.push(4.1)

console.log(arr);
%DebugPrint(arr)

arr.pop()

console.log(arr);
%DebugPrint(arr)

The core result of printing is:

 1,2,3,4.1
DebugPrint: 0xf91000ca195: [JSArray]
 - map: 0x0f9100283b11 <Map(PACKED_DOUBLE_ELEMENTS)> [FastProperties]

1,2,3
DebugPrint: 0xf91000ca195: [JSArray]
 - map: 0x0f9100283b11 <Map(PACKED_DOUBLE_ELEMENTS)> [FastProperties]

It can be seen that even if pop the original array is returned to a full integer, DOUBLE will not be optimized to SMI.

Take a look at the length test again:

 const arr = [1, 2, 3]
arr[4] = 4

console.log(arr);
%DebugPrint(arr)

arr.pop()
arr.pop()

console.log(arr);
%DebugPrint(arr)

The core result of printing is:

 1,2,3,,4
DebugPrint: 0x338b000ca175: [JSArray]
 - map: 0x338b00283ae9 <Map(HOLEY_SMI_ELEMENTS)> [FastProperties]

1,2,3
DebugPrint: 0x338b000ca175: [JSArray]
 - map: 0x338b00283ae9 <Map(HOLEY_SMI_ELEMENTS)> [FastProperties]

The irreversibility of PACKED to HOLEY is also proved.

dictionary mode

Another internal implementation of arrays is Dictionary Elements, which uses HashTable as the underlying structure to simulate the operation of arrays.

This mode is used when the length of the array is very large, it does not need to continuously open up memory space, but uses a fragmented memory space to process data storage through a HashTable addressing. This mode saves storage when the amount of data is large. space, but brings additional query overhead.

When the assignment to the array is much larger than the current array size, V8 will consider converting the array to Dictionary Elements for storage to save storage space.

Do a test:

 const arr = [1, 2, 3];
%DebugPrint(arr);

arr[3000] = 4;
%DebugPrint(arr);

The main output is:

 DebugPrint: 0x209d000ca115: [JSArray]
 - map: 0x209d00283a71 <Map(PACKED_SMI_ELEMENTS)> [FastProperties]

DebugPrint: 0x209d000ca115: [JSArray]
 - map: 0x209d00287d29 <Map(DICTIONARY_ELEMENTS)> [FastProperties]

As you can see, taking up too much space causes the internal implementation of the array to switch to DICTIONARY_ELEMENTS mode.

In fact, these two modes are converted into each other according to fixed rules. Specifically, check the V8 source code:

The dictionary mode is called SlowElements in V8 code, otherwise it is called FastElements, so to see the conversion rules, we mainly look at two functions: ShouldConvertToSlowElements and ShouldConvertToFastElements .

The following is the ShouldConvertToSlowElements code, that is, when to convert to dictionary mode:

 static inline bool ShouldConvertToSlowElements(
  uint32_t used_elements,
  uint32_t new_capacity
) {
  uint32_t size_threshold = NumberDictionary::kPreferFastElementsSizeFactor *
                            NumberDictionary::ComputeCapacity(used_elements) *
                            NumberDictionary::kEntrySize;
  return size_threshold <= new_capacity;
}

static inline bool ShouldConvertToSlowElements(
  JSObject object,
  uint32_t capacity,
  uint32_t index,
  uint32_t* new_capacity
) {
  STATIC_ASSERT(JSObject::kMaxUncheckedOldFastElementsLength <=
                JSObject::kMaxUncheckedFastElementsLength);
  if (index < capacity) {
    *new_capacity = capacity;
    return false;
  }
  if (index - capacity >= JSObject::kMaxGap) return true;
  *new_capacity = JSObject::NewElementsCapacity(index + 1);
  DCHECK_LT(index, *new_capacity);
  if (*new_capacity <= JSObject::kMaxUncheckedOldFastElementsLength ||
      (*new_capacity <= JSObject::kMaxUncheckedFastElementsLength &&
       ObjectInYoungGeneration(object))) {
    return false;
  }
  return ShouldConvertToSlowElements(object.GetFastElementsUsage(),
                                     *new_capacity);
}

ShouldConvertToSlowElements function is overloaded twice, so there are two judgment logics. The first new_capacity > size_threshold becomes a dictionary mode, new_capacity represents the new size, and size_threshold is calculated based on 3 existing size 2.

The second place index - capacity >= JSObject::kMaxGap becomes dictionary mode, where kMaxGap is the constant 1024, that is, the newly added HOLEY (hole) is greater than 1024, then it is converted to dictionary mode.

The function to convert from dictionary mode to normal mode is ShouldConvertToFastElements :

 static bool ShouldConvertToFastElements(
  JSObject object,
  NumberDictionary dictionary,
  uint32_t index,
  uint32_t* new_capacity
) {
  // If properties with non-standard attributes or accessors were added, we
  // cannot go back to fast elements.
  if (dictionary.requires_slow_elements()) return false;

  // Adding a property with this index will require slow elements.
  if (index >= static_cast<uint32_t>(Smi::kMaxValue)) return false;

  if (object.IsJSArray()) {
    Object length = JSArray::cast(object).length();
    if (!length.IsSmi()) return false;
    *new_capacity = static_cast<uint32_t>(Smi::ToInt(length));
  } else if (object.IsJSArgumentsObject()) {
    return false;
  } else {
    *new_capacity = dictionary.max_number_key() + 1;
  }
  *new_capacity = std::max(index + 1, *new_capacity);

  uint32_t dictionary_size = static_cast<uint32_t>(dictionary.Capacity()) *
                             NumberDictionary::kEntrySize;

  // Turn fast if the dictionary only saves 50% space.
  return 2 * dictionary_size >= *new_capacity;
}

The point is that the last line return 2 * dictionary_size >= *new_capacity means that when the dictionary mode only saves 50% space, it is better to switch to the normal mode (fast mode).

I will not test it specifically. Interested students can use the method described above to test it with v8-debug.

Summarize

The usage of JS arrays is very flexible, but when V8 is implemented in C++, it must be converted to a lower-level type, so in order to take into account performance, a fast and slow mode is made, and the fast mode is divided into SMI, DOUBLE; PACKED, and HOLEY modes are processed separately. Get as fast as possible.

That is to say, when we create an array at will, V8 will analyze the element composition and length changes of the array, and automatically distribute it to various sub-mode processing to maximize performance.

This mode enables JS developers to obtain a better developer experience, and in fact, the execution performance is almost the same as that of C++ native optimization, so from this point of view, JS is a language with a higher encapsulation level, which greatly reduces the The threshold for developers to learn.

Of course, JS also provides some relatively native syntaxes such as ArrayBuffer, or WASM to allow developers to directly operate lower-level features, which can make performance control more precise, but it brings greater learning and maintenance costs, which requires developers to operate according to the actual situation. situation balance.

The discussion address is: Intensive Reading "The Internal Implementation of JS Arrays" Issue #414 dt-fe/weekly

If you'd like to join the discussion, click here , there are new topics every week, with a weekend or Monday release. Front-end intensive reading - help you filter reliable content.

Follow Front-end Intensive Reading WeChat Official Account

<img width=200 src="https://img.alicdn.com/tfs/TB165W0MCzqK1RjSZFLXXcn2XXa-258-258.jpg">

Copyright notice: Free to reprint - non-commercial - non-derivative - keep attribution ( Creative Commons 3.0 license )

黄子毅
7k 声望9.6k 粉丝