1

The g flag in JavaScript regular

origin

One day I saw a problem in the Sifu community, which is roughly described as follows

const list = ['a', 'b', '-', 'c', 'd'];
const reg = /[a-z]/g;
const letters = list.filter(i => reg.test(i));

// letters === ['a', 'c'];
// 如果正则不使用`g`标志可以得到所有的字母
// 为什么加入`g`之后就不可以了

For the problem, i g not needed.

But as far as my understanding of regularity (too superficial) is concerned, whether there is g (just a global search, stop if it doesn't match) should have no effect, arousing my curiosity.

The suggested way of writing the above question is as follows

const reg = /[a-z]/g;
reg.test('a'); // => true
reg.test('a'); // => false
reg.test('a'); // => true
reg.test('a'); // => false
reg.test('a'); // => true

decryption process

First of all, the performance that can be determined must be caused by g

search engine

Open MDN and carefully check the role of the g flag, and the conclusion is the same as my understanding.

I guess it should be that g may have some kind of cache enabled, and because the reg relative filter is a global variable, I changed the code to:

const list = ['a', 'b', '-', 'c', 'd'];
const letters = list.filter(i => /[a-z]/g.test(i));

// letters === ['a', 'b', 'c', 'd'];

Declare the regularity to each traversal, and the conclusion is correct, which verifies my conjecture. Also got it, the cache is somewhere in the regular

Below I find the corresponding source code to see the cause of the problem

source level

Since I'm looking at Rust recently, I use the source code written in Rust to view https://github/boa-dev/boa

After opening the project, click . to enter vscode mode, command+p to search for regexp keywords

Enter the test.rs file, command+f to search for /g , you can find a test of last_index() on line 90

#[test]
fn last_index() {
    let mut context = Context::default();
    let init = r#"
        var regex = /[0-9]+(\.[0-9]+)?/g;
        "#;
    // forward 的作用:更改 context,并返回结果的字符串。
    eprintln!("{}", forward(&mut context, init));
    assert_eq!(forward(&mut context, "regex.lastIndex"), "0");
    assert_eq!(forward(&mut context, "regex.test('1.0foo')"), "true");
    assert_eq!(forward(&mut context, "regex.lastIndex"), "3");
    assert_eq!(forward(&mut context, "regex.test('1.0foo')"), "false");
    assert_eq!(forward(&mut context, "regex.lastIndex"), "0");
}

Seeing that there is the lastIndex keyword, I have roughly guessed the cause of the problem here. The g flag has the last subscript after the match, which caused the problem.

We move our gaze into the mod.rs file and search for test

The fn test() method is seen on line 631

pub(crate) fn test(
    this: &JsValue,
    args: &[JsValue],
    context: &mut Context,
) -> JsResult<JsValue> {
    // 1. Let R be the this value.
    // 2. If Type(R) is not Object, throw a TypeError exception.
    let this = this.as_object().ok_or_else(|| {
        context
            .construct_type_error("RegExp.prototype.test method called on incompatible value")
    })?;

    // 3. Let string be ? ToString(S).
    let arg_str = args
        .get(0)
        .cloned()
        .unwrap_or_default()
        .to_string(context)?;

    // 4. Let match be ? RegExpExec(R, string).
    let m = Self::abstract_exec(this, arg_str, context)?;

    // 5. If match is not null, return true; else return false.
    if m.is_some() {
        Ok(JsValue::new(true))
    } else {
        Ok(JsValue::new(false))
    }
}

test() method found in Self::abstract_exec() method

pub(crate) fn abstract_exec(
    this: &JsObject,
    input: JsString,
    context: &mut Context,
) -> JsResult<Option<JsObject>> {
    // 1. Assert: Type(R) is Object.
    // 2. Assert: Type(S) is String.

    // 3. Let exec be ? Get(R, "exec").
    let exec = this.get("exec", context)?;

    // 4. If IsCallable(exec) is true, then
    if let Some(exec) = exec.as_callable() {
        // a. Let result be ? Call(exec, R, « S »).
        let result = exec.call(&this.clone().into(), &[input.into()], context)?;

        // b. If Type(result) is neither Object nor Null, throw a TypeError exception.
        if !result.is_object() && !result.is_null() {
            return context.throw_type_error("regexp exec returned neither object nor null");
        }

        // c. Return result.
        return Ok(result.as_object().cloned());
    }

    // 5. Perform ? RequireInternalSlot(R, [[RegExpMatcher]]).
    if !this.is_regexp() {
        return context.throw_type_error("RegExpExec called with invalid value");
    }

    // 6. Return ? RegExpBuiltinExec(R, S).
    Self::abstract_builtin_exec(this, &input, context)
}

Found the Self::abstract_exec() method in the Self::abstract_builtin_exec() method again

pub(crate) fn abstract_builtin_exec(
    this: &JsObject,
    input: &JsString,
    context: &mut Context,
) -> JsResult<Option<JsObject>> {
    // 1. Assert: R is an initialized RegExp instance.
    let rx = {
        let obj = this.borrow();
        if let Some(rx) = obj.as_regexp() {
            rx.clone()
        } else {
            return context.throw_type_error("RegExpBuiltinExec called with invalid value");
        }
    };

    // 2. Assert: Type(S) is String.

    // 3. Let length be the number of code units in S.
    let length = input.encode_utf16().count();

    // 4. Let lastIndex be ℝ(? ToLength(? Get(R, "lastIndex"))).
    let mut last_index = this.get("lastIndex", context)?.to_length(context)?;

    // 5. Let flags be R.[[OriginalFlags]].
    let flags = &rx.original_flags;

    // 6. If flags contains "g", let global be true; else let global be false.
    let global = flags.contains('g');

    // 7. If flags contains "y", let sticky be true; else let sticky be false.
    let sticky = flags.contains('y');

    // 8. If global is false and sticky is false, set lastIndex to 0.
    if !global && !sticky {
        last_index = 0;
    }

    // 9. Let matcher be R.[[RegExpMatcher]].
    let matcher = &rx.matcher;

    // 10. If flags contains "u", let fullUnicode be true; else let fullUnicode be false.
    let unicode = flags.contains('u');

    // 11. Let matchSucceeded be false.
    // 12. Repeat, while matchSucceeded is false,
    let match_value = loop {
        // a. If lastIndex > length, then
        if last_index > length {
            // i. If global is true or sticky is true, then
            if global || sticky {
                // 1. Perform ? Set(R, "lastIndex", +0𝔽, true).
                this.set("lastIndex", 0, true, context)?;
            }

            // ii. Return null.
            return Ok(None);
        }

        // b. Let r be matcher(S, lastIndex).
        // Check if last_index is a valid utf8 index into input.
        let last_byte_index = match String::from_utf16(
            &input.encode_utf16().take(last_index).collect::<Vec<u16>>(),
        ) {
            Ok(s) => s.len(),
            Err(_) => {
                return context
                    .throw_type_error("Failed to get byte index from utf16 encoded string")
            }
        };
        let r = matcher.find_from(input, last_byte_index).next();

        match r {
            // c. If r is failure, then
            None => {
                // i. If sticky is true, then
                if sticky {
                    // 1. Perform ? Set(R, "lastIndex", +0𝔽, true).
                    this.set("lastIndex", 0, true, context)?;

                    // 2. Return null.
                    return Ok(None);
                }

                // ii. Set lastIndex to AdvanceStringIndex(S, lastIndex, fullUnicode).
                last_index = advance_string_index(input, last_index, unicode);
            }

            Some(m) => {
                // c. If r is failure, then
                #[allow(clippy::if_not_else)]
                if m.start() != last_index {
                    // i. If sticky is true, then
                    if sticky {
                        // 1. Perform ? Set(R, "lastIndex", +0𝔽, true).
                        this.set("lastIndex", 0, true, context)?;

                        // 2. Return null.
                        return Ok(None);
                    }

                    // ii. Set lastIndex to AdvanceStringIndex(S, lastIndex, fullUnicode).
                    last_index = advance_string_index(input, last_index, unicode);
                // d. Else,
                } else {
                    //i. Assert: r is a State.
                    //ii. Set matchSucceeded to true.
                    break m;
                }
            }
        }
    };

    // 13. Let e be r's endIndex value.
    let mut e = match_value.end();

    // 14. If fullUnicode is true, then
    if unicode {
        // e is an index into the Input character list, derived from S, matched by matcher.
        // Let eUTF be the smallest index into S that corresponds to the character at element e of Input.
        // If e is greater than or equal to the number of elements in Input, then eUTF is the number of code units in S.
        // b. Set e to eUTF.
        e = input.split_at(e).0.encode_utf16().count();
    }

    // 15. If global is true or sticky is true, then
    if global || sticky {
        // a. Perform ? Set(R, "lastIndex", 𝔽(e), true).
        this.set("lastIndex", e, true, context)?;
    }

    // 16. Let n be the number of elements in r's captures List. (This is the same value as 22.2.2.1's NcapturingParens.)
    let n = match_value.captures.len();
    // 17. Assert: n < 23^2 - 1.
    debug_assert!(n < 23usize.pow(2) - 1);

    // 18. Let A be ! ArrayCreate(n + 1).
    // 19. Assert: The mathematical value of A's "length" property is n + 1.
    let a = Array::array_create(n + 1, None, context)?;

    // 20. Perform ! CreateDataPropertyOrThrow(A, "index", 𝔽(lastIndex)).
    a.create_data_property_or_throw("index", match_value.start(), context)
        .expect("this CreateDataPropertyOrThrow call must not fail");

    // 21. Perform ! CreateDataPropertyOrThrow(A, "input", S).
    a.create_data_property_or_throw("input", input.clone(), context)
        .expect("this CreateDataPropertyOrThrow call must not fail");

    // 22. Let matchedSubstr be the substring of S from lastIndex to e.
    let matched_substr = if let Some(s) = input.get(match_value.range()) {
        s
    } else {
        ""
    };

    // 23. Perform ! CreateDataPropertyOrThrow(A, "0", matchedSubstr).
    a.create_data_property_or_throw(0, matched_substr, context)
        .expect("this CreateDataPropertyOrThrow call must not fail");

    // 24. If R contains any GroupName, then
    // 25. Else,
    let named_groups = match_value.named_groups();
    let groups = if named_groups.clone().count() > 0 {
        // a. Let groups be ! OrdinaryObjectCreate(null).
        let groups = JsValue::from(JsObject::empty());

        // Perform 27.f here
        // f. If the ith capture of R was defined with a GroupName, then
        // i. Let s be the CapturingGroupName of the corresponding RegExpIdentifierName.
        // ii. Perform ! CreateDataPropertyOrThrow(groups, s, capturedValue).
        for (name, range) in named_groups {
            if let Some(range) = range {
                let value = if let Some(s) = input.get(range.clone()) {
                    s
                } else {
                    ""
                };

                groups
                    .to_object(context)?
                    .create_data_property_or_throw(name, value, context)
                    .expect("this CreateDataPropertyOrThrow call must not fail");
            }
        }
        groups
    } else {
        // a. Let groups be undefined.
        JsValue::undefined()
    };

    // 26. Perform ! CreateDataPropertyOrThrow(A, "groups", groups).
    a.create_data_property_or_throw("groups", groups, context)
        .expect("this CreateDataPropertyOrThrow call must not fail");

    // 27. For each integer i such that i ≥ 1 and i ≤ n, in ascending order, do
    for i in 1..=n {
        // a. Let captureI be ith element of r's captures List.
        let capture = match_value.group(i);

        let captured_value = match capture {
            // b. If captureI is undefined, let capturedValue be undefined.
            None => JsValue::undefined(),
            // c. Else if fullUnicode is true, then
            // d. Else,
            Some(range) => {
                if let Some(s) = input.get(range) {
                    s.into()
                } else {
                    "".into()
                }
            }
        };

        // e. Perform ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(i)), capturedValue).
        a.create_data_property_or_throw(i, captured_value, context)
            .expect("this CreateDataPropertyOrThrow call must not fail");
    }

    // 28. Return A.
    Ok(Some(a))
}

There are global and last_index in the Self::abstract_builtin_exec() method, so it seems that the final execution method is here, look at the code in this method carefully (the code is written in great detail and there are comments for each step)

In step 12:

  1. lastIndex exceeds text length and sets lastIndex to 0 when global exists
  2. Get the matched value ( match_value )

    1. If there is no match, it will be set to the return value of the method advance_string_index()
    2. advance_string_index() not considered in the current question https://tc39.es/ecma262/#sec-advancestringindex

Step 13 Get the endIndex of the matched value

Step 15 Set lastIndex to endIndex

At this point, the meaning of the g flag is fully understood. There is a lastIndex in the regular prototype chain. If the match is true, lastIndex will not be reset to 0, and the previous position will be inherited at the next start.

in conclusion

Analyze in problem code

const reg = /[a-z]/g; // 声明后,lastIndex 为 0
reg.test('a'); // => true;第一次匹配后,lastIndex 为 1
reg.test('a'); // => false;第二次匹配由于 lastIndex 为 1,且字符只有一个,得到 false,将 lastIndex 置为 0
reg.test('a'); // => true;下面依次循环前两次的逻辑
reg.test('a'); // => false;
reg.test('a'); // => true;

伍陆柒
1.2k 声望25 粉丝

如果觉得我的文章对大家有用的话, 可以去我的github start一下[链接]