The g flag in JavaScript regular
origin
One day I saw a problem in the Sifu community, which is roughly described as follows
const list = ['a', 'b', '-', 'c', 'd'];
const reg = /[a-z]/g;
const letters = list.filter(i => reg.test(i));
// letters === ['a', 'c'];
// 如果正则不使用`g`标志可以得到所有的字母
// 为什么加入`g`之后就不可以了
For the problem, i
g
not needed.
But as far as my understanding of regularity (too superficial) is concerned, whether there is g
(just a global search, stop if it doesn't match) should have no effect, arousing my curiosity.
The suggested way of writing the above question is as follows
const reg = /[a-z]/g;
reg.test('a'); // => true
reg.test('a'); // => false
reg.test('a'); // => true
reg.test('a'); // => false
reg.test('a'); // => true
decryption process
First of all, the performance that can be determined must be caused by g
search engine
Open MDN and carefully check the role of the g
flag, and the conclusion is the same as my understanding.
I guess it should be that g
may have some kind of cache enabled, and because the reg
relative filter is a global variable, I changed the code to:
const list = ['a', 'b', '-', 'c', 'd'];
const letters = list.filter(i => /[a-z]/g.test(i));
// letters === ['a', 'b', 'c', 'd'];
Declare the regularity to each traversal, and the conclusion is correct, which verifies my conjecture. Also got it, the cache is somewhere in the regular
Below I find the corresponding source code to see the cause of the problem
source level
Since I'm looking at Rust recently, I use the source code written in Rust to view https://github/boa-dev/boa
After opening the project, click .
to enter vscode mode, command+p to search for regexp keywords
Enter the test.rs
file, command+f to search for /g
, you can find a test of last_index()
on line 90
#[test]
fn last_index() {
let mut context = Context::default();
let init = r#"
var regex = /[0-9]+(\.[0-9]+)?/g;
"#;
// forward 的作用:更改 context,并返回结果的字符串。
eprintln!("{}", forward(&mut context, init));
assert_eq!(forward(&mut context, "regex.lastIndex"), "0");
assert_eq!(forward(&mut context, "regex.test('1.0foo')"), "true");
assert_eq!(forward(&mut context, "regex.lastIndex"), "3");
assert_eq!(forward(&mut context, "regex.test('1.0foo')"), "false");
assert_eq!(forward(&mut context, "regex.lastIndex"), "0");
}
Seeing that there is the lastIndex
keyword, I have roughly guessed the cause of the problem here. The g flag has the last subscript after the match, which caused the problem.
We move our gaze into the mod.rs
file and search for test
The fn test()
method is seen on line 631
pub(crate) fn test(
this: &JsValue,
args: &[JsValue],
context: &mut Context,
) -> JsResult<JsValue> {
// 1. Let R be the this value.
// 2. If Type(R) is not Object, throw a TypeError exception.
let this = this.as_object().ok_or_else(|| {
context
.construct_type_error("RegExp.prototype.test method called on incompatible value")
})?;
// 3. Let string be ? ToString(S).
let arg_str = args
.get(0)
.cloned()
.unwrap_or_default()
.to_string(context)?;
// 4. Let match be ? RegExpExec(R, string).
let m = Self::abstract_exec(this, arg_str, context)?;
// 5. If match is not null, return true; else return false.
if m.is_some() {
Ok(JsValue::new(true))
} else {
Ok(JsValue::new(false))
}
}
test()
method found in Self::abstract_exec()
method
pub(crate) fn abstract_exec(
this: &JsObject,
input: JsString,
context: &mut Context,
) -> JsResult<Option<JsObject>> {
// 1. Assert: Type(R) is Object.
// 2. Assert: Type(S) is String.
// 3. Let exec be ? Get(R, "exec").
let exec = this.get("exec", context)?;
// 4. If IsCallable(exec) is true, then
if let Some(exec) = exec.as_callable() {
// a. Let result be ? Call(exec, R, « S »).
let result = exec.call(&this.clone().into(), &[input.into()], context)?;
// b. If Type(result) is neither Object nor Null, throw a TypeError exception.
if !result.is_object() && !result.is_null() {
return context.throw_type_error("regexp exec returned neither object nor null");
}
// c. Return result.
return Ok(result.as_object().cloned());
}
// 5. Perform ? RequireInternalSlot(R, [[RegExpMatcher]]).
if !this.is_regexp() {
return context.throw_type_error("RegExpExec called with invalid value");
}
// 6. Return ? RegExpBuiltinExec(R, S).
Self::abstract_builtin_exec(this, &input, context)
}
Found the Self::abstract_exec()
method in the Self::abstract_builtin_exec()
method again
pub(crate) fn abstract_builtin_exec(
this: &JsObject,
input: &JsString,
context: &mut Context,
) -> JsResult<Option<JsObject>> {
// 1. Assert: R is an initialized RegExp instance.
let rx = {
let obj = this.borrow();
if let Some(rx) = obj.as_regexp() {
rx.clone()
} else {
return context.throw_type_error("RegExpBuiltinExec called with invalid value");
}
};
// 2. Assert: Type(S) is String.
// 3. Let length be the number of code units in S.
let length = input.encode_utf16().count();
// 4. Let lastIndex be ℝ(? ToLength(? Get(R, "lastIndex"))).
let mut last_index = this.get("lastIndex", context)?.to_length(context)?;
// 5. Let flags be R.[[OriginalFlags]].
let flags = &rx.original_flags;
// 6. If flags contains "g", let global be true; else let global be false.
let global = flags.contains('g');
// 7. If flags contains "y", let sticky be true; else let sticky be false.
let sticky = flags.contains('y');
// 8. If global is false and sticky is false, set lastIndex to 0.
if !global && !sticky {
last_index = 0;
}
// 9. Let matcher be R.[[RegExpMatcher]].
let matcher = &rx.matcher;
// 10. If flags contains "u", let fullUnicode be true; else let fullUnicode be false.
let unicode = flags.contains('u');
// 11. Let matchSucceeded be false.
// 12. Repeat, while matchSucceeded is false,
let match_value = loop {
// a. If lastIndex > length, then
if last_index > length {
// i. If global is true or sticky is true, then
if global || sticky {
// 1. Perform ? Set(R, "lastIndex", +0𝔽, true).
this.set("lastIndex", 0, true, context)?;
}
// ii. Return null.
return Ok(None);
}
// b. Let r be matcher(S, lastIndex).
// Check if last_index is a valid utf8 index into input.
let last_byte_index = match String::from_utf16(
&input.encode_utf16().take(last_index).collect::<Vec<u16>>(),
) {
Ok(s) => s.len(),
Err(_) => {
return context
.throw_type_error("Failed to get byte index from utf16 encoded string")
}
};
let r = matcher.find_from(input, last_byte_index).next();
match r {
// c. If r is failure, then
None => {
// i. If sticky is true, then
if sticky {
// 1. Perform ? Set(R, "lastIndex", +0𝔽, true).
this.set("lastIndex", 0, true, context)?;
// 2. Return null.
return Ok(None);
}
// ii. Set lastIndex to AdvanceStringIndex(S, lastIndex, fullUnicode).
last_index = advance_string_index(input, last_index, unicode);
}
Some(m) => {
// c. If r is failure, then
#[allow(clippy::if_not_else)]
if m.start() != last_index {
// i. If sticky is true, then
if sticky {
// 1. Perform ? Set(R, "lastIndex", +0𝔽, true).
this.set("lastIndex", 0, true, context)?;
// 2. Return null.
return Ok(None);
}
// ii. Set lastIndex to AdvanceStringIndex(S, lastIndex, fullUnicode).
last_index = advance_string_index(input, last_index, unicode);
// d. Else,
} else {
//i. Assert: r is a State.
//ii. Set matchSucceeded to true.
break m;
}
}
}
};
// 13. Let e be r's endIndex value.
let mut e = match_value.end();
// 14. If fullUnicode is true, then
if unicode {
// e is an index into the Input character list, derived from S, matched by matcher.
// Let eUTF be the smallest index into S that corresponds to the character at element e of Input.
// If e is greater than or equal to the number of elements in Input, then eUTF is the number of code units in S.
// b. Set e to eUTF.
e = input.split_at(e).0.encode_utf16().count();
}
// 15. If global is true or sticky is true, then
if global || sticky {
// a. Perform ? Set(R, "lastIndex", 𝔽(e), true).
this.set("lastIndex", e, true, context)?;
}
// 16. Let n be the number of elements in r's captures List. (This is the same value as 22.2.2.1's NcapturingParens.)
let n = match_value.captures.len();
// 17. Assert: n < 23^2 - 1.
debug_assert!(n < 23usize.pow(2) - 1);
// 18. Let A be ! ArrayCreate(n + 1).
// 19. Assert: The mathematical value of A's "length" property is n + 1.
let a = Array::array_create(n + 1, None, context)?;
// 20. Perform ! CreateDataPropertyOrThrow(A, "index", 𝔽(lastIndex)).
a.create_data_property_or_throw("index", match_value.start(), context)
.expect("this CreateDataPropertyOrThrow call must not fail");
// 21. Perform ! CreateDataPropertyOrThrow(A, "input", S).
a.create_data_property_or_throw("input", input.clone(), context)
.expect("this CreateDataPropertyOrThrow call must not fail");
// 22. Let matchedSubstr be the substring of S from lastIndex to e.
let matched_substr = if let Some(s) = input.get(match_value.range()) {
s
} else {
""
};
// 23. Perform ! CreateDataPropertyOrThrow(A, "0", matchedSubstr).
a.create_data_property_or_throw(0, matched_substr, context)
.expect("this CreateDataPropertyOrThrow call must not fail");
// 24. If R contains any GroupName, then
// 25. Else,
let named_groups = match_value.named_groups();
let groups = if named_groups.clone().count() > 0 {
// a. Let groups be ! OrdinaryObjectCreate(null).
let groups = JsValue::from(JsObject::empty());
// Perform 27.f here
// f. If the ith capture of R was defined with a GroupName, then
// i. Let s be the CapturingGroupName of the corresponding RegExpIdentifierName.
// ii. Perform ! CreateDataPropertyOrThrow(groups, s, capturedValue).
for (name, range) in named_groups {
if let Some(range) = range {
let value = if let Some(s) = input.get(range.clone()) {
s
} else {
""
};
groups
.to_object(context)?
.create_data_property_or_throw(name, value, context)
.expect("this CreateDataPropertyOrThrow call must not fail");
}
}
groups
} else {
// a. Let groups be undefined.
JsValue::undefined()
};
// 26. Perform ! CreateDataPropertyOrThrow(A, "groups", groups).
a.create_data_property_or_throw("groups", groups, context)
.expect("this CreateDataPropertyOrThrow call must not fail");
// 27. For each integer i such that i ≥ 1 and i ≤ n, in ascending order, do
for i in 1..=n {
// a. Let captureI be ith element of r's captures List.
let capture = match_value.group(i);
let captured_value = match capture {
// b. If captureI is undefined, let capturedValue be undefined.
None => JsValue::undefined(),
// c. Else if fullUnicode is true, then
// d. Else,
Some(range) => {
if let Some(s) = input.get(range) {
s.into()
} else {
"".into()
}
}
};
// e. Perform ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(i)), capturedValue).
a.create_data_property_or_throw(i, captured_value, context)
.expect("this CreateDataPropertyOrThrow call must not fail");
}
// 28. Return A.
Ok(Some(a))
}
There are global
and last_index
in the Self::abstract_builtin_exec()
method, so it seems that the final execution method is here, look at the code in this method carefully (the code is written in great detail and there are comments for each step)
In step 12:
- lastIndex exceeds text length and sets lastIndex to 0 when global exists
Get the matched value (
match_value
)- If there is no match, it will be set to the return value of the method
advance_string_index()
advance_string_index()
not considered in the current question https://tc39.es/ecma262/#sec-advancestringindex
- If there is no match, it will be set to the return value of the method
Step 13 Get the endIndex of the matched value
Step 15 Set lastIndex to endIndex
At this point, the meaning of the g
flag is fully understood. There is a lastIndex
in the regular prototype chain. If the match is true, lastIndex
will not be reset to 0, and the previous position will be inherited at the next start.
in conclusion
Analyze in problem code
const reg = /[a-z]/g; // 声明后,lastIndex 为 0
reg.test('a'); // => true;第一次匹配后,lastIndex 为 1
reg.test('a'); // => false;第二次匹配由于 lastIndex 为 1,且字符只有一个,得到 false,将 lastIndex 置为 0
reg.test('a'); // => true;下面依次循环前两次的逻辑
reg.test('a'); // => false;
reg.test('a'); // => true;
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。