ES6 new API: String (2)

This article introduces several new functions unicode

String.prototype.codePointAt
Function type:
```
(index?: number)=> number|undefined
```

codePointAt is a prototype function. It code point ( code point ) value of the character at that position in the string index This method can identify UTF-16 in . 4 bytes code points, to support a range of functions than the prototype charCodeAt wider, charCodeAt only recognizes 2 bytes of substantially planar character (BMP). In addition, when index boundary, codePointAt returns undefined , and charCodeAt returns NaN .

Except for these two points, codePointAt and charCodeAt are basically the same:

index parameter default values are 0

When the character is in the basic flat character set, the results returned by the two are the same.

const str = 'abc'; //字符 'a' 位于基本平面字符集中
console.log(str.codePointAt(0));//97
//index默认值为0
console.log(str.codePointAt());//97
//index越界时，返回undefined
console.log(str.codePointAt(5));//undefined
console.log(str.charCodeAt(0));//97
//index默认值为0
console.log(str.charCodeAt());//97
//index越界时，返回NaN
console.log(str.charCodeAt(5));//NaN

When the character is in the auxiliary plane character set, codePointAt can be correctly recognized and the code point of the corresponding character is returned. charCodeAt cannot be recognized correctly, and can only return 2-byte character at the current position.
For example, for the treble character 𝄞 auxiliary plane, it is represented by two 2-byte basic plane characters 0xd834 and 0xdd1e . When we are about 𝄞
When using charCodeAt , only the code point corresponding to the position can be obtained.
```
const str = '\ud834\udd1e'; //辅助平面字符 高音字符 𝄞
console.log(str.charCodeAt(0).toString(16)); //d834 
console.log(str.charCodeAt(1).toString(16)); //dd1e
```

 当我们使用`codePointAt`时，可以得到**𝄞**的码点`0x1d11e`。

 ```js
 console.log(str.codePointAt(0).toString(16)); //1d11e
 //当index为1时，'\udd1e'后面没有另一个代码单元，被认为只是一个2字节的字符，而非是一对代码单元，所以此时只返回'\udd1e'的码点，而非'\ud834\udd1e'的码点
 console.log(str.codePointAt(1).toString(16)); //dd1e
 ```

String.fromCodePoint
Function type:
```
(...codePoints: number[])=> string
```

The static function fromCodePoint returns the corresponding string according to the incoming unicode fromCharCode , it supports the direct input of the code point value of the auxiliary plane. Or in the treble clef 𝄞 example, the use fromCodePoint can direct incoming code point values 0x1d11e , and fromCharCode values need to pass 0xd834 and 0xdd1e .

console.log(String.fromCodePoint(0x1d11e)); //𝄞
console.log(String.fromCodePoint(0xd834, 0xdd1e)); //𝄞
console.log(String.fromCharCode(0x1d11e)); //턞 不能正确识别，乱码
console.log(String.fromCharCode(0xd834, 0xdd1e)); //𝄞

For basic plane characters, the fromCodePoint and fromCharCode are the same.

console.log(String.fromCodePoint(97)); //'a'
console.log(String.fromCodePoint(97, 98)); //'ab'
console.log(String.fromCodePoint()); //''
console.log(String.fromCharCode(97)); //'a'
console.log(String.fromCharCode(97, 98)); //'ab'
console.log(String.fromCharCode()); //''

String.prototype.normalize

Function type:

(form:'NFC'|'NFD'|'NFKC'|'NFKD')=>string

The prototype function normalize accepts a specified normalization (if you don’t understand the meaning of NFC, NFD, etc., click it) parameters in the form of form , form default value ' (Canonical Composition 161a9d2371e3 Normalization is equivalent to To decompose, and then reorganize with standard equivalence), and return the string normalized to

unicode of combined symbol (alphabetic characters with diacritics in tone, etc.) provided two kinds representation, one is to use a unicode code point represents a synthesis of a character is the letter In combination with additional symbols, uses two code points . For example, ń is a composite symbol. We can either use one code point 0x0144 represent it, or use two code points 0x006e and 0x0301 represent it.

const str1 = '\u0144'; //ń
const str2 = '\u006e\u0301'; //ń
console.log({
    str1,
    str2,
});//{ str1: 'ń', str2: 'ń' }

These two representations are visually and semantically the same, and they are standard equivalents. However, they are different at the code level. str1 is one code point , str2 is two code points , which is likely to cause problems.

console.log(str1.length, str2.length);//1 2
console.log(str1 === str2);//false

normalize function is to solve this problem, two strings by normalize achieve function normalization After that, it will not happen again this problem.

let str1 = '\u0144'; //ń
let str2 = '\u006e\u0301'; //ń
//正规化
str1 = str1.normalize();
str2 = str2.normalize();
console.log({
    str1,
    str2,
}); //{ str1: 'ń', str2: 'ń' }

console.log(str1.length, str2.length); //1 1
console.log(str1 === str2); //true

new unicode representation method
Before, we said that unicode characters can pass \u+code point, and ES6 has added a new representation, that is, \u+{code point}.
The difference between these two methods is also easy to think of. \u+{code point} supports 4-byte code points written in the auxiliary plane, while \u+ code points only support 2-byte code points on the basic plane.
```
//对于基本平面的2字节码点，两种没有区别
const str1 = '\u{0144}';
const str2 = '\u0144';
console.log(str1 === str2); //true
//高音符号
const str3 = '\u{1d11e}';
//错误的表示方法，被识别为了 \u1d11 和 e 两个字符
const str4 = '\u1d11e';
console.log(str4,str3===str4); //ᴑe false
```

Unicode is really a headache. If you have a friend who doesn’t know much about unicode, you can leave a message in the comment area. I will post another article that introduces unicode and JS in detail.

ES6 new API: String (2)

forceddd

`引用和评论`

从me.name = 'forceddd' 开始

2025年最新反编译微信小程序的教程及工具

手写一个动态海洋和天空效果的vue hooks

你可能不知道的图片加载相关知识

原生JS大揭秘—JS代码执行原理解刨

使用CSS给标题添加书名号并超出省略

原生electron起步-从零到一完成构建和打包