I believe you will never be unfamiliar with zh_CN, whether it is in PHP or on our web pages, you will see it. In fact, this is to specify what country or region our display code is and what language to use. For this kind of regional language markup, PHP also has a lot of fun content. Today, the Locale class we want to learn is to manipulate the content of the regional language. It cannot be instantiated, and all the functions and methods are static.

Get and set the current regional language information

The first is that we can dynamically obtain and set the corresponding regional language information.

// # echo $LANG;
// en_US.UTF-8

// php.ini
// intl.default_locale => no value => no value

echo Locale::getDefault(), PHP_EOL; // en_US_POSIX
ini_set('intl.default_locale', 'zh_CN');
echo Locale::getDefault(), PHP_EOL; // zh_CN
Locale::setDefault('fr');
echo Locale::getDefault(), PHP_EOL; // fr

By default, the content of the intl.default_locale configuration in the php.ini file is obtained using the getDefault() method. If there is no configuration in php.ini, the content in the $LANG value of the operating system will be taken, which is the en_US_POSIX output in our example above. POSIX represents the configuration from the operating system.

Use ini_set() to directly modify the configuration of ini or use the setDefault() method to dynamically modify the current regional language settings.

Rules about language tags

Before continuing to learn the following content, let's first learn about the specification of language tagging. For most people, they may have only come into contact with tags such as en_US and zh_CN, but in fact their complete definition is very long, but when we use this abbreviation, a lot of content will be provided in the default form. The complete marking rules are:

language-extlang-script-region-variant-extension-privateuse
语言文字种类-扩展语言文字种类-书写格式-国家和地区-变体-扩展-私有

In other words, our zh_CN can be written like this:

zh-cmn-Hans-CN-Latn-pinyin

Represents: zh language type, Hans writing format is simplified Chinese, cmn mandarin, CN country and region, Latn variant Latin alphabet, pinyin variant pinyin.

Does it feel that something so simple suddenly becomes taller? In addition, the zh- prefix is no longer recommended. zh- is no longer the language code, but macrolang, which is the macro language. We directly use cmn, yue (Cantonese), wuu (Wu Chinese), hsn (Xiang Chinese, Hunan dialect) This kind of language can be used as a language. Therefore, the paragraph above can also be written like this:

cmn-Hans-CN-Latn-pinyin

In the last article, when we talked about NumberFormatter, we said that we can directly get the output of Chinese number format. Now we want the result of traditional Chinese? It's very simple, just add the Hant logo and write the format in Traditional Chinese.

Regarding the content of the language marking rules, you can check the reference link at the end of the article, which is more detailed.

$fmt = new NumberFormatter('zh-Hant', NumberFormatter::SPELLOUT);
echo $fmt->format(1234567.891234567890000), PHP_EOL; 
// 一百二十三萬四千五百六十七點八九一二三四五六七九

Obtain all kinds of information in the specified language markup rules

What can you do after learning the rules of language markup? The main function of the Locale class is to analyze and obtain these attribute information.

Obtain various attribute information separately

echo Locale::getDisplayLanguage('cmn-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_EOL; // cmn
echo Locale::getDisplayLanguage('zh-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_EOL; // 中文

echo Locale::getDisplayName('cmn-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_EOL; // cmn(简体,中国,LATN_PINYIN)
echo Locale::getDisplayName('zh-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_EOL; // 中文(简体,中国,LATN_PINYIN)

echo Locale::getDisplayRegion('cmn-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_EOL; // 中国
echo Locale::getDisplayRegion('zh-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_EOL; // 中国

echo Locale::getDisplayScript('cmn-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_EOL; // 简体中文
echo Locale::getDisplayScript('zh-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_EOL; // 简体中文

echo Locale::getDisplayVariant('cmn-Hans-Latn-pinyin', 'zh_CN'), PHP_EOL; // LATN_PINYIN
echo Locale::getDisplayVariant('zh-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_EOL; // LATN_PINYIN

We use two marking methods to test the code, and you can see the comparison of the results.

  • The getDisplayLanguage() method is used to obtain the displayed language information, that is, the language content in the rule.
  • The getDisplayName() method is used to obtain the standard language name, you can see that the content is richer.
  • The getDisplayRegion() method obviously gets the country information.
  • What getDisplayScript() gets is the writing format information.
  • getDisplayVariant() gets the variant information

Get property information in batches

Of course, we can also get some language-related information in batches.

$arr = Locale::parseLocale('zh-Hans-CN-Latn-pinyin');
if ($arr) {
    foreach ($arr as $key => $value) {
        echo "$key : $value ", PHP_EOL;
    }
}
// language : zh
// script : Hans
// region : CN
// variant0 : LATN
// variant1 : PINYIN

Use the parseLocale() method to get all kinds of information in a language tag and save it in an array. The key is the tag rule name and the value is the corresponding content. See if it is the same as the content we introduced above.

Get all variant information

As can be seen from the above code, we have two variant information. This can also be directly obtained through a getAllVariants() method to obtain an array of all variant information in the language tag.

$arr = Locale::getAllVariants('zh-Hans-CN-Latn-pinyin');
var_export($arr);
echo PHP_EOL;
//  array (
//     0 => 'LATN',
//     1 => 'PINYIN',
//   )

Get information about character set

echo Locale::canonicalize('zh-Hans-CN-Latn-pinyin'), PHP_EOL; // zh_Hans_CN_LATN_PINYIN

$keywords_arr = Locale::getKeywords('zh-cn@currency=CMY;collation=UTF-8');
if ($keywords_arr) {
    foreach ($keywords_arr as $key => $value) {
        echo "$key = $value", PHP_EOL;
    }
}
// collation = UTF-8
// currency = CMY

The canonicalize() method is used to display the language tag information in a standardized manner. You can see that it turns our underline into an underline and converts the following attributes to uppercase. This is the standardized way of writing. However, for our applications and web pages, the underline and upper and lower case are supported. Of course, it’s best to define it in accordance with the standard way of writing.

getKeywords() is used to obtain language-related information attributes from the @ symbol, for example, we defined this zh-cn, and then defined its currency as CMY and character set as UTF-8, and the currency can be obtained directly through getKeywords() And an array of character set attributes.

Matching judgment language tag information

For language tags, we can determine whether the given two tags match each other, such as:

echo (Locale::filterMatches('cmn-CN', 'zh-CN', false)) ? "Matches" : "Does not match", PHP_EOL;
echo (Locale::filterMatches('zh-CN-Latn', 'zh-CN', false)) ? "Matches" : "Does not match", PHP_EOL;

Of course, we can also use another lookup() method to determine which of a given set of language tags is closest to the specified tag.

$arr = [
    'zh-hans',
    'zh-hant',
    'zh',
    'zh-cn',
];
echo Locale::lookup($arr, 'zh-Hans-CN-Latn-pinyin', true, 'en_US'), PHP_EOL; // zh_hans

Generate a standard rule language tag

Now that the attribute information of various language tags can be obtained, can we generate a standard language tag content?

$arr = [
    'language' => 'en',
    'script' => 'Hans',
    'region' => 'CN',
    'variant2' => 'rozaj',
    'variant1' => 'nedis',
    'private1' => 'prv1',
    'private2' => 'prv2',
];
echo Locale::composeLocale($arr), PHP_EOL; // en_Hans_CN_nedis_rozaj_x_prv1_prv2

Yes, the composeLocale() method can generate a complete standard language tag format content based on the content of an array format. Of course, this test code is scribbled, which is equivalent to a mark of en_CN, which is normally not written like this.

acceptFromHttp reads language information from the request header

In addition, the Locale class also provides a method to obtain client browser language information from the Accept Language in the header.

// Locale::acceptFromHttp($_SERVER['HTTP_ACCEPT_LANGUAGE']);

echo Locale::acceptFromHttp('en_US'), PHP_EOL; // en_US
echo Locale::acceptFromHttp('en_AU'), PHP_EOL; // en_AU

echo Locale::acceptFromHttp('zh_CN'), PHP_EOL; // zh
echo Locale::acceptFromHttp('zh_TW'), PHP_EOL; // zh

But from the test results, in fact, it only needs a string parameter, so we can also test it on the command line. It should be noted that for Chinese, it cannot return area information, only language information.

Summarize

In fact, the content related to this Locale class has not been exposed to much in the author's daily development, but I believe that many students who are doing cross-border projects will have some understanding of them. It can only be said that the business is not accessible, so you can only learn it briefly first. Similarly, when you encounter related business needs in the future, don't forget their existence!

Test code:

https://github.com/zhangyue0503/dev-blog/blob/master/php/202011/source/5. Operation of regional language tag information in

Reference documents:

https://www.php.net/manual/zh/class.locale.php

https://www.zhihu.com/question/20797118/answer/63480740

===========

Searchable on their respective media platforms [Hardcore Project Manager]


硬核项目经理
90 声望18 粉丝