3

Hello everyone, I am fried fish.

When I was looking at the Go1.18 Release Notes recently, I found that the Title method of the strings and bytes standard library was actually deprecated (Deprecated). Why is this?

Today, this article is made by Fried Fish and everyone will take a look.

introduce

Taking the strings standard library as an example, the role of the strings.Title method is to map the Unicode letters at the beginning of all words to their Unicode title case.

Examples are as follows:

import (
    "fmt"
    "strings"
)

func main() {
    fmt.Println(strings.Title("her royal highness"))
    fmt.Println(strings.Title("eddy cjy"))
    fmt.Println(strings.Title("хлеб"))
}

Output result:

Her Royal Highness
Eddy Cjy
Хлеб

These words are converted to their upper case.

question

It seems that everything is going well, but in fact he has 2 obvious flaws at this stage.

They are:

  • Unicode punctuation is not handled correctly.
  • The capitalization rules of a particular human language are not taken into account.

Next, we will talk about it in detail.

Unicode Punctuation

For the first question, an example is as follows:

import (
    "fmt"
    "strings"
)

func main() {
    a := strings.Title("go.go\u2024go")
    b := "Go.Go\u2024Go"
    if a != b {
        fmt.Printf("%s != %s\n", a, b)
    }
}

Output result:

Go.Go․go != Go.Go․Go

The result of the conversion processing of variable a is "Go.Go․go", but according to the actual request it should be "Go.Go․Go".

language-specific rules

The second question, the code is as follows:

func main() {
    fmt.Println(strings.Title("ijsland"))
}

Output result:

Ijsland

In Dutch words, "ijsland" should be capitalized as "IJsland", but the result is converted to "Ijsland".

solution

This problem was discovered in 2013 and originated from " strings: Title function incorrectly handles word breaks ", which was identified as an unplanned problem by Rob Pike, the father of the Go language.

As shown below:

Due to the Go1 compatibility guarantee agreement, this is "unfixable", and once fixed, it will affect the output of the function, which is a breaking change.

But it can also take another way, which is the "deprecation" mentioned in this article. Identified as follows:

// Title returns a copy of the string s with all Unicode letters that begin words
// mapped to their Unicode title case.
//
// BUG(rsc): The rule Title uses for word boundaries does not handle Unicode punctuation properly.
//
// Deprecated: Use golang.org/x/text/cases instead.
func Title(s string) string {

Mark "Deprecated" on the function:

https://pkg.go.dev

The corresponding Go documentation will fold it and clearly show that it is deprecated. It is recommended to use the golang.org/x/text/cases library directly to implement this function.

The new x/text/cases cases are as follows:

import (
    "fmt"

    "golang.org/x/text/cases"
    "golang.org/x/text/language"
)

func main() {
    src := []string{
        "hello world!",
        "i with dot",
        "'n ijsberg",
        "here comes O'Brian",
    }
    for _, c := range []cases.Caser{
        cases.Lower(language.Und),
        cases.Upper(language.Turkish),
        cases.Title(language.Dutch),
        cases.Title(language.Und, cases.NoLower),
    } {
        fmt.Println()
        for _, s := range src {
            fmt.Println(c.String(s))
        }
    }
}

Output result:

hello world!
i with dot
'n ijsberg
here comes o'brian

HELLO WORLD!
İ WİTH DOT
'N İJSBERG
HERE COMES O'BRİAN

Hello World!
I With Dot
'n IJsberg
Here Comes O'brian

Hello World!
I With Dot
'N Ijsberg
Here Comes O'Brian

The conversion of multiple languages is output. We focus on the code related to cases.Lower(language.Und) . The library will call:

  • cases.Title(<language>).Bytes(<bytes>)
  • cases.Title(<language>).String(<string>)

Specify the processing language in programming to solve the demands of symbols in different human languages, different languages and capitalized words, and avoid one size fits all.

Summarize

Although there is only a small function, it also extends a lot of problems. In essence, it is still in the design, there are cognitive limitations.

In addition, the strings.Title and bytes.Title functions are often misunderstood in practice as a method of converting the capitalization of the first letter, which is contrary to the design meaning.

Although such misunderstandings have brought better results in the end compared to defects, there are still big problems with some special scenarios and language support.

It can also be regarded as a loss of horses, which is not a blessing.

If you have any questions, welcome feedback and exchange in the comment area. The best relationship is .

The article is continuously updated, you can read it on WeChat search [Brain fried fish], this article GitHub github.com/eddycjy/blog has been included, learn Go language, you can see Go learning map and route .

煎鱼
8.4k 声望12.8k 粉丝