Hello everyone, I am fried fish.
When I was looking at the Go1.18 Release Notes recently, I found that the Title method of the strings and bytes standard library was actually deprecated (Deprecated). Why is this?
Today, this article is made by Fried Fish and everyone will take a look.
introduce
Taking the strings standard library as an example, the role of the strings.Title method is to map the Unicode letters at the beginning of all words to their Unicode title case.
Examples are as follows:
import (
"fmt"
"strings"
)
func main() {
fmt.Println(strings.Title("her royal highness"))
fmt.Println(strings.Title("eddy cjy"))
fmt.Println(strings.Title("хлеб"))
}
Output result:
Her Royal Highness
Eddy Cjy
Хлеб
These words are converted to their upper case.
question
It seems that everything is going well, but in fact he has 2 obvious flaws at this stage.
They are:
- Unicode punctuation is not handled correctly.
- The capitalization rules of a particular human language are not taken into account.
Next, we will talk about it in detail.
Unicode Punctuation
For the first question, an example is as follows:
import (
"fmt"
"strings"
)
func main() {
a := strings.Title("go.go\u2024go")
b := "Go.Go\u2024Go"
if a != b {
fmt.Printf("%s != %s\n", a, b)
}
}
Output result:
Go.Go․go != Go.Go․Go
The result of the conversion processing of variable a is "Go.Go․go", but according to the actual request it should be "Go.Go․Go".
language-specific rules
The second question, the code is as follows:
func main() {
fmt.Println(strings.Title("ijsland"))
}
Output result:
Ijsland
In Dutch words, "ijsland" should be capitalized as "IJsland", but the result is converted to "Ijsland".
solution
This problem was discovered in 2013 and originated from " strings: Title function incorrectly handles word breaks ", which was identified as an unplanned problem by Rob Pike, the father of the Go language.
As shown below:
Due to the Go1 compatibility guarantee agreement, this is "unfixable", and once fixed, it will affect the output of the function, which is a breaking change.
But it can also take another way, which is the "deprecation" mentioned in this article. Identified as follows:
// Title returns a copy of the string s with all Unicode letters that begin words
// mapped to their Unicode title case.
//
// BUG(rsc): The rule Title uses for word boundaries does not handle Unicode punctuation properly.
//
// Deprecated: Use golang.org/x/text/cases instead.
func Title(s string) string {
Mark "Deprecated" on the function:
The corresponding Go documentation will fold it and clearly show that it is deprecated. It is recommended to use the golang.org/x/text/cases
library directly to implement this function.
The new x/text/cases cases are as follows:
import (
"fmt"
"golang.org/x/text/cases"
"golang.org/x/text/language"
)
func main() {
src := []string{
"hello world!",
"i with dot",
"'n ijsberg",
"here comes O'Brian",
}
for _, c := range []cases.Caser{
cases.Lower(language.Und),
cases.Upper(language.Turkish),
cases.Title(language.Dutch),
cases.Title(language.Und, cases.NoLower),
} {
fmt.Println()
for _, s := range src {
fmt.Println(c.String(s))
}
}
}
Output result:
hello world!
i with dot
'n ijsberg
here comes o'brian
HELLO WORLD!
İ WİTH DOT
'N İJSBERG
HERE COMES O'BRİAN
Hello World!
I With Dot
'n IJsberg
Here Comes O'brian
Hello World!
I With Dot
'N Ijsberg
Here Comes O'Brian
The conversion of multiple languages is output. We focus on the code related to cases.Lower(language.Und)
. The library will call:
cases.Title(<language>).Bytes(<bytes>)
cases.Title(<language>).String(<string>)
Specify the processing language in programming to solve the demands of symbols in different human languages, different languages and capitalized words, and avoid one size fits all.
Summarize
Although there is only a small function, it also extends a lot of problems. In essence, it is still in the design, there are cognitive limitations.
In addition, the strings.Title
and bytes.Title
functions are often misunderstood in practice as a method of converting the capitalization of the first letter, which is contrary to the design meaning.
Although such misunderstandings have brought better results in the end compared to defects, there are still big problems with some special scenarios and language support.
It can also be regarded as a loss of horses, which is not a blessing.
If you have any questions, welcome feedback and exchange in the comment area. The best relationship is .
The article is continuously updated, you can read it on WeChat search [Brain fried fish], this article GitHub github.com/eddycjy/blog has been included, learn Go language, you can see Go learning map and route .
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。