2

In Java 18, UTF-8 is designated as the default character set for the standard Java API. With this change, APIs that rely on the default charset will remain consistent across all implementations, operating systems, locales, and configurations.

The main goals of making this change:

  • Makes Java programs more predictable and portable when their code relies on the default character set.
  • Clarifies where the standard Java API uses the default character set.
  • UTF-8 is standardized throughout the standard Java API, with the exception of console I/O.

It is important to note that the goal of this change is not to define new standard Java APIs or supported JDK APIs, although this work may uncover new convenience methods that may make existing APIs easier to use, this change It is not intended to deprecate or remove the standard Java API that relies on the default character set.

The standard Java API for reading and writing files and manipulating text allows character sets to be passed as parameters. The character set controls the conversion between raw bytes and 16-bit character values of the Java programming language. For example, supported character sets include US-ASCII, UTF-8, and ISO-8859-1.

If no charset parameter is passed, the standard Java API usually uses the default charset. The JDK chooses a default character set at startup based on the runtime environment: the operating system, the user's locale, and other factors.

Because the default charset is different everywhere, APIs that use the default charset pose a number of dangers that aren't obvious, even to seasoned developers.

Consider an application that creates a java.io.FileWriter without passing a charset, and then uses it to write some text to a file. The resulting file will contain a sequence of bytes encoded using the default charset of the JDK running the application. A second application running on a different machine, or by a different user on the same machine, creates a java.io.FileReader without passing the charset, and uses that to read in that file bytes. The generated text contains character sequences decoded using the default character set of the JDK running the second application. If the default charset is different between the JDK of the first application and the JDK of the second application, the generated text may be corrupted or incomplete because FileReader cannot tell that it uses relative The text is decoded in the wrong charset of FileWriter .

For example, here is a typical example where a Japanese text file encoded in UTF-8 on MacOS is corrupted when read in US-English or Japanese locale on Windows:

 java.io.FileReader(“hello.txt”) -> “こんにちは” (macOS)
java.io.FileReader(“hello.txt”) -> “ã?“ã‚“ã?«ã?¡ã? ” (Windows (en-US))
java.io.FileReader(“hello.txt”) -> “縺ォ縺。縺ッ” (Windows (ja-JP)

In JDK 17 and earlier, the default character set was determined at the Java runtime. On MacOS, it's UTF-8 except for the POSIX C locale. On other operating systems, it depends on the user's locale, e.g. on Windows it is a codepage based character set such as Windows-1252 or Windows-31j. If you don't know the default encoding of the Java application runtime environment, you can use this command to view the default character set of the current JDK:

 java -XshowSettings:properties -version 2>&1 | grep file.encoding

Programmer DD Tips : In the past versions, when reading and writing files, if the character set was not specified, the selected character set was related to the operating system, user area and other factors, and the default encoding of different operating systems was different, so it is very important There may be inconsistent reading and writing codes, resulting in garbled characters when the program runs under different systems. So this change can make Java-developed applications more portable. At the same time, the improvement from this point also reminds us that when reading and writing files, in order to have better portability of your application, you must add encoding parameters when reading and writing operations are involved. This allows better portability even for versions prior to Java 18, while providing a better compatibility premise for future upgrades to Java 21.

The supporting video for this article: https://www.bilibili.com/video/BV1YY4y1a7vGopen in new window

If you encounter difficulties in the learning process? You can join our high-quality technical exchange group , participate in exchanges and discussions, and learn and progress better! Also, don't walk away, follow me! Continue to update the new Java feature tutorial !

Welcome to my public account: Programmer DD. Learn about cutting-edge industry news for the first time, share in-depth technical dry goods, and obtain high-quality learning resources

程序猿DD
2.2k 声望2.8k 粉丝

作品:《Spring Cloud微服务实战》、SpringForAll社区、OpenWrite、Youtube中文配音