5 年后:首次胜利

  • Paper and Standard: N3366 - Restartable Functions for Efficient Character Conversions has entered the C2Y Standard. The author's long struggle related to text conversions in C and C++ has come to an end.
  • Journey Beginnings: Over 6 years ago in the C++ Unicode Study Group (SG16), the author wrote a text renderer in C# and then C++. They faced difficulties in making the text renderer cross-platform due to the awful APIs for text conversions in C and C++. For example, getting Windows Command Line Arguments into UTF-8 was difficult, and using C standard functions on a default-rolled Ubuntu LTS was stripping off accent marks.
  • Move to C: The author initially turned to C++, but realized that it wasn't as powerful or separate from C as claimed, especially regarding the C standard library. Questions about wchar_t, the execution encoding, and the wide execution encoding were often punted to the C standard library rather than being changed or mandated in C++. The author proposed interfaces similar to those in <wchar.h> and <uchar.h>, but found the existing design to be abysmal.
  • Frustration and Dilemma: The author expressed frustration with the fundamentally broken functions in C for text conversions. Henri Sivonen pointed out that standardizing something known to be busted was a bad idea. The author faced a dilemma: should they continue with the C approach and deal with its deficiencies, or look for a completely new approach?
  • New Approach: The author took a different approach from existing transcoding APIs like iconv and WideCharToMultiByte. They needed to turn things upside down and inside out and come up with something entirely new to handle all aspects of text conversions. This involved uniting the repulsive forces of old C APIs with the attractive forces of existing transcoding APIs.
  • Implementation and Future: Implementing this new approach will take time for glibc and musl-libc. The author hopes Microsoft doesn't make mistakes in the new functions and provides proper conversions to UTF-8, UTF-16, and UTF-32. The next target is to update [P1629] and start attending SG16 and C++ again. The author is grateful to many, including TomTom, Peter Bindels, the Netherlands National Body, NEN, sponsors, patrons, and h-vetinari for their support.
阅读 12
0 条评论