One of the four major features of C++20: Detailed Module Features

What is the biggest feature of C++20?

The biggest feature is that no compiler has fully implemented all features so far.

Some people think that C++20 is the biggest change since C++11, even bigger than C++11. This article only introduces the Module part of the four major features of C++20, which is divided into three parts:

Explore the origin and pros and cons of the C++ compilation link model
Introduce the use of C++20 Module mechanism
Summarize the mechanism behind Module, the pros and cons, and the support of major compilers

C++ is compatible with C, not only compatible with C syntax, but also compatible with C's compilation and linking model. At the beginning of 1973, the C language was basically finalized: with preprocessing and support structure; the compilation model was also basically finalized as follows: preprocessing, compilation, assembly, linking four steps and continue to use the C language; in 1973, K&R two people use the C language Rewrite the Unix kernel.

Why is there pretreatment? Why is there a header file? In the era when C was born, the hardware configuration of the computer PDP-11 used to run the C compiler was as follows: memory: 64 KiB, hard disk: 512 KiB. The compiler cannot put large source code files into a small memory, so the design goal of the C compiler at that time was to support modular compilation, that is, the source code was divided into multiple source files and compiled one by one to generate multiple target files, and finally integrated (Link) into an executable file.

C compiler compiling multiple source files separately is actually a one pass compile process , that is: scanning the source code from beginning to end, generating the target file while scanning, and forgetting when you pass the eye (in the source file unit) The following code will not affect the previous decision of the compiler. This feature leads to the following characteristics of the C language:

structure must be defined before use, otherwise the type and offset of the member cannot be known, and the target code cannot be generated.

local variable defined first and then used. Otherwise, the type and position of the variable in the stack cannot be known. In order to facilitate the compiler to manage the stack space, the local variable must be defined at the beginning of the statement block.

External variable only needs to know the type and name (the two together are the declaration) to use (generate the target code), the actual address of the external variable is filled in by the linker.

external function only needs to know the function name, return value, parameter type list (function declaration) to generate the target code of the calling function, and the actual address of the function is filled in by the linker.

The header file and preprocessing just meet the above requirements. The header file only needs a small amount of code to declare the function prototype, structure and other information. When compiling, expand the header file into the implementation file, and the compiler can perfectly execute One pass comlile Process.

So far, what we have seen are the necessity and benefits of header files. Of course, header files also have many negative effects:

inefficient: The job of the 16110c471c9a6a header file is to provide forward declarations, and the method of providing forward declarations uses text copy. The text copy process does not have grammatical analysis. It will copy all the required and unnecessary declarations to Source file.

Transmissibility: can be “transmitted” to the top-level header file through the intermediate header file. This transparent transmission will bring a lot of trouble.

reduces compilation speed: added to ah and is included by three modules, then a will be expanded three times and compiled three times.

order of is related: The behavior of the 16110c471c9b56 program is affected by the inclusion of header files, and also by whether a certain header file is included, especially in C++ (overloading).

Uncertainty: same header file may show different behaviors in different source files. These different reasons may originate from the source file (such as other header files included in the source file, and the definition in the source file). Macros, etc.), may also be derived from compile options.

Module has been added to C++20. Let's first look at the basic usage posture of Module, and finally summarize the advantages of Module over header files.

Module (i.e. module) avoids many shortcomings of the traditional header file mechanism. A Module is an independent translation unit, containing one or more module interface files (i.e. module interface files), including 0 to more module implementation files ( That is, the module implementation file), use the Import keyword to import a module and use the methods exposed by the module.

implements the simplest Module

module_hello.cppm: Define a complete hello module and export a say_hello_to method for external use. Currently, the compilers do not specify the suffix of the module interface file, and the suffix ".cppm" is used uniformly in this article. The ".cppm" file has a special name "Module Interface File". It is worth noting that this file can not only declare entities, but also define entities.

The hello module can be used directly in the main function:

The compilation script is as follows, you need to compile module_hello.cppm first to generate a pcm file (Module cache file), which contains the symbols exported by the hello module.

The above code has the following details to note:

module hello: declares a module, preceded by an export, it means that the current file is a module interface file (module interface file), only in the module interface file can export entities (variables, functions, classes, namespace, etc.) . A module has at least one module interface file. The module interface file can only contain entity declarations or entity definitions.

import hello: not need to add angle brackets, and unlike include, import is not followed by the file name, but the module name (the file name is module_hello.cpp). The compiler does not force the module name to be consistent with the file name.

To export a function, add a export keyword before the function definition/declaration.

Import module is not transitive. hello module contains string_view, but the main function still needs to import <string_view>; before using the hello module.

The Import declaration in the module needs to be placed after the module declaration and before the declaration of other entities inside the module. means: import <iostream>; must be placed after export module hello; and before void internal_helper().

When compiling, you need to compile the basic modules first, and then compile the upper-level modules. In buildfile.sh, first compile module_hello to generate pcm, and then compile main.

Interface and implementation separation

In the previous example, the declaration and implementation of the interface are in the same file (in .cppm, to be precise, there is only the implementation of the function in this file, and the declaration is automatically generated by the compiler and placed in the cache file pcm), When the scale of the module becomes larger and the interface becomes more, putting all the entity definitions in the module interface file will be very unfavorable to the maintenance of the code. The module mechanism of C++20 also supports the separation of interface and implementation. Below we put the declaration and implementation of the interface into the .cppm and .cpp files respectively.

module_hello.cppm: We assume that the interfaces such as say_hello_to, func_a, and func_b are very complicated. The .cppm file only contains the declaration of the interface (the square method is an exception, it is a function template, which can only be defined in .cppm and cannot be compiled separately).

module_hello.cpp: Gives the corresponding implementation of each interface declaration of the hello module.

There are several details of the code that need attention:

The entire hello module is divided into module_hello.cppm and module_hello.cpp . The former is the module interface file (there is the export keyword before the module declaration), and the latter is the module implementation file. Currently, major compilers do not stipulate that the suffix of the module interface file must be cppm.

No entity can be exported in the module implementation file.

Function templates, such as the square function in the code, must be defined in the module interface file. For functions that use auto return values, the definition must also be placed in the module interface file.

Visibility control

In the first example of the module, we mentioned that the import of the module is not transitive: when the main function uses the hello module, you must import <string_view>. If you want the string_view module in the hello module to be exposed to users, you need to use export Import explicitly declares:

After the hello module explicitly exports string_view, there is no need to include string_view in the main file.

When the module becomes larger, just splitting the interface and implementation of the module into two files is not enough. The module implementation file will become very large and it is not easy to maintain the code. The module mechanism of C++20 supports submodules.

This time the module_hello.cppm file no longer defines or declares any functions, but only explicitly exports the two sub-modules hello.sub_a and hello.sub_b. The externally required methods are defined by the above two sub-modules, and module_hello.cppm serves as a "summary" "character of.

The submodule module hello.sub_a adopts a definition method that separates the interface and the implementation: the definition is given in ".cppm", and the implementation is given in ".cpp".

module hello.sub_b is the same as above, so I won’t repeat it.

In this way, the interface and implementation files of the hello module are split into two sub-modules, and each sub-module has its own interface and implementation files.

It is worth noting that the sub-module of C++20 is a "simulation mechanism" , the module hello.sub_b is a complete module, the dots in the middle do not represent the grammatical subordination, which is different from function names and variables The naming rules for identifiers such as names. The naming rules for modules allow dots to exist in the module names. Dots only help programmers understand the logical relationship between modules in terms of logic and semantics.

Module Partition

In addition to submodules, there is also Module Partition as a mechanism for handling complex modules. Module Partition has never thought of an appropriate Chinese translation, or it can be translated as a module partition, and Module Partition will be used directly below. Module Partition is divided into two types:

module implementation partition
module interface partition

module implementation partition can be generally understood as: split the module implementation file into multiple. module_hello.cppm file: gives the declaration of the module and the declaration of the exported function.

Part of the implementation code of the module is split into the module_hello_partition_internal.cpp file, which implements an internal method internal_helper.

Another part of the module is split into the module_hello.cpp file, which implements func_a, func_b, and references the internal method internal_helper (func_a, func_b can of course be split into two cpp files).

It is worth noting that when importing a module partition inside a module, you cannot import hello:internal; but directly import:internal;.

module interface partition can be understood as splitting the module declaration into multiple files. In the example of module implementation partition, function declarations are concentrated in only one file, and module interface partition can split these declarations into multiple interface files.

First define an internal helper: internal_helper:

The a part of the hello module adopts the method of declaration + definition and is defined in module_hello_partition_a.cppm:

The b part of the hello module adopts a declaration + definition separation method, and module_hello_partition_b.cppm only makes declarations:

module_hello_partition_b.cpp gives the corresponding implementation of part b of the hello module:

module_hello.cppm again plays the role of "summary", exporting part a + part b of the module for external use:

The usage of module implementation partition is relatively intuitive, which is equivalent to the situation of "a header file declares multiple cpp implementations" in our usual programming. The module interface partition is somewhat similar to the submodule mechanism, but there are many differences in syntax:

Import hello:partition_b cannot be used in the first line of module_hello_partition_b.cpp; although this seems more intuitive, it is not allowed.

Each module partition interface must ultimately be exported by the primary module interface file and cannot be omitted.

The primary module interface file cannot export the module implementation file, but only the module interface file, so export :internal; in module_hello.cppm is wrong.

Also as a mechanism for processing large modules, the most essential difference between Module Partition and sub-modules is that sub-modules can be imported independently by external users, while Module Partition is only visible inside the module.

global module fragment

（Global module fragments）

Before C++20, there were a large number of codes and header files that did not support modules. These codes were actually implicitly treated as global module fragments. The way the module code interacts with these fragments is as follows:

In fact, since most of the header files of the standard library have not been modularized (VS has modularized some header files), the code of the entire chapter 2 cannot be directly compiled under the current compiler environment (Clang12)-currently not Directly import modules such as <iostream>, through the global module section, you can make a convenient transition (#include <iostream> directly in the global module section). Another transition plan is the Module Map introduced in the next section-this mechanism can be Allows us to compile the old iostream into a Module.

Module Map

The Module Map mechanism can map ordinary header files into Modules, so that the old code can benefit from the Module mechanism. Take the Module Map mechanism in Clang13 as an example:

Suppose there is an ah header file, which has a long history and does not support Module:

By defining a module.modulemap file for the Clang compiler, the header file can be mapped into a module in this file:

The compilation script needs to compile the three modules A, ctype, and iostream in turn, and then compile the main file:

First use the -fmodule-map-file parameter to specify a module map file, and then use -fmodule to specify the module defined in the map file to compile the header file into pcm. When the main file uses modules such as A and iostream, you also need to use the fmodule-map-file parameter to specify the mdule map file, and use -fmodule to specify the name of the dependent module.

Note: There is less information about the Module Map mechanism, and the author has not been able to find out some details, such as:

After a header file is modularized by Module Map, how will the macros exposed in the header file be handled?

If the implementation of the entities declared in the header file are scattered in multiple cpps, how to organize the compilation?

Module and Namespace

Module and Namespace are two-dimensional concepts, and Namespace can also be derived in Module:

summary

Finally, compared with the shortcomings of the header files mentioned at the beginning, the module mechanism has the following advantages:

not need to recompile: , as a translation unit, generate pcm after one compilation, and then encounter the code of Import this module, the compiler will look for information such as function declarations from pcm, this feature Will greatly speed up the compilation speed of C++ code.

has better isolation: module will not leak to the outside of the module unless the export Import statement is used explicitly.

order is irrelevant: Import multiple modules, no need to care about the order between these modules.

reduces redundancy and inconsistency: The can directly export and define the entity in a single cppm file, but the large module will still split the declaration and implementation into different files.

Sub-modules, Module Partition and other mechanisms make large modules and super large modules more flexible.

The global module section and Module Map system make it possible for Module to interact with old header files.

There are also disadvantages:

compiler support is unstable: does not yet have a compiler that fully supports all the features of Module, and the Module Map features supported by Clang13 may not be retained to the main version.

compiling 16110c471cafc6, you need to analyze the dependencies and compile the most basic modules first.

existing C++ project needs to reorganize the pipline, and an automated build system has not yet appeared. It requires manual construction of scripts based on the dependency group, which is extremely difficult to implement.

Module not do?

Module cannot realize binary distribution of code, and module still needs to be distributed through source code.

pcm files cannot be used universally, pcm files of different compilers cannot be used universally, and pcm files with different parameters of the same compiler cannot be used universally.

It cannot be built automatically, and the build script needs to be organized manually at this stage.

compiler hide Module internal symbols to the outside world?

Before the emergence of the Module mechanism, the linkability of symbols is divided into external linkage (symbols can be shared between files) and internal linkage (symbols can only be used inside files), which can be passed through external, static, etc. Keywords control the linkability of a symbol.

The Module mechanism introduces module linkage, and symbols can be shared within the entire module (a module may have multiple partition files).

For the symbols exported by the module, the compiler performs name mangling on the symbols according to the existing rules (external connectivity).

For the symbols inside the Module, uniformly add the "_Zw" name modification in front of the symbol name, so that the linker will not link to the internal symbols when linking.

As of 2020.7, the support of the module mechanism by the three major compilers:

above is the entire content of this article. We have introduced one of the four major features of C++20. In subsequent articles, we will also arrange the interpretation of the other three (concept, range, coroutine), and welcome to continue Follow us. The content in the article will inevitably have omissions and deficiencies, please leave a message to communicate with us.

One of the four major features of C++20: Detailed Module Features

网易数智

引用和评论

InfoQ官媒报道|网易云信裴明明：云原生架构下中间件联邦高可用架构实践

C++ 中 VS 项目引入公共配置文件

疯狂推荐！从零开始 Dify 部署全攻略！

Cherry Studio 入门 MCP：为你的大模型插上翅膀

Visual Studio Code (VS Code) – C/C++ 入门

狂揽17k star！Docker可视化神器，一键部署项目真香！

OpenWebUI：一站式 AI 应用构建平台体验