What Exactly Are C++ Modules

What exactly are C++ modules?

Motivation

The simplistic answer is that a C++ module is like a header that is also a translation unit. It is like a header in that you can use it (with import, which is a new contextual keyword) to gain access to declarations from a library. Because it is a translation unit (or several for a complicated module), it is compiled separately and only once. (Recall that #include literally copies the contents of a file into the translation unit that contains the directive.) This combination yields a number of advantages:

  1. Isolation: because a module unit is a separate translation unit, it has its own set of macros and using declarations/directives that neither affect nor are affected by those in the importing translation unit or any other module. This prevents collisions between an identifier #defined in one header and used in another. While use of using still should be judicious, it is not intrinsically harmful to write even using namespace at namespace scope in a module interface.
  2. Interface control: because a module unit can declare entities with internal linkage (with static or namespace {}), with export (the keyword reserved for purposes like these since C++98), or with neither, it can restrict how much of its contents are available to clients. This replaces the namespace detail idiom which can conflict between headers (that use it in the same containing namespace).
  3. Deduplication: because in many cases it is no longer necessary to provide a declaration in a header file and a definition in a separate source file, redundancy and the associated opportunity for divergence are reduced.
  4. One Definition Rule violation avoidance: the ODR exists solely because of the need to define certain entities (types, inline functions/variables, and templates) in every translation unit that uses them. A module can define an entity just once and nonetheless provide that definition to clients. Also, existing headers that already violate the ODR via internal-linkage declarations stop being ill-formed, no diagnostic required, when they are converted into modules.
  5. Non-local variable initialization order: because import establishes a dependency order among translation units that contain (unique) variable definitions, there is an obvious order in which to initialize non-local variables with static storage duration. C++17 supplied inline variables with a controllable initialization order; modules extend that to normal variables (and do not need inline variables at all).
  6. Module-private declarations: entities declared in a module that neither are exported nor have internal linkage are usable (by name) by any translation unit in the module, providing a useful middle ground between the preexisting choices of static or not. While it remains to be seen what exactly implementations will do with these, they correspond closely to the notion of “hidden” (or “not exported”) symbols in a dynamic object, providing a potential language recognition of this practical dynamic linking optimization.
  7. ABI stability: the rules for inline (whose ODR-compatibility purpose is not relevant in a module) have been adjusted to support (but not require!) an implementation strategy where non-inline functions can serve as an ABI boundary for shared library upgrades.
  8. Compilation speed: because the contents of a module do not need to be reparsed as part of every translation unit that uses them, in many cases compilation proceeds much faster. It's worth noting that the critical path of compilation (which governs the latency of infinitely parallel builds) can actually be longer, because modules must be processed separately in dependency order, but the total CPU time is significantly reduced, and rebuilds of only some modules/clients are much faster.
  9. Tooling: the “structural declarations” involving import and module have restrictions on their use to make them readily and efficiently detectable by tools that need to understand the dependency graph of a project. The restrictions also allow most if not all existing uses of those common words as identifiers.

Approach

Because a name declared in a module must be found in a client, a significant new kind of name lookup is required that works across translation units; getting correct rules for argument-dependent lookup and template instantiation was a significant part of what made this proposal take over a decade to standardize. The simple rule is that (aside from being incompatible with internal linkage for obvious reasons) export affects only name lookup; any entity available via (e.g.) decltype or a template parameter has exactly the same behavior regardless of whether it is exported.

Because a module must be able to provide types, inline functions, and templates to its clients in a way that allows their contents to be used, typically a compiler generates an artifact when processing a module (sometimes called a Compiled Module Interface) that contains the detailed information needed by the clients. The CMI is similar to a pre-compiled header, but does not have the restrictions that the same headers must be included, in the same order, in every relevant translation unit. It is also similar to the behavior of Fortran modules, although there is no analog to their feature of importing only particular names from a module.

Because the compiler must be able to find the CMI based on import foo; (and find source files based on import :partition;), it must know some mapping from “foo” to the (CMI) file name. Clang has established the term “module map” for this concept; in general, it remains to be seen just how to handle situations like implicit directory structures or module (or partition) names that don’t match source file names.

Non-features

Like other “binary header” technologies, modules should not be taken to be a distribution mechanism (as much as those of a secretive bent might want to avoid providing headers and all the definitions of any contained templates). Nor are they “header-only” in the traditional sense, although a compiler could regenerate the CMI for each project using a module.

While in many other languages (e.g., Python), modules are units not only of compilation but also of naming, C++ modules are not namespaces. C++ already has namespaces, and modules change nothing about their usage and behavior (partly for backward compatibility). It is to be expected, however, that module names will often align with namespace names, especially for libraries with well-known namespace names that would be confusing as the name of any other module. (A nested::name may be rendered as a module name nested.name, since . and not :: is allowed there; a . has no significance in C++20 except as a convention.)

Modules also do not obsolete the pImpl idiom or prevent the fragile base class problem. If a class is complete for a client, then changing that class still requires recompiling the client in general.

Finally, modules do not provide a mechanism to provide the macros that are an important part of the interface of some libraries; it is possible to provide a wrapper header that looks like

// wants_macros.hpp
import wants.macros;
#define INTERFACE_MACRO(x) (wants::f(x),wants::g(x))

(You don't even need #include guards unless there might be other definitions of the same macro.)

Multi-file modules

A module has a single primary interface unit that contains export module A;: this is the translation unit processed by the compiler to produce the data needed by clients. It may recruit additional interface partitions that contain export module A:sub1;; these are separate translation units but are included in the one CMI for the module. It is also possible to have implementation partitions (module A:impl1;) that can be imported by the interface without providing their contents to clients of the overall module. (Some implementations may leak those contents to clients anyway for technical reasons, but this never affects name lookup.)

Finally, (non-partition) module implementation units (with simply module A;) provide nothing at all to clients, but can define entities declared in the module interface (which they implicitly import). All translation units of a module can use anything declared in another part of the same module that they import so long as it does not have internal linkage (in other words, they ignore export).

As a special case, a single-file module can contain a module :private; declaration that effectively packages an implementation unit with the interface; this is called a private module fragment. In particular, it can be used to define a class while leaving it incomplete in a client (which provides binary compatibility but will not prevent recompilation with typical build tools).

Upgrading

Converting a header-based library to a module is neither a trivial nor a monumental task. The required boilerplate is very minor (two lines in many cases), and it is possible to put export {} around relatively large sections of a file (although there are unfortunate limitations: no static_assert declarations or deduction guides may be enclosed). Generally, a namespace detail {} can either be converted to namespace {} or simply left unexported; in the latter case, its contents may often be moved to the containing namespace. Class members need to be explicitly marked inline if it is desired that even ABI-conservative implementations inline calls to them from other translation units.

Of course, not all libraries can be upgraded instantaneously; backward comptibility has always been one of C++’s emphases, and there are two separate mechanisms to allow module-based libraries to depend on header-based libraries (based on those supplied by initial experimental implementations). (In the other direction, a header can simply use import like anything else even if it is used by a module in either fashion.)

As in the Modules Technical Specification, a global module fragment may appear at the beginning of a module unit (introduced by a bare module;) that contains only preprocessor directives: in particular, #includes for the headers on which a module depends. It is possible in most cases to instantiate a template defined in a module that uses declarations from a header it includes because those declarations are incorporated into the CMI.

There is also the option to import a “modular” (or importable) header (import "foo.hpp";): what is imported is a synthesized header unit that acts like a module except that it exports everything it declares—even things with internal linkage (which may (still!) produce ODR violations if used outside the header) and macros. (It is an error to use a macro given different values by different imported header units; command-line macros (-D) aren't considered for that.) Informally, a header is modular if including it once, with no special macros defined, is sufficient to use it (rather than it being, say, a C implementation of templates with token pasting). If the implementation knows that a header is importable, it can replace an #include of it with an import automatically.

In C++20, the standard library is still presented as headers; all the C++ headers (but not the C headers or <cmeow> wrappers) are specified to be importable. C++23 will presumably additionally provide named modules (though perhaps not one per header).

Example

A very simple module might be

export module simple;
import <string_view>;
import <memory>;
using std::unique_ptr; // not exported
int *parse(std::string_view s) {/*…*/} // cannot collide with other modules
export namespace simple {
auto get_ints(const char *text)
{return unique_ptr<int[]>(parse(text));}
}

which could be used as

import simple;
int main() {
return simple::get_ints("1 1 2 3 5 8")[0]-1;
}

Conclusion

Modules are expected to improve C++ programming in a number of ways, but the improvements are incremental and (in practice) gradual. The committee has strongly rejected the idea of making modules a “new language” (e.g., that changes the rules for comparisons between signed and unsigned integers) because it would make it more difficult to convert existing code and would make it hazardous to move code between modular and non-modular files.

MSVC has had an implementation of modules (closely following the TS) for some time. Clang has had an implementation of importable headers for several years as well. GCC has a functional but incomplete implementation of the standardized version.

What is a module in C?

It is for the Linux Kernel Module. Section 1.1 mentions:

So, you want to write a kernel module. [..] now you want to get to
where the real action is, What exactly is a kernel module? Modules
are pieces of code that can be loaded and unloaded into the kernel
upon demand. They extend the functionality of the kernel [..]. For
example, one type of module is the device driver, which allows the
kernel to access hardware connected to the system.

Then, in section 2.3:

The macros module_init() and module_exit() macros initialize and cleanup your functions.

Example:

module_init(hello_2_init);
module_exit(hello_2_exit);

where both this dymmy functions call printk(); saying hello/goodbye world.

In section 3.1.1:

A module always begins with either the init_module() or the function you specify with module_init() call. This is the entry function for
modules
; it tells the kernel what functionality the module provides and sets up the kernel to run the module's functions when they're needed.

All modules end by calling either cleanup_module() or the function
you specify with the module_exit() call. This is the exit function
for modules
; it undoes whatever entry function did. It unregisters the
functionality that the entry function registered.

C++ Modules - why were they removed from C++0x? Will they be back later on?

From the State of C++ Evolution (Post San Francisco 2008), the Modules proposal was categorized as "Heading for a separate TR:"

These topics are deemed too important to wait for another standard after C++0x before being published, but too experimental to be finalised in time for the next Standard. Therefore, these features will be delivered by a technical report at the earliest opportunity.

The modules proposal just wasn't ready and waiting for it would have delayed finishing the C++0x standard. It wasn't really removed, it was just never incorporated into the working paper.

C++ Modules and the C++ ABI

Pre-compiled headers (PCH) are special files that certain compilers can generate for a .cpp file. What they are is exactly that: pre-compiled source code. They are source code that has been fed through the compiler and built into a compiler-dependent format.

PCHs are commonly used to speed up compilation. You put commonly used headers in the PCH, then just include the PCH. When you do a #include on the PCH, your compiler does not actually do the usual #include work. It instead loads these pre-compiled symbols directly into the compiler. No running a C++ preprocessor. No running a C++ compiler. No #including a million different files. One file is loaded and symbols appear fully formed directly in your compiler's workspace.

I mention all that because modules are PCHs in their perfect form. PCHs are basically a giant hack built on top of a system that doesn't allow for actual modules. The purpose of modules is ultimately to be able to take a file, generate a compiler-specific module file that contains symbols, and then some other file loads that module as needed. The symbols are pre-compiled, so again, there is no need to #include a bunch of stuff, run a compiler, etc. Your code says, import thing.foo, and it appears.

Look at any of the STL-derived standard library headers. Take <map> for example. Odds are good that this file is either gigantic or has a lot of #inclusions of other files that make the resulting file gigantic. That's a lot of C++ parsing that has to happen. It must happen for every .cpp file that has #include <map> in it. Every time you compile a source file, the compiler has to recompile the same thing. Over. And over. And over again.

Does <map> change between compilations? Nope, but your compiler can't know that. So it has to keep recompiling it. Every time you touch a .cpp file, it must compile every header that this .cpp file includes. Even though you didn't touch those headers or source files that affect those headers.

PCH files were a way to get around this problem. But they are limited, because they're just a hack. You can only include one per .cpp file, because it must be the first thing included by .cpp files. And since there is only one PCH, if you do something that changes the PCH (like add a new header to it), you have to recompile everything in that PCH.

Modules have essentially nothing to do with cross-compiler ABI (though having one of those would be nice, and modules would make it a bit easier to define one). Their fundamental purpose is to speed up compile times.

What is the difference between a clang (objective-C) module and a Swift module?

They are different. At the end of the build process though, they both need to be linked to your application/ library's other .o and .dylib files for it to run.

Swift modules

  • From Swift Serialization.md docs:

    The fundamental unit of distribution for Swift code is a module. A module contains declarations as an interface for clients to write code against.

  • Swift acccess control docs:

    A module is a single unit of code distribution: a framework or application that’s built and shipped as a single unit and that can be imported by another module with Swift’s import keyword.

  • Configured by .target()'s in Package.swift

  • Cannot have submodules, so users cannot import Module.Submodule in Swift. Users can still import specific entities, import struct PackageModel.Manifest, but this is a lot more verbose than importing submodules.

  • Its interface exists as a .swiftmodule. What is a .swiftmodule?. The documentation says:

    Conceptually, the file containing the interface for a module serves much the same purpose as the collection of C header files for a particular library.


  • The compiler produces this .swiftmodule file a lot, like a generated objective-C header, but instead of text, its a binary repesentation. It includes the bodies of inlinable functions, much like static inline functions in objective-C or header implementations in C++. However, Swift modules does include the names and types of private declarations. This allows you to refer to them in the debugger, but it does mean you shouldn't name a private variable after your deepest darkest secret. from WWDC 2018: Behind the Scenes of the Xcode Build Process

    • So private declarations are exposed in your .swiftmodule (Swift module interface).
  • When importing pure Objective-C frameworks into Swift, the Swift compiler uses its built-in clang compiler to import an Objective-C header.

The importer finds declarations in the headers exposed in Clangs .modulemap for that framework. (again, from WWDC2018)

  • When importing Objective-C + Swift frameworks into Swift, the Swift compiler uses the Umbrella header.

Clang modules

  • Configured by YourModuleName.modulemap file (previously module.map, but this is deprecated), formatted like this
  • Can have submodules, e.g. std module has std.io and std.complex.
  • A clang module exposes header files specified in the module map. Private details (in .m) are not exposed at all.
  • Is an improvement of the original #include or #import style imports to improve the build process (This is a big topic, read the Clang module docs).


Related Topics



Leave a reply



Submit