Benefits of Header-Only Libraries

Benefits of header-only libraries

There are situations when a header-only library is the only option, for example when dealing with templates.

Having a header-only library also means you don't have to worry about different platforms where the library might be used. When you separate the implementation, you usually do so to hide implementation details, and distribute the library as a combination of headers and libraries (lib, dll's or .so files). These of course have to be compiled for all different operating systems/versions you offer support.

You could also distribute the implementation files, but that would mean an extra step for the user - compiling your library before using it.

Of course, this applies on a case-by-case basis. For example, header-only libraries sometimes increase ~~code size &~~ compilation times.

Does the usage of header only libraries with different versions result in UB

It hardly makes any difference that you are using a static library.

The C++ standard states that if in a program there is multiple definitions of an inline function (or class template, or variable, etc.) and all the definitions are not the same, then you have UB.

Practically, it means that unless the changes between the 2 versions of the header library are very limited you will have UB.
For instance, if the only changes are whitespace changes, comments, or adding new symbols, then you will not have undefined behavior. However, if a single line of code in an existing function was changed, then it is UB.

From the C++17 final working draft (n4659.pdf):

6.2 One-definition rule

[...]

There can be more than one definition of a class type (Clause 12),
enumeration type (10.2), inline function with external linkage
(10.1.6), inline variable with external linkage (10.1.6), class
template (Clause 17), non-static function template (17.5.6), static
data member of a class template (17.5.1.3), member function of a class
template (17.5.1.1), or template specialization for which some
template parameters are not specified in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the
following requirements.

Given such an entity named D defined in more than one translation
unit, then

each definition of D shall consist of the same
sequence of tokens; and

in each definition of D, corresponding
names, looked up according to 6.4, shall refer to an entity defined
within the definition of D, or shall refer to the same entity, after
overload resolution (16.3) and after matching of partial template
specialization (17.8.3), except that a name can refer to (6.2.1)

a non-volatile const object with internal or no linkage if the object

has the same literal type in all definitions of D,
(6.2.1.2)

is initialized with a constant expression (8.20),

is not odr-used in any definition of D, and

has the same value in all definitions of D,

or

a reference with internal or no linkage initialized with a constant expression
such that the reference refers to the same entity in all definitions
of D; and (6.3)

in each definition of D, corresponding entities
shall have the same language linkage; and

in each definition
of D, the overloaded operators referred to, the implicit calls to
conversion functions, constructors, operator new functions and
operator delete functions, shall refer to the same function, or to a
function defined within the definition of D; and

in each definition of
D, a default argument used by an (implicit or explicit) function call
is treated as if its token sequence were present in the definition of
D; that is, the default argument is subject to the requirements
described in this paragraph (and, if the default argument has
subexpressions with default arguments, this requirement applies
recursively).28

if D is a class with an implicitly-declared
constructor (15.1), it is as if the constructor was implicitly defined
in every translation unit where it is odr-used, and the implicit
definition in every translation unit shall call the same constructor
for a subobject of D.

If D is a template and is defined in more than one translation unit,
then the preceding requirements shall apply both to names from the
template’s enclosing scope used in the template definition (17.6.3),
and also to dependent names at the point of instantiation (17.6.2). If
the definitions of D satisfy all these requirements, then the behavior
is as if there were a single definition of D. If the definitions of D
do not satisfy these requirements, then the behavior is undefined.

When are header-only libraries acceptable?

I work for a company that has a "Middleware" department of its own to maintain a few hundreds of libraries that are commonly used by a great many teams.

Despite being in the same company, we shy from header only approach and prefer to favor binary compability over performance because of the ease of maintenance.

The general consensus is that the performance gain (if any) would not be worth the trouble.

Furthermore, the so called "code-bloat" may have a negative impact on performance as more code to be loaded in the cache implies more cache miss, and those are the performance killers.

In an ideal world I suppose that the compiler and the linker could be intelligent enough NOT to generate those "multiple definitions" rules, but as long as it is not the case, I will (personally) favor:

binary compatibility
non-inlining (for methods that are more than a couple of lines)

Why don't you test ? Prepare the two libraries (one header only and the other without inlining methods over a couple of lines) and check their respective performance in YOUR case.

EDIT:

It's been pointed out by 'jalf' (thanks) that I should precise what I meant exactly by binary compatibility.

2 versions of a given library are said binary compatible if you can (usually) link against one or the other without any change of your own library.

Because you can only link with one version of a given library Target, all the libraries loaded that use Target will effectively use the same version... and here is the cause of the transitivity of this property.

MyLib --> Lib1 (v1), Lib2 (v1)
Lib1 (v1) --> Target (v1)
Lib2 (v1) --> Target (v1)

Now, say that we need a fix in Target for a feature only used by Lib2, we deliver a new version (v2). If (v2) is binary compatible with (v1), then we can do:

Lib1 (v1) --> Target (v2)
Lib2 (v1) --> Target (v2)

However if it's not the case, then we will have:

Lib1 (v2) --> Target (v2)
Lib2 (v2) --> Target (v2)

Yep, you read it right, even though Lib1 did not required the fix, you head to rebuild it against a new version of Target because this version is mandatory for the updated Lib2 and Executable can only link against one version of Target.

With a header-only library, since you don't have a library, you are effectively not binary compatible. Therefore each time you make some fix (security, critical bug, etc...) you need to deliver a new version, and all the libraries that depend on you (even indirectly) will have to be rebuilt against this new version!

Should C libraries be header only?

Personally, I'm a little confused by that fact since I'm comparing it
to the fact that the standard library is all header only.

No conforming implementation of the C standard library is header-only, at least not in the sense you mean. We know this, among other reasons, because the language specification says that

Provided that a library function can be declared without reference to
any type defined in a header, it is also permissible to declare the
function and use it without including its associated header.

(C2017, paragraph 7.1.4/2)

That would not be possible if the standard library were header-only. The natural and usual way to implement it is as a bona fide normal library, with normal(-ish) headers containing declarations of the provided functions and data types. Do not be confused by the fact that most C implementations link that library to your programs automatically, without you having to request it.

So what's the right way to build a library?

There is no one right way to build a library, but header-only libraries are very rare in the C world. They are more common in C++, but not the norm there, either. (In particular, the overall C++ standard library also is not header-only.) Without making it a matter of right vs wrong, I would recommend that you not devote any extra effort to producing header-only libraries until and unless you can articulate for yourself a specific purpose that would be served by doing so, and can justify to yourself that the benefits will outweigh the costs. I wouldn't be surprised if you never reach such a point.

How do I create a header-only library?

You can use the inline keyword:

// header.hpp (included into multiple translation units)

void foo_bad() {} // multiple definitions, one in every translation unit :(

inline void foo_good() {} // ok :)

inline allows the linker to simply pick one definition and discard the rest.

(As such, if those definitions don't actually match, you get a good dose of undefined behavior...!)

As an aside, member functions defined within a class-type, are implicitly marked inline:

struct myclass
{
    void i_am_inline_implicitly()
    {
        // because my definition is here
    }

    void but_i_am_not();
    void neither_am_i();
};

inline void myclass::but_i_am_not()
{
    // but that doesn't mean my definition cannot be explicitly inline
}

void myclass::neither_am_i()
{
    // but in this case, no inline for me :(
}

Quantifiable metrics (benchmarks) on the usage of header-only c++ libraries

Summary (notable points):

Two packages benchmarked (one with 78 compilation units, one with 301 compilation units)
Traditional Compiling (Multi Unit Compilation) resulted in a 7% faster application (in the 78 unit package); no change in application runtime in the 301 unit package.
Both Traditional Compiling and Header-only benchmarks used the same amount of memory when running (in both packages).
Header-only Compiling (Single Unit Compilation) resulted in an executable size that was 10% smaller in the 301 unit package (only 1% smaller in the 78 unit package).
Traditional Compiling used about a third of the memory to build over both packages.
Traditional Compiling took three times as long to compile (on the first compilation) and took only 4% of the time on recompile (as header-only has to recompile the all sources).
Traditional Compiling took longer to link on both the first compilation and subsequent compilations.

Box2D benchmark, data:

box2d_data_gcc.csv

Botan benchmark, data:

botan_data_gcc.csv

Box2D SUMMARY (78 Units)

Sample Image

Botan SUMMARY (301 Units)

Sample Image

NICE CHARTS:

Box2D executable size:

Box2D executable size

Box2D compile/link/build/run time:

Box2D compile/link/build/run time

Box2D compile/link/build/run max memory usage:

Box2D compile/link/build/run max memory usage

Botan executable size:

Botan executable size

Botan compile/link/build/run time:

Botan compile/link/build/run time

Botan compile/link/build/run max memory usage:

Botan compile/link/build/run max memory usage

Benchmark Details

TL;DR

The projects tested, Box2D and Botan were chosen because they are potentially computationally expensive, contain a good number of units, and actually had few or no errors compiling as a single unit. Many other projects were attempted but were consuming too much time to "fix" into compiling as one unit. The memory footprint is measured by polling the memory footprint at regular intervals and using the maximum, and thus might not be fully accurate.

Also, this benchmark does not do automatic header dependency generation (to detect header changes). In a project using a different build system, this may add time to all benchmarks.

There are 3 compilers in the benchmark, each with 5 configurations.

Compilers:

gcc
icc
clang

Compiler configurations:

Default - default compiler options
Optimized native - -O3 -march=native
Size optimized - -Os
LTO/IPO native - -O3 -flto -march=native with clang and gcc, -O3 -ipo -march=native with icpc/icc
Zero optimization - -Os

I think these each can have different bearings on the comparisons between single-unit and multi-unit builds. I included LTO/IPO so we might see how the "proper" way to achieve single-unit-effectiveness compares.

Explanation of csv fields:

Test Name - name of the benchmark. Examples: Botan, Box2D.
Test Configuration - name a particular configuration of this test (special cxx flags etc.). Usually the same as Test Name.
Compiler - name of the compiler used. Examples: gcc,icc,clang.
Compiler Configuration - name of a configuration of compiler options used. Example: gcc opt native
Compiler Version String - first line of output of compiler version from the compiler itself. Example: g++ --version produces g++ (GCC) 4.6.1 on my system.
Header only - a value of True if this test case was built as a single unit, False if it was built as a multi-unit project.
Units - number of units in the test case, even if it is built as a single unit.
Compile Time,Link Time,Build Time,Run Time - as it sounds.
Re-compile Time AVG,Re-compile Time MAX,Re-link Time AVG,Re-link Time MAX,Re-build Time AVG,Re-build Time MAX - the times across rebuilding the project after touching a single file. Each unit is touched, and for each, the project is rebuilt. The maximum times, and average times are recorded in these fields.
Compile Memory,Link Memory,Build Memory,Run Memory,Executable Size - as they sound.

To reproduce the benchmarks:

The bullwork is run.py.
Requires psutil (for memory footprint measurements).
Requires GNUMake.
As it is, requires gcc, clang, icc/icpc in the path. Can be modified to remove any of these of course.
Each benchmark should have a data-file that lists the units of that benchmarks. run.py will then create two test cases, one with each unit compiled separately, and one with each unit compiled together. Example: box2d.data. The file format is defined as a json string, containing a dictionary with the following keys
- "units" - a list of c/cpp/cc files that make up the units of this project
- "executable" - A name of the executable to be compiled.
- "link_libs" - A space separated list of installed libraries to link to.
- "include_directores" - A list of directories to include in the project.
- "command" - optional. special command to execute to run the benchmark. For example, "command": "botan_test --benchmark"
Not all C++ projects can this be easily done with; there must be no conflicts/ambiguities in the single unit.
To add a project to the test cases, modify the list test_base_cases in run.py with the information for the project, including the data file name.
If everything runs well, the output file data.csv should contain the benchmark results.

To produce the bar charts:

You should start with a data.csv file produced by the benchmark.
Get chart.py. Requires matplotlib.
Adjust the fields list to decide which graphs to produce.
Run python chart.py data.csv.
A file, test.png should now contain the result.

Box2D

Box2D was used from svn as is, revision 251.
The benchmark was taken from here, modified here and might not be representative of a good Box2D benchmark, and it might not use enough of Box2D to do this compiler benchmark justice.
The box2d.data file was manually written, by finding all the .cpp units.

Botan

Using Botan-1.10.3.
Data file: botan_bench.data.
First ran ./configure.py --disable-asm --with-openssl --enable-modules=asn1,benchmark,block,cms,engine,entropy,filters,hash,kdf,mac,bigint,ec_gfp,mp_generic,numbertheory,mutex,rng,ssl,stream,cvc, this generates the header files and Makefile.
I disabled assembly, because assembly might intefere with optimizations that can occure when the function boundaries do not block optimization. However, this is conjecture and might be totally wrong.
Then ran commands like grep -o "\./src.*cpp" Makefile and grep -o "\./checks.*" Makefile to obtain the .cpp units and put them into botan_bench.data file.
Modified /checks/checks.cpp to not call the x509 unit tests, and removed x509 check, because of conflict between Botan typedef and openssl.
The benchmark included in the Botan source was used.

System specs:

OpenSuse 11.4, 32-bit
4GB RAM
Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz

How come the fmt library is not header-only?

The main reason is build speed as others already correctly pointed out. For example, compiling with a static library (the default) is ~2.75x faster than with a header-only one:

#include <fmt/core.h>

int main() {
  fmt::print("The answer is {}.", 42);
}

% time c++ -c test.cc -I include -std=c++11
c++ -c test.cc -I include -std=c++11  0.27s user 0.05s system 97% cpu 0.324 total

% time c++ -c test.cc -I include -std=c++11 -DFMT_HEADER_ONLY
c++ -c test.cc -I include -std=c++11 -DFMT_HEADER_ONLY  0.81s user 0.07s system 98% cpu 0.891 total

In header-only libraries implementation details and dependencies leak into every translation unit that uses them.

Inclusion of a header-only library to multiple files producing a linker error

Is there something special about these functions?

Yes, they were not marked as inline like all the other free functions in the algorithm namespace.

When you define a function in a header file you need to mark it as inline so that it can be defined in multiple translation units (included in multiple source files) without error.

Benefits of Header-Only Libraries