Standard Library Abi Compatibility

Standard library ABI compatibility

ABIs in practice are not linked to the standard, for example consider this following code compiled with gcc 4.9.4 and gcc 5.1
using the same flags:

-std=c++11 -O2

#include <string>
int main(){
return sizeof (std::string);
}

gcc 4.9.4 returns 8 from main, gcc 5.1 returns 32.

As for guarantees: it is complicated:

Nothing is guaranteed by the standard.

Practically MSVC used to break ABI compatability, they stopped (v140,v141,v142 use the same ABI), clang/gcc have a stable ABI for a long time.

For those interested in learning more:
For a broad discussion of ABI/C++ standard that is not directly related to this question you an look at this blog post.

Please explain the C++ ABI

Although the C++ Standard doesn't prescribe any ABI, some actual implementations try hard to preserve ABI compatibility between versions of the toolchain. E.g. with GCC 4.x, it was possible to use a library linked against an older version of libstdc++, from a program that's compiled by a newer toolchain with a newer libstdc++. The older versions of the symbols expected by the library are provided by the newer libstdc++.so, and layouts of the classes defined in the C++ Standard Library are the same.

But when C++11 introduced the new requirements to std::string and std::list, these couldn't be implemented in libstdc++ without changing the layout of these classes. This means that, if you don't use the _GLIBCXX_USE_CXX11_ABI=0 kludge with GCC 5 and higher, you can't pass e.g. std::string objects between a GCC4-compiled library and a GCC5-compiled program. So the ABI was broken.

Some C++ implementations don't try that hard to have compatible ABI: e.g. MSVC++ doesn't provide such compatibility between major compiler releases (see this question), so one has to provide different versions of library to use with different versions of MSVC++.

So, in general, you can't mix and match libraries and executables compiled with different versions even of the same toolchain.

Why is library API + compiler ABI enough to ensure compatibility between objects with different versions of gcc?

It seems like it is possible that gcc can change the header implementing std::string

It can't make arbitrary changes. That would (as you surmise) break things. But only some changes to std::string will affect the memory layout of the class, and those are the ones that matter.

For an example of an optimisation that wouldn't affect the memory layout: they could change the code inside

size_t string::find (const string& str, size_t pos = 0) const;

to use a more efficient algorithm. That wouldn't change the memory layout of the string.

In fact, if you temporarily ignore the fact that everything is templated and so has to be in header files, you can imagine string as being defined in a .h file and implemented in a .cpp file. The memory layout is determined only from the contents of the header file. Anything in the .cpp file could be safely changed.

An example of something they couldn't do is to add a new data member to string. That would definitely break things.

You mentioned the dual ABI case. What happened there is that they needed to make a breaking change, and so they had to introduce a new string class. One of the classes is std::string and the other std::_cxx11::string. (Messy things happen under the hood so most users don't realise they are using std::_cxx11::string on newer versions of the compiler/standard library.)

Do static libraries behave like dynamic libraries in terms of ABI compatibility?

There are several kinds of mismatches which could occur, including:

  1. Name mangling. This is a major reason why different compilers may be incompatible. However, many compilers are cross-compatible.
  2. Calling conventions. How to pass function arguments, return values, etc. Tends to be tied to the CPU architecture and operating system.
  3. Language types. For example int can have different widths on different operating systems.
  4. Library types. Commonly std::string, std::map and other classes can be implemented differently by each library vendor, and since many people use the standard library provided by their compiler vendor, this issue may arise.

Some of these, like name mangling, are likely to result in build failures. Others may seem to build OK but behave unexpectedly at runtime. Whether you're using static or dynamic linking, the overall situation is the same.

Does c++ std::string are binary compatible between different compilers and std libraries?

C++ does not standardize ABI.

In practice, different std::string implementations may employ small string optimization differently tuned (with different buffer size), or no small string optimization. They may also have different order of fields. Additionally, std::allocator that allocates std:;string by default from different implementations may use different arenas.

std::string_view is more likely to match, though there could be variations (two pointers or pointer and size). Debug versions may have some additional information.

GCC ABI compatibility

The official ABI page points to an ABIcheck. This tool may do, what you want.

Easy way to guarantee binary compatibility for C++ library, C linkage?

For binary compatibility, what also matters is the application binary interface (ABI).

BTW, C++ functions can not only return some results, but also throw some exceptions. You need to document that (even if you decide that no exceptions are thrown, which is actually difficult to achieve, e.g. because new can throw std::bad_alloc and this can happen inside some internal layers of your software).

Notice that (notably on Linux) the ABI of the C++ standard library did change with various versions of the compiler. For examples, implementation of std::string-s or of standard C++ containers did vary (in a subtle incompatible way, at the binary level). You could statically link the C++ standard library (but this might not always be enough, or be brittle, e.g. if the user program also needs it because his code is in C++).

Since most C++ compilers and standard libraries are free software on Linux, you could dive into their source code to understand all the details. This should take you years of efforts and you need to budget that.

So it is harder than what you believe. At the very least, document the version of the C++ compiler used to build your thing, and the version of the C++ standard library involved.

Another approach (which I recommend) could be to make your thing free software or open source and publish the source code on github or elsewhere, and let your user compile your source code (with his C++ compiler and C++ standard library).

Binary compatibility with various C++ compilers and standard C++ libraries is actually difficult to achieve, because the evil is in the details (if you release only some binary thing). You might publish several binaries compiled with various compiler versions (e.g. with g++-5, g++-6, clang++-4.0 etc...).

I don't know exact way to guarantee binary compatibility for C++ library

Such a general goal is unrealistic and over-ambitious. At most you might publish several binaries and document with what exact C++ compiler (and version, and compiler options) and C++ standard library each have been compiled.

The compatibility I'd like to achieve is compiler forward compatibility and standard library forward compatibility.

This is impossible in general. Various C++ compilers did break ABI compatibility in the past (and this has been documented). The evil is in the details (so even if it apparently seems to work most of the time, it could often be buggy).

Am I right?

No, you are wrong and over-ambitious. At most you could release a binary (probably you should release several ones) and tell exactly how it was built (what C++ compiler and version, what compilation flags, what C++ standard library and version and even what C standard library and version; if you use some external C++ or C library - like Qt, Boost, Sqlite, etc... - you also need to document their version). Binary compatibility for C++ is a fiction.

You could and probably should use (on Linux particularly) package management systems, e.g. publish some .deb package for some particular Linux distributions (e.g. a given version of Debian or Ubuntu). You'll list exact dependencies in your binary package.

Be aware that maintaining several binary versions (and binary packages) is a lot of boring work that you should budget. You might ask permission from your manager or client to open the source of your library (and quite often this takes less work). For instance, your library might be published under GPLv3 license: open source programs could freely use it, but proprietary applications would have to buy some other license from your company.

Does using -std=c++11 break binary compatibility?

An authoritative reference can be found in gcc's C++11 ABI Compatibility page.

The short summary is: the are no language reasons the ABI gets broken but there are a number of mandated changes which cause the standard C++ library shipping with gcc to change.

What is an application binary interface (ABI)?

One easy way to understand "ABI" is to compare it to "API".

You are already familiar with the concept of an API. If you want to use the features of, say, some library or your OS, you will program against an API. The API consists of data types/structures, constants, functions, etc that you can use in your code to access the functionality of that external component.

An ABI is very similar. Think of it as the compiled version of an API (or as an API on the machine-language level). When you write source code, you access the library through an API. Once the code is compiled, your application accesses the binary data in the library through the ABI. The ABI defines the structures and methods that your compiled application will use to access the external library (just like the API did), only on a lower level. Your API defines the order in which you pass arguments to a function. Your ABI defines the mechanics of how these arguments are passed (registers, stack, etc.). Your API defines which functions are part of your library. Your ABI defines how your code is stored inside the library file, so that any program using your library can locate the desired function and execute it.

ABIs are important when it comes to applications that use external libraries. Libraries are full of code and other resources, but your program has to know how to locate what it needs inside the library file. Your ABI defines how the contents of a library are stored inside the file, and your program uses the ABI to search through the file and find what it needs. If everything in your system conforms to the same ABI, then any program is able to work with any library file, no matter who created them. Linux and Windows use different ABIs, so a Windows program won't know how to access a library compiled for Linux.

Sometimes, ABI changes are unavoidable. When this happens, any programs that use that library will not work unless they are re-compiled to use the new version of the library. If the ABI changes but the API does not, then the old and new library versions are sometimes called "source compatible". This implies that while a program compiled for one library version will not work with the other, source code written for one will work for the other if re-compiled.

For this reason, developers tend to try to keep their ABI stable (to minimize disruption). Keeping an ABI stable means not changing function interfaces (return type and number, types, and order of arguments), definitions of data types or data structures, defined constants, etc. New functions and data types can be added, but existing ones must stay the same. If, for instance, your library uses 32-bit integers to indicate the offset of a function and you switch to 64-bit integers, then already-compiled code that uses that library will not be accessing that field (or any following it) correctly. Accessing data structure members gets converted into memory addresses and offsets during compilation and if the data structure changes, then these offsets will not point to what the code is expecting them to point to and the results are unpredictable at best.

An ABI isn't necessarily something you will explicitly provide unless you are doing very low-level systems design work. It isn't language-specific either, since (for example) a C application and a Pascal application can use the same ABI after they are compiled.

Edit: Regarding your question about the chapters regarding the ELF file format in the SysV ABI docs: The reason this information is included is because the ELF format defines the interface between operating system and application. When you tell the OS to run a program, it expects the program to be formatted in a certain way and (for example) expects the first section of the binary to be an ELF header containing certain information at specific memory offsets. This is how the application communicates important information about itself to the operating system. If you build a program in a non-ELF binary format (such as a.out or PE), then an OS that expects ELF-formatted applications will not be able to interpret the binary file or run the application. This is one big reason why Windows apps cannot be run directly on a Linux machine (or vice versa) without being either re-compiled or run inside some type of emulation layer that can translate from one binary format to another.

IIRC, Windows currently uses the Portable Executable (or, PE) format. There are links in the "external links" section of that Wikipedia page with more information about the PE format.

Also, regarding your note about C++ name mangling: When locating a function in a library file, the function is typically looked up by name. C++ allows you to overload function names, so name alone is not sufficient to identify a function. C++ compilers have their own ways of dealing with this internally, called name mangling. An ABI can define a standard way of encoding the name of a function so that programs built with a different language or compiler can locate what they need. When you use extern "c" in a C++ program, you're instructing the compiler to use a standardized way of recording names that's understandable by other software.

Library ABI compatibility between versions of Visual Studio

The issue may be not only in ABI differences (calling conventions, etc.) between these VS versions, but also in removed/changed symbols in system DLL libraries. See this table for the detailed comparison of system DLL libraries between VS8 (2005, Windows SDK 5.0) and VS9 (2008, Windows SDK 6.0).

See also compatibility matrix for Windows SDKs.

Sample Image



Related Topics



Leave a reply



Submit