Why Glibc and Pthread Library Both Defined Same APIs

Why glibc and pthread library both defined same APIs?

libpthread.so is part of glibc too, and they both contain ~~(identical)~~ definitions of some symbols.

If you look for pthread_create instead you'll see that it's only present in libpthread.so -- this means programs must link to libpthread.so to actually create threads, but can use mutexes and condition variables in single-threaded programs that only link to libc.so. That's useful for interprocess mutexes and interprocess condition variables that live in shared memory and are used to synchronise with separate processes. (corrections thanks to Zan Lynx's comment below).

It's not a problem to link to both libpthread.so and libc.so even though they both define the symbol. ELF linkers allows several shared libraries to contain definitions of the same symbol and the linker will choose the first one it sees and use it for all references to that symbol, this is called symbol interposition. Another feature that allows multiple symbols to be defined is if one library contains weak symbols which will be overidden by non-weak symbols with the same name. In this case the definitions in ~~the two libraries are identical, so it doesn't matter which is used~~ libpthread.so override those in libc.so. If you use LD_DEBUG ~~and change the order of arguments to the linker~~ you should be able to see which library the symbol actually gets found in.

As well as the two libraries defining the same symbol, each library has two definitions of the symbol, with different symbol versions, GLIBC_2.0 and GLIBC_2.3.2. This symbol versioning allows multiple definitions to co-exist in the same library so that new, improved versions of the function to be added to the library without breaking code that is linked against the old implementation. This allows the same shared library to work for applications using LinuxThreads and applications using NPTL. The default symbol that a reference will be bound to when linking to the library is pthread_cond_signal@GLIBC_2.3.2 which corresponds to the NPTL implementation of that function (NPTL was first included in glibc 2.3.2). The older symbol, pthread_cond_signal@GLIBC_2.0, is the older LinuxThreads implementation that was the default before NPTL was provided. Applications linked against older (pre-2.3.2) versions of glibc will be bound to pthread_cond_signal@GLIBC_2.0 and will use that symbol.

is pthread in glibc.so implemented by weak symbol to provide pthread stub functions?

Yes, glibc uses a stub implementation of various pthread functions, so that single threaded programs do not have to waste cycles doing things like locking and unlocking mutexes, and yet do not have to link to a different C library (like what is done in the Microsoft world, for instance).

For instance, according to POSIX, every time you call fputc(ch, stream), there is mutex lock and unlock. If you don't want that, you call fputc_unlocked. But when you do that, you're using a POSIX extension related to threading; it's not an appropriate workaround for programs that don't use POSIX or don't use the threading API.

The overriding of the stub pthread functions with the real ones (in the dynamic glibc) is not based on weak symbols. The shared library mechanism makes it possible to override non-weak definitions.

Weak symbols are a mechanism which allows for symbol overriding under static linking.

If you want a source for the above statement, here it is:

"Note that a definition in a DSO being weak has no effects. Weak definitions only play a role in static linking." [Ulrich Drepper, "How To Write Shared Libraries"].

If you run nm on the static glibc on your system (if you have one), libc.a, you will note that functions like pthread_mutex_lock are marked weak. In the dynamic version, libc.so.<whatetever>, the functions are not marked weak.

Note: you should use nm -D or nm --dynamic to look at the symbols in a shared library. nm will not produce anything on a shared library that is stripped. If it does, you're looking at the debug symbols, not the dynamic symbols.

In-depth explanation for why we need '-pthread' in Linker option for gcc?

You've got the broad strokes correct, I'll try to break down each part of your question and point you in the right direction.

To use pthread_create and other POSIX thread library functions, we need this flag. Why do we need this?

pthreads are not implemented as gcc builtins or as part of "standard" libc, which means an external library must implement them. Since an external library is required, the linker needs to be informed about that external library with the -pthread flag.

Why isn't there a code in /usr that implements these functions like other functions or implementation of other system calls?

There absolutely is, the pthread API is implemented in libpthread, which you will almost certainly find in its shared (libpthread.so), static, (libpthread.a), and version specific (libpthread-X.YY.so) forms in your /usr/lib folder.

Also, without this flag, following output is there: ERROR

This is the linker telling you that you haven't specified an implementation of pthread for it to use. Just because glibc provides an implementation does not mean that is the implementation you intend to use. The linker is not a mind reader, it needs to be informed at compile time what specific libraries you want to link to.

My question is, why are we getting these errors (undefined reference to...) at compile time? gcc should make the executable and during run-time, it should try to find these symbols right? (Dynamic linking)

Once again, the linker needs to know at compile time specifically what library it will look for at run time. The ABIs of two libraries that define the same symbols may not be the same (and likely won't be, unless specifically designed for). That means even though your dynamically linked code won't carry a statically linked copy of the library, the binary structure of the code still depends on the library dependency. This information therefore must be known at compile time.

NOTE: These symbols (pthread_create, pthread_join) are extern (checked in preproc using -E gcc flag). Is that different from symbols loaded at run-time?

No, extern just informs the compiler that a symbol won't be defined in a given compilation unit. If that doesn't make sense, it basically means, "that symbol is defined in another file, I'm just using it here".

Difference between -pthread and -lpthread while compiling

-pthread tells the compiler to link in the pthread library as well as configure the compilation for threads.

For example, the following shows the macros that get defined when the -pthread option gets used on the GCC package installed on my Ubuntu machine:

$ gcc -pthread -E -dM test.c > dm.pthread.txt
$ gcc          -E -dM test.c > dm.nopthread.txt
$ diff dm.pthread.txt dm.nopthread.txt 
152d151
< #define _REENTRANT 1
208d206
< #define __USE_REENTRANT 1

Using the -lpthread option only causes the pthread library to be linked - the pre-defined macros don't get defined.

Bottom line: you should use the -pthread option.

Note: the -pthread option is documented as a platform specific option in the GCC docs, so it might not always be available. However, it is available on platforms that the GCC docs don't explicitly list it for (such as i386 and x86-64) - you should use it when available.

Also note that other similar options have been used by GCC, such as -pthreads (listed as a synonym for -pthread on Solaris 2) and -mthread (for MinGW-specific thread support on i386 and x86-64 Windows). My understanding is that GCC is trying to move to using -pthread uniformly going forward.

Why same memcpy glibc implementations is faster on Linux and slower on Windows?

memcpy is part of the standard C library, and as such, is provided by the operating system on which you run your code (or an alternative provider if you use a different libc). For small copies of known sizes, GCC will often inline these operations because it can often avoid the overhead of a function call, but for large or unknown sizes, it will often use the system function.

In this case, you're seeing that glibc and Windows have different implementations, and glibc provides a better option. glibc does provide several different variants on different platforms based on what works best for a given CPU, but Windows may not do so, or may have a less optimized implementation.

In the past, glibc has even taken advantage of the fact that memcpy cannot have overlapping arguments and copied backwards on some CPUs, but that unfortunately broke some programs which did not comply with the standard, notably Adobe Flash Player. However, such an implementation was permissible and was indeed faster.

Instead of memcpy being slower, you could be finding that Windows has a different memory handling strategy. For example, it is common not to fault in all of the memory when it is first allocated. You may be finding that Linux, which in some cases will prefault subsequent pages, may be performing better here because of that optimization or a different one. If Windows has chosen not to do that, it could be because it complicates the code, or because it doesn't perform well on real-world use cases that are commonly run on Windows. What performs well in a synthetic benchmark may or may not match what performs well in the real world.

Ultimately, this is a quality of implementation issue. The standard requires that the functions it specifies behave in a specified way, and doesn't specify performance characteristics. Some projects choose to include optimized memcpy implementations if performance of that function is very important to them. Others choose not to and prefer to advise users to choose a platform that best meets their needs, taking into account that some platforms may perform better than others.

Why does the program which is compiled against the installed glibc not run normally?

First, you should stop using ldd -- in the presence of multiple GLIBCs on a host, ldd is more likely to mislead than to illuminate.

If you want to see which libraries are really loaded, do this instead:

LD_TRACE_LOADED_OBJECTS=1 ./exec/1-1.out

Second, you should almost never use $* in shell scripts. Use "$@" instead (note: quotes are important). See this answer.

Third, the behavior you are observing is easily explained. To understand it, you need to know the difference between DT_RPATH and DT_RUNPATH, described here.

You can verify that your binaries are currently using RUNPATH, like so:

readelf -d 1-1.out | grep 'R.*PATH'

And you can verify that everything starts working as you expect by adding -Wl,--disable-new-dtags to the link command (which would cause the binary to use RPATH instead).

To summarize:

RUNPATH affects the search for the binary itself, but not for any libraries the binary depends on.
RPATH affects the search path for the binary and all libraries it depends on.
with RUNPATH, expected libpthread.so.0 is found only when the binary depends on it directly, but not when the dependency on libpthread is indirect (via librt).
with RPATH, expected libpthread.so.0 is found regardless of whether the dependency is direct or indirect.

Update:

If I want to use DT_RUNPATH, how to set the library runpath for librt?

You would need to link librt.so with -rpath=${SYSROOT}/lib64.

You could edit the rt/Makefile, or build with:

make LDFLAGS-rt.so='-Wl,--enable-new-dtags,-z,nodelete,-rpath=${SYSROOT}/lib64'

You would need to do the same for any other library that may bring transitive dependency on other parts of GLIBC. I don't know of a general way to do this, but setitng LDFLAGS-lib.so='-Wl,-rpath=${SYSROOT}/lib64' and rebuilding everything might do the trick.

undefined behavior in shared lib using libpthread, but not having it in ELF as dependency

There's a lot happening here: differences between gcc and clang, differences between gnu ld and gold, the --as-needed linker flag, two different failure modes, and maybe even some timing issues.

Let's start with how to link a program using POSIX threads.

The compiler's -pthread flag is all you should need. It's a compiler flag, so you should use it both when compiling code that uses threads and when linking the final executable. When you use -pthread on the link step, the compiler will provide the -lpthread flag automatically, and in the right place in the link line.

Typically, you would only use it when linking the final executable, and not when linking a shared library. If you simply want to make your library thread safe, but don't want to force every program that uses your library to link with pthreads, you'd want to use a runtime check to see if the pthreads library is loaded, and call the pthread APIs only if it is. On Linux, this is typically done by checking a "canary" -- for example, make a weak reference to an arbitrary symbol like __pthread_key_create, which will only be defined if the library is loaded, and will have the value 0 if the program was linked without it.

In your case, however, your library libodr.so pretty much depends on threads, so it's reasonable to link it with the -pthread flag.

That brings us to the first failure mode: if you use g++ and gold for both link steps, the program throws std::system_error and says you need to enable multithreading. This is due to the --as-needed flag. GCC passes --as-needed to the linker by default, while clang (apparently) does not. With --as-needed, the linker will only record library dependencies that resolve a strong reference. Since all the references to pthread APIs are weak, none of them are sufficient to tell the linker that libpthread.so should be added to the dependency list (via a DT_NEEDED entry in the dynamic table). Changing to clang or adding a -Wl,--no-as-needed flag solves this problem, and the program will load the pthread library.

But, wait, why don't you need to do this when using the Gnu linker? It uses the same rule: only a strong reference causes the library to be recorded as a dependency. The difference is that Gnu ld also considers references from other shared libraries, while gold only considers references from regular object files. It turns out that the pthread library provides overriding definitions of several libc symbols, and there are strong references from libstdc++.so to some of those symbols (e.g., write). Those strong references are enough to get Gnu ld to record libpthread.so as a dependency. This is more of an accident than design; I don't think changing gold to consider references from other shared libraries would actually be a robust fix. I think the proper solution is for GCC to put --no-as-needed in front of the -lpthread flag when you use -pthread.

This begs the question of why this issue doesn't come up all the time when using POSIX threads and the gold linker. But this is a small test program; a larger program is almost certain to contain strong references to some of those libc symbols that libpthread.so overrides.

Now let's look at the second failure mode, where both Notify() and Get() block indefinitely if you link libodr.so with g++, gold and -lpthread.

In Notify(), you're holding the lock through the end of the function, while you call cv.notify_one(). You really only need to hold the lock to set the ready flag; if we change it so that we release the lock before that, then the thread calling Get() will timeout after 300 ms, and does not block. So it's really the call to notify_one() that's blocking, and the program is deadlocking because Get() is waiting on that same lock.

So why does it block only when __pthread_key_create is FUNC instead of NOTYPE? I think the type of the symbol is a red herring, and that the real problem is caused by the fact that gold doesn't record the symbol versions for references resolved by a library that isn't added as a needed library. The implementation of wait_for calls pthread_cond_timedwait, which has two versions in both libpthread and libc. It's possible that the loader is binding the reference to the wrong version, causing a deadlock by failing to unlock the mutex. I made a temporary patch to gold to record those versions, and that made the program work. Unfortunately, that's not a solution, as that patch can cause ld.so to crash under other circumstances.

I tried changing cv.wait_for(...) to cv.wait(lock, []{ return ready; }), and the program runs perfectly in all scenarios, which further suggests that the problem is with pthread_cond_timedwait.

The bottom line is that adding the --no-as-needed flag will fix the problem for this very small test case. Anything larger is likely to work without the extra flag, as you'll be increasing the odds of making a strong reference to a symbol in libpthread. (For example, adding a call to std::this_thread::sleep_for anywhere in odr.cpp adds a strong reference to nanosleep, which puts libpthread in the needed list.)

Update: I've verified that the failing program is linking to the wrong version of pthread_cond_timedwait. For glibc 2.3.2, the pthread_cond_t type was changed, and the old versions of the APIs that use the type were changed to dynamically allocate a new (bigger) structure and store a pointer to it in the original type. So now, if the consuming thread reaches cv.wait_for before the producing thread reaches cv.notify_one, the implementation of cv.wait_for calls the old version of pthread_cond_timedwait, which initializes what it thinks is an old pthread_cond_t in cv with a pointer to a new pthread_cond_t. After that, when the other thread reaches cv.notify_one, its implementation assumes that cv contains a new-style pthread_cond_t rather than a pointer to one, so it calls pthread_mutex_lock with the pointer to the new pthread_cond_t instead of the pointer to the mutex. It locks that would-be mutex, but it never gets unlocked because the other thread unlocks the real mutex.

Why `pthread_rwlock_t`'s ABI differs a lot among versions?

Why pthread_rwlock_t changes this much among versions? Is it because it want to support more features, or enhance performance?

Because the glibc maintainers decided that there was an advantage to be gained in changing it.

As with most structures defined by the standard headers, the layout of struct pthread_rwlock_t is not standardized. As with some such structures, not even any of the member names are standardized. I presume that the reason the structure is not totally opaque is so that instances can be declared directly instead of requiring them to be produced by some kind of constructor function.

You are going way out on a limb if you construct programs that depend on a specific layout of that structure, except whatever the version of pthreads you build against provides. That definition is provided by the version of pthreads.h used at compile time, and it must be matched with the corresponding library.

Is there any ways I can may GLIBC2.30 and GLIBC2.17 operate the same rwlock?

I think you already know that the answer is "no". Each library implementation has direct dependencies on a specific layout of that structure, and the layouts do not coincide at all. If you want to share a pthreads rwlock among processes then, in addition to configuring its pshared attribute on, you should build the cooperating programs against the same version of libpthread and its headers (or whatever library provides your pthreads implementation). Some version skew may be acceptable in practice, but if you want to risk that then it's on you to test and validate specific combinations.

are different versions of GLIBC not compactible with each other?

Clearly the two versions involved in your particular problem are not compatible in the way you hoped. Glibc is pretty good about link compatibility:

a program dynamically linked against one version will very likely interoperate correctly with the shared libraries of later versions; and
a program that builds and statically links correctly against one version will almost certainly build and statically link correctly against later versions, even to the point of use of functions removed from the C language standard (I'm looking at you, gets()).

But there is no requirement for, nor reasonable expectation of, consistent internal representations of most structure types provided by the library. Note well that even just adding members, so that a type's size changes, produces an incompatible representation.

So I
can't use shm between different versions of GLIBC?

Sharing memory across programs built against different versions of Glibc can probably be done successfully with data types that are fully defined by you based on the C language's built-in data types (int, double, etc.) and those with standardized representations (int32_t, etc.). In principle, however, even the representations of built-in data types might change between versions.

I suppose there might be C libraries designed for this express purpose (though I don't know any), but in general, C implementations make very few guarantees about compatibility of in-memory data representation. You cannot, in general, rely on different versions of the library to provide interoperable in-memory representations of most other data types.

Why Glibc and Pthread Library Both Defined Same APIs