How to Undo Strip - I.E. Add Symbols Back to Stripped Binary

How to undo strip - i.e. add symbols back to stripped binary

Valgrind supports separate debug files, so you should use the answer here, and valgrind should work properly with the externalized debug file.

How to reverse the objcopy's strip with only-keep-debug?

For ELF, the elfutils package contains a tool called eu-unstrip that does the job. In the context of your example:

eu-unstrip binary binary.dbg

binary.dbg now has both the binary and debug symbols. I'd include a reference to documentation if I could find any...

Separating out symbols and stripping unneeded symbols at the same time

I've just stumbled to the very same question.
Previously I just separated my built ELF-binaries with the objcopy's --only-keep-debug, --strip-debug, --add-gnu-debuglink flags.
Now I need to save even more space and so I'm considering using --strip-unneeded instead of --strip-debug.
But like you I'm afraid that this may affect my debugging experience.

So, I've made several tests and come to the following conclusions:

  1. --strip-unneeded strips what is stripped by --strip-debug and even more. I.e. 'debug' info is considered as a part of 'unneeded' info.
  2. Debug binaries created with the --only-keep-debug flag, not only store info stripped by the --strip-debug, but also info stripped by the --strip-unneeded.
  3. I've not noticed any difference in debugging against --strip-debug vs --strip-unneeded, provided the debug binaries created with the --only-keep-debug.

Details below:

I've created a very simple C++ project, which contains an executable and a shared library.
The library contains a globally exported function, which is called by the application.
Also the library contains several local functions (i.e. static or in the anonymous namespace), being called by the globally exported function. The code in the local functions just created a crash by just throwing an unhandled exception.

First, I've compiled both binaries with the -g -O0 flags.
Second, I've extracted the debug information from them in a separate binaries and linked these debug files to the original binaries. I.e., for both files:

objcopy --only-keep-debug $FILE $FILE.debug
objcopy --add-gnu-debuglink=$FILE.debug $FILE

After this point I had unstripped binaries also having separate correspondent linked debug binaries.

Then I've copied these files into two additional directories. In the first one I've done --strip-debug against the original binaries and in another I've done --strip-unneeded.

Considering file sizes, the original files where obviously the biggest, the files in strip-unneeded dir where the smallest, and the files in the strip-debug dir were in the middle.
Also, additionally running --strip-debug against the files in the strip-unneeded dir has not changed the file sizes, meaning that --strip-debug strips just some subset of what is stripped by --strip-unneeded.

I've then compared section listing of all the three variants by running readelf -S against all of them.
Looking at them, it could be seen that --strip-debug strips the following sections: .debug_arranges, .debug_info, .debug_abbrev, .debug_line and .debug_str, and also somewhat reduced the .symtab and .strtab sections.
--strip-unneeded also additionally completely removes .symtab and .strtab sections.

I've then run readelf -S against the debug binaries, which I'd got with the --only-keep-debug flag. The sections there had all the sections removed by --strip-unneeded. So, it not only contained .debug_arranges, .debug_info, .debug_abbrev, .debug_line and .debug_str, but also .symtab and .strtab. And the sizes of the sections were almost identical to their original sizes.

I've then tried to step-by-step debug all the three variants and haven't noticed any difference between them. Also I've produced crashes and core dumps with all of them and then tried to debug against the core dumps - also no difference.

Versions used:
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
GNU objcopy (GNU Binutils for Ubuntu) 2.26.1
GNU strip (GNU Binutils for Ubuntu) 2.26.1

How to give symbols from unstripped binary to gdb?

There are two main ways to do this.

One way is to start gdb on the unstripped executable, and then attach:

$ gdb unstripped
(gdb) attach 12345

This way is easy! However it has a hidden danger, which is that you might accidentally mismatch the stripped and unstripped programs, leading to a very confusing debugging session.

Another way is to take the time to properly split the debug information into a separate file when stripping. There are some instructions in the gdb manual.

With this approach, be sure to use the build-id feature. If you do this properly, then you can simply point gdb at your archive of separate debug info, and gdb will pick up the proper information automatically.

The main advantage of this approach is that it avoids the possibility of debuginfo mismatch. FWIW this is what the distros use to build their debuginfo archives.

How can dlsym successfully import function from stripped binary library?

Try readelf -s a.so. The dynamic symbols are still there after that strip.

(Or just switch to nm -D a.so.)

Strip/Remove debug symbols and archive names from a static library

This script implements Sigismondo's suggestion (unpacks the archive, strips each object file individually, renames them 1000.o, 1001.o, etc., and repacks). The parameters for ar crus may vary depending on your version of ar.

#!/bin/bash
# usage: repack.sh file.a

if [ -z "$1" ]; then
echo "usage: repack file.a"
exit 1
fi

if [ -d tmprepack ]; then
/bin/rm -rf tmprepack
fi

mkdir tmprepack
cp $1 tmprepack
pushd tmprepack

basename=${1##*/}

ar xv $basename
/bin/rm -f $basename
i=1000
for p in *.o ; do
strip -d $p
mv $p ${i}.o
((i++))
done

ar crus $basename *.o
mv $basename ..

popd
/bin/rm -rf tmprepack
exit 0

Why OSX's strip can not remove weak symbols?

My first conclusion would be to jump to it being a result of radar bug 5614542 hence that weird symbol, but it's not related to it.

I'll draw some assumptions and guess from the fact that it seems that you're using nlist relocations and not new bytecode based relocations (you can check by looking for the dyld info load command), this is either built with an ancient toolchain or is a MH_OBJECT file for a main executable that has not gone through the final linking step. I'm not 100% sure if that is the case here- but either way,

Sorry for my above assumption, but the original answer still applies unless you use really want to opt out of symbol coalescing in which case build your application with private linkage but again this template instantiation forces the symbol as weak for a very good reason, it has a static constructor and an implicitly instantiated template, it prefers safety so it keeps the symbol. You can not export it at all outside of the executable, while you have a small case here, C++ programs tend to use things like boost, or C++ libs that depend on other C++ libs, that all creates chains and eventually you end up with multiple definitions within the shared namespace just because of C++ semantics. In your small test case you can get away with it, in a larger application unless you really know what you're doing and examining things like dependency trees for dylibs, just let dyld do its job. I think my original answer still applies for a major part as it explains why your symbol is marked as weak (ODR is a C++ specific concept but it's dealt differently by different static linkers):


For a longer explanation - it's to do with C++ semantics, namely the one definition rule (ODR) which is a close but not the same concept as not being able to have duplicate strong symbols in the same namespace (I mean a link namespace, not an C++ namespace, this gets confusing very quickly).

If you want to know why it's marked as weak, it's for dyld to be able to coalesce it during dynamic linking, since reusing that template would instantiate it again (causing an ODR violation and depending on the context a link time error), as it's an implicit instantion, which may or may not require coalescing (which is not known until static or even dynamic link time, unless of course you define it as hidden in which case you have to be extremely careful since semantics will vary a lot depending on factors like whether it's a modular build or not (I mean LLVM "modules", not the Modules TS for C++).

Without it being weak, you'd be causing an ODR violation per C++ rules by defining it as hidden across more than 1 translation unit (if you reused that template, say in a header within the module, you would get duplicate symbol errors). You could get away with violating ODR since it's not actually enforced, but be prepared for some nasty surprises (ie. by using non modular builds aka "every translation unit is a module").

By defining it as weak, dyld is able to select correct definitions per final linked object be that a shared library or an executable (and don't forget about the shared cache) at runtime and bind/relocate them appropriately within the otherwise flat namespace.

The above is a lot to be able to deduce by a compiler without any form of a hint, hidden linkage is a really bad idea unless you understand the implication, you want internal visibility if you really want to re-instantiate and copy the template every time. OSX has a fairly complicated linking model in general, a lot of landmines to step on potentially.

And if I'm right about the object file thing, you shouldn't really run strip on object files before they are fed into the static linker.



Related Topics



Leave a reply



Submit