How to Remove Strings from a Compiled Binary (.So)

How to remove strings from a compiled binary (.so)

These strings are in the dynamic symbol table, which is used when the library is loaded at runtime. readelf -p .dynstr mylib.so will show these entries.

strip -g will remove debugging symbols, but it can't remove entries from the dynamic symbol table, as these may be needed at runtime. Your problem is that you have entries in the dynamic symbol table for functions which are never going to be called from outside your library. Unless you tell it, the compiler/linker has no way of knowing which functions form part of the external API (and therefore need entries in the dynamic symbol table) and which functions are private to your library (and so don't need entries in the dynamic symbol table), so it just creates dynamic symbol table entries for all non-static functions.

There are two main ways you can inform the compiler which functions are private.

  1. Mark the private functions static. Obviously, this only works for functions only needed within a single compilation unit, though for some libraries this technique might be sufficient.

  2. Use the gcc "visibility" attribute to mark the functions as visible or hidden. You have two options: either mark all the private functions as hidden, or change the default visibility to hidden using the -fvisibility=hidden compiler option and mark all the public functions as visible. The latter is probably the best option for you, as it means that you don't have to worry about accidentally adding a function and forgetting to mark it as hidden.

If you have a function:

int foo(int a, int b);

then the syntax for marking it hidden is:

int foo(int a, int b) __attribute__((visibility("hidden")));

and the syntax for marking it visible is:

int foo(int a, int b) __attribute__((visibility("default")));

For further details, see this document, which is an excellent source of information on this subject.

show strings in compiled binary

From man 1 strings (emphasis mine):

For each file given, GNU strings prints the printable character
sequences that are at least 4 characters long (or the number given
with the options below) and are followed by an unprintable character.
By default, it only prints the strings from the initialized and
loaded sections of object files
; for other types of files, it prints
the strings from the whole file.

The C language does not define strings as first-class citizens. They are expressed as either string arrays or strings literals. For instance, in such basic program:

#include <stdio.h>

int main(void)
{
char s[] = "my string";

printf("%s\n", s);

return 0;
}

we can reasonably say that s array holds a string. Notice that this one is allocated on stack. It has automatic storage duration, as opposite to example in your question, where s is clearly defined outside of main (and any) function.

Now, backing to your question, both underlying objects in your two programs share the same characteristics:

  • they are of type char[6] and have the same content (C11 §6.2.5/p20),
  • they have static storage duration, meaning that they must be initialized conceptually before program's execution (C11 §5.1.2/p1).

The only difference is that modyfing a string literal invokes undefined behaviur, thus compiler may choose to place them into seperate (e.g. read-only) memory location.

C11 §6.2.5/p20 Types:

An array type describes a contiguously allocated nonempty set of
objects with a particular member object type, called the element type.

C11 §5.1.2/p1 Execution environments:

All objects with static storage duration shall be initialized (set to
their initial values) before program startup.

Looking from more practical viewpoint, beside that strings command you might also analyze your programs with gdb debugger, more specifically using x/s command. Here is basic illustration:

$ gcc -g hello.c -o hello
$ gdb -q hello
Reading symbols from /home/grzegorz/hello...done.
(gdb) disas /m main
Dump of assembler code for function main:
6 {
0x00000000004004c4 <+0>: push %rbp
0x00000000004004c5 <+1>: mov %rsp,%rbp

7 printf("%s\n", s);
0x00000000004004c8 <+4>: mov $0x60086c,%edi
0x00000000004004cd <+9>: callq 0x4003b8 <puts@plt>

8 }
0x00000000004004d2 <+14>: leaveq
0x00000000004004d3 <+15>: retq

End of assembler dump.
(gdb) x/s 0x60086c
0x60086c <s>: "hello"

You might want to compare results of disas command for your programs and see if there is some discrepancy between them.

How to hide a string in binary code?

I'm sorry for long answer.

Your answers are absolutely correct, but the question was how to hide string and do it nicely.

I did it in such way:

#include "HideString.h"

DEFINE_HIDDEN_STRING(EncryptionKey, 0x7f, ('M')('y')(' ')('s')('t')('r')('o')('n')('g')(' ')('e')('n')('c')('r')('y')('p')('t')('i')('o')('n')(' ')('k')('e')('y'))
DEFINE_HIDDEN_STRING(EncryptionKey2, 0x27, ('T')('e')('s')('t'))

int main()
{
std::cout << GetEncryptionKey() << std::endl;
std::cout << GetEncryptionKey2() << std::endl;

return 0;
}

HideString.h:

#include <boost/preprocessor/cat.hpp>
#include <boost/preprocessor/seq/for_each_i.hpp>
#include <boost/preprocessor/seq/enum.hpp>

#define CRYPT_MACRO(r, d, i, elem) ( elem ^ ( d - i ) )

#define DEFINE_HIDDEN_STRING(NAME, SEED, SEQ)\
static const char* BOOST_PP_CAT(Get, NAME)()\
{\
static char data[] = {\
BOOST_PP_SEQ_ENUM(BOOST_PP_SEQ_FOR_EACH_I(CRYPT_MACRO, SEED, SEQ)),\
'\0'\
};\
\
static bool isEncrypted = true;\
if ( isEncrypted )\
{\
for (unsigned i = 0; i < ( sizeof(data) / sizeof(data[0]) ) - 1; ++i)\
{\
data[i] = CRYPT_MACRO(_, SEED, i, data[i]);\
}\
\
isEncrypted = false;\
}\
\
return data;\
}

Most tricky line in HideString.h is:

BOOST_PP_SEQ_ENUM(BOOST_PP_SEQ_FOR_EACH_I(CRYPT_MACRO, SEED, SEQ))

Lets me explane the line. For code:

DEFINE_HIDDEN_STRING(EncryptionKey2, 0x27, ('T')('e')('s')('t'))

BOOST_PP_SEQ_FOR_EACH_I(CRYPT_MACRO, SEED, SEQ)
generate sequence:

( 'T'  ^ ( 0x27 - 0 ) ) ( 'e'  ^ ( 0x27 - 1 ) ) ( 's'  ^ ( 0x27 - 2 ) ) ( 't'  ^ ( 0x27 - 3 ) )

BOOST_PP_SEQ_ENUM(BOOST_PP_SEQ_FOR_EACH_I(CRYPT_MACRO, SEED, SEQ))
generate:

'T' ^ ( 0x27 - 0 ), 'e' ^ ( 0x27 - 1 ), 's' ^ ( 0x27 - 2 ), 't' ^ ( 0x27 - 3 )

and finally,

DEFINE_HIDDEN_STRING(EncryptionKey2, 0x27, ('T')('e')('s')('t'))
generate:

static const char* GetEncryptionKey2()
{
static char data[] = {
'T' ^ ( 0x27 - 0 ), 'e' ^ ( 0x27 - 1 ), 's' ^ ( 0x27 - 2 ), 't' ^ ( 0x27 - 3 ),
'\0'
};
static bool isEncrypted = true;
if ( isEncrypted )
{
for (unsigned i = 0; i < ( sizeof(data) / sizeof(data[0]) ) - 1; ++i)
{
data[i] = ( data[i] ^ ( 0x27 - i ) );
}
isEncrypted = false;
}
return data;
}

data for "My strong encryption key" looks like:

0x00B0200C  32 07 5d 0f 0f 08 16 16 10 56 10 1a 10 00 08  2.]......V.....
0x00B0201B 00 1b 07 02 02 4b 01 0c 11 00 00 00 00 00 00 .....K.........

Thank you very much for your answers!

How to hide strings in a exe or a dll?

Welcome to the wider world of defensive programming.

There are a couple of options, but I believe all of them depend on some form of obfuscation; which, although not perfect, is at least something.

  1. Instead of a straight string value you can store the text in some other binary form (hex?).

  2. You can encrypt the strings that are stored in your app, then decrypt them at run time.

  3. You can split them across various points in your code, and reconstitute later.

Or some combination thereof.

Bear in mind, that some attacks go further than looking at the actual binary. Sometimes they will investigate the memory address space of the program while it's running. MS came up with something called a SecureString in .Net 2.0. The purpose being to keep the strings encrypted while the app is running.

A fourth idea is to not store the string in the app itself, but rather rely on a validation code to be submitted to a server you control. On the server you can verify if it's a legit "cheat code" or not.

Add/edit string in compiled C program?

Many posix platforms come with the program strings which will read through a binary file searching for strings. There is an option to print out the offset of the strings. For example:

strings -td myexec

From there you can use a hex editor but the main problem is that you wouldn't be able to make a string bigger than it already is.

How do I compile C without anything but my code in the binary?


Shortly:

  1. strip -s does not remove the sections but only overrides them with 0 (and thus the file size remains the same.
  2. There are a lot of program headers that we do not need in this case (in order to handle exceptions etc.)
  3. There is a default alignment in the binary, which makes the start of it be at least 4000 (and we do not need it).

Detailed

First, we can improve it slightly if we compile the binary statically:

$ gcc -nostdlib -static nolib.c -o static_output
$ strip -s static_output # strip -s in order to strip all (not helping here)
$ ls -lh static_output
-rwxrwxrwx 1 graul graul 8.7K Jan 17 22:59 static_output

Lets look over our elf now:
$ readelf -h static_output
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x401000
Start of program headers: 64 (bytes into file)
Start of section headers: 8368 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 7
Size of section headers: 64 (bytes)
Number of section headers: 7
Section header string table index: 6

Looks like there is more than 8kn before the start of sections header!
Let's look at what this is made of:

$ readelf -e static_output
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x401000
Start of program headers: 64 (bytes into file)
Start of section headers: 8368 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 7
Size of section headers: 64 (bytes)
Number of section headers: 7
Section header string table index: 6

Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .note.gnu.propert NOTE 00000000004001c8 000001c8
0000000000000020 0000000000000000 A 0 0 8
[ 2] .note.gnu.build-i NOTE 00000000004001e8 000001e8
0000000000000024 0000000000000000 A 0 0 4
[ 3] .text PROGBITS 0000000000401000 00001000
000000000000001b 0000000000000000 AX 0 0 1
[ 4] .eh_frame PROGBITS 0000000000402000 00002000
0000000000000038 0000000000000000 A 0 0 8
[ 5] .comment PROGBITS 0000000000000000 00002038
000000000000002a 0000000000000001 MS 0 0 1
[ 6] .shstrtab STRTAB 0000000000000000 00002062
000000000000004a 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)

Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x000000000000020c 0x000000000000020c R 0x1000
LOAD 0x0000000000001000 0x0000000000401000 0x0000000000401000
0x000000000000001b 0x000000000000001b R E 0x1000
LOAD 0x0000000000002000 0x0000000000402000 0x0000000000402000
0x0000000000000038 0x0000000000000038 R 0x1000
NOTE 0x00000000000001c8 0x00000000004001c8 0x00000000004001c8
0x0000000000000020 0x0000000000000020 R 0x8
NOTE 0x00000000000001e8 0x00000000004001e8 0x00000000004001e8
0x0000000000000024 0x0000000000000024 R 0x4
GNU_PROPERTY 0x00000000000001c8 0x00000000004001c8 0x00000000004001c8
0x0000000000000020 0x0000000000000020 R 0x8
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10

Section to Segment mapping:
Segment Sections...
00 .note.gnu.property .note.gnu.build-id
01 .text
02 .eh_frame
03 .note.gnu.property
04 .note.gnu.build-id
05 .note.gnu.property
06

This is weird, as we called the strip function which was supposed to remove this section from our elf. If we look over the response in https://unix.stackexchange.com/questions/267070/why-doesnt-strip-remove-section-headers-from-elf-executables
We can see that though not mentioned specifically, strip does not remove these parts from our binary, but only removes their content (which does not help for our case).

We can use strip -R in order to remove these sections completely, the biggest one here is the ".eh_frame" segment (which is not needed for our case, look over Why GCC compiled C program needs .eh_frame section? to look over it).

$ strip -R .eh_frame static_output
$ ls -lh static_output
-rwxrwxrwx 1 graul graul 4.6K Jan 17 23:22 static_output*

Just to be clear, there is no reason to not strip the rest of the unwanted sections as well:

$ strip -R .eh_frame -R .note.gnu.property -R .note.gnu.build-id -R .note.gnu.property static_output
-rwxrwxrwx 1 graul graul 4.4K Jan 17 23:31 static_output

Half the size! But still not good enough. looks like there is a big program header we need to remove.

looks like gcc inserts these sections without our desire:

$ gcc -c -nostdlib -static nolib.c -o nolib.o
$ ls -l nolib.o
-rwxrwxrwx 1 graul graul 1376 Jan 17 23:40 nolib.o
$ strip -R .data -R .bss -R .comment -R .note.GNU-stack -R .note.GNU-stack -R .note.gnu.propery -R .eh_frame -R .real.eh_frame -R .symtab -R.strtab -R.shstrtab nolib.o
$ ls -l nolib.o
-rwxrwxrwx 1 graul graul 424 Jan 17 23:41 nolib.o

But this is not an elf, if we run now

$ld nolib.o -o ld_output
$ls -l ld_output
-rwxrwxrwx 1 graul graul 4760 Jan 17 23:55 ld_output

In the program ld there is a flag to remove the alignment between our sections (which is almost all of our size).

$ ld -n -static nolib.o -o ld_output
$ls -l ld_output
-rwxrwxrwx 1 graul graul 928 Jan 17 23:57 ld_output
$strip -R .note.gnu.property ld_output
$ls -l ld_output
-rwxrwxrwx 1 graul graul 472 Jan 17 23:58 ld_output

Which is a drastic improvement (though of course a lot of more work could be done).

Is it possible to extract constants and other predefined values from binary executables?

Yes you could easily use a decompiler to extract those kinds of constants, especially strings (since they require a larger chunk of memory). This will even work in machine-code binaries and is even easier for VM-languages like Java and C#.

If you need to keep something secret in there you will need to go great lengths. Simply encrypting the string for example would add a layer of security, but for someone who knows what she does this won't be a big barrier. For example scanning the the file for places with uncommon entropy is likely to reveal the key which was used for encryption. There are even systems which encode secrets by altering the used low-level commands in the binary. Those tools replace certain combinations of commands with other equivalent commands. But even thous systems are not too hard to circumvent, as the uncommon combination of commands will reveal the use of such tools.

And even if you manage to protect the string by some kind of encryption in your binary, you will at some point require a decrypted version for your execution. Creating a memory-dump at a point in time where the string is used will thus also contain a copy of the secret value. This is especially problematic in Java as you cannot deallocate a chunk of memory and a string is immutable (meaning that a "change" to the string will lead to a new chunk of memory).

As you see the problem is far from trivial. And of course there is no way to give you 100% security (think of all the cracked games and so on).

Something that can be implemented in a secure way is using Public-key cryptography. In that case you will need to keep the private key hidden. That might be possible if you could for example send things to your server to encrypt them or you have hardware which provides a Trusted Platform Module. But those things might not be feasible for your case.



Related Topics



Leave a reply



Submit