Techniques for Obscuring Sensitive Strings in C++

Techniques for obscuring sensitive strings in C++

Basically, anyone with access to your program and a debugger can and will find the key in the application if they want to.

But, if you just want to make sure the key doesn't show up when running strings on your binary, you could for instance make sure that the key is not within the printable range.

Obscuring key with XOR

For instance, you could use XOR to split the key into two byte arrays:

key = key1 XOR key2

If you create key1 with the same byte-length as key you can use (completely) random byte values and then compute key2:

key1[n] = crypto_grade_random_number(0..255)
key2[n] = key[n] XOR key1[n]

You can do this in your build environment, and then only store key1and key2 in your application.

Protecting your binary

Another approach is to use a tool to protect your binary. For instance, there are several security tools that can make sure your binary is obfuscated and starts a virtual machine that it runs on. This makes it hard(er) to debug, and is also the convential way many commercial grade secure applications (also, alas, malware) is protected.

One of the premier tools is Themida, which does an awesome job of protecting your binaries. It is often used by well known programs, such as Spotify, to protect against reverse engineering. It has features to prevent debugging in programs such as OllyDbg and Ida Pro.

There is also a larger list, maybe somewhat outdated, of tools to protect your binary.

Some of them are free.

Password matching

Someone here discussed hashing password+salt.

If you need to store the key to match it against some kind of user submitted password, you should use a one-way hashing function, preferrably by combining username, password and a salt. The problem with this, though, is that your application has to know the salt to be able to do the one-way and compare the resulting hashes. So therefore you still need to store the salt somewhere in your application. But, as @Edward points out in the comments below, this will effectively protect against a dictionary attack using, e.g, rainbow tables.

Finally, you can use a combination of all the techniques above.

How to hide a string in binary code?

I'm sorry for long answer.

Your answers are absolutely correct, but the question was how to hide string and do it nicely.

I did it in such way:

#include "HideString.h"

DEFINE_HIDDEN_STRING(EncryptionKey, 0x7f, ('M')('y')(' ')('s')('t')('r')('o')('n')('g')(' ')('e')('n')('c')('r')('y')('p')('t')('i')('o')('n')(' ')('k')('e')('y'))
DEFINE_HIDDEN_STRING(EncryptionKey2, 0x27, ('T')('e')('s')('t'))

int main()
{
std::cout << GetEncryptionKey() << std::endl;
std::cout << GetEncryptionKey2() << std::endl;

return 0;
}

HideString.h:

#include <boost/preprocessor/cat.hpp>
#include <boost/preprocessor/seq/for_each_i.hpp>
#include <boost/preprocessor/seq/enum.hpp>

#define CRYPT_MACRO(r, d, i, elem) ( elem ^ ( d - i ) )

#define DEFINE_HIDDEN_STRING(NAME, SEED, SEQ)\
static const char* BOOST_PP_CAT(Get, NAME)()\
{\
static char data[] = {\
BOOST_PP_SEQ_ENUM(BOOST_PP_SEQ_FOR_EACH_I(CRYPT_MACRO, SEED, SEQ)),\
'\0'\
};\
\
static bool isEncrypted = true;\
if ( isEncrypted )\
{\
for (unsigned i = 0; i < ( sizeof(data) / sizeof(data[0]) ) - 1; ++i)\
{\
data[i] = CRYPT_MACRO(_, SEED, i, data[i]);\
}\
\
isEncrypted = false;\
}\
\
return data;\
}

Most tricky line in HideString.h is:

BOOST_PP_SEQ_ENUM(BOOST_PP_SEQ_FOR_EACH_I(CRYPT_MACRO, SEED, SEQ))

Lets me explane the line. For code:

DEFINE_HIDDEN_STRING(EncryptionKey2, 0x27, ('T')('e')('s')('t'))

BOOST_PP_SEQ_FOR_EACH_I(CRYPT_MACRO, SEED, SEQ)
generate sequence:

( 'T'  ^ ( 0x27 - 0 ) ) ( 'e'  ^ ( 0x27 - 1 ) ) ( 's'  ^ ( 0x27 - 2 ) ) ( 't'  ^ ( 0x27 - 3 ) )

BOOST_PP_SEQ_ENUM(BOOST_PP_SEQ_FOR_EACH_I(CRYPT_MACRO, SEED, SEQ))
generate:

'T' ^ ( 0x27 - 0 ), 'e' ^ ( 0x27 - 1 ), 's' ^ ( 0x27 - 2 ), 't' ^ ( 0x27 - 3 )

and finally,

DEFINE_HIDDEN_STRING(EncryptionKey2, 0x27, ('T')('e')('s')('t'))
generate:

static const char* GetEncryptionKey2()
{
static char data[] = {
'T' ^ ( 0x27 - 0 ), 'e' ^ ( 0x27 - 1 ), 's' ^ ( 0x27 - 2 ), 't' ^ ( 0x27 - 3 ),
'\0'
};
static bool isEncrypted = true;
if ( isEncrypted )
{
for (unsigned i = 0; i < ( sizeof(data) / sizeof(data[0]) ) - 1; ++i)
{
data[i] = ( data[i] ^ ( 0x27 - i ) );
}
isEncrypted = false;
}
return data;
}

data for "My strong encryption key" looks like:

0x00B0200C  32 07 5d 0f 0f 08 16 16 10 56 10 1a 10 00 08  2.]......V.....
0x00B0201B 00 1b 07 02 02 4b 01 0c 11 00 00 00 00 00 00 .....K.........

Thank you very much for your answers!

Encrypting / obfuscating a string literal at compile-time

I think this question deserves an updated answer.

When I asked this question several years ago, I didn't consider the difference between obfuscation and encryption. Had I known this difference then, I'd have included the term Obfuscation in the title before.

C++11 and C++14 have features that make it possible to implement compile-time string obfuscation (and possibly encryption, although I haven't tried that yet) in an effective and reasonably simple way, and it's already been done.

ADVobfuscator is an obfuscation library created by Sebastien Andrivet that uses C++11/14 to generate compile-time obfuscated code without using any external tool, just C++ code. There's no need to create extra build steps, just include it and use it. I don't know a better compile-time string encryption/obfuscation implementation that doesn't use external tools or build steps. If you do, please share.

It not only obuscates strings, but it has other useful things like a compile-time FSM (Finite State Machine) that can randomly obfuscate function calls, and a compile-time pseudo-random number generator, but these are out of the scope of this answer.

Here's a simple string obfuscation example using ADVobfuscator:

#include "MetaString.h"

using namespace std;
using namespace andrivet::ADVobfuscator;

void Example()
{
/* Example 1 */

// here, the string is compiled in an obfuscated form, and
// it's only deobfuscated at runtime, at the very moment of its use
cout << OBFUSCATED("Now you see me") << endl;

/* Example 2 */

// here, we store the obfuscated string into an object to
// deobfuscate whenever we need to
auto narrator = DEF_OBFUSCATED("Tyler Durden");

// note: although the function is named `decrypt()`, it's still deobfuscation
cout << narrator.decrypt() << endl;
}

You can replace the macros DEF_OBFUSCATED and OBFUSCATED with your own macros. Eg.:

#define _OBF(s) OBFUSCATED(s)

...

cout << _OBF("klapaucius");

How does it work?

If you take a look at the definition of these two macros in MetaString.h, you will see:

#define DEF_OBFUSCATED(str) MetaString<andrivet::ADVobfuscator::MetaRandom<__COUNTER__, 3>::value, andrivet::ADVobfuscator::MetaRandomChar<__COUNTER__>::value, Make_Indexes<sizeof(str) - 1>::type>(str)

#define OBFUSCATED(str) (DEF_OBFUSCATED(str).decrypt())

Basically, there are three different variants of the MetaString class (the core of the string obfuscation). Each has its own obfuscation algorithm. One of these three variants is chosen randomly at compile-time, using the library's pseudo-random number generator (MetaRandom), along with a random char that is used by the chosen algorithm to xor the string characters.

"Hey, but if we do the math, 3 algorithms * 255 possible char keys (0 is not used) = 765 variants of the obfuscated string"

You're right. The same string can only be obfuscated in 765 different ways. If you have a reason to need something safer (you're paranoid / your application demands increased security) you can extend the library and implement your own algorithms, using stronger obfuscation or even encryption (White-Box cryptography is in the lib's roadmap).


Where / how does it store the obfuscated strings?

One thing I find interesting about this implementation is that it doesn't store the obfuscated string in the data section of the executable.
Instead, it is statically stored into the MetaString object itself (on the stack) and the algorithm decodes it in place at runtime. This approach makes it much harder to find the obfuscated strings, statically or at runtime.

You can dive deeper into the implementation by yourself. That's a very good basic obfuscation solution and can be a starting point to a more complex one.

How to hide strings in a exe or a dll?

Welcome to the wider world of defensive programming.

There are a couple of options, but I believe all of them depend on some form of obfuscation; which, although not perfect, is at least something.

  1. Instead of a straight string value you can store the text in some other binary form (hex?).

  2. You can encrypt the strings that are stored in your app, then decrypt them at run time.

  3. You can split them across various points in your code, and reconstitute later.

Or some combination thereof.

Bear in mind, that some attacks go further than looking at the actual binary. Sometimes they will investigate the memory address space of the program while it's running. MS came up with something called a SecureString in .Net 2.0. The purpose being to keep the strings encrypted while the app is running.

A fourth idea is to not store the string in the app itself, but rather rely on a validation code to be submitted to a server you control. On the server you can verify if it's a legit "cheat code" or not.

What is the best way to protect sensitive data in the code?

First advice is to never store anything sensitive in your code directly. You can always reverse engineer that, no matter how cleverly you try to obfuscate it.

I've read about things like breaking a password into several pieces, placing them at different places in the code and running them through a series of functions before finally using them... although this makes things harder, you can still always monitor the application using a debugger and ultimately you will be able to retrieve the secret information.

If I interpret your scenario correctly, what you have is code that is to be deployed at some client's premises and your code is connected to a database (which I suppose is also under the client's supervision), connecting to it requires a password. This password is known to that client, so trying to hide it from the client is rather useless. What you do want is to restrict access to that password from anybody who is not supposed to know it.

You typically achieve this by putting the sensitive information in a separate file in a folder that should have very restrictive permissions, only the application and a handful of selected people should have access. The application would then access the information when needed during runtime.

Additionally encrypting the separate file turns out to be a problem - if you do so then there is a key involved that again would have to be secured somehow - infinite recursion is on it's way :) Securing access to the file is often sufficient, but if you really require to be as secure as possible, then a solution is to use password-based encryption for the file. But the idea here is not to store the password in yet another location on the system, but rather as out-of-band information (e.g. in a physical vault) and entering the password when starting the application. This, too, has its problems: physical presence of a person is required for (re-)starting the application, and you could still retrieve the password from the RAM of the machine where the application is running on. But it is probably the best you can do without specialized hardware.

Another good alternative to password-based encryption would be to rely on OS-specific "password vaults" such as Windows' Isolated Storage, it's sort of a trade-off between not encrypting at all and keeping the password out-of-band.

Compile time string encryption using constexpr

Here's how I would do it:

1.) Use the str_const template for constexpr string manipulation described here: Conveniently Declaring Compile-Time Strings in C++

Code:

class str_const {
// constexpr string
private:
const char * const p_;
const std::size_t sz_;

public:
template <std::size_t N>
constexpr str_const(const char(&a)[N])
: p_(a)
, sz_(N - 1)
{}

constexpr char operator[](std::size_t n) const { return n < sz_ ? p_[n] : throw std::out_of_range(""); }
constexpr std::size_t size() const { return sz_; }

constexpr const char * get() const { return p_; }
};

This lets you do things like str_const message = "Invalid license" and manipulate message in constexpr functions.

2.) Make a simple compile-time pseudorandom generator, using the macros __TIME__ and __LINE__ to generate the seed. This is described in detail here: Generate random numbers in C++ at compile time

They give some template-based code.

3.) Make a struct, with a constexpr ctor which takes either const char [] and templates itself against the size similarly to the str_const example, or which just takes a str_const, and generates two str_const which it are its member variables.

  • A str_const of length n containing pseudorandom unsigned chars, generated using the pseudorandom generator, where n is the length of the input. (the "noise string")
  • A str_const of length n containing the entry-wise sum (as unsigned chars) of the input characters with the noise characters. (the "cipher text")

Then it has a member function decrypt which need not be constexpr, and can return a std::string, which simply subtracts each character of the noise string from the corresponding character of the cipher text and returns the resulting string.

If your compiler is still storing the original string literal in the binary, it means that either it's storing the input string literal (the constructor argument) which I don't think it should be doing since its a temporary, or its basically inlining the decrypt function, and you should be able to prevent that by obfuscating it with function pointers, or marking it volatile or similar.

Edit: I'm not sure if the standard requires that temporary constexpr objects should not appear in the binary. Actually I'm curious about that now. My expectation is that at least in a release build, a good compiler should remove them when they are no longer needed.

Edit: So, you already accepted my answer. But anyways for completeness, here's some source code that implements the above ideas, using only C++11 standard. It works on gcc-4.9 and clang-3.6, even when optimizations are disabled, as nearly as I can tell.

#include <array>
#include <iostream>
#include <string>

typedef uint32_t u32;
typedef uint64_t u64;
typedef unsigned char uchar;

template<u32 S, u32 A = 16807UL, u32 C = 0UL, u32 M = (1UL<<31)-1>
struct LinearGenerator {
static const u32 state = ((u64)S * A + C) % M;
static const u32 value = state;
typedef LinearGenerator<state> next;
struct Split { // Leapfrog
typedef LinearGenerator< state, A*A, 0, M> Gen1;
typedef LinearGenerator<next::state, A*A, 0, M> Gen2;
};
};

// Metafunction to get a particular index from generator
template<u32 S, std::size_t index>
struct Generate {
static const uchar value = Generate<LinearGenerator<S>::state, index - 1>::value;
};

template<u32 S>
struct Generate<S, 0> {
static const uchar value = static_cast<uchar> (LinearGenerator<S>::value);
};

// List of indices
template<std::size_t...>
struct StList {};

// Concatenate
template<typename TL, typename TR>
struct Concat;

template<std::size_t... SL, std::size_t... SR>
struct Concat<StList<SL...>, StList<SR...>> {
typedef StList<SL..., SR...> type;
};

template<typename TL, typename TR>
using Concat_t = typename Concat<TL, TR>::type;

// Count from zero to n-1
template<size_t s>
struct Count {
typedef Concat_t<typename Count<s-1>::type, StList<s-1>> type;
};

template<>
struct Count<0> {
typedef StList<> type;
};

template<size_t s>
using Count_t = typename Count<s>::type;

// Get a scrambled character of a string
template<u32 seed, std::size_t index, std::size_t N>
constexpr uchar get_scrambled_char(const char(&a)[N]) {
return static_cast<uchar>(a[index]) + Generate<seed, index>::value;
}

// Get a ciphertext from a plaintext string
template<u32 seed, typename T>
struct cipher_helper;

template<u32 seed, std::size_t... SL>
struct cipher_helper<seed, StList<SL...>> {
static constexpr std::array<uchar, sizeof...(SL)> get_array(const char (&a)[sizeof...(SL)]) {
return {{ get_scrambled_char<seed, SL>(a)... }};
}
};

template<u32 seed, std::size_t N>
constexpr std::array<uchar, N> get_cipher_text (const char (&a)[N]) {
return cipher_helper<seed, Count_t<N>>::get_array(a);
}

// Get a noise sequence from a seed and string length
template<u32 seed, typename T>
struct noise_helper;

template<u32 seed, std::size_t... SL>
struct noise_helper<seed, StList<SL...>> {
static constexpr std::array<uchar, sizeof...(SL)> get_array() {
return {{ Generate<seed, SL>::value ... }};
}
};

template<u32 seed, std::size_t N>
constexpr std::array<uchar, N> get_key() {
return noise_helper<seed, Count_t<N>>::get_array();
}

/*
// Get an unscrambled character of a string
template<u32 seed, std::size_t index, std::size_t N>
char get_unscrambled_char(const std::array<uchar, N> & a) {
return static_cast<char> (a[index] - Generate<seed, index>::value);
}
*/

// Metafunction to get the size of an array
template<typename T>
struct array_info;

template <typename T, size_t N>
struct array_info<T[N]>
{
typedef T type;
enum { size = N };
};

template <typename T, size_t N>
struct array_info<const T(&)[N]> : array_info<T[N]> {};

// Scramble a string
template<u32 seed, std::size_t N>
class obfuscated_string {
private:
std::array<uchar, N> cipher_text_;
std::array<uchar, N> key_;
public:
explicit constexpr obfuscated_string(const char(&a)[N])
: cipher_text_(get_cipher_text<seed, N>(a))
, key_(get_key<seed,N>())
{}

operator std::string() const {
char plain_text[N];
for (volatile std::size_t i = 0; i < N; ++i) {
volatile char temp = static_cast<char>( cipher_text_[i] - key_[i] );
plain_text[i] = temp;
}
return std::string{plain_text, plain_text + (N - 1)};///We do not copy the termination character
}
};

template<u32 seed, std::size_t N>
std::ostream & operator<< (std::ostream & s, const obfuscated_string<seed, N> & str) {
s << static_cast<std::string>(str);
return s;
}

#define RNG_SEED ((__TIME__[7] - '0') * 1 + (__TIME__[6] - '0') * 10 + \
(__TIME__[4] - '0') * 60 + (__TIME__[3] - '0') * 600 + \
(__TIME__[1] - '0') * 3600 + (__TIME__[0] - '0') * 36000) + \
(__LINE__ * 100000)

#define LIT(STR) \
obfuscated_string<RNG_SEED, array_info<decltype(STR)>::size>{STR}

auto S2 = LIT(("Hewwo, I'm hunting wabbits"));

int main() {
constexpr auto S1 = LIT(("What's up doc"));
std::cout << S1 << std::endl;
std::cout << S2 << std::endl;
}


Related Topics



Leave a reply



Submit