Vector <Unsigned Char> VS String for Binary Data

vector <unsigned char> vs string for binary data

You should prefer std::vector over std::string. In common cases both solutions can be almost equivalent, but std::strings are designed specifically for strings and string manipulation and that is not your intended use.

Can I safely use std::string for binary data in C++11?

The conversion static_cast<char>(uc) where uc is of type is unsigned char is always valid: according to 3.9.1 [basic.fundamental] the representation of char, signed char, and unsigned char are identical with char being identical to one of the two other types:

Objects declared as characters (char) shall be large enough to store any member of the implementation’s basic character set. If a character from this set is stored in a character object, the integral value of that character object is equal to the value of the single character literal form of that character. It is implementation-defined whether a char object can hold negative values. Characters can be explicitly declared unsigned or signed. Plain char, signed char, and unsigned char are three distinct types, collectively called narrow character types. A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (3.11); that is, they have the same object representation. For narrow character types, all bits of the object representation participate in the value representation. For unsigned narrow character types, all possible bit patterns of the value representation represent numbers. These requirements do not hold for other types. In any particular implementation, a plain char object can take on either the same
values as a signed char or an unsigned char; which one is implementation-defined.

Converting values outside the range of unsigned char to char will, of course, be problematic and may cause undefined behavior. That is, as long as you don't try to store funny values into the std::string you'd be OK. With respect to bit patterns, you can rely on the nth bit to translated into 2ⁿ. There shouldn't be a problem to store binary data in a std::string when processed carefully.

That said, I don't buy into your premise: Processing binary data mostly requires dealing with bytes which are best manipulated using unsigned values. The few cases where you'd need to convert between char* and unsigned char* create convenient errors when not treated explicitly while messing up the use of char accidentally will be silent! That is, dealing with unsigned char will prevent errors. I also don't buy into the premise that you get all those nice string functions: for one, you are generally better off using the algorithms anyway but also binary data is not string data. In summary: the recommendation for std::vector<unsigned char> isn't just coming out of thin air! It is deliberate to avoid building hard to find traps into the design!

The only mildly reasonable argument in favor of using char could be the one about string literals but even that doesn't hold water with user-defined string literals introduced into C++11:

#include <cstddef>
unsigned char const* operator""_u (char const* s, size_t) 
{
    return reinterpret_cast<unsigned char const*>(s);
}

unsigned char const* hello = "hello"_u;

C++ strings vs vector<char>

I'd use vector<char> only if I explicitly intent to store an array of char values, which is not a string. E.g. if for some reason I'd collect all the characters used somewhere in a specific text, the result might be a vector<char>.

To be clear: it is all about expressing the intent.

Vector<unsinged char> Transformation to String and Back

The right way to do this is to use base64 encoding and decoding.

Its commonly used and will work without blob_binding.

Withdraw of this, is a performance penalty.

Simplest way to read binary data from a std::vector<unsigned char>?

You have access to the data in a vector through its operator[]. A vector's data is guranteed to be stored in a single contiguous array, and [] returns a reference to a member of that array. You may use that reference directly, or through a memcpy.

std::vector<unsigned char> v;
...
byteField = v[12];
memcpy(&intField, &v[13], sizeof intField);
memcpy(charArray, &v[20], lengthOfCharArray);

EDIT 1:
If you want something "more convenient" that that, you could try:

template <class T>
ReadFromVector(T& t, std::size_t offset, 
  const std::vector<unsigned char>& v) {
  memcpy(&t, &v[offset], sizeof(T));
}

Usage would be:

std::vector<unsigned char> v;
...
char c;
int i;
uint64_t ull;
ReadFromVector(c, 17, v);
ReadFromVector(i, 99, v);
ReadFromVector(ull, 43, v);

EDIT 2:

struct Reader {
  const std::vector<unsigned char>& v;
  std::size_t offset;
  Reader(const std::vector<unsigned char>& v) : v(v), offset() {}
  template <class T>
  Reader& operator>>(T&t) {
    memcpy(&t, &v[offset], sizeof t);
    offset += sizeof t;
    return *this;
  }
  void operator+=(int i) { offset += i };
  char *getStringPointer() { return &v[offset]; }
};

Usage:

std::vector<unsigned char> v;
Reader r(v);
int i; uint64_t ull;
r >> i >> ull;
char *companyName = r.getStringPointer();
r += strlen(companyName);

Write vector<unsigned char> in binary mode

e, P, R and f are bytes. The file is 4 characters (bytes) long and it contains what you put there.

The only difference between a "binary file" and a "text file" is how linebreaks are read/written on Windows (and maybe some other special characters on very old OSes). The only difference between the character e and the number 65 is how the program that you're using to read the file chooses to display it. A text editor will display e and a hex editor will display 65.

Vector <Unsigned Char> VS String for Binary Data