vector <unsigned char> vs string for binary data
You should prefer std::vector
over std::string
. In common cases both solutions can be almost equivalent, but std::string
s are designed specifically for strings and string manipulation and that is not your intended use.
Can I safely use std::string for binary data in C++11?
The conversion static_cast<char>(uc)
where uc
is of type is unsigned char
is always valid: according to 3.9.1 [basic.fundamental] the representation of char
, signed char
, and unsigned char
are identical with char
being identical to one of the two other types:
Objects declared as characters (char) shall be large enough to store any member of the implementation’s basic character set. If a character from this set is stored in a character object, the integral value of that character object is equal to the value of the single character literal form of that character. It is implementation-defined whether a char object can hold negative values. Characters can be explicitly declared unsigned or signed. Plain char, signed char, and unsigned char are three distinct types, collectively called narrow character types. A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (3.11); that is, they have the same object representation. For narrow character types, all bits of the object representation participate in the value representation. For unsigned narrow character types, all possible bit patterns of the value representation represent numbers. These requirements do not hold for other types. In any particular implementation, a plain char object can take on either the same
values as a signed char or an unsigned char; which one is implementation-defined.
Converting values outside the range of unsigned char
to char
will, of course, be problematic and may cause undefined behavior. That is, as long as you don't try to store funny values into the std::string
you'd be OK. With respect to bit patterns, you can rely on the n
th bit to translated into 2n
. There shouldn't be a problem to store binary data in a std::string
when processed carefully.
That said, I don't buy into your premise: Processing binary data mostly requires dealing with bytes which are best manipulated using unsigned
values. The few cases where you'd need to convert between char*
and unsigned char*
create convenient errors when not treated explicitly while messing up the use of char
accidentally will be silent! That is, dealing with unsigned char
will prevent errors. I also don't buy into the premise that you get all those nice string functions: for one, you are generally better off using the algorithms anyway but also binary data is not string data. In summary: the recommendation for std::vector<unsigned char>
isn't just coming out of thin air! It is deliberate to avoid building hard to find traps into the design!
The only mildly reasonable argument in favor of using char
could be the one about string literals but even that doesn't hold water with user-defined string literals introduced into C++11:
#include <cstddef>
unsigned char const* operator""_u (char const* s, size_t)
{
return reinterpret_cast<unsigned char const*>(s);
}
unsigned char const* hello = "hello"_u;
C++ strings vs vector<char>
I'd use vector<char>
only if I explicitly intent to store an array of char values, which is not a string. E.g. if for some reason I'd collect all the characters used somewhere in a specific text, the result might be a vector<char>
.
To be clear: it is all about expressing the intent.
Vector<unsinged char> Transformation to String and Back
The right way to do this is to use base64 encoding and decoding.
Its commonly used and will work without blob_binding.
Withdraw of this, is a performance penalty.
Simplest way to read binary data from a std::vector<unsigned char>?
You have access to the data in a vector through its operator[]
. A vector's data is guranteed to be stored in a single contiguous array, and []
returns a reference to a member of that array. You may use that reference directly, or through a memcpy.
std::vector<unsigned char> v;
...
byteField = v[12];
memcpy(&intField, &v[13], sizeof intField);
memcpy(charArray, &v[20], lengthOfCharArray);
EDIT 1:
If you want something "more convenient" that that, you could try:
template <class T>
ReadFromVector(T& t, std::size_t offset,
const std::vector<unsigned char>& v) {
memcpy(&t, &v[offset], sizeof(T));
}
Usage would be:
std::vector<unsigned char> v;
...
char c;
int i;
uint64_t ull;
ReadFromVector(c, 17, v);
ReadFromVector(i, 99, v);
ReadFromVector(ull, 43, v);
EDIT 2:
struct Reader {
const std::vector<unsigned char>& v;
std::size_t offset;
Reader(const std::vector<unsigned char>& v) : v(v), offset() {}
template <class T>
Reader& operator>>(T&t) {
memcpy(&t, &v[offset], sizeof t);
offset += sizeof t;
return *this;
}
void operator+=(int i) { offset += i };
char *getStringPointer() { return &v[offset]; }
};
Usage:
std::vector<unsigned char> v;
Reader r(v);
int i; uint64_t ull;
r >> i >> ull;
char *companyName = r.getStringPointer();
r += strlen(companyName);
Write vector<unsigned char> in binary mode
e
, P
, R
and f
are bytes. The file is 4 characters (bytes) long and it contains what you put there.
The only difference between a "binary file" and a "text file" is how linebreaks are read/written on Windows (and maybe some other special characters on very old OSes). The only difference between the character e
and the number 65
is how the program that you're using to read the file chooses to display it. A text editor will display e
and a hex editor will display 65
.
Related Topics
Why Use a "Tpp" File When Implementing Templated Functions and Classes Defined in a Header
How Far to Go with a Strongly Typed Language
Why Does Enumwindows Return More Windows Than I Expected
Getprocaddress Function in C++
When Should We Use Sizeof with and Without Parentheses
Is There a Reason to Use Enum to Define a Single Constant in C++ Code
C++ Trying to Get Function Address from a Std::Function
How to Find the 'Temp' Directory in Linux
Best Way to Start a Thread as a Member of a C++ Class
Returning a "Null Reference" in C++
Are There Any Issues with Allocating Memory Within Constructor Initialization Lists
Testing If Given Number Is Integer
How to Create a Type List (For Variadic Templates) That Contains N-Times the Same Type