reinterpret_cast between char* and std::uint8_t* - safe?
Ok, let's get truly pedantic. After reading this, this and this, I'm pretty confident that I understand the intention behind both Standards.
So, doing reinterpret_cast
from std::uint8_t*
to char*
and then dereferencing the resulting pointer is safe and portable and is explicitly permitted by [basic.lval].
However, doing reinterpret_cast
from char*
to std::uint8_t*
and then dereferencing the resulting pointer is a violation of strict aliasing rule and is undefined behavior if std::uint8_t
is implemented as extended unsigned integer type.
However, there are two possible workarounds, first:
static_assert(std::is_same_v<std::uint8_t, char> ||
std::is_same_v<std::uint8_t, unsigned char>,
"This library requires std::uint8_t to be implemented as char or unsigned char.");
With this assert in place, your code will not compile on platforms on which it would result in undefined behavior otherwise.
Second:
std::memcpy(uint8buffer, charbuffer, size);
Cppreference says that std::memcpy
accesses objects as arrays of unsigned char
so it is safe and portable.
To reiterate, in order to be able to reinterpret_cast
between char*
and std::uint8_t*
and work with resulting pointers portably and safely in a 100% standard-conforming way, the following conditions must be true:
CHAR_BIT == 8
.std::uint8_t
is defined.std::uint8_t
is implemented aschar
orunsigned char
.
On a practical note, the above conditions are true on 99% of platforms and there is likely no platform on which the first 2 conditions are true while the 3rd one is false.
How to work with uint8_t instead of char?
The general, portable, roundtrip-correct way would be to:
- demand in your API that all byte values can be expressed with at most 8 bits,
- use the layout-compatibility of
char
,signed char
andunsigned char
for I/O, and - convert
unsigned char
touint8_t
as needed.
For example:
bool read_one_byte(std::istream & is, uint8_t * out)
{
unsigned char x; // a "byte" on your system
if (is.get(reinterpret_cast<char *>(&x)))
{
*out = x;
return true;
}
return false;
}
bool write_one_byte(std::ostream & os, uint8_t val)
{
unsigned char x = val;
return os.write(reinterpret_cast<char const *>(&x), 1);
}
Some explanation: Rule 1 guarantees that values can be round-trip converted between uint8_t
and unsigned char
without losing information. Rule 2 means that we can use the iostream I/O operations on unsigned char
variables, even though they're expressed in terms of char
s.
We could also have used is.read(reinterpret_cast<char *>(&x), 1)
instead of is.get()
for symmetry. (Using read
in general, for stream counts larger than 1, also requires the use of gcount()
on error, but that doesn't apply here.)
As always, you must never ignore the return value of I/O operations. Doing so is always a bug in your program.
Is it safe to implicitly convert a `uint8_t` (read from a socket) to a `char`?
From implicit conversions - Numeric Conversion/Integral conversions:
To unsigned
If the destination type is unsigned, the resulting value is the
smallest unsigned value equal to the source value modulo 2n where n
is the number of bits used to represent the destination type. That is,
depending on whether the destination type is wider or narrower, signed
integers are sign-extended[footnote 1] or truncated and unsigned
integers are zero-extended or truncated respectively.
To signed
If the destination type is signed, the value does not change if the
source integer can be represented in the destination type. Otherwise
the result is implementation-defined (until C++20)the unique value of
the destination type equal to the source value modulo 2n where n is
the number of bits used to represent the destination type. (since
C++20). (Note that this is different from signed integer arithmetic
overflow, which is undefined).
So for values in range, there should be no conversion. Otherwise, I interpret it as if your machine represents values as two's complement, there is no changes in the bits for conversion to unsigned (from C++20 also to signed) and implementation defined until C++20. (I am not sure why, but I assume most compilers do not change the value, even though they are allowed to).
Regarding cstyle-cast
vs static-cast
: cstyle-cast performs (link)
When the C-style cast expression is encountered, the compiler
attempts to interpret it as the following cast expressions, in this
order:a) const_cast<new_type>(expression);
b) static_cast<new_type>(expression), with extensions: pointer or
reference to a derived class is additionally allowed to be cast to
pointer or reference to unambiguous base class (and vice versa) even
if the base class is inaccessible (that is, this cast ignores the
private inheritance specifier). Same applies to casting pointer to
member to pointer to member of unambiguous non-virtual base;c) static_cast (with extensions) followed by const_cast;
d) reinterpret_cast<new_type>(expression);e) reinterpret_cast followed> by const_cast. The first choice that satisfies the requirements of the respective cast operator is selected, even if it cannot be compiled.
So for signed<->unsiged
conversions, cstyle-cast
should be the same as static_cast
.
For implicit conversion (implicit conversions - Order of the conversions)
Implicit conversion sequence consists of the following, in this order:
- zero or one standard conversion sequence;
- zero or one user-defined conversion;
- zero or one standard conversion sequence.
, where
A standard conversion sequence consists of the following, in this
order:
- zero or one conversion from the following set: lvalue-to-rvalue
conversion, array-to-pointer conversion, and function-to-pointer
conversion;- zero or one numeric promotion or numeric conversion;
- zero or one function pointer conversion; (since C++17) 4) zero or one
qualification adjustment.
and numeric conversion is yet again the conversion quoted on the top.
static_cast
itself converts between types using a combination of implicit and user-defined conversions (link). So there should not be any difference between implicit or explicit.
Is reinterpret_castchar*(myTypePtr) assumed to point to an array?
auto ptr = reinterpret_cast<char*>(myTypePtr);
The standard permit this conversion, due to:
An object pointer can be explicitly converted to an object pointer of a different type.73 When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_cast<cv T*>(static_cast<cv void*>(v)). [ Note: Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. — end note ]
So, the conversion is equivalent to:
assume
myTypePtr
has no any cv qualifier.
auto ptr = static_cast<char*>(static_cast<void*>(myTypePtr))
And you are permitted to dereference myTypePtr
to access the value within the object(the pointer point to), due to:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
- a char, unsigned char, or std::byte type.
If myTypePtr
is not an object of array of char type, as long as you applied addition to ptr
, It will result in undefined behavior, due to:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements,86 the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element
x[j + p] if 0 ≤ i+j≤n ; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i - j] if 0 ≤ i - j≤n ; otherwise, the behavior is undefined.
For addition or subtraction, if the expressions P or Q have type “pointer to cv T”, where T and the array element type are not similar, the behavior is undefined.
Because the element of myTypePtr
is not of type char. Hence applying addition to ptr
result in undefined behavior.
Or maybe std::memcpy must be used for that purpose?
Yes, If the object to which myTypePtr
point subject to the following rules:
For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes ([intro.memory]) making up the object can be copied into an array of char, unsigned char, or std::byte ([cstddef.syn]).43 If the content of that array is copied back into the object, the object shall subsequently hold its original value.
OR
For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the underlying bytes ([intro.memory]) making up obj1 are copied into obj2,44 obj2 shall subsequently hold the same value as obj1.
However, It's unfortunately we can't implement such a memcpy
subject to the current standard.
Related Topics
How to Receive a Lambda as Parameter by Reference
Why Do C++ Streams Use Char Instead of Unsigned Char
Is an Object Guaranteed to Be Moved When It Is Returned
Under What Circumstances Are C++ Destructors Not Going to Be Called
C++ Linking Error After Upgrading to MAC Os X 10.9/Xcode 5.0.1
How to Count Lines of a File in C++
How to Tell If a Lib Was Compiled with /Mt or /Md
Measuring Exception Handling Overhead in C++
What Does (Template) Rebind<> Do
Forcing MAChine to Use Dedicated Graphics Card
Linear Index Upper Triangular Matrix
Unresolved External Symbol "Public: Virtual Struct Qmetaobject Const * _Thiscall Parent
No == Operator Found While Comparing Structs in C++
Reverse String C++ Using Char Array
Gsl::Not_Null<T*> VS. Std::Reference_Wrapper<T> VS. T&
Maximum Number of Parameters in Function Declaration