Reinterpret_Cast Between Char* and Std::Uint8_T* - Safe

reinterpret_cast between char* and std::uint8_t* - safe?

Ok, let's get truly pedantic. After reading this, this and this, I'm pretty confident that I understand the intention behind both Standards.

So, doing reinterpret_cast from std::uint8_t* to char* and then dereferencing the resulting pointer is safe and portable and is explicitly permitted by [basic.lval].

However, doing reinterpret_cast from char* to std::uint8_t* and then dereferencing the resulting pointer is a violation of strict aliasing rule and is undefined behavior if std::uint8_t is implemented as extended unsigned integer type.

However, there are two possible workarounds, first:

static_assert(std::is_same_v<std::uint8_t, char> ||
std::is_same_v<std::uint8_t, unsigned char>,
"This library requires std::uint8_t to be implemented as char or unsigned char.");

With this assert in place, your code will not compile on platforms on which it would result in undefined behavior otherwise.

Second:

std::memcpy(uint8buffer, charbuffer, size);

Cppreference says that std::memcpy accesses objects as arrays of unsigned char so it is safe and portable.

To reiterate, in order to be able to reinterpret_cast between char* and std::uint8_t* and work with resulting pointers portably and safely in a 100% standard-conforming way, the following conditions must be true:

  • CHAR_BIT == 8.
  • std::uint8_t is defined.
  • std::uint8_t is implemented as char or unsigned char.

On a practical note, the above conditions are true on 99% of platforms and there is likely no platform on which the first 2 conditions are true while the 3rd one is false.

How to work with uint8_t instead of char?

The general, portable, roundtrip-correct way would be to:

  1. demand in your API that all byte values can be expressed with at most 8 bits,
  2. use the layout-compatibility of char, signed char and unsigned char for I/O, and
  3. convert unsigned char to uint8_t as needed.

For example:

bool read_one_byte(std::istream & is, uint8_t * out)
{
unsigned char x; // a "byte" on your system
if (is.get(reinterpret_cast<char *>(&x)))
{
*out = x;
return true;
}
return false;
}

bool write_one_byte(std::ostream & os, uint8_t val)
{
unsigned char x = val;
return os.write(reinterpret_cast<char const *>(&x), 1);
}

Some explanation: Rule 1 guarantees that values can be round-trip converted between uint8_t and unsigned char without losing information. Rule 2 means that we can use the iostream I/O operations on unsigned char variables, even though they're expressed in terms of chars.

We could also have used is.read(reinterpret_cast<char *>(&x), 1) instead of is.get() for symmetry. (Using read in general, for stream counts larger than 1, also requires the use of gcount() on error, but that doesn't apply here.)

As always, you must never ignore the return value of I/O operations. Doing so is always a bug in your program.

Is it safe to implicitly convert a `uint8_t` (read from a socket) to a `char`?

From implicit conversions - Numeric Conversion/Integral conversions:

To unsigned

If the destination type is unsigned, the resulting value is the
smallest unsigned value equal to the source value modulo 2n where n
is the number of bits used to represent the destination type. That is,
depending on whether the destination type is wider or narrower, signed
integers are sign-extended[footnote 1] or truncated and unsigned
integers are zero-extended or truncated respectively.

To signed

If the destination type is signed, the value does not change if the
source integer can be represented in the destination type. Otherwise
the result is implementation-defined (until C++20)the unique value of
the destination type equal to the source value modulo 2n where n is
the number of bits used to represent the destination type. (since
C++20). (Note that this is different from signed integer arithmetic
overflow, which is undefined).

So for values in range, there should be no conversion. Otherwise, I interpret it as if your machine represents values as two's complement, there is no changes in the bits for conversion to unsigned (from C++20 also to signed) and implementation defined until C++20. (I am not sure why, but I assume most compilers do not change the value, even though they are allowed to).


Regarding cstyle-cast vs static-cast: cstyle-cast performs (link)

When the C-style cast expression is encountered, the compiler
attempts to interpret it as the following cast expressions, in this
order:

a) const_cast<new_type>(expression);

b) static_cast<new_type>(expression), with extensions: pointer or
reference to a derived class is additionally allowed to be cast to
pointer or reference to unambiguous base class (and vice versa) even
if the base class is inaccessible (that is, this cast ignores the
private inheritance specifier). Same applies to casting pointer to
member to pointer to member of unambiguous non-virtual base;

c) static_cast (with extensions) followed by const_cast;
d) reinterpret_cast<new_type>(expression);

e) reinterpret_cast followed> by const_cast. The first choice that satisfies the requirements of the respective cast operator is selected, even if it cannot be compiled.

So for signed<->unsiged conversions, cstyle-cast should be the same as static_cast.


For implicit conversion (implicit conversions - Order of the conversions)

Implicit conversion sequence consists of the following, in this order:

  1. zero or one standard conversion sequence;
  2. zero or one user-defined conversion;
  3. zero or one standard conversion sequence.

, where

A standard conversion sequence consists of the following, in this
order:

  1. zero or one conversion from the following set: lvalue-to-rvalue
    conversion, array-to-pointer conversion, and function-to-pointer
    conversion;
  2. zero or one numeric promotion or numeric conversion;
  3. zero or one function pointer conversion; (since C++17) 4) zero or one
    qualification adjustment.

and numeric conversion is yet again the conversion quoted on the top.

static_cast itself converts between types using a combination of implicit and user-defined conversions (link). So there should not be any difference between implicit or explicit.

Is reinterpret_castchar*(myTypePtr) assumed to point to an array?

auto ptr = reinterpret_cast<char*>(myTypePtr);

The standard permit this conversion, due to:

An object pointer can be explicitly converted to an object pointer of a different type.73 When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_­cast<cv T*>(static_­cast<cv void*>(v)). [ Note: Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value.  — end note ]

So, the conversion is equivalent to:

assume myTypePtr has no any cv qualifier.

auto ptr = static_­cast<char*>(static_­cast<void*>(myTypePtr))

And you are permitted to dereference myTypePtr to access the value within the object(the pointer point to), due to:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • a char, unsigned char, or std​::​byte type.

If myTypePtr is not an object of array of char type, as long as you applied addition to ptr, It will result in undefined behavior, due to:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements,86 the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element
x[j + p] if 0 ≤ i+j≤n ; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i - j] if 0 ≤ i - j≤n ; otherwise, the behavior is undefined.

For addition or subtraction, if the expressions P or Q have type “pointer to cv T”, where T and the array element type are not similar, the behavior is undefined.

Because the element of myTypePtr is not of type char. Hence applying addition to ptr result in undefined behavior.



Or maybe std::memcpy must be used for that purpose?

Yes, If the object to which myTypePtr point subject to the following rules:

For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes ([intro.memory]) making up the object can be copied into an array of char, unsigned char, or std​::​byte ([cstddef.syn]).43 If the content of that array is copied back into the object, the object shall subsequently hold its original value.

OR

For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the underlying bytes ([intro.memory]) making up obj1 are copied into obj2,44 obj2 shall subsequently hold the same value as obj1.

However, It's unfortunately we can't implement such a memcpy subject to the current standard.



Related Topics



Leave a reply



Submit