Set All Bytes of Int to (Unsigned Char)0, Guaranteed to Represent Zero

Set all bytes of int to (unsigned char)0, guaranteed to represent zero?

C++ 11

I think the pertinent part are

3.9.1/1 In C++11

For character types, all bits of the object representation participate
in the value representation. For unsigned character types, all possible bit patterns of the value representation represent numbers. These requirements do not hold for other types.

Along with 3.9.1/7

The representations of integral types
shall define values by use of a pure binary numeration system.

C11

6.2.6.2 is very explicit

For unsigned integer types other than unsigned char, the bits of the object
representation shall be divided into two groups: value bits and padding bits (there need
not be any of the latter). If there are N value bits, each bit shall represent a different
power of 2 between 1 and 2N−1, so that objects of that type shall be capable of
representing values from 0 to 2N − 1 using a pure binary representation; this shall be
known as the value representation. The values of any padding bits are unspecified.

For signed integer types, the bits of the object representation shall be divided into three
groups: value bits, padding bits, and the sign bit. There need not be any padding bits;
signed char shall not have any padding bits. There shall be exactly one sign bit.
Each bit that is a value bit shall have the same value as the same bit in the object
representation of the corresponding unsigned type (if there are M value bits in the signed
type and N in the unsigned type, then M ≤ N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the
following ways:

— the corresponding value with sign bit 0 is negated (sign and magnitude);

— the sign bit has the value −(2M) (two’s complement);

— the sign bit has the value −(2M − 1) (ones’ complement).

Which of these applies is implementation-defined, as is whether the value with sign bit 1
and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones’ complement), is a trap representation or a normal value. In the case of sign and
magnitude and ones’ complement, if this representation is a normal value it is called a
negative zero.

Summmary

I think the intend is the same for both standard.

  • char, signed char and unsigned char have all bits participating in the value

  • other integer types may have padding bits which don't participate in the value. A wrong bit pattern in them may imply a not valid value.

  • the interpretation is a pure binary representation, something whose definition is expanded in the C11 citation above.

Two things which may be not clear:

  • can -0 (for sign and magnitude and _ones' complement) be a trap value in C++

  • can one of the padding bits be a parity bit (i.e. can we modify the representation if we ensure that the padding bits aren't modified or not)

I'd be conservative and assume yes for the both.

Zeroing out anonymous union

In your simple case it's the same, but only because (most likely) int (and unsigned is short for unsigned int) is 32 bits (i.e. four bytes). If the array is larger, or int is only 16 bits it will not be the same.

Test zero for 4 bytes in an int

There are several ways in the famous bithacks page

bool hasZeroByte(unsigned int v)
{
return ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);
}

or

bool hasZeroByte = ((v + 0x7efefeff) ^ ~v) & 0x81010100;
if (hasZeroByte) // or may just have 0x80 in the high byte
{
hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);
}

And the likely most compact way when compiling to assembly

#define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)

As they're tricks, they're hard to understand so if you want clarity, mask out each byte and check like in dasblinkenlight's answer

Example assembly output on Compiler Explorer

Is it guaranteed that memset will zero out the padding bits in a structure?

Perhaps worth noting that memset doesn't know anything about your struct (or array, or primitive, or any chunk of memory whatsoever that you happen to unleash it on), so even if it wanted to leave the padding bits intact, it wouldn't even know where they are.

Is it more portable to use ~0 or -1 to represent a type with all bits flipped to 1?

In general you should be casting before applying the operator, because casting to a wider unsigned type may or may not cause sign extension depending on whether the source type is signed.

If you want a value of primitive type T with all bits set, the most portable approach is ~T(0). It should work on any number-like classes as well.

As Mr. Bingley said, the types from stdint.h are guaranteed to be two's-complement, so that -T(1) will also give a value with all bits set.

The source you reference has the right thought but misses some of the details, for example neither of (T)~0u nor (T)-1u will be the same as ~T(0u) and -T(1u). (To be fair, litb wasn't talking about widening in that answer you linked)

Note that if there are no variables, just an unsuffixed literal 0 or -1, then the source type is guaranteed to be signed and none of the above concerns apply. But why write different code when dealing with literals, when the universally correct code is no more complex?

How are the values assigned in the following union?

I would say that depends on the size of int and char. A union contains the memory of the largest variable. If int is 4 bytes and char[2] represents 2 bytes, the int consumes more memory than the char-array, so you are not initialising the full int-memory to 0 by setting all char-variables. It depends on your memory initialization mechanisms but basically the value of the int will appear to be random as the extra 2 bytes are filled with unspecified values.

Besides, filling one variable of a union and reading another is exactly what makes unions unsafe in my oppinion.

If you are sure that int is the largest datatype, you can initialize the whole union by writing

union a
{
int i;
char ch[2];
};

void foo()
{
a u = { 0 }; // Initializes the first field in the union
cout << u.i;
}

Therefore it may be a good idea to place the largest type at the beginning of the union. Althugh that doesn't garantuee that all datatypes can be considered zero or empty when all bits are set to 0.

Looking for a better way to represent unsigned char arrays

How you represent the commands is going to depend very much on what command sequences your program is going to send.

If your program is totally general-purpose and needs to be able to send literally any possible sequence of bytes, then a const unsigned char array (or const uint8_t if you want to be a little bit more explicit) is probably the way to go.

On the other hand, if there are some "rules" to your protocol that you know won't ever change or need to have any exceptions, than you can write your code to include/enforce those rules rather than just blindly sending raw programmer-provided sequences (and hoping the programmer typed them all in correctly).

For example, if you know for a fact that your serial device always requires that every command starts with the prefix 0x7E, 0x01, 0x00, 0x20, then you can cut down on duplication (and therefore on the chances of making a typo) by removing that prefix from your sequences and having your send-function automatically prepend it, instead, e.g.:

const unsigned char configurePresetDelivery[] = { 0x38, 0x0B, 0x04, 0x03, 0xF2, 0x40, 0x59, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xE3 };
const unsigned char beginPresetDelivery[] = { 0x3C, 0x01, 0x04, 0x2B };
const unsigned char configureDirectDelivery[] = { 0x37, 0x02, 0X03, 0XF2, 0xD5 };

const unsigned char prefix[] = {0x7e, 0x01, 0x00, 0x20};

void send_prefix_and_command(const unsigned char * cmdWithoutPrefix, int numBytes)
{
send(prefix, sizeof(prefix));
send(cmdWithoutPrefix, numBytes);
}

[...]

send_prefix_and_command(configurePresetDelivery, sizeof(configurePresetDelivery));

... and (taking it a bit further) if you know that some of your command-sequences are going to vary based on run-time parameters, then rather than hand-coding each variation, you can create a command-generator function to do it for you (and thus encapsulate the potentially-error-prone generation step into a single code-location, so there's only one routine to maintain/debug instead of many). E.g.

// This is easier to do using std::vector, so I will use it
std::vector<unsigned char> generatePresetDataCommand(unsigned char presetID, unsigned short presetValue)
{
// I'm totally making this up just to show an example
std::vector<unsigned char> ret;
ret.push_back(0x66);
ret.push_back(0x67);
ret.push_back(presetID);
ret.push_back((presetValue>>8)&0xFF); // store high-bits of 16-bit value into a byte
ret.push_back((presetValue>>0)&0xFF); // store low-bits of 16-bit value into a byte
return ret;
}

// Convenience wrapper-function so later code can send a vector with less typing
void send_prefix_and_command(const std::vector<unsigned char> & vec)
{
send_prefix_and_command(&vec[0], vec.size());
}

[...]

// The payoff -- easy one-liner sending of a command with little chance of getting it wrong
send_prefix_and_command(generatePresetDataCommand(42, 32599));

Does Standard define null pointer constant to have all bits set to zero?

No, NULL doesn't have to be all bits zero.

N1570 6.3.2.3 Pointers paragraph 3:

An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant. 66) If a null pointer constant is converted to a
pointer type
, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.

See my emphasis above: Integer 0 is converted if necessary, it doesn't have to have same bit presentation.

Note 66 on bottom of the page says:

66) The macro NULL is defined in (and other headers) as a null pointer constant; see 7.19.

Which leads us to a paragraph of that chapter:

The macros are

NULL

which expands to an implementation-defined null pointer constant

And what is more, on Annex J.3.12 (Portability issues, Implementation-defined behaviour, Library functions) says:

— The null pointer constant to which the macro NULL expands (7.19).



Related Topics



Leave a reply



Submit