Is It Safe to Memset Bool to 0

Is it safe to memset bool to 0?

Update

P1236R1: Alternative Wording for P0907R4 Signed Integers are Two's Complement says the following:

As per EWG decision in San Diego, deviating from P0907R3, bool is specified to have some integral type as its underlying type, but the presence of padding bits for "bool" will remain unspecified, as will the mapping of true and false to values of the underlying type.

Original Answer

I believe this unspecified although it seems likely the underlying representation of false would be all zeros. Boost.Container relies on this as well (emphasis mine):

Boost.Container uses std::memset with a zero value to initialize some
types as in most platforms this initialization yields to the desired
value initialization with improved performance.

Following the C11 standard, Boost.Container assumes that for any
integer type, the object representation where all the bits are zero
shall be a representation of the value zero in that type. Since
_Bool/wchar_t/char16_t/char32_t are also integer types in C, it considers all C++ integral types as initializable via std::memset.

This C11 quote they they point to as a rationale actually comes from a C99 defect: defect 263: all-zero bits representations which added the following:

For any integer type, the object representation where all the bits are
zero shall be a representation of the value zero in that type.

So then the question here is the assumption correct, are the underlying object representation for integer compatible between C and C++?
The proposal Resolving the difference between C and C++ with regards to object representation of integers sought to answer this to some extent which as far as I can tell was not resolved. I can not find conclusive evidence of this in the draft standard. We have a couple of cases where it links to the C standard explicitly with respect to types. Section 3.9.1 [basic.fundamental] says:

[...] The signed and unsigned integer types shall satisfy the
constraints given in the C standard, section 5.2.4.2.1.

and 3.9 [basic.types] which says:

The object representation of an object of type T is the sequence of N
unsigned char objects taken up by the object of type T, where N equals
sizeof(T). The value representation of an object is the set of bits
that hold the value of type T. For trivially copyable types, the value
representation is a set of bits in the object representation that
determines a value, which is one discrete element of an
implementation-defined set of values.44

where footnote 44(which is not normative) says:

The intent is that the memory model of C++ is compatible with that of
ISO/IEC 9899 Programming Language C.

The farthest the draft standard gets to specifying the underlying representation of bool is in section 3.9.1:

Types bool, char, char16_t, char32_t, wchar_t, and the signed and
unsigned integer types are collectively called integral types.50 A
synonym for integral type is integer type. The representations of
integral types shall define values by use of a pure binary numeration
system.51 [ Example: this International Standard permits 2’s
complement, 1’s complement and signed magnitude representations for
integral types. —end example ]

the section also says:

Values of type bool are either true or false.

but all we know of true and false is:

The Boolean literals are the keywords false and true. Such literals
are prvalues and have type bool.

and we know they are convertible to 0 an 1:

A prvalue of type bool can be converted to a prvalue of type int, with
false becoming zero and true becoming one.

but this gets us no closer to the underlying representation.

As far as I can tell the only place where the standard references the actual underlying bit value besides padding bits was removed via defect report 1796: Is all-bits-zero for null characters a meaningful requirement? :

It is not clear that a portable program can examine the bits of the representation; instead, it would appear to be limited to examining the bits of the numbers corresponding to the value representation (3.9.1 [basic.fundamental] paragraph 1). It might be more appropriate to require that the null character value compare equal to 0 or '\0' rather than specifying the bit pattern of the representation.

There are more defect reports that deal with the gaps in the standard with respect to what is a bit and difference between the value and object representation.

Practically, I would expect this to work, I would not consider it safe since we can not nail this down in the standard. Do you need to change it, not clear, you clearly have a non-trivial trade-off involved. So assuming it works now the question is do we consider it likely to break with future versions of various compilers, that is unknown.

Is it well-defined to use memset on a dynamic bool array?

  1. memset does not change the effective type. C11 (C17) 6.5p6:


    1. The effective type of an object for an access to its stored value is
      the declared type of the object, if any. [ This clearly is not the case. An allocated object has no declared type. ]

      If a value is stored into
      an object having no declared type through an lvalue having a type that
      is not a character type, then the type of the lvalue becomes the
      effective type of the object for that access and for subsequent
      accesses that do not modify the stored value. [ this is not the case as an lvalue of character type is used by memset! ]

      If a value is copied
      into an object having no declared type using memcpy or memmove, or is
      copied as an array of character type
      , then the effective type of the
      modified object for that access and for subsequent accesses that do
      not modify the value is the effective type of the object from which
      the value is copied, if it has one.
      [ this too is not the case here - it is not copied with memcpy, memmove or an array of characters ]

      For all other accesses to an
      object having no declared type, the effective type of the object is
      simply the type of the lvalue used for the access.
      [ therefore, this has to apply in our case. Notice that this applies to accessing it as characters inside memset as well as dereferencing array. ]

    Since the values are stored with an lvalue that has character type inside memset, and not have the bytes copied from another object with lvalues of character type (the clause exists to equate memcpy and memmove with doing the same with an explicit for loop!), it does not get an effective type, and the effective type of elements is _Bool for those accessed through array.

    There might be parts in the C17 standard that are underspecified, but this certainly is not one of those cases.

  2. array[0] would not violate the effective type rule.

    That does not make using the value of array[0] any more legal. It can (and will most probably) be a trap value!

    I tried the following functions

    #include <stdio.h>
    #include <stdbool.h>

    void f1(bool x, bool y) {
    if (!x && !y) {
    puts("both false");
    }
    }

    void f2(bool x, bool y) {
    if (x && y) {
    puts("both true");
    }
    }

    void f3(bool x) {
    if (x) {
    puts("true");
    }
    }

    void f4(bool x) {
    if (!x) {
    puts("false");
    }
    }

    with array[0] as any of the arguments - for the sake of avoiding compile-time optimizations this was compiled separately. When compiled with -O3 the following messages were printed:

    both true
    true

    And when without any optimization

    both false
    both true
    true
    false

Should a compiler interpret an arbitrary non-zero value in bool as true correctly?

No, reading from that bool after the memset is (at least, see below) unspecified behaviour so there is no guarantee as to what value will be returned.

It might turn out that in the particular architecture, the value representation of a bool consists only of the high-order bit, in which case the value produced by broadcasting 123 over the byte(s) of the bool would be turn out to be a representation of false.

The C++ standard does not specify what the actual bit patterns representing the values true and false are. An implementation may use any or all of the bits in the object representation of a bool -- which must be at least one byte, but might be longer -- and it may map more than one bit pattern to the same value:

§3.9.1 [basic.fundamental]/1:

…For narrow character types, all bits of the object representation participate in the value representation. For unsigned narrow character types, each possible bit pattern of the value representation represents a distinct number. These requirements do not hold for other types.

Paragraph 6 of the same section requires values of type bool to be either true or false, but a footnote points out that in the face of undefined behaviour a bool "might behave as if it is neither true nor false." (That's obviously within the bounds of undefined behaviour; if a program exhibits UB, there are no requirements whatsoever on its execution, even before the UB is evidenced.)

Nothing in the standard permits using low-level memory copying operations on objects other than arrays of narrow chars, except for the case in which the object is trivially copyable and the object representation is saved by copying it to a buffer and later restored by copying it back. Any other use of C library functions which overwrite arbitrary bytes in an object representation should be undefined by the general definition of undefined behaviour ("[the standard] omits any explicit definition of behavior"). But I'm forced to agree that there is no explicit statement that memset is UB, and so I'll settle on unspecified behaviour, which seems quite clear since the representation of bool is certainly unspecified.

Is it safe to reinterpret_castbool* zeroed out memory?

The core issue - even if the memory's all zeros, is it valid to read from it as if from a properly initialised bool - is the same as for this question.

Long story short: it's undefined behaviour that works on common systems but isn't guaranteed portable. Specific implementations are allowed to document behaviour for cases the Standard leaves undefined, so it's worth doing some research for the specific platforms/compilers you care about.

What is the fastest way to fill the entire boolean array with a single value (true)?

Except for the for-loop, they all result in the exact same assembler code if you're using a modern C++ compiler and optimizations enabled, take a look at this link: https://godbolt.org/z/1a4a4bKfW



Related Topics



Leave a reply



Submit