Trap Representation

trap representation

  1. A trap representation is a catch-all term used by C99 (IIRC not by C89) to describe bit patterns that fit into the space occupied by a type, but trigger undefined behavior if used as a value of that type. The definition is in section 6.2.6.1p5 (with tentacles into all of 6.2.6) and I'm not going to quote it here because it's long and confusing. A type for which such bit patterns exist is said to "have" trap representations. No type is required to have any trap representations, but the only type that the standard guarantees will not have trap representations is unsigned char (6.2.6.1p5, 6.2.6.2p1).

    The standard gives two hypothetical examples of trap representations, neither of which correspond to anything that any real CPU has done for many years, so I'm not going to confuse you with them. A good example of a trap representation (also the only thing that qualifies as a hardware-level trap representation on any CPU you are likely to encounter) is a signaling NaN in a floating-point type. C99 Annex F (section 2.1) explicitly leaves the behavior of signaling NaNs undefined, even though IEC 60559 specifies their behavior in detail.

    It's worth mentioning that, while pointer types are allowed to have trap representations, null pointers are not trap representations. Null pointers only cause undefined behavior if they are dereferenced or offset; other operations on them (most importantly, comparisons and copies) are well-defined. Trap representations cause undefined behavior if you merely read them using the type that has the trap representation. (Whether invalid but non-null pointers are, or should be, considered trap representations is a subject of debate. The CPU doesn't treat them that way, but the compiler might.)

  2. The code you show has undefined behavior, but this is because of the pointer-aliasing rules, not because of trap representations. This is how to convert a float into the int with the same representation (assuming, as you say, sizeof(float) == sizeof(int))

    int extract_int(float f)
    {
    union { int i; float f; } u;
    u.f = f;
    return u.i;
    }

    This code has unspecified (not undefined) behavior in C99, which basically means the standard doesn't define what integer value is produced, but you do get some valid integer value, it's not a trap representation, and the compiler is not allowed to optimize on the assumption that you have not done this. (Section 6.2.6.1, para 7. My copy of C99 might include technical corrigienda — my recollection is that this was undefined in the original publication but was changed to unspecified in a TC.)

Trap representation for structures

member has no trap representations, because uint64_t has no trap representations.

7.20.1.1 Exact-width integer types

2 The typedef name uintN_t designates an unsigned integer type
with width N and no padding bits. Thus, uint24_t denotes such an
unsigned integer type with a width of exactly 24 bits.

No padding bits. And from the following section we learn:

6.2.6.2 Integer types

1 For unsigned integer types other than unsigned char, the bits
of the object representation shall be divided into two groups: value
bits and padding bits (there need not be any of the latter). If there
are N value bits, each bit shall represent a different power of 2
between 1 and 2N - 1, so that objects of that type shall be
capable of representing values from 0 to 2N - 1 using a
pure binary representation; this shall be known as the value
representation. The values of any padding bits are
unspecified.53)

Where note 53, despite being non-normative, tells us that those padding bits (if they exist) can be used to trap:

53) Some combinations of padding bits might generate trap
representations, for example, if one padding bit is a parity bit.
Regardless, no arithmetic operation on valid values can generate a
trap representation other than as part of an exceptional condition
such as an overflow, and this cannot occur with unsigned types. All
other combinations of padding bits are alternative object
representations of the value specified by the value bits.

While value bits can never hold a pattern that is illegal.

So you cannot produce a trap representation of uint64_t in a well-formed program. Mind you, that your program has UB due to the out of bounds access, but that is not caused by possibility of trap representations. It's undefined all on its own.

How to check whether an int variable contains a legal (not trap representation) value?

In C++'s current working draft (for C++20), an integer cannot have a trap representation. An integer is mandated as two's complement: ([basic.fundamental]/3)

An unsigned integer type has the same object representation, value representation, and alignment requirements ([basic.align]) as the corresponding signed integer type.
For each value x of a signed integer type, the value of the corresponding unsigned integer type congruent to x modulo 2N has the same value of corresponding bits in its value representation. 41
[ Example: The value −1 of a signed integer type has the same representation as the largest value of the corresponding unsigned type.
— end example
]

Where the note 41 says

This is also known as two's complement representation.

This was changed in p0907.

Additionally, padding bits in integers cannot cause traps: ([basic.fundamental/4])

Each set of values for any padding bits ([basic.types]) in the object representation are alternative representations of the value specified by the value representation.
[ Note: Padding bits have unspecified value, but do not cause traps.
See also ISO C 6.2.6.2.
— end note
]

Can all bits 0 be a trap representation for integers?

From C Standard, 6.2.6.2, Integer Types

For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type.

What is the meaning of the type char in C?

Q1. Does 6.2.6.1 p5 mean object representation is defined only when an
objected is divided by char?

No. It simply means that the object representation is a sequence of values of type unsigned char.

(why should it be divided by char, rather
than remaining itself?)

The size of char (and of unsigned char) is the unit in which object sizes are measured. unsigned char has no padding bits, so all its bits are significant to its value. Thus values of type unsigned char are the basic units of storage in C's model of memory.

Note also that there are ways to access the individual bytes of an object representation that do not require copying the representation to an array, but the language specification chooses this definition to avoid forming a set of circular definitions.

Q2. Why doesn't only character type cause undefined behavior when trap
representation value is stored?(why is to read trap representation
undefined behavior unless an object doesn't have character type?)

Trap representations are a characteristic of interpreting a byte sequence as a value of a particular type. I suspect what you're missing is that C doesn't offer many choices for how you read a given object's representation: roughly speaking, you can do it as an object of a compatible type or as a sequence of [signed|unsigned] char.* Only the latter alternative does not attempt to interpret the representation in question as an object of a type for which it is a trap representation.

Q3. I guess generally we don't have to read trap representation. But if one try to read it, does 6.2.6.1 p6 mean any wider type(than char) should be copied in a char array as Q1 code example?

No. You can examine the bytes of an object's representation in place by taking the object's address, casting that to type unsigned char *, and reading the bytes through the pointer:

my_type o;
/* ... assign a value to o ... */
unsigned char *p = &o;

for (size_t i = 0; i < sizeof o; i++) {
printf("%hhx", *p++);
}

Do also note, however, that the "assign a value to o" part is tricky if the objective is to produce a trap representation in o.


*You can also read an object's representation as part of the representation of an aggregate or union that contains the object as a member, but that is not what the specifications are talking about here.

What is the meaning of producing negative zeroes in a system that doesn't support it?

Your interpretation is correct.

Going up to paragraph 2 of 6.2.6.2:

For signed integer types, the bits of the object
representation shall be divided into three groups: value bits,
padding bits, and the sign bit. There need not be any
padding bits; signed char shall not have any padding bits.
There shall be exactly one sign bit. Each bit that is a
value bit shall have the same value as the same bit in the
object representation of the corresponding unsigned type (if there are
M value bits in the signed type and N in the unsigned type, then M ≤ N
). If the sign bit is zero, it shall not affect the resulting
value. If the sign bit is one, the value shall be modified
in one of the following ways:

  • the corresponding value with sign bit 0 is negated ( sign and magnitude );
  • the sign bit has the value − (2M)( two’s complement );
  • the sign bit has the value − (2M − 1) ( ones’ complement ).

Which of these applies is implementation-defined, as is whether the
value with sign bit 1 and all value bits zero (for the first
two), or with sign bit and all value bits 1 (for ones’
complement), is a trap representation or a normal value. In
the case of sign and magnitude and ones’ complement, if this
representation is a normal value it is called a negative zero.

This means an implementation using either one's complement or sign and magnitude has, for a given size integer type, a specific representation which must be either negative zero or a trap representation. It's then up to the implementation to choose which one of those applies.

As an example, suppose a system has sign and magnitude representation and a 32 bit int with no padding. Then the representation that would be negative zero, if it is supported, is 0x80000000.

Now suppose the following operations are performed:

 int x = 0x7fffffff;
x = ~x;

If the implementation supports negative zero, the ~ operator will generate -0 as the result and store it in x. If it does not, it creates a trap representation and invokes undefined behavior as per paragraph 4.



Related Topics



Leave a reply



Submit