What Is the Purpose and Return Type of the _Builtin_Offsetof Operator

what is the purpose and return type of the __builtin_offsetof operator?

It's a builtin provided by the GCC compiler to implement the offsetof macro that is specified by the C and C++ Standard:

GCC - offsetof

It returns the offset in bytes that a member of a POD struct/union is at.

Sample:

struct abc1 { int a, b, c; };
union abc2 { int a, b, c; };
struct abc3 { abc3() { } int a, b, c; }; // non-POD
union abc4 { abc4() { } int a, b, c; }; // non-POD

assert(offsetof(abc1, a) == 0); // always, because there's no padding before a.
assert(offsetof(abc1, b) == 4); // here, on my system
assert(offsetof(abc2, a) == offsetof(abc2, b)); // (members overlap)
assert(offsetof(abc3, c) == 8); // undefined behavior. GCC outputs warnings
assert(offsetof(abc4, a) == 0); // undefined behavior. GCC outputs warnings

@Jonathan provides a nice example of where you can use it. I remember having seen it used to implement intrusive lists (lists whose data items include next and prev pointers itself), but i can't remember where it was helpful in implementing it, sadly.

Portability of using stddef.h's offsetof rather than rolling your own

To answer #2: yes, gcc-4* (I'm currently looking at v4.3.4, released 4 Aug 2009, but it should hold true for all gcc-4 releases to date). The following definition is used in their stddef.h:

#define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER)

where __builtin_offsetof is a compiler builtin like sizeof (that is, it's not implemented as a macro or run-time function). Compiling the code:

#include <stddef.h>

struct testcase {
char array[256];
};

int main (void) {
char buffer[offsetof(struct testcase, array[0])];
return 0;
}

would result in an error using the expansion of the macro that you provided ("size of array ‘buffer’ is not an integral constant-expression") but would work when using the macro provided in stddef.h. Builds using gcc-3 used a macro similar to yours. I suppose that the gcc developers had many of the same concerns regarding undefined behavior, etc that have been expressed here, and created the compiler builtin as a safer alternative to attempting to generate the equivalent operation in C code.

Additional information:

  • A mailing list thread from the Linux kernel developer's list
  • GCC's documentation on offsetof
  • A sort-of-related question on this site

Regarding your other questions: I think R's answer and his subsequent comments do a good job of outlining the relevant sections of the standard as far as question #1 is concerned. As for your third question, I have not heard of a modern C compiler that does not have stddef.h. I certainly wouldn't consider any compiler lacking such a basic standard header as "production". Likewise, if their offsetof implementation didn't work, then the compiler still has work to do before it could be considered "production", just like if other things in stddef.h (like NULL) didn't work. A C compiler released prior to C's standardization might not have these things, but the ANSI C standard is over 20 years old so it's extremely unlikely that you'll encounter one of these.

The whole premise to this problems begs a question: If these people are convinced that they can't trust the version of offsetof that the compiler provides, then what can they trust? Do they trust that NULL is defined correctly? Do they trust that long int is no smaller than a regular int? Do they trust that memcpy works like it's supposed to? Do they roll their own versions of the rest of the C standard library functionality? One of the big reasons for having language standards is so that you can trust the compiler to do these things correctly. It seems silly to trust the compiler for everything else except offsetof.

Update: (in response to your comments)

I think my co-workers behave like yours do :-) Some of our older code still has custom macros defining NULL, VOID, and other things like that since "different compilers may implement them differently" (sigh). Some of this code was written back before C was standardized, and many older developers are still in that mindset even though the C standard clearly says otherwise.

Here's one thing you can do to both prove them wrong and make everyone happy at the same time:

#include <stddef.h>

#ifndef offsetof
#define offsetof(tp, member) (((char*) &((tp*)0)->member) - (char*)0)
#endif

In reality, they'll be using the version provided in stddef.h. The custom version will always be there, however, in case you run into a hypothetical compiler that doesn't define it.

Based on similar conversations that I've had over the years, I think the belief that offsetof isn't part of standard C comes from two places. First, it's a rarely used feature. Developers don't see it very often, so they forget that it even exists. Second, offsetof is not mentioned at all in Kernighan and Ritchie's seminal book "The C Programming Language" (even the most recent edition). The first edition of the book was the unofficial standard before C was standardized, and I often hear people mistakenly referring to that book as THE standard for the language. It's much easier to read than the official standard, so I don't know if I blame them for making it their first point of reference. Regardless of what they believe, however, the standard is clear that offsetof is part of ANSI C (see R's answer for a link).


Here's another way of looking at question #1. The ANSI C standard gives the following definition in section 4.1.5:

     offsetof( type,  member-designator)

which expands to an integral constant expression that has type size_t,
the value of which is the offset in bytes, to the structure member
(designated by member-designator ), from the beginning of its
structure (designated by type ).

Using the offsetof macro does not invoke undefined behavior. In fact, the behavior is all that the standard actually defines. It's up to the compiler writer to define the offsetof macro such that its behavior follows the standard. Whether it's implemented using a macro, a compiler builtin, or something else, ensuring that it behaves as expected requires the implementor to deeply understand the inner workings of the compiler and how it will interpret the code. The compiler may implement it using a macro like the idiomatic version you provided, but only because they know how the compiler will handle the non-standard code.

On the other hand, the macro expansion you provided indeed invokes undefined behavior. Since you don't know enough about the compiler to predict how it will process the code, you can't guarantee that particular implementation of offsetof will always work. Many people define their own version like that and don't run into problems, but that doesn't mean that the code is correct. Even if that's the way that a particular compiler happens to define offsetof, writing that code yourself invokes UB while using the provided offsetof macro does not.

Rolling your own macro for offsetof can't be done without invoking undefined behavior (ANSI C section A.6.2 "Undefined behavior", 27th bullet point). Using stddef.h's version of offsetof will always produce the behavior defined in the standard (assuming a standards-compliant compiler). I would advise against defining a custom version since it can cause portability problems, but if others can't be persuaded then the #ifndef offsetof snippet provided above may be an acceptable compromise.

Can we implement ANSI C's `offsetof` in Delphi?

Without a pre-processor or a built-in function, there's no way to do it quite as cleanly as the offsetof macro. The way that offsetof is able to do it so cleanly is that the pre-processor does the work. In fact some compilers implement it as a built-in, but that's beside the point. Delphi has no pre-processor, and no built-in offsetof.

The cleanest solution I know is like this:

NativeUInt(@TMyRecord(nil^).MyField)

But that is nothing like as clean as

offsetof(struct MyStruct, MyField)

In the comma operator, is the left operand guaranteed not to be actually executed if it hasn't side effects?

The comma operator (C documentation, says something very similar) has no such guarantees.

In a comma expression E1, E2, the expression E1 is evaluated, its result is discarded ..., and its side effects are completed before evaluation of the expression E2 begins

irrelevant information omitted

To put it simply, E1 will be evaluated, although the compiler might optimize it away by the as-if rule if it is able to determine that there are no side-effects.

C++ Compile-Time offsetof inside a template

The following should work (credits go to the answer to this question for the idea):

#include <cstddef>

template <typename T, typename M> M get_member_type(M T::*);
template <typename T, typename M> T get_class_type(M T::*);

template <typename T,
typename R,
R T::*M
>
constexpr std::size_t offset_of()
{
return reinterpret_cast<std::size_t>(&(((T*)0)->*M));
}

#define OFFSET_OF(m) offset_of<decltype(get_class_type(m)), \
decltype(get_member_type(m)), m>()

struct S
{
int x;
int y;
};

static_assert(OFFSET_OF(&S::x) == 0, "");

Note that in gcc, the offsetof macro expands to a builtin extension which can be used at compile time (see below). Also, your code invokes UB, it dereferences a null pointer, so even if it might work in practice, there are no guarantees.

#define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER)

As pointed out by Luc Danton, constant expressions cannot involve a reinterpret_cast according to the C++11 standard although currently gcc accepts the code (see the bug report here). Also, I found defect report 1384 which
talks about making the rules less strict, so this might change in the future.

C++ class member variable knowing its own offset

Asking a question is the best way to realize the answer, so this is where I've got:

The offset can't be a template argument, because the type has to be known before the offset can be calculated. So it has to be returned by a function of the argument. Let's add a tag type (dummy struct) and either a put an overloaded function into Owner or directly into the tag. That way we can define everything we need on one place (using a macro). The following code compiles fine with gcc 4.4.5 and prints correct pointer for all members:

#include <cstddef>
#include <iostream>

using namespace std;

(just preamble to make it really compile)

template <typename Owner, typename Tag>
struct offset_aware
{
Owner *owner()
{
return reinterpret_cast<Owner *>(
reinterpret_cast<char *>(this) - Tag::offset());
}
};

This is what's needed to make the object aware of it's own offset. Property or functor or some other code can be added freely to make it useful. Now we need to declare some extra stuff along with the member itself, so let's define this macro:

#define OFFSET_AWARE(Owner, name) \
struct name ## _tag { \
static ptrdiff_t offset() { \
return offsetof(Owner, name); \
} \
}; \
offset_aware<Owner, name ## _tag> name

This defines structure as the tag and puts in a function returning the required offset. Than it defines the data member itself.

Note, that the member needs to be public as defined here, but we could easily add a 'friend' declaration for the tag support protected and private properties. Now let's use it.

struct foo
{
int x;
OFFSET_AWARE(foo, a);
OFFSET_AWARE(foo, b);
OFFSET_AWARE(foo, c);
int y;
};

Simple, isn't it?

int main()
{
foo f;

cout << "foo f = " << &f << endl
<< "f.a: owner = " << f.a.owner() << endl
<< "f.b: owner = " << f.b.owner() << endl
<< "f.c: owner = " << f.c.owner() << endl;
return 0;
}

This prints the same pointer value on all lines. C++ standard does not allow members to have 0 size, but they will only have the size of their actual content or 1 byte if they are otherwise empty compared to 4 or 8 (depending on platform) bytes for a pointer.

Concat (##) the result of offsetof macro to an identifier?

You cannot do what you want. offsetof is (like sizeof) computed after preprocessing, during the compilation proper.

Look at the preprocessed form of your source code. With GCC you could get it using gcc -C -E CMain.c > CMain.i then use an editor (or a pager) to look inside CMain.i e.g. less CMain.i



Related Topics



Leave a reply



Submit