Is It Legal/Well-Defined C++ to Call a Non-Static Method That Doesn't Access Members Through a Null Pointer

Is it legal/well-defined C++ to call a non-static method that doesn't access members through a null pointer?

This will probably work on most systems, but it is Undefined Behaviour. Quoth the Standard:

5.2.5.3

If E1 has the type “pointer to class X,” then the expression E1->E2 is converted to the equivalent form (*(E1)).E2 [...]

And:

5.2.5.1

A postfix expression followed by a dot . or an arrow ->, optionally followed by the keyword template (14.8.1), and then followed by an id-expression, is a postfix expression. The postfix expression before the dot or arrow is evaluated;58) [...]

58) This evaluation happens even if the result is unnecessary to determine the value of the entire postfix expression, for example if the id-expression denotes a static member.

Evaluation of *x where x is a null pointer results in Undefined Behaviour, so yours is clearly a case of UB, before the function is even entered.

Does calling a method on a NULL pointer which doesn't access any data ever fail?

Can you think of any plausible reason why any future compiler wouldn't behave as expected?

A helpful compiler might add code to access the real object under the hood in debug builds in the hope of helping you catch this issue in your code early in the development cycle.

What if the function does modify members, but the NULL ptr is guarded against. For instance,

void foo::blah()
{
foo* pThis = this ? this : new foo();
pThis->i++;
}

Since it is undefined behavior to call that function with a null pointer, the compiler can assume that the test will always pass and optimize that function to:

void foo::blah()
{
this->i++;
}

Note that this is correct, since if this is not null, it behaves as-if the original code was executed, and if this was null, it would be undefined behavior and the compiler does not need to provide any particular behavior at all.

Why would code explicitly call a static method via a null pointer?

Static member functions were added into C++ in 1989, in Release 2.0 of the AT&T C++ Language System (pre-standardisation). Prior to that, the static keyword could not be used to declare static member functions, so code authors used workarounds, principally the one you have observed of indirecting a null pointer.

In the Selected Readings accompanying version 2.0 of the AT&T C++ Language System, in section 1-22, Stroustrup writes:

It was also observed that nonportable code, such as:

((X*)0)->f();

was used to simulate static member functions. This trick is a time bomb because sooner or later someone will make an f() that is used this way virtual and the call will fail horribly because there is no X object at address zero. Even where f() is not virtual such calls will fail under some implementations of dynamic linking.

Your code was written to compile under Cfront 1.0 or by someone who was not aware at the time of the addition of static member functions to the language.

The annotation of the member function with static is indeed a puzzle, as Cheers and hth. - Alf has observed; Cfront 1.0 would have rejected that code with:

error:  member Method() cannot be static

so it cannot have been there initially. I think Potatoswatter is most likely correct; static was added at a later date to document and enforce the static method attribute of Method, once a C++ 2.0 compiler could be guaranteed to be available, but without the calling code being updated. To confirm this you'd need to interview the original programmer(s) or at least examine source control history (if any exists).

May I have a real life example where a non-static member function not accessing the object being called via a null pointer causes observable problems?

Simple:

struct Object { void foo(); std::string s; };

void print(Object* o) {
o->foo();
if (o) { std::cout << o->x << "\n"; }
}

Let us say that foo does not access any non-static attribute of Object (ie, x).

The problem is that because formally o->foo() is undefined behavior if o is null, then it is obvious that o is not null. Therefore the check is redundant.

The function is thus optimized:

void print(Object* o) {
o->foo();
std::cout << o->x << "\n";
}

Reversing the order does not change anything:

void print(Object* o) {
if (o) { std::cout << o->x << "\n"; }
o->foo();
}

is still optimized:

void print(Object* o) {
std::cout << o->x << "\n";
o->foo();
}

Sometimes referred to as the Time Travel Clause of Undefined Behavior by some SO members.

For more information, check out Chris Lattner's serie on Undefined Behavior:

  • What Every C Programmar Should Know About Undefined Behavior 1/3
  • What Every C Programmar Should Know About Undefined Behavior 2/3
  • What Every C Programmar Should Know About Undefined Behavior 3/3

Your specific concern is addressed in 2/3.

Whether this actually fails depend on the compiler you use, the optimization passes you specify and the order in which they run.

Do you really want to depend on all that :x ?

Of course, one would argue that's it is pointless to have a function member that does not access any state of the object... so the question itself is of little value in practice (but interesting for its theoretical aspects).

c++ access static members using null pointer

TL;DR: Your example is well-defined. Merely dereferencing a null pointer is not invoking UB.

There is a lot of debate over this topic, which basically boils down to whether indirection through a null pointer is itself UB.

The only questionable thing that happens in your example is the evaluation of the object expression. In particular, d->a is equivalent to (*d).a according to [expr.ref]/2:

The expression E1->E2 is converted to the equivalent form
(*(E1)).E2; the remainder of 5.2.5 will address only the first
option (dot).

*d is just evaluated:

The postfix expression before the dot or arrow is evaluated;65 the
result of that evaluation, together with the id-expression, determines
the result of the entire postfix expression.

65) If the class member access expression is evaluated, the subexpression evaluation happens even if the result is unnecessary
to determine the value of the entire postfix expression, for example if the id-expression denotes a static member.

Let's extract the critical part of the code. Consider the expression statement

*d;

In this statement, *d is a discarded value expression according to [stmt.expr]. So *d is solely evaluated1, just as in d->a.

Hence if *d; is valid, or in other words the evaluation of the expression *d, so is your example.

Does indirection through null pointers inherently result in undefined behavior?

There is the open CWG issue #232, created over fifteen years ago, which concerns this exact question. A very important argument is raised. The report starts with

At least a couple of places in the IS state that indirection through a
null pointer produces undefined behavior: 1.9 [intro.execution]
paragraph 4 gives "dereferencing the null pointer" as an example of
undefined behavior, and 8.3.2 [dcl.ref] paragraph 4 (in a note) uses
this supposedly undefined behavior as justification for the
nonexistence of "null references."

Note that the example mentioned was changed to cover modifications of const objects instead, and the note in [dcl.ref] - while still existing - is not normative. The normative passage was removed to avoid commitment.

However, 5.3.1 [expr.unary.op] paragraph 1, which describes the unary
"*" operator, does not say that the behavior is undefined if the
operand is a null pointer, as one might expect. Furthermore, at least
one passage gives dereferencing a null pointer well-defined behavior:
5.2.8 [expr.typeid] paragraph 2 says

If the lvalue expression is obtained by applying the unary * operator
to a pointer and the pointer is a null pointer value (4.10
[conv.ptr]), the typeid expression throws the bad_typeid exception
(18.7.3 [bad.typeid]).


This is inconsistent and should be cleaned up.

The last point is especially important. The quote in [expr.typeid] still exists and appertains to glvalues of polymorphic class type, which is the case in the following example:

int main() try {

// Polymorphic type
class A
{
virtual ~A(){}
};

typeid( *((A*)0) );

}
catch (std::bad_typeid)
{
std::cerr << "bad_exception\n";
}

The behavior of this program is well-defined (an exception will be thrown and catched), and the expression *((A*)0) is evaluated as it isn't part of an unevaluated operand. Now if indirection through null pointers induced UB, then the expression written as

*((A*)0);

would be doing just that, inducing UB, which seems nonsensical when compared to the typeid scenario. If the above expression is merely evaluated as every discarded-value expression is1, where is the crucial difference that makes the evaluation in the second snippet UB? There is no existing implementation that analyzes the typeid-operand, finds the innermost, corresponding dereference and surrounds its operand with a check - there would be a performance loss, too.

A note in that issue then ends the short discussion with:

We agreed that the approach in the standard seems okay: p = 0; *p;
is not inherently an error.
An lvalue-to-rvalue conversion would give
it undefined behavior.

I.e. the committee agreed upon this. Although the proposed resolution of this report, which introduced so-called "empty lvalues", was never adopted…

However, “not modifiable” is a compile-time concept, while in fact
this deals with runtime values and thus should produce undefined
behavior instead. Also, there are other contexts in which lvalues can
occur, such as the left operand of . or .*, which should also be
restricted. Additional drafting is required.

that does not affect the rationale. Then again, it should be noted that this issue even precedes C++03, which makes it less convincing while we approach C++17.


CWG-issue #315 seems to cover your case as well:

Another instance to consider is that of invoking a member function
from a null pointer:

  struct A { void f () { } };
int main ()
{
A* ap = 0;
ap->f ();
}

[…]

Rationale (October 2003):

We agreed the example should be allowed. p->f() is rewritten as
(*p).f() according to 5.2.5 [expr.ref]. *p is not an error when
p is null unless the lvalue is converted to an rvalue (4.1
[conv.lval]), which it isn't here.

According to this rationale, indirection through a null pointer per se does not invoke UB without further lvalue-to-rvalue conversions (=accesses to stored value), reference bindings, value computations or the like. (Nota bene: Calling a non-static member function with a null pointer should invoke UB, albeit merely hazily disallowed by [class.mfct.non-static]/2. The rationale is outdated in this respect.)

I.e. a mere evaluation of *d does not suffice to invoke UB. The identity of the object is not required, and neither is its previously stored value. On the other hand, e.g.

*p = 123;

is undefined since there is a value computation of the left operand, [expr.ass]/1:

In all cases, the assignment is sequenced after the value computation
of the right and left operands

Because the left operand is expected to be a glvalue, the identity of the object referred to by that glvalue must be determined as mentioned by the definition of evaluation of an expression in [intro.execution]/12, which is impossible (and thus leads to UB).


1 [expr]/11:

In some contexts, an expression only appears for its side effects.
Such an expression is called a discarded-value expression. The
expression is evaluated and its value is discarded.
[…]. The lvalue-to-rvalue conversion (4.1) is
applied if and only if the expression is a glvalue of
volatile-qualified type and […]

Why is dereferencing of nullptr while using a static method not undefined behaviour in C++?

Standard citations in this answer are from the C++17 spec (N4713).

One of the sections cited in your question answers the question for non-static member functions. [class.mfct.non-static]/2:

If a non-static member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined.

This applies to, for example, accessing an object through a different pointer type:

std::string foo;

A *ptr = reinterpret_cast<A *>(&foo); // not UB by itself
ptr->non_static_mem_fn(); // UB by [class.mfct.non-static]/2

A null pointer doesn't point at any valid object, so it certainly doesn't point to an object of type A either. Using your own example:

p->non_static_mem_fn(); // UB by [class.mfct.non-static]/2

With that out of the way, why does this work in the static case? Let's pull together two parts of the standard:

[expr.ref]/2:

... The expression E1->E2 is converted to the equivalent form (*(E1)).E2 ...

[class.static]/1 (emphasis mine):

... A static member may be referred to using the class member access syntax, in which case the object expression is evaluated.

The second block, in particular, says that the object expression is evaluated even for static member access. This is important if, for example, it is a function call with side effects.

Put together, this implies that these two blocks are equivalent:

// 1
p->static_mem_fn();

// 2
*p;
A::static_mem_fn();

So the final question to answer is whether *p alone is undefined behavior when p is a null pointer value.

Conventional wisdom would say "yes" but this is not actually true. There is nothing in the standard that states dereferencing a null pointer alone is UB and there are several discussions that directly support this:

  • Issue 315, as you have mentioned in your question, explicitly states that *p is not UB when the result is unused.
  • DR 1102 removes "dereferencing the null pointer" as an example of UB. The given rationale is:

    There are core issues surrounding the undefined behavior of dereferencing a null pointer. It appears the intent is that dereferencing is well defined, but using the result of the dereference will yield undefined behavior. This topic is too confused to be the reference example of undefined behavior, or should be stated more precisely if it is to be retained.

  • This DR links to issue 232 where it is discussed to add wording that explicitly indicates *p as defined behavior when p is a null pointer, as long as the result is not used.

In conclusion:

p->non_static_mem_fn(); // UB by [class.mfct.non-static]/2
p->static_mem_fn(); // Defined behavior per issue 232 and 315.

Is calling a nonvirtual member function on a non-constructed object well-defined?

That code is undefined behavior.

Note that inside the constructor you can also call virtual member function.

The somewhat tricky part is calling virtual member function during member initialization before the constructor code begins. Still valid but it's not obvious what happens (the point is that until the constructor code begins the object isn't considered yet an instance of its class and virtual member functions get dispatched to the base class).

Some compilers emit a warning if you use this in the member initialization list exactly because the this pointer will behave strangely at that point (it will start behave normally only after the start of the constructor).

The code apparently works because most compilers use the VMT approach for method dispatching but the VMT is not needed to call a non-virtual method and thus if the method code doesn't dereference in any way this then things seems to "work". However the fact that the code seems to work in an implementation (or even in every implementation for that matter) still doesn't make it legal C++ code.



Related Topics



Leave a reply



Submit