C++ Static Const Access Through a Null Pointer

C++ static const access through a NULL pointer

You can use a pointer (or other expression) to access a static member; however, doing so through a NULL pointer unfortunately is officially undefined behavior. From 9.4/2 "Static members":

A static member s of class X may be
referred to using the qualified-id
expression X::s; it is not necessary
to use the class member access syntax
(5.2.5) to refer to a static member. A
static member may be referred to using
the class member access syntax, in
which case the object-expression is
evaluated.

Based on the example that follows:

class process {
public:
static void reschedule();
};

process& g();

void f()
{
process::reschedule(); // OK: no object necessary
g().reschedule(); // g() is called
}

The intent is to allow you to ensure that functions will be called in this scenario.

c++ access static members using null pointer

TL;DR: Your example is well-defined. Merely dereferencing a null pointer is not invoking UB.

There is a lot of debate over this topic, which basically boils down to whether indirection through a null pointer is itself UB.

The only questionable thing that happens in your example is the evaluation of the object expression. In particular, d->a is equivalent to (*d).a according to [expr.ref]/2:

The expression E1->E2 is converted to the equivalent form
(*(E1)).E2; the remainder of 5.2.5 will address only the first
option (dot).

*d is just evaluated:

The postfix expression before the dot or arrow is evaluated;65 the
result of that evaluation, together with the id-expression, determines
the result of the entire postfix expression.

65) If the class member access expression is evaluated, the subexpression evaluation happens even if the result is unnecessary
to determine the value of the entire postfix expression, for example if the id-expression denotes a static member.

Let's extract the critical part of the code. Consider the expression statement

*d;

In this statement, *d is a discarded value expression according to [stmt.expr]. So *d is solely evaluated1, just as in d->a.

Hence if *d; is valid, or in other words the evaluation of the expression *d, so is your example.

Does indirection through null pointers inherently result in undefined behavior?

There is the open CWG issue #232, created over fifteen years ago, which concerns this exact question. A very important argument is raised. The report starts with

At least a couple of places in the IS state that indirection through a
null pointer produces undefined behavior: 1.9 [intro.execution]
paragraph 4 gives "dereferencing the null pointer" as an example of
undefined behavior, and 8.3.2 [dcl.ref] paragraph 4 (in a note) uses
this supposedly undefined behavior as justification for the
nonexistence of "null references."

Note that the example mentioned was changed to cover modifications of const objects instead, and the note in [dcl.ref] - while still existing - is not normative. The normative passage was removed to avoid commitment.

However, 5.3.1 [expr.unary.op] paragraph 1, which describes the unary
"*" operator, does not say that the behavior is undefined if the
operand is a null pointer, as one might expect. Furthermore, at least
one passage gives dereferencing a null pointer well-defined behavior:
5.2.8 [expr.typeid] paragraph 2 says

If the lvalue expression is obtained by applying the unary * operator
to a pointer and the pointer is a null pointer value (4.10
[conv.ptr]), the typeid expression throws the bad_typeid exception
(18.7.3 [bad.typeid]).


This is inconsistent and should be cleaned up.

The last point is especially important. The quote in [expr.typeid] still exists and appertains to glvalues of polymorphic class type, which is the case in the following example:

int main() try {

// Polymorphic type
class A
{
virtual ~A(){}
};

typeid( *((A*)0) );

}
catch (std::bad_typeid)
{
std::cerr << "bad_exception\n";
}

The behavior of this program is well-defined (an exception will be thrown and catched), and the expression *((A*)0) is evaluated as it isn't part of an unevaluated operand. Now if indirection through null pointers induced UB, then the expression written as

*((A*)0);

would be doing just that, inducing UB, which seems nonsensical when compared to the typeid scenario. If the above expression is merely evaluated as every discarded-value expression is1, where is the crucial difference that makes the evaluation in the second snippet UB? There is no existing implementation that analyzes the typeid-operand, finds the innermost, corresponding dereference and surrounds its operand with a check - there would be a performance loss, too.

A note in that issue then ends the short discussion with:

We agreed that the approach in the standard seems okay: p = 0; *p;
is not inherently an error.
An lvalue-to-rvalue conversion would give
it undefined behavior.

I.e. the committee agreed upon this. Although the proposed resolution of this report, which introduced so-called "empty lvalues", was never adopted…

However, “not modifiable” is a compile-time concept, while in fact
this deals with runtime values and thus should produce undefined
behavior instead. Also, there are other contexts in which lvalues can
occur, such as the left operand of . or .*, which should also be
restricted. Additional drafting is required.

that does not affect the rationale. Then again, it should be noted that this issue even precedes C++03, which makes it less convincing while we approach C++17.


CWG-issue #315 seems to cover your case as well:

Another instance to consider is that of invoking a member function
from a null pointer:

  struct A { void f () { } };
int main ()
{
A* ap = 0;
ap->f ();
}

[…]

Rationale (October 2003):

We agreed the example should be allowed. p->f() is rewritten as
(*p).f() according to 5.2.5 [expr.ref]. *p is not an error when
p is null unless the lvalue is converted to an rvalue (4.1
[conv.lval]), which it isn't here.

According to this rationale, indirection through a null pointer per se does not invoke UB without further lvalue-to-rvalue conversions (=accesses to stored value), reference bindings, value computations or the like. (Nota bene: Calling a non-static member function with a null pointer should invoke UB, albeit merely hazily disallowed by [class.mfct.non-static]/2. The rationale is outdated in this respect.)

I.e. a mere evaluation of *d does not suffice to invoke UB. The identity of the object is not required, and neither is its previously stored value. On the other hand, e.g.

*p = 123;

is undefined since there is a value computation of the left operand, [expr.ass]/1:

In all cases, the assignment is sequenced after the value computation
of the right and left operands

Because the left operand is expected to be a glvalue, the identity of the object referred to by that glvalue must be determined as mentioned by the definition of evaluation of an expression in [intro.execution]/12, which is impossible (and thus leads to UB).


1 [expr]/11:

In some contexts, an expression only appears for its side effects.
Such an expression is called a discarded-value expression. The
expression is evaluated and its value is discarded.
[…]. The lvalue-to-rvalue conversion (4.1) is
applied if and only if the expression is a glvalue of
volatile-qualified type and […]

Why would code explicitly call a static method via a null pointer?

Static member functions were added into C++ in 1989, in Release 2.0 of the AT&T C++ Language System (pre-standardisation). Prior to that, the static keyword could not be used to declare static member functions, so code authors used workarounds, principally the one you have observed of indirecting a null pointer.

In the Selected Readings accompanying version 2.0 of the AT&T C++ Language System, in section 1-22, Stroustrup writes:

It was also observed that nonportable code, such as:

((X*)0)->f();

was used to simulate static member functions. This trick is a time bomb because sooner or later someone will make an f() that is used this way virtual and the call will fail horribly because there is no X object at address zero. Even where f() is not virtual such calls will fail under some implementations of dynamic linking.

Your code was written to compile under Cfront 1.0 or by someone who was not aware at the time of the addition of static member functions to the language.

The annotation of the member function with static is indeed a puzzle, as Cheers and hth. - Alf has observed; Cfront 1.0 would have rejected that code with:

error:  member Method() cannot be static

so it cannot have been there initially. I think Potatoswatter is most likely correct; static was added at a later date to document and enforce the static method attribute of Method, once a C++ 2.0 compiler could be guaranteed to be available, but without the calling code being updated. To confirm this you'd need to interview the original programmer(s) or at least examine source control history (if any exists).

why can't I initialize a static const pointers with other static const pointer?

Here is the text covering this:

C11 6.6/7:

More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following:

  • an arithmetic constant expression,
  • a null pointer constant,
  • an address constant, or
  • an address constant for a complete object type plus or minus an integer constant
    expression.

C11 6.6/9:

An address constant is a null pointer, a pointer to an lvalue designating an object of static
storage duration, or a pointer to a function designator; it shall be created explicitly using
the unary & operator or an integer constant cast to pointer type, or implicitly by the use of
an expression of array or function type. The array-subscript [] and member-access .
and -> operators, the address & and indirection * unary operators, and pointer casts may
be used in the creation of an address constant, but the value of an object shall not be
accessed by use of these operators.

C11 6.6/10:

An implementation may accept other forms of constant expressions.

So, your temp does not qualify to be an address constant because it doesn't satisfy any of the options in the definition of address constant.

Rationale: IDK, it seems like an oversight to me. Perhaps it is to avoid placing undue burden on the compiler; e.g. if (void *)1000 does not actually point to an object (e.g. it points outside the process's address space, or is a trap representation), then evaluating temp causes undefined behaviour (no diagnostic required).

Workaround: Using unsigned long long is not as good as using uintptr_t. However ,

static const long *const temp1 = (void *)1000; 

seems like a better option than the integer options. You could use a #define macro to avoid repeating the actual address.

Accessing static member through invalid pointer: guaranteed to work?

it's not clear whether "evaluation" here involves an actual dereference.

I read "evaluation" here as "the subexpression is evaluated." That would mean that the unary * is evaluated and you perform indirection via a null pointer, yielding undefined behavior.

This issue (accessing a static member via a null pointer) is discussed in another question, When does invoking a member function on a null instance result in undefined behavior? While it discusses member functions specifically, I don't see any reason that data members are any different in this respect. There is some good discussion of the issue there.

There was a defect reported against the C++ Standard that asks "Is call of static member function through null pointer undefined?" (see CWG Defect 315) This defect is closed and its resolution states that it is valid to call a static member function via a null pointer:

p->f() is rewritten as (*p).f() according to 5.2.5 [expr.ref]. *p is not an error when p is null unless the lvalue is converted to an rvalue

However, this resolution is in fact wrong.

It presupposes the concept of an "empty lvalue," which is part of the proposed resolution for another defect, CWG defect 232, which asks the more general question, "Is indirection through a null pointer undefined behavior?"

The resolution to that defect would make certain forms of indirection through a null pointer (like calling a static member function) valid. However, that defect is still open and its resolution has not been adopted into the C++ Standard. Until that defect is closed and its resolution is incorporated into the C++ Standard, indirection via a null pointer (or dereferencing a null pointer, if one prefers that term) always yields undefined behavior.

Is accessing static class member via unitilialized pointer UB?

The semantics of a->n are that *a is evaluated, but not accessed, since the data member is static. See C++17 [expr.ref]:

... The postfix expression before the dot or arrow is evaluated ... The expression
E1->E2 is converted to the equivalent form (*(E1)).E2 ...

There is also a footnote that says:

If the class member access expression is evaluated, the subexpression evaluation happens even if the result is unnecessary to determine the value of the entire postfix expression, for example if the id-expression denotes a static member.

In this case, the expression *a is evaluated. Since a is an automatic variable that has not been initialized, [dcl.init]/12 applies:

If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases: [ ... ]

Evaluating *a clearly requires accessing the value of the pointer variable a, which is an indeterminate value, therefore it is UB.

static Pointer to Custom Type stays nullptr after initialization with static not-null pointer of same Type

In "Graphic.h", you have

static Window* window;

This statement is included in every translation unit (.cpp) that will #include Graphic.h. Therefore each unit will have its own variable window. What happens then is that Graphic.cpp assigns its own window, but main.cpp find its own variable window unchanged.

What you should do is the following:

In Graphic.h, declare window but don't define it:

extern Window* window;

And define it only once, in Graphic.cpp:

Window* Graphic::window = nullptr;

This way all the translation units will refer to the same global variable window.

You should do the same for the variable Graphic::Window* mainWindow defined in App.h.

extern Graphic::Window* mainWindow; // <-- in App.h

And

Graphic::Window* App::mainWindow = nullptr; // <-- in App.cpp

C: Assigning static const char * const to static const char *

6.7.8/4 [C99]:

All the expressions in an initializer for an object that has static storage duration shall be constant expressions or string literals.

STRING_A is neither, hence the error.

One to way to work around this would be along the following lines:

void input_function()
{
static const char *current = NULL;
if (current == NULL) {
current = STRING_A;
}

...
}

Static allocation and placement new result in null pointer dereference

This answer describes the problem happening at run time, using the example of GCC. Other compilers will generate different code with similar problem as your code has an inherent the issue of lack of initialization.

Without the avoidance of dynamic memory allocation for efficiency purpose, without the generic approach, without templates, with every step decomposed, your code really boils down to:

class InterfaceA {};

class InterfaceB {};

class ObjectA : public virtual InterfaceA {
public:
ObjectA(InterfaceB *intrB) : m_intrB(intrB) {}

private:
InterfaceB *m_intrB;
};

class ObjectB : public virtual InterfaceB {
public:
ObjectB(InterfaceA *intrA) : m_intrA(intrA) {}

private:
InterfaceA *m_intrA;
};

#include <new>

void simple_init() {
void *ObjectA_mem = operator new(sizeof(ObjectA));
void *ObjectB_mem = operator new(sizeof(ObjectB));
ObjectA *premature_ObjectA = static_cast<ObjectA *>(ObjectA_mem); // still not constructed
ObjectB *premature_ObjectB = static_cast<ObjectB *>(ObjectB_mem);
InterfaceA *ia = premature_ObjectA; // derived-to-base conversion
InterfaceB *ib = premature_ObjectB;
new (ObjectA_mem) ObjectA(ib);
new (ObjectB_mem) ObjectB(ia);
}

For maximum compiled code readability, let's write that with global variables instead:

void *ObjectA_mem;
void *ObjectB_mem;
ObjectA *premature_ObjectA;
ObjectB *premature_ObjectB;
InterfaceA *ia;
InterfaceB *ib;

void simple_init() {
ObjectA_mem = operator new(sizeof(ObjectA));
ObjectB_mem = operator new(sizeof(ObjectB));
premature_ObjectA = static_cast<ObjectA *>(ObjectA_mem); // still not constructed
premature_ObjectB = static_cast<ObjectB *>(ObjectB_mem);
ia = premature_ObjectA; // derived-to-base conversion
ib = premature_ObjectB;
new (ObjectA_mem) ObjectA(ib);
new (ObjectB_mem) ObjectB(ia);
}

That gives us a very nice assembly code. We can see that the statement:

  ia = premature_ObjectA;  // derived-to-base conversion

compiles to:

        movq    premature_ObjectA(%rip), %rax
testq %rax, %rax
je .L6
movq premature_ObjectA(%rip), %rdx
movq premature_ObjectA(%rip), %rax
movq (%rax), %rax
subq $24, %rax
movq (%rax), %rax
addq %rdx, %rax
jmp .L7
.L6:
movl $0, %eax
.L7:
movq %rax, ia(%rip)

First we see that the (un-optimized) code tests for a null pointer, the equivalent of

if (premature_ObjectA == 0) 
ia = 0;
else
// real stuff

The real stuff being:

    movq    premature_ObjectA(%rip), %rdx
movq premature_ObjectA(%rip), %rax
movq (%rax), %rax
subq $24, %rax
movq (%rax), %rax
addq %rdx, %rax
movq %rax, ia(%rip)

So a value pointed to by premature_ObjectA is dereferenced, interpreted as a pointer, decreased by 24, the resulting pointer is used to read a value, that value is added to the original pointer premature_ObjectA. Since the content of premature_ObjectA is uninitialized, that obviously cannot work.

What's happening is that the compiler is fetching the vptr (vtable pointer) to read the entry at -3 "quad" (3*8 = 24) from the level 0 (a vtable like a building can have negative floors, it just means that the 0th floor isn't the lowest floor):

vtable for ObjectA:
.quad 0
.quad 0
.quad typeinfo for ObjectA
vtable for ObjectB:
.quad 0
.quad 0
.quad typeinfo for ObjectB

The vtable (of each of these objects) starts at its end, after "typeinfo for ObjectA", as we can see inside compiled code for ObjectA::ObjectA(InterfaceB*):

        movl    $vtable for ObjectA+24, %edx
...
movq %rdx, (%rax)

So during construction, the vptr is set to "floor 0" of the vtable which is before the first virtual function, at the end if there is no virtual function.

At floor -3 there is the beginning of the vtable:

vtable for ObjectA:
.quad 0

The value 0 is for "InterfaceA is at offset 0 inside a complete ObjectA object".

The fine details of the vtable layout will be compiler dependent, the principles:

  • initialization of the vptr hidden data member (and possibly multiple other hidden members) in the constructor
  • using these hidden members during conversion to InterfaceA base class

remains the same.

My explanation does not provide a fix: we don't even know what kind of high level problem you have and why you use these constructor argument and mutually dependent classes.

Knowing what these classes represent, we might be able to help more.



Related Topics



Leave a reply



Submit