Why Uninitialized Global Variable Is Weak Symbol

Why uninitialized global variable is weak symbol?

gcc, in C mode:

Uninitialised globals which are not declared extern are treated as "common" symbols, not weak symbols.

Common symbols are merged at link time so that they all refer to the same storage; if more than one object attempts to initialise such a symbol, you will get a link-time error. (If they aren't explicitly initialised anywhere, they will be placed in the BSS, i.e. initialised to 0.)

gcc, in C++ mode:

Not the same - it doesn't do the common symbols thing. "Uninitialised" globals which are not declared extern are implicitly initialised to a default value (0 for simple types, or default constructor).


In either case, a weak symbol allows an initialised symbol to be overridden by a non-weak initialised symbol of the same name at link time.


To illustrate (concentrating on the C case here), I'll use 4 variants of a main program, which are all the same except for the way that global is declared:

  1. main_init.c:

    #include <stdio.h>

    int global = 999;

    int main(void) { printf("%d\n", global); return 0; }
  2. main_uninit.c, which omits the initialisation:

    #include <stdio.h>

    int global;

    int main(void) { printf("%d\n", global); return 0; }
  3. main_uninit_extern.c, which adds the extern keyword:

    #include <stdio.h>

    extern int global;

    int main(void) { printf("%d\n", global); return 0; }
  4. main_init_weak.c, which initialises global and declares it to be a weak symbol:

    #include <stdio.h>

    int global __attribute__((weak)) = 999;

    int main(void) { printf("%d\n", global); return 0; }

and another_def.c which initialises the same global:

int global = 1234;

Using main_uninit.c on its own gives 0:

$ gcc -o test main_uninit.c && ./test
0

but when another_def.c is included as well, global is explicitly initialised and we get the expected result:

$ gcc -o test main_uninit.c another_def.c && ./test
1234

(Note that this case fails instead if you're using C++.)

If we try with both main_init.c and another.def.c instead, we have 2 initialisations of global, which won't work:

$ gcc -o test main_init.c another_def.c && ./test
/tmp/cc5DQeaz.o:(.data+0x0): multiple definition of `global'
/tmp/ccgyz6rL.o:(.data+0x0): first defined here
collect2: ld returned 1 exit status

main_uninit_extern.c on its own won't work at all - the extern keyword causes the symbol to be an ordinary external reference rather than a common symbol, so the linker complains:

$ gcc -o test main_uninit_extern.c && ./test
/tmp/ccqdYUIr.o: In function `main':
main_uninit_extern.c:(.text+0x12): undefined reference to `global'
collect2: ld returned 1 exit status

It works fine once the initialisation from another_def.c is included:

$ gcc -o test main_uninit_extern.c another_def.c && ./test
1234

Using main_init_weak.c on its own gives the value we initialised the weak symbol to (999), as there is nothing to override it:

$ gcc -o test main_init_weak.c && ./test
999

But pulling in the other definition from another_def.c does work in this case, because the strong definition there overrides the weak definition in main_init_weak.c:

$ gcc -o test main_init_weak.c another_def.c && ./test
1234

__attribute__ ((weak)) not work for global variable

At bottom what you have observed here is just the fact that the linker will not
resolve a symbol dynamically if it can resolve it statically. See:

main.c

extern void foo(void);
extern void need_dynamic_foo(void);
extern void need_static_foo(void);

int main(void){
foo();
need_dynamic_foo();
need_static_foo();
return 0;
}

dynamic_foo.c

#include <stdio.h>

void foo(void)
{
puts("foo (dynamic)");
}

void need_dynamic_foo(void)
{
puts(__func__);
}

static_foo.c

#include <stdio.h>

void foo(void)
{
puts("foo (static)");
}

void need_static_foo(void)
{
puts(__func__);
}

Compile the sources so:

$ gcc -Wall -c main.c static_foo.c
$ gcc -Wall -fPIC -c dynamic_foo.c

Make a shared library:

$ gcc -shared -o libfoo.so dynamic_foo.o

And link a program:

$ gcc -o prog main.o static_foo.o libfoo.so -Wl,-rpath=$PWD

It runs like:

$ ./prog
foo (static)
need_dynamic_foo
need_static_foo

So foo and need_static_foo were statically resolved to the definitions from static_foo.o and
the definition of foo from libfoo.so was ignored, despite the fact that libfoo.so
was needed and provided the definition of need_dynamic_foo. It makes no difference
if we change the linkage order to:

$ gcc -o prog main.o libfoo.so static_foo.o -Wl,-rpath=$PWD
$ ./prog
foo (static)
need_dynamic_foo
need_static_foo

It also makes no difference if we replace static_foo.c with:

static_weak_foo.c

#include <stdio.h>

void __attribute__((weak)) foo(void)
{
puts("foo (static weak)");
}

void need_static_foo(void)
{
puts(__func__);
}

Compile that and relink:

$ gcc -Wall -c static_weak_foo.c
$ gcc -o prog main.o libfoo.so static_weak_foo.o -Wl,-rpath=$PWD
$ ./prog
foo (static weak)
need_dynamic_foo
need_static_foo

Although the definition of foo in static_weak_foo.c is now declared weak,
the fact that foo can be statically resolved to this definition
still preempts any need to resolve it dynamically.

Now if we write another source file containing another strong definition of
foo:

static_strong_foo.c

#include <stdio.h>

void foo(void)
{
puts("foo (static strong)");
}

and compile it and link as follows:

$ gcc -Wall -c static_strong_foo.c
$ gcc -o prog main.o static_weak_foo.o libfoo.so static_strong_foo.o -Wl,-rpath=$PWD

we see:

$ ./prog
foo (static strong)
need_dynamic_foo
need_static_foo

Now, libfoo.so still provides the definition of need_dynamic_foo, because there
is no other; static_weak_foo.o still provides the only definition of need_static_foo,
and the definition of foo in libfoo.so is still ignored because the symbol
can be statically resolved.

But in this case there are two definitions of foo in different files that are
available to resolve it statically: the weak definition in static_weak_foo.o and
the strong definition in static_strong_foo.o. By the linkage rules that you are
familiar with, the strong definition wins.

If both of these statically linked definitions of foo were strong, there would of course be a
multiple definition error, just like:

$ gcc -o prog main.o static_foo.o libfoo.so static_strong_foo.o -Wl,-rpath=$PWD
static_strong_foo.o: In function `foo':
static_strong_foo.c:(.text+0x0): multiple definition of `foo'
static_foo.o:static_foo.c:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status

in which the dynamic definition in libfoo.so plays no part. So you can
be guided by this practical principle: The rules you are familiar with for arbitrating
between weak and strong definitions of the same symbol in a linkage only apply
to rival definitions which would provoke a multiple definition error in the absence
of the
weak attribute.

multiple definition of value when compiling C program with uninitialized global in g++ but not gcc

C and C++ are different languages. Case in point, the above program is a valid C program but in ill-formed C++ program. You have violated C++'s one definition rule. There is no corresponding rule in C.

When compiling with gcc, you are compiling the above text as a C program. When compiling with g++, you are compiling the above text as a C++ program.

global variable not found in llvm JIT symbol table

The problem seems to be that uninitialized global variables are somehow optimized out and not added to the symbol table.

A quick work around to ensure that the variable gets added to the symbol table is to initialize it with an "Undefined value".

The following code allows to do such an initialization with the c++ api

// code defining the struct type
std::vector<llvm::Type *> Members(2, llvm::Type::getDoubleTy(TheContext));
llvm::StructType *TypeG = llvm::StructType::create(TheContext,Members,"g",false);

// defining the global variable
TheModule->getOrInsertGlobal("r",TypeG);
llvm::GlobalVariable *gVar = TheModule->getNamedGlobal("r");

// initialize the variable with an undef value to ensure it is added to the symbol table
gVar->setInitializer(llvm::UndefValue::get(TypeG));

This solves the problem.

Why has the .bss segment not increased when variables are added?

This is because the way global variables work.

The problem that is being solved is that it is possible to declare a global variable, without initializing it, in several .c files and not getting a duplicate symbol error. That is, every global uninitialized declaration works like a weak declaration, that can be considered external if no other declaration contains an initialization.

How it this implemented by the compiler? Easy:

  • when compiling, instead of adding that variable in the bss segment it will be added to the COMMON segment.
  • when linking, however, it will merge all the COMMON variables with the same name and discard anyone that is already in other section. The remaining ones will be moved to the bss of the executable.

And that is why you don't see your variables in the bss of the object file, but you do in the executable file.

You can check the contents of the object sections using a more modern alternative to size, such as objdump -x. And note how the variables are placed in *COM*.

It is worth noting that if you declare your global variable as static you are saying that the variable belongs to that compilation unit, so the COMMON is not used and you get the behavior you expect:

int a;
int b;
static int c;

$ size test.o
text data bss dec hex filename
91 0 4 95 5f test.o

Initializing to 0 will get a similar result.

int a;
int b;
int c = 0;

$ size test.o
text data bss dec hex filename
91 0 4 95 5f test.o

However initializing to anything other than 0 will move that variable to data:

int a;
int b = 1;
int c = 0;

$ size test.o
text data bss dec hex filename
91 4 4 99 5f test.o


Related Topics



Leave a reply



Submit