Is Main() Really Start of a C++ Program

Is main() really start of a C++ program?

No, C++ does a lot of things to "set the environment" prior to the call of main; however, main is the official start of the "user specified" part of the C++ program.

Some of the environment setup is not controllable (like the initial code to set up std::cout; however, some of the environment is controllable like static global blocks (for initializing static global variables). Note that since you don't have full control prior to main, you don't have full control on the order in which the static blocks get initialized.

After main, your code is conceptually "fully in control" of the program, in the sense that you can both specify the instructions to be performed and the order in which to perform them. Multi-threading can rearrange code execution order; but, you're still in control with C++ because you specified to have sections of code execute (possibly) out-of-order.

why main function run first in c/c++?

Because that's what the Standard defines the language to use (C++ quoted here):

[basic.start.main]

A program shall contain a global function called main. Executing a program starts a main thread of execution (...) in which the main function is invoked (...)

So the compiler has to produce the binary in a way that calls main when the program is started by the operating system, or, in case of freestanding environments, when it's loaded.

Technically speaking, it doesn't have to be the first call in the resulting assembly. The compiler can insert some additional startup code (like initializing variables etc.), which can itself be grouped into functions. This is out of concern of a C++ program developer, but becomes quite important on embedded systems, where you need/want to be aware of almost every instruction executed.

Is a main() required for a C program?

No, the ISO C standard states that a main function is only required for a hosted environment (such as one with an underlying OS).

For a freestanding environment like an embedded system (or an operating system itself), it's implementation defined. From C99 5.1.2:

Two execution environments are defined: freestanding and hosted. In both cases, program startup occurs when a designated C function is called by the execution environment.

In a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined.

As to how Linux itself starts, the start point for the Linux kernel is start_kernel though, for a more complete picture of the entire boot process, you should start here.

Does the program execution always start from main in C?

The '#pragma' command is specified in the ANSI standard to have an arbitrary implementation-defined effect. In the GNU C preprocessor, '#pragma' first attempts to run the game 'rogue'; if that fails, it tries to run the game 'hack'; if that fails, it tries to run GNU Emacs displaying the Tower of Hanoi; if that fails, it reports a fatal error. In any case, preprocessing does not continue.

-- Richard M. Stallman, The GNU C Preprocessor, version 1.34

Program execution starts at the startup code, or "runtime". This is usually some assembler routine called _start or somesuch, located (on Unix machines) in a file crt0.o that comes with the compiler package. It does the setup required to run a C executable (like, setting up stdin, stdout and stderr, the vectors used by atexit()... for C++ it also includes initializing of global objects, i.e. running their constructors). Only then does control jump to main().

As the quote at the beginning of my answer expresses so eloquently, what #pragma does is completely up to your compiler. Check its documentation. (I'd guess your pragma startup - which should be prepended with a # by the way - tells the runtime to call fun() first...)

Why is main necessary in a program

Why? Because the standard says so (mostly).

The main function is required for hosted C environments (freestanding environments are allowed to start up any way they like).

If you're developing a library, you don't need a main for the library itself but you won't be able to turn it into an executable without one (other than by using non-portable trickery). And, at a bare minimum, you should have one for the test suite anyway.

In other words, your library should have a large test suite which is controlled from a main function (most likely in a separate source file or files) so that you can test any new work and regression-test to ensure it hasn't stuffed up the old work.

Which mechanism knows the entry point of a program is main()

The application does not know that main() is the entry point. Firstly, we assume C not C++ here despite your picture.

For C the "C" entry point is main(). But you cant just start execution there as we have assumptions, more than that, rules, in C that for example .data needs to be initialized and .bss zeroed.

unsigned x = 1;
unsigned int y;

We expect that when main() is hit that x=1. and most folks assume and perhaps it is specified that y = 0 at that time, I wouldn't make that assumption, but anyway.

We also need a stack pointer and need to deal with argc/argv. If C++ then other stuff has to be done. Even for C depending.

The APPLICATION does not generally know any of this. You are likely working with a C library and that library is/should be responsible for bootstrap code that preceeds main() as well as a linker script for the linker as bootstrap and linker script are intimately related. And one could argue based on some implementations that the C library is separable from the toolchain as we know with gnu you can choose from different ones and those have different bootstraps and linker scripts. But I am sure there are many that are intimately related, there is also a relationship of the library and the operating system as so many C library calls end up in one or countless system calls.

You design an operating system, part of the design of the operating system assuming it supports runtime loadable applications is a file format that the operating systems loader supports, features that the operating system loader wants to support and how they overlap with the file format, not uncommon for the OS to define the file format, but with elf and others (not accidentally/independently created no doubt) you have opportunities for a new OS to use an existing container like elf. The OS design and its loader determines a lot of things, and the C library that mates up with all of that has to follow all of those rules, if integrated into the compiler then the compiler has to play along as well.

It is not the application that knows it is part of the system design and the application is simply a slave to all of that, when you compile on that platform for that platform all of these rules and relationships are in play, you are putting in a very small part of the puzzle, the rest is already in place, what file formats are supported, per format what information is required, what rules are required that the compiler/library solution must provide. The system design dictates if .data and .bss are zeroed by the loader or by the application and what I mean by that is by the bootstrap not the user's portion of the program, you cant bootstrap C in C because that C would need a bootstrap and if that bootstrap were in C that C would need a bootstrap and so on.

int main ( void )
{
return 0;
}

there are a lot of things going on in the background when you compile that program not just the few instructions that might be needed to implement that code.

compile that program on windows and Linux and mac and different versions of each with different compilers for each or C libraries, and different versions of each, etc. And what you should expect to see is perhaps even if the same target ISA, same computer even, some percentage of the combinations MIGHT choose the same few instructions for the function, what is wrapped around it is expected to be maybe similar but not the same. Would be no reason to be surprised if some of the implementations are very different from each other.

And this is all for full blown operating systems that load programs into ram and run them, for embedded things don't be surprised if the differences are even bigger. Within a full blown os you would expect to see an mmu and the application gets a perhaps zero based address space for .text, .data, .bss at a minimum so all the solutions might have a favorite place or favorite number of sections in the same order in the binary but the size of each may be specific to the implementation. The order/size might vary by C library version or compiler version, etc.

The magic is in the system design. and that is not magic, that is design. main() cannot be entered directly and still have various parts of the language still work like .data and .bss init, stack pointer can be solved before the entry but how and where .data and .bss are is application specific so cant be handled by a simple branch to main from the OS.

The linker for your toolchain can be told in various ways where the entry point is it could be assumed/dictated for that tool/target or a command line option or a linker script option, or some special symbol you put on a label or whatever the designers choose. main is assumed to be the C entry point, although that doesn't actually mean it is there might be some C code that precedes it but in general there is some amount of asm (cant bootstrap C with C) and then one or more steps to main().

What happens before main in C++?

A lot depends on the execution environment. A great deal of work may be done by the operating system loader before the C run-time start up that is specifically part of your executable runs. This operating system dependent part of setting up the execution environment is common to all native (machine language) executables, regardless of source implementation language.

What part is played by the OS and what is performed by code that is part of your executable differs depending on the execution environment. The OS loader (in a non-standalone system)is responsible for loading the code into memory, and may involve loading and linking dynamically-linked libraries (DLL or shared-libraries depending on the OS nomenclature). Regardless of whether it is an OS or an C-runtime responsibility, the following normally occur:

  • Establishment of a stack
  • Zero initialisation of initialised static data
  • Initialisation of explicitly initialised static data
  • C library initialisation (typically stdio and heap-management require some initialisation)
  • For C++ call static constructors.
  • Creation of the stack frame for main() (argv, argc parameters)

In GCC and some other compilers for example, the part that is performed by your program rather then the OS prior to your program starting, is performed by a separately linked module called crt0.o. This is normally written in assembler and is automatically linked by default.

For further examples and discussion see:

  • Linux x86 Program Start Up
  • Typical stand-alone embedded system start-up

How does the main() method work in C?

Some of the features of the C language started out as hacks which just happened to work.

Multiple signatures for main, as well as variable-length argument lists, is one of those features.

Programmers noticed that they can pass extra arguments to a function, and nothing bad happens with their given compiler.

This is the case if the calling conventions are such that:

  1. The calling function cleans up the arguments.
  2. The leftmost arguments are closer to the top of the stack, or to the base of the stack frame, so that spurious arguments do not invalidate the addressing.

One set of calling conventions which obeys these rules is stack-based parameter passing whereby the caller pops the arguments, and they are pushed right to left:

 ;; pseudo-assembly-language
;; main(argc, argv, envp); call

push envp ;; rightmost argument
push argv ;;
push argc ;; leftmost argument ends up on top of stack

call main

pop ;; caller cleans up
pop
pop

In compilers where this type of calling convention is the case, nothing special need to be done to support the two kinds of main, or even additional kinds. main can be a function of no arguments, in which case it is oblivious to the items that were pushed onto the stack. If it's a function of two arguments, then it finds argc and argv as the two topmost stack items. If it's a platform-specific three-argument variant with an environment pointer (a common extension), that will work too: it will find that third argument as the third element from the top of the stack.

And so a fixed call works for all cases, allowing a single, fixed start-up module to be linked to the program. That module could be written in C, as a function resembling this:

/* I'm adding envp to show that even a popular platform-specific variant
can be handled. */
extern int main(int argc, char **argv, char **envp);

void __start(void)
{
/* This is the real startup function for the executable.
It performs a bunch of library initialization. */

/* ... */

/* And then: */
exit(main(argc_from_somewhere, argv_from_somewhere, envp_from_somewhere));
}

In other words, this start module just calls a three-argument main, always. If main takes no arguments, or only int, char **, it happens to work fine, as well as if it takes no arguments, due to the calling conventions.

If you were to do this kind of thing in your program, it would be nonportable and considered undefined behavior by ISO C: declaring and calling a function in one manner, and defining it in another. But a compiler's startup trick does not have to be portable; it is not guided by the rules for portable programs.

But suppose that the calling conventions are such that it cannot work this way. In that case, the compiler has to treat main specially. When it notices that it's compiling the main function, it can generate code which is compatible with, say, a three argument call.

That is to say, you write this:

int main(void)
{
/* ... */
}

But when the compiler sees it, it essentially performs a code transformation so that the function which it compiles looks more like this:

int main(int __argc_ignore, char **__argv_ignore, char **__envp_ignore)
{
/* ... */
}

except that the names __argc_ignore don't literally exist. No such names are introduced into your scope, and there won't be any warning about unused arguments.
The code transformation causes the compiler to emit code with the correct linkage which knows that it has to clean up three arguments.

Another implementation strategy is for the compiler or perhaps linker to custom-generate the __start function (or whatever it is called), or at least select one from several pre-compiled alternatives. Information could be stored in the object file about which of the supported forms of main is being used. The linker can look at this info, and select the correct version of the start-up module which contains a call to main which is compatible with the program's definition. C implementations usually have only a small number of supported forms of main so this approach is feasible.

Compilers for the C99 language always have to treat main specially, to some extent, to support the hack that if the function terminates without a return statement, the behavior is as if return 0 were executed. This, again, can be treated by a code transformation. The compiler notices that a function called main is being compiled. Then it checks whether the end of the body is potentially reachable. If so, it inserts a return 0;



Related Topics



Leave a reply



Submit