Why Does Int Main() {} Compile

Why does int main() {} compile?

3.6.1 Main function
....
2 An implementation shall not predefine the main function. This function shall not be overloaded. It shall have a return type of type int, but otherwise its type is implementation-defined. All implementations shall allow both of the following definitions of main:
int main() { /* ... */ }
and
int main(int argc, char* argv[]) {
/* ... */
}
.... and it continues to add ...
5 A return statement in main has the effect of leaving the main function (destroying any objects with automatic storage duration) and calling exit with the return value as the argument. If control reaches the end of main without encountering a return statement, the effect is that of executing return 0;

attempting to find an online copy of the C++ standard so I could quote this passage I found a blog post that quotes all the right bits better than I could.

Does int main() need a declaration on C++?

A definition of a function is also a declaration of a function.

The purpose of a declaring a function is to make it known to the compiler. Declaring a function without defining it allows a function to be used in places where it is inconvenient to define it. For example:

If a function is used in a source file (A) other than the one it is defined in (B), we need to declare it in A (usually via a header that A includes, such as B.h).
If two or more functions may call each other, then we cannot define all those functions before the others—one of them has to be first. So declarations can be provided first, with definitions coming afterward.
Many people prefer to put “higher level” routines earlier in a source file and subroutines later. Since those “higher level” routines call various subroutines, the subroutines must be declared earlier.

In C++, a user program never calls main, so it never needs a declaration before the definition. (Note that you could provide one if you wished. There is nothing special about a declaration of main in this regard.) In C, a program can call main. In that case, it does require that a declaration be visible before the call.

Note that main does need to be known to the code that calls it. This is special code in what is typically called the C++ runtime startup code. The linker includes that code for you automatically when you are linking a C++ program with the appropriate linker options. Whatever language that code is written in, it has whatever declaration of main it needs in order to call it properly.

Why does this program, which defines main as a function pointer, fail?

In the code that runs before main, there is something like:

extern "C" int main(int argc, char **argv);

The problem with your code is that if you have a function pointer called main, it is not a the same as a function (as opposed to Haskell where a function and a funciton pointer is pretty much interchangable - at least with my 0.1% knowledge of Haskell).

Whilst the compiler will happily accept:

int (*func)()  = ...;

int x = func();

as a valid call to the function pointer func. However, when the compiler generates code to call func, it actually does this in a different way [although the standard doesn't say how this should be done, and it varies on different processor architectures, in practice it loads the value in the pointer variable, and then calls this content].

When you have:

int func() { ... }

int x = func();

the call to func just refers to the address of func itself, and calls that.

So, assuming your code actually does compile, the startup code before main will call the address of your variable main rather than indirectly reading the value in main and then calling that. In modern systems, this will cause a segfault because main lives in the data segment which is not executable, but in older OS's it would most likely crash due to main does not contain real code (but it may execute a few instructions before it falls over in this case - in the dim and distant past, I've accidentally run all sorts of "rubbish" with rather difficult to discover causes...)

But since main is a "special" function, it's also possible that the compiler says "No, you can't do this".

It used to work, many years ago to do this:

char main[] = { 0xXX, 0xYY, 0xZZ ... };

but again, this doesn't work in a modern OS, because main ends up in the data section, and it's not executable in that section.

Edit: After actually testing the posted code, at least on my 64-bit Linux, the code actually compiles, but crashes, unsurprisingly, when it tries to execute main.

Running in GDB gives this:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000600950 in main ()
(gdb) bt
#0  0x0000000000600950 in main ()
(gdb) disass
Dump of assembler code for function main:
=> 0x0000000000600950 <+0>: and    %al,0x40(%rip)        # 0x600996
   0x0000000000600956 <+6>: add    %al,(%rax)
End of assembler dump.
(gdb) disass stuff
Dump of assembler code for function stuff():
   0x0000000000400520 <+0>: push   %rbp
   0x0000000000400521 <+1>: mov    %rsp,%rbp
   0x0000000000400524 <+4>: sub    $0x10,%rsp
   0x0000000000400528 <+8>: lea    0x400648,%rdi
   0x0000000000400530 <+16>:    callq  0x400410 <puts@plt>
   0x0000000000400535 <+21>:    mov    $0x0,%ecx
   0x000000000040053a <+26>:    mov    %eax,-0x4(%rbp)
   0x000000000040053d <+29>:    mov    %ecx,%eax
   0x000000000040053f <+31>:    add    $0x10,%rsp
   0x0000000000400543 <+35>:    pop    %rbp
   0x0000000000400544 <+36>:    retq   
End of assembler dump.
(gdb) x main
0x400520 <stuff()>: 0xe5894855
(gdb) p main
$1 = (int (*)(void)) 0x400520 <stuff()>
(gdb)

So, we can see that main is not really a function, it's a variable which contains a pointer to stuff. The startup code calls main as if it was a function, but it fails to execute the instructions there (because it's data, and data has the "no execute" bit set - not that you can see that here, but I know it works that way).

Edit2:

Inspecting dmesg shows:

a.out[7035]: segfault at 600950 ip 0000000000600950 sp 00007fff4e7cb928 error 15 in a.out[600000+1000]

In other words, the segmentation fault happens immediately with the execution of main - because it's not executable.

Edit3:

Ok, so it's slightly more convoluted than that (at least in my C runtime library), as the code that calls main is a function that takes the pointer to main as an argument, and calls it through a pointer. This however doesn't change the fact that when the compiler builds the code, it produces a level of indirection less than it needs, and tries to execute the variable called main rather than the function that the variable is pointing at.

Listing __libc_start_main in GDB:

87  STATIC int
88  LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
89           int argc, char *__unbounded *__unbounded ubp_av,
90  #ifdef LIBC_START_MAIN_AUXVEC_ARG
91           ElfW(auxv_t) *__unbounded auxvec,
92  #endif

At this point, printing main gives us a function pointer that points at 0x600950, which is the variable called main (same as what I dissassembled above)

(gdb) p main
$1 = (int (*)(int, char **, char **)) 0x600950 <main>

Note that this is a different variable main than the one called main in the source posted in the question.

What's wrong with int main()?

Because the definition

int main() { /* ... */ }

does not include a prototype; it doesn't specify the number or type(s) of the parameters.

This:

int main(void) { /* ... */ }

does include a prototype.

With the empty parentheses, you're saying that main takes a fixed but unspecified number and type(s) of arguments. With (void), you're explicitly saying that it takes no arguments.

With the former, a call like:

main(42);

will not necessarily be diagnosed.

This goes back to the pre-ANSI days before prototypes were introduced to the language, and most functions were defined with empty parentheses. Back then, it was perfectly legal to write:

int foo();

int foo(n)
int n;
{
    /* ... */
}

...

foo(42);

When prototypes were added to the language (borrowed from C++), it was necessary to keep the old meaning of empty parentheses; the "new" (this was 1989) syntax (void) was added so you could explicitly say that a function takes no arguments.

(C++ has different rules; it doesn't allow old-style non-prototyped functions, and empty parentheses mean that a function takes no arguments. C++ permits the (void) syntax for compatibility with C, but it's not generally recommended.)

Best practice is to use (void), because it's more explicit. It's not entirely clear that the int main() form is even valid, but I've never seen a compiler that doesn't accept it.

Why is main necessary in a program

Why? Because the standard says so (mostly).

The main function is required for hosted C environments (freestanding environments are allowed to start up any way they like).

If you're developing a library, you don't need a main for the library itself but you won't be able to turn it into an executable without one (other than by using non-portable trickery). And, at a bare minimum, you should have one for the test suite anyway.

In other words, your library should have a large test suite which is controlled from a main function (most likely in a separate source file or files) so that you can test any new work and regression-test to ensure it hasn't stuffed up the old work.

Why does declaring main as an array compile?

It's because C allows for "non-hosted" or freestanding environment which doesn't require the main function. This means that the name main is freed for other uses. This is why the language as such allows for such declarations. Most compilers are designed to support both (the difference is mostly how linking is done) and therefore they don't disallow constructs that would be illegal in hosted environment.

The section you refers to in the standard refers to hosted environment, the corresponding for freestanding is:

in a freestanding environment (in which C program execution may take place without any
benefit of an operating system), the name and type of the function called at program
startup are implementation-defined. Any library facilities available to a freestanding
program, other than the minimal set required by clause 4, are implementation-defined.

If you then link it as usual it will go bad since the linker normally has little knowledge about the nature of the symbols (what type it has or even if it's a function or variable). In this case the linker will happily resolve calls to main to the variable named main. If the symbol is not found it will result in link error.

If you're linking it as usual you're basically trying to use the compiler in hosted operation and then not defining main as you're supposed to means undefined behavior as per appendix J.2:

the behavior is undefined in the following circumstances:

...

program in a hosted environment does not define a function named
main
using one
of the specified forms (5.1.2.2.1)

The purpose of the freestanding possibility is to be able to use C in environments where (for example) standard libraries or CRT initialization is not given. This means that the code that is run before main is called (that's the CRT initialization that initializes the C runtime) might not provided and you would be expected to provide that yourself (and you may decide to have a main or may decide not to).

What should main() return in C and C++?

The return value for main indicates how the program exited. Normal exit is represented by a 0 return value from main. Abnormal exit is signaled by a non-zero return, but there is no standard for how non-zero codes are interpreted. As noted by others, void main() is prohibited by the C++ standard and should not be used. The valid C++ main signatures are:

int main()

and

int main(int argc, char* argv[])

which is equivalent to

int main(int argc, char** argv)

It is also worth noting that in C++, int main() can be left without a return-statement, at which point it defaults to returning 0. This is also true with a C99 program. Whether return 0; should be omitted or not is open to debate. The range of valid C program main signatures is much greater.

Efficiency is not an issue with the main function. It can only be entered and left once (marking the program's start and termination) according to the C++ standard. For C, re-entering main() is allowed, but should be avoided.

Why can C main function be coded with or without parameters?

Making it work has to do with the binary format of the executable and the OS's loader. The linker doesn't care (well it cares a little: it needs to mark the entry point) and the only caller routine is the loader.

The loader for any system must know how to bring supported binary format into memory and branch into the entry point. This varies slightly by system and binary format.

If you have a question about a particular OS/binary format, you may want to clarify.

use of std::less in std::map does not compile

In the first program, you have a vexing parse. If the compiler can parse a declaration as either a variable or a function, it will choose to parse it as a function.

myMap can be parsed as a function declaration.

It returns a std::map<int, int, std::less<int>>.

It takes an argument of type std::less<int>(), which is itself a function type that returns a std::less<int> and takes no arguments. Note that you can't actually have a function type as an argument; the type is actually a pointer to a function that takes no arguments and returns a std::less<int>.

In the second program, replacing () with {} resolves the ambiguity. Now myMap can no longer be a function declaration, and so it instead declares a variable of type std::map<int, int, std::less<int>>.

Why Does Int Main() {} Compile