Loading Executable or Executing a Library

How am I able to run a shared library as an executable from the terminal?

Okay I think I understand.

So basically a shared library is in fact an executable. And because musl is a libc implementation it defines the _start() function which is the real entry-point of the program. The _start() function would then calls the main function.

The developers of musl made it so that if you invoke their libc.so as ld or ldd they would detect that and act accordingly.

They can detect that because _start() does take argc and argv (which would then be passed on to main()) so they can see if argv[0] is "ld" or "ldd".

And thanks to @that other guy and @David C. Rankin for linking this. The answer there says that you can even have a shared library that defines main().

So I tried that myself.

Here's _start.c

void
_start()
{
    asm("mov $60,%rax; mov $0,%rdi; syscall");
}

I compiled that on an x86_64 ubuntu linux machine with gcc 7.4.0 like so:

$ gcc -shared -nostdlib _start.c -o libwow.so

And then I called it:

$ ./libwow.so
$

It didn't do anything of course, but it did run.

It's a crazy world we live in :D

EDIT:

On a crazier note. One can load executables as dynamic libraries using dlopen(3). Look at this answer to learn more.

Conclusion:

Shared libraries and Executables are pretty much the same thing (ELF binaries).

Except that shared libs have no fixed entry-point address while executables do.

Also shared libs are PIE while binaries are not by default.

And I guess there are a few other minor differences :p

There are executables (traitors :p) that live among us that were really shared libs all along and we didn't know like gawk and ntfsck.

Look at this Question/Answers for more information.

Loading an executable into current process's memory, then executing it

You code is a good start, but you are missing a few things.

First is, as you mentioned, resolving imports. What you say looks right, but I've never done this manually like you so I don't know the details. It would be possible for a program to work without resolving imports, but only if you don't use any imported function. Here your code fails because it tries to access an import that hasn't been resolved ; the function pointer contains 0x4242 instead of the resolved address.

The second thing is relocation. To make it simple, PE executable are position independent (can work at any base address), even if the code isn't. To make this work, the file contains a relocation table that is used to adjust all the data that are dependent on the image location. This point is optional if you can load at the preferred address (pINH->OptionalHeader.ImageBase), but it means that if you use the relocation table, you can load your image anywhere, and you can omit the first parameter of VirtualAlloc (and remove the related checks).

You can find more info on import resolving and relocation in this article, if you didn't find it already. There is plenty of other resource you can find.

Also, as mentioned in marom's answer, your program is basically what LoadLibrary do, so in a more practical context, you would use this function instead.

Using dlopen() on an executable

You can't open executables as libraries. The entry point of an executable will attempt to re-initialize the C library, and take over the brk pointer. This will corrupt your malloc heap. Additionally, the executable is likely to be mapped at a fixed address with no relocations, and if this address overlaps with anything already loaded, it's not possible to map it for that reason as well.

You need to refactor the other program into a library, or add a RPC interface to the other program.

Note that this does not necessarily apply for PIE executables. However, unless the executable is specifically designed for being dlopen()ed, this is unsafe, as main() will not be run, and any initialization done in main() therefore will not occur.

Loading time for shared libraries vs static libraries

Linking (resolving references) is not free. With static linking, the resolution is done once and for all when the binary is generated. With dynamic linking, it has to be done every time the binary is loaded. Not to mention that code compiled to run in a shared library can be less efficient than code compiled to be linked statically. The exact cost depends on the architecture and on the system's implementation of dynamic linking.

The cost of making a library dynamic can be relatively high for the 32-bit x86 instruction set: in the ELF binary format, one of the already scarce registers has to be sacrificed to make dynamically linked code relocatable. The older a.out format placed each shared library at a fixed place, but that didn't scale. I believe that Mac OS X has had an intermediate system when dynamic libraries where placed in pre-determined locations in the address space, but the conflicts were resolved at the scale of the individual computer (the lengthy "Optimizing system performance" phase after installing new software). In a way, this system (called pre-binding) allows you to have your cake and eat it too.
I do not know if prebinding is still necessary now that Apple pretty much switched to the amd64 architecture.

Also, on a modern OS both statically and dynamically linked code is only loaded (paged in) from disk if it is used, but this is quite orthogonal to your question.