Create Position Independent Object File from Llvm Bit Code

create position independent object file from LLVM bit code

You need to setup relocation model. Something like -llc -relocation-model=pic. Do not use PIE, because it's for executables, not for libraries. Also, -cppgen does not make any sense here, it's for cpp backend only.

LLVM: Implement linking of the object code

As far as the "no clang necessary": LLVM has a linker called LLD that is part of the LLVM project. Depending on how you installed LLVM it should be part of the distro.

Refer to your installed version for LLD as well as usage strategies. You will be able to then define your make or cmake recipes.

With reference to your core question, here is the general make flow I go through with my own language:

Compile source -> output.ll (LLVM assembly)
Optimize assembly -> output.oll (using opt)
Generate target assembly -> output.s
Assemble to object (using as) -> output.o
Link (I am using clang but this could be swapped with lld)

Get entry point of llvm::module

There is no "entry point" in a LLVM Module. Entry point is a feature of an application and could be very different depending on e.g. source language, etc. Also, since LLVM Module represents somehow a translation unit there might be no "main function at all.

If you happen to know the name of the function you're looking for, then you can certainly use Module::getFunction call to perform name-based lookup.

What is the -fPIE option for position-independent executables in gcc and ld?

PIE is to support address space layout randomization (ASLR) in executable files.

Before the PIE mode was created, the program's executable could not be placed at a random address in memory, only position independent code (PIC) dynamic libraries could be relocated to a random offset. It works very much like what PIC does for dynamic libraries, the difference is that a Procedure Linkage Table (PLT) is not created, instead PC-relative relocation is used.

After enabling PIE support in gcc/linkers, the body of program is compiled and linked as position-independent code. A dynamic linker does full relocation processing on the program module, just like dynamic libraries. Any usage of global data is converted to access via the Global Offsets Table (GOT) and GOT relocations are added.

PIE is well described in this OpenBSD PIE presentation.

Changes to functions are shown in this slide (PIE vs PIC).

x86 pic vs pie

Local global variables and functions are optimized in pie

External global variables and functions are same as pic

and in this slide (PIE vs old-style linking)

x86 pie vs no-flags (fixed)

Local global variables and functions are similar to fixed

External global variables and functions are same as pic

Note, that PIE may be incompatible with -static

llc: unsupported relocation on symbol

First of all, do not use neither llc nor opt. These are developer-side tools that should be never used in any production environment. Instead of this, implement your own proper optimization and codegeneration runtime via LLVM libraries.

As for this particular bug - the thumb codegenerator might contain some bugs. Please reduce the problem and report it. Or don't use Thumb mode at all :)

Using LLVM bytecode for libraries (instead of native object files)

I have done something similar to this in the past. One thing that you should realize is that LLVM bitcode is not "portable" in that it is not completely machine independent. Bitcode files have knowledge of things like the size of pointers, etc. that are specific to the processor being targeted.

Having said that, in the past I have compiled programs and their support libraries to bitcode and linked the bitcode files together before generating an assembly file for the whole program. You're right that calling conventions aren't important for calls that are internal but calls made outside (or from outside) still require that the ABI is followed.

You may be able to design your toy language in such a way that you can avoid processor dependent bit code, but you'll have to be very careful.

I noticed that linking the bitcode files together took quite a while, especially at high optimization levels. That may have speeded up by now, I did it with LLVM from 2 or 3 years ago.

One final point: depending on the target processor you'll probably need the equivalent of libgcc.a or compiler-rt to handle things that the processor can't like floating point or 64 bit integer stuff if the processor doesn't have instructions that perform those operations.

How to generate llvm bitcode for large programs with many source code files and a huge Makefile (e.g. memcached)?

Depending on what you're pass is doing you can:

Build with LTO: adding -flto to the CFLAGS and building your application with your own built linker plugin is quite seamless from a build system point of view. However it requires some understand about how to setup LTO.
Build with your own built clang: adding statically your pass to the LLVM pipeline and use your own built clang. Depending on the build system, exporting CC/CXX environment variable pointing to your installed clang should be enough.
Build by loading your pass dynamically into clang, for example this is what Polly is (optionally) doing.

Why does GCC create a shared object instead of an executable binary according to file?

What am I doing wrong?

Nothing.

It sounds like your GCC is configured to build -pie binaries by default. These binaries really are shared libraries (of type ET_DYN), except they run just like a normal executable would.

So your should just run your binary, and (if it works) not worry about it.

Or you could link your binary with gcc -no-pie ... and that should produce a non-PIE executable of type ET_EXEC, for which file will say ELF 64-bit LSB executable.

Create Position Independent Object File from Llvm Bit Code