Llvm Bitcode Cross-Platform

LLVM bitcode cross-platform

LLVM IR can be cross-platform, with the obvious exceptions others have listed. However, that does not mean Clang generates cross-platform code. As you note, the preprocessor is almost universally used to only pass parts of the code to the C/C++ compiler, depending on the platform. Even when this is not done in user code, many system headers include a bit or two that's platform-specific, such as typedefs. For example, if you compile C code using size_t to LLVM IR on a platform where size_t is 32 bit, the LLVM IR now uses i32 for that, and there's no way in hell you can reverse engineer that to fix it.

Google's Portable Native Client project (thanks @willglynn for the link), if I understand it correctly, achieves portability by fixing the ABI for all target platforms. So in that sense, it doesn't solve the aforementioned issues: The LLVM IR is not portable to platform with a different ABI. The only reason this is more portable is that the clients provide a layer which matches the PNaCl ABI to the actual ABI. In other words, PNaCl code isn't portable to many platforms, the "PNaCl VM" is.

So, bottom line: If you're very careful, you can use LLVM IR across multiple platforms, but not without doing significant additional work (which Clang doesn't do) to abstract over the ABI differences.

can LLVM IR (Intermediate Representation) be used to create cross-platform (iphone and Android) ARM executables?

The compiler is not the problem. To develop for both you need to create an abstraction layer that allows you to write a single application on that layer. Then have two implementations of the abstraction layer, one that makes Android api calls and one that makes iPhone api calls. There is nothing the compiler can do to help you.

Where LLVM IR might be interesting in its portability is for programs like:


int a,b;

a=7;
b=a-4;

Compile to IR then take the same IR and generate assembler for all the different processor types and examine the differences.

In the case of real applications that for example need to write a pixel on a display, the registers, sizes of the display and a whole host of other differences exist, and those differences are not exposed between the IR and the assembler backend but are exposed in the main C program and the api calls defined by the platforms library, so you have to solve the problem in C not IR or assembler.

Is it possible to target the iPhone with compatible LLVM IR/bitcode from a non-Apple operating system?

Looks like Java-to-IOS is being taken care of by the RoboVM project:

Java to Native

The RoboVM compiler translates Java bytecode into native ARM or x86
code. Apps run directly on the CPU. No interpreter or virtual machine
involved.

It makes use of LLVM, as my question suggested.

Also of note is the Avian JVM project. It to can be used to compile to native and IOS binaries (by bundling the JVM), however, I'm uncertain as to the status or completeness of its user interface (UI) layer(s).

Both project appear to be in current and constant development.

LLVM what is it and how can i use it to cross platform compilations

The key concept of LLVM is a low-level "intermediate" representation (IR) of your program.
This IR is at about the level of assembler code, but it contains more information to facilitate optimization.

The power of LLVM comes from its ability to defer compilation of this intermediate representation to a specific target machine until just before the code needs to run. A just-in-time (JIT) compilation approach can be used for an application to produce the code it needs just before it needs it.

In many cases, you have more information at the time the program is running that you do back at head office, so the program can be much optimized.

To get started, you could compile a C++ program to a single intermediate representation, then compile it to multiple platforms from that IR.

You can also try the Kaleidoscope demo, which walks you through creating a new language without having to actually write a compiler, just write the IR.

In performance-critical applications, the application can essentially write its own code that it needs to run, just before it needs to run it.

What platform can I compile binaries for, using LLVM (Low Level Virtual Machine)?

I'm only answering the edit's question here (it would probably be more appropriate to make a new question).

This is a good architectural overview of LLVM. This page also contains a ton of documentations on all aspects of LLVM.

The short version is that LLVM is the optimizer and backend of a traditionnal compiler. It operates on a bytecode which is essentially it's intermediate representation of the code and is used to optimize and generate the final binary. The LLVM frontends are independent and uses there own internal ASTs to eventually generate bytecode.

LLVM is actually pretty flexible when it comes to when you want to generate the final binary. You can either do it right away or delay it until the program is being installed. I believe you can even use its JIT to generate the final binary during execution (not 100% sure of this). The main advantage of delaying like this is that it can apply optimizations that are specific to the environment it is executing on.

When writing code compiled by LLVM backend, does architecture matter?

TL;DR

From my understanding you can compile to any target LLVM supports (there may still be a few caveats here with frontends using inline assembler or module level inline assembly), however, you are not guaranteed it will actually execute correctly. The frontend is responsible for doing the work to be portable across the platforms the author supports.

Note also that as a frontend developer you are responsible for providing the data layout and target triple.

Your Questions:

Let's say I'm writing Rust (which uses LLVM as a backend). Am I
automatically capable of compiling my Rust code to every architecture
that LLVM can target (assuming there's an OS on that machine that can
run it)?

This is dependent on the authors of the Rust frontend.

Or could it be that the Rust standard library hasn't been made "ARM
compatible" yet, so I couldn't compile to ARM even if the LLVM targets
it?

I'm pretty sure LLVM would be able to emit the instructions, but it may not be correct in terms of addressing.

I have not used the inline assembler facilities mentioned above myself, but I assume if it allows platform specific assembly then this would break platform agnostic compilation as well.

What if I don't use any of the standard library, my entire program is
just a program that returns right away? Could it be the case that even
without any libraries, Rust (or what have you) can't compile to ARM
(or what have you) even if the LLVM targets it?

This again depends on what the Rust frontend emits. There may be some boilerplate setup logic it emits even before it emits instructions for your logic.

I'm writing my own language in LLVM that does this in the case of a special function called "main". I am targeting the C ABI so it will wrap this main with a proper C style main and invoke it with a stricter set of parameters.

If all the above examples compile just fine, what do I have to do to
get my code to break on one architecture not compile to a certain
architecture?

Consider C/C++ with Clang as mentioned in the llvm FAQ. Clang is a frontend, probably the most popular, for LLVM and the users writing C/C++ are responsible for #include-ing the appropriate platform specific functionality.

Some languages may be designed more platform independent and the frontend could then handle the work for you.

Let's say the standard library makes use of OS system calls (which is
surely does). Do you have to care about architecture when making
system calls? Or does the OS (Linux, for example) abstract away
architecture as well?

I'm assuming you are talking about the case where the frontend targets the C standard library in which case LLVM has standard C library intrinsics which could be used by the frontend. This is not the only way, however, as you can use the call instruction to invoke C functions directly if targeting the C ABI as in the Kaleidoscope example.

In the end the standard library can be a portability issue and must be addressed by the frontend developers.

Locating crti on cross-platform basis

It's not a job of llc to provide these details for you (and after all, llc is a just a developer-side tools, it's not intended to be used in the final product).

Neither GCC nor clang "emit object files with crt embedded inside them". Instead, they contain platform-specific driver logic to figure out library search paths, linker cmdline, etc. Obviously, no "cross-platform CRT" is possible since the necessary platform details need to be handled somehow.

The easiest way in your case is to use clang to perform the linking step since all the necessary platform details are already handled. Oh, and you will not need llc at all, since clang will happily produce object code for your LLVM IR.

Llvm Bitcode Cross-Platform