Llvm Ir Back to Human-Readable Source Language

llvm ir back to human-readable source language?

There is an issue here... it might not be possible to easily represent the IR back into the language.

I mean, you'll probably be able to get some representation, but it might be less readable.

The issue is that the IR is not concerned with high-level semantic, and without it...

I'd rather advise you to learn to read the IR. I can read a bit of it without that much effort, and I am far from being a llvm expert.

Otherwise, you can C code from the IR. It won't be much more similar to your C++ code, but you'll perhaps feel better without ssa and phi nodes.

Converting LLVM-IR into a C like language

The C backend was dropped in release 3.1 because it was not maintained and started developing code rot, becoming a burden. Since no maintainer stepped up, it was removed from the tree. From the release notes of 3.1:

The C backend has been removed. It had numerous problems, to the point
of not being able to compile any nontrivial program.

In August 2012 a thread on llvmdev discussed reviving the C backend, but I don't think it ended up anywhere useful.

You can still download LLVM version 3.0 (from the releases page), build it and see the C backend in action, study its code, etc. For your specific purpose - looking at the code and figuring out how it works, the 3.0 C backend should be good enough.

Is it possible to recompile LLVM IR into another triplet and data layout?

No, the LLVM toolchain has no tool to transform between incompatible triples.

The best third party option I am aware of would be to lift the IR to source code and recompile.

Compiler output language - LLVM IR vs C

I've used LLVM IR for a few compiler back ends and have worked with compilers that use C as a back end. One thing that I found that gave the LLVM IR an advantage is that it is typed. It is hard to make completely ill-formed output without getting errors from the LLVM libraries.

It is also easier to keep a close correlation between the source code and the IR for debugging, in my opinion.

Plus, you get all the cool LLVM command line tools to analyse and process the IR your front end emits.

Parsing and Modifying LLVM IR code

First, to fix an obvious misunderstanding: LLVM is a framework for manipulating code in IR format. There are no ASTs in sight (*) - you read IR, transform/manipulate/analyze it, and you write IR back.

Reading IR is really simple:

int main(int argc, char** argv)
{
if (argc < 2) {
errs() << "Expected an argument - IR file name\n";
exit(1);
}

LLVMContext &Context = getGlobalContext();
SMDiagnostic Err;
Module *Mod = ParseIRFile(argv[1], Err, Context);

if (!Mod) {
Err.print(argv[0], errs());
return 1;
}

[...]
}

This code accepts a file name. This should be an LLVM IR file (textual). It then goes on to parse it into a Module, which represents a module of IR in LLVM's internal in-memory format. This can then be manipulated with the various passes LLVM has or you add on your own. Take a look at some examples in the LLVM code base (such as lib/Transforms/Hello/Hello.cpp) and read this - http://llvm.org/docs/WritingAnLLVMPass.html.

Spitting IR back into a file is even easier. The Module class just writes itself to a stream:

 some_stream << *Mod;

That's it.

Now, if you have any specific questions about specific modifications you want to do to IR code, you should really ask something more focused. I hope this answer shows you how to parse IR and write it back.


(*) IR doesn't have an AST representation inside LLVM, because it's a simple assembly-like language. If you go one step up, to C or C++, you can use Clang to parse that into ASTs, and then do manipulations at the AST level. Clang then knows how to produce LLVM IR from its AST. However, you do have to start with C/C++ here, and not LLVM IR. If LLVM IR is all you care about, forget about ASTs.

How to convert LLVM IR br back to a while loop

During the translation from C to LLVM IR, instructions that are deemed necessary can be decorated with Metadata, this metadata can then be used to convert LLVM IR to JavaScript, e.g indicating if the circular branching between basic blocks is a while loop or not (This information is present in the C context). See Intrinsics & Metadata Attributes.

For more information regarding LLVM Metadata see LLVM-Metadata.



Related Topics



Leave a reply



Submit