Get Human Readable Ast from C++ Code

get human readable AST from c++ code

clang still has that functionality:

The commands are -ast-dump and -ast-dump-xml

Note: -ast-dump-xml will only work when you build clang in debug mode.

http://clang.llvm.org/docs/IntroductionToTheClangAST.html

For example:

## cat test.cpp 
int main()
{
return 0;
}

##clang++ -cc1 -ast-dump-xml test.cpp
<TranslationUnit ptr="0x4e42660">
<Typedef ptr="0x4e42bd0" name="__builtin_va_list" typeptr="0x0">
<PointerType ptr="0x4e42b90" canonical="0x4e42b90">
<BuiltinType ptr="0x4e426f0" canonical="0x4e426f0"/>
</PointerType>
</Typedef>
<Function ptr="0x4e42c70" name="main" returnzero="true" prototype="true">
<FunctionProtoType ptr="0x4e42c20" canonical="0x4e42c20">
<BuiltinType ptr="0x4e42750" canonical="0x4e42750"/>
<parameters/>
</FunctionProtoType>
<Stmt>
CompoundStmt 0x4e42d78 <test.cpp:2:1, line:4:1>
`-ReturnStmt 0x4e42d58 <line:3:1, col:8>
`-IntegerLiteral 0x4e42d38 <col:8> 'int' 0

</Stmt>
</Function>
</TranslationUnit>

Build AST from C code

First, it is a difficult task, because the abstract syntax tree of C is much more complex than what you believe it is. Read the C11 standard n1570 for details, and see this website. Look also into tinyCC or nwcc (at least for inspiration).

Then if you are using a recent GCC (e.g. 4.7 or 4.8), I strongly suggest customizing GCC e.g. with a MELT extension (or your GCC plugin).

I don't claim it is a simple task, because very probably you need to understand the details of GCC internal representations (at least GIMPLE)

BTW, MELT is (was) a domain specific language to extend GCC, and is designed exactly for the kind of tasks you are dreaming about. You would be able with MELT to transform the internal GCC representations (Gimple and Tree-s). Today in 2020, MELT is not worked upon because of lack of funding.

The advantage of working inside GCC (or inside some other compiler like Clang/LLVM) is that you don't have to spit back some C code (which is actually much more difficult than what you think); you just transform the internal compiler representation and, perhaps most importantly, you take advantage "gratis" of the many things a compiler always do: all kind of optimizations like constant folding, inlining, common-subexpression elimination, etc, etc, etc, ....

In 2020, you could also consider using the libgccjit framework inside recent GCC 10, and read this draft report (related to Bismon; but see also RefPerSys, sharing some ideas but no code with Bismon). Try perhaps also the Clang static analyzer and/or Frama-C.

Getting AST for C++?

You can use clang and especially libclang to parse C++ code. It's a very high quality, hand written library for lexing, parsing and compiling C++ code but it can also generate an AST.

Clang also supports C, Objective-C and Objective-C++. Clang itself is written in C++.

Can I get an XML AST of C/C++/Java code without compiling it?

For Java, see What would an AST (abstract syntax tree) for an object-oriented programming language look like?

For C, see get human readable AST from c++ code

Both of these are produced by one engine: our DMS Software Reengineering Toolkit. DMS also has a full C++11 parser that can produce similar XML. (EDIT Jan 2016: now full C++ 14 for GCC and Visual C++).

I don't think XML is really a good idea: it is enormous and klunky, and the analysis tools you can bring to bear on it are ... what? XSLT: That's not very useful for analyzing programs. Read the XML into a DOM and climb over that? You'll find that you are missing lots of useful support (symbol tables, etc.); AST's are just not enough. See my essay on Life After Parsing (check my bio or google).

You are better off using a set of integrated machinery that provides all kinds of consistent support for analyzing (multiple) programming languages (using the ASTs as a foundation). This is what DMS is designed to do.

How can I parse C++ to create an AST?

clang can do this:

clang -Xclang -ast-dump -fsyntax-only test.cc

also see the docs.

Can I get an XML AST dump of C/C++ code with clang without using the compiler?

For your information, the XML printer has been removed from the 2.9 version by Douglas Gregor (responsible of CLang FrontEnd).

The issue was that the XML printer was lacking. A number of the AST nodes had never been implemented in the printer, as well as a number of the properties of some nodes, which led to an inaccurate representation of the source code.

Another point raised by Douglas was that the output should be suitable not for debugging CLang itself (which is what the -emit-ast is about) but for consumption by external tools. This requires the output to be stable from one version to another. Notably it should not be a 1-on-1 mapping of CLang internal, but rather translate the source code into standarized language.

Unless there is significant work on the printer (which requires volunteers) it will not be integrated back...



Related Topics



Leave a reply



Submit