Visualizing an Ast Created with Antlr (In a .Net Environment)

Visualizing an AST created with ANTLR (in a .Net environment)

Correct, the interpreter only shows what rules are used in the parsing process, and ignores any AST rewrite rules.

What you can do is use StringTemplate to create a Graphviz DOT-file. After creating such a DOT-file, you use some 3rd party viewer to display this tree (graph).

Here's a quick demo in Java (I know little C#, sorry).

Take the following (overly simplistic) expression grammar that produces an AST:

grammar ASTDemo;

options {
output=AST;
}

tokens {
ROOT;
EXPRESSION;
}

parse
: (expression ';')+ -> ^(ROOT expression+) // omit the semi-colon
;

expression
: addExp -> ^(EXPRESSION addExp)
;

addExp
: multExp
( '+'^ multExp
| '-'^ multExp
)*
;

multExp
: powerExp
( '*'^ powerExp
| '/'^ powerExp
)*
;

powerExp
: atom ('^'^ atom)*
;

atom
: Number
| '(' expression ')' -> expression // omit the parenthesis
;

Number
: Digit+ ('.' Digit+)?
;

fragment
Digit
: '0'..'9'
;

Space
: (' ' | '\t' | '\r' | '\n') {skip();}
;

First let ANTLR generate lexer and parser files from it:

java -cp antlr-3.2.jar org.antlr.Tool ASTDemo.g 

then create a little test harness that parses the expressions "12 * (5 - 6); 2^3^(4 + 1);" and will output a DOT-file:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class MainASTDemo {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("12 * (5 - 6); 2^3^(4 + 1);");
ASTDemoLexer lexer = new ASTDemoLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
ASTDemoParser parser = new ASTDemoParser(tokens);
ASTDemoParser.parse_return returnValue = parser.parse();
CommonTree tree = (CommonTree)returnValue.getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}

Compile all .java files:

// *nix & MacOS
javac -cp .:antlr-3.2.jar *.java

// Windows
javac -cp .;antlr-3.2.jar *.java

and then run the main class and pipe its output to a file named ast-tree.dot:

// *nix & MacOS
java -cp .:antlr-3.2.jar MainASTDemo > ast-tree.dot

// Windows
java -cp .;antlr-3.2.jar MainASTDemo > ast-tree.dot

The file ast-tree.dot now contains:

digraph {

ordering=out;
ranksep=.4;
bgcolor="lightgrey"; node [shape=box, fixedsize=false, fontsize=12, fontname="Helvetica-bold", fontcolor="blue"
width=.25, height=.25, color="black", fillcolor="white", style="filled, solid, bold"];
edge [arrowsize=.5, color="black", style="bold"]

n0 [label="ROOT"];
n1 [label="EXPRESSION"];
n1 [label="EXPRESSION"];
n2 [label="*"];
n2 [label="*"];
n3 [label="12"];
n4 [label="EXPRESSION"];
n4 [label="EXPRESSION"];
n5 [label="-"];
n5 [label="-"];
n6 [label="5"];
n7 [label="6"];
n8 [label="EXPRESSION"];
n8 [label="EXPRESSION"];
n9 [label="^"];
n9 [label="^"];
n10 [label="^"];
n10 [label="^"];
n11 [label="2"];
n12 [label="3"];
n13 [label="EXPRESSION"];
n13 [label="EXPRESSION"];
n14 [label="+"];
n14 [label="+"];
n15 [label="4"];
n16 [label="1"];

n0 -> n1 // "ROOT" -> "EXPRESSION"
n1 -> n2 // "EXPRESSION" -> "*"
n2 -> n3 // "*" -> "12"
n2 -> n4 // "*" -> "EXPRESSION"
n4 -> n5 // "EXPRESSION" -> "-"
n5 -> n6 // "-" -> "5"
n5 -> n7 // "-" -> "6"
n0 -> n8 // "ROOT" -> "EXPRESSION"
n8 -> n9 // "EXPRESSION" -> "^"
n9 -> n10 // "^" -> "^"
n10 -> n11 // "^" -> "2"
n10 -> n12 // "^" -> "3"
n9 -> n13 // "^" -> "EXPRESSION"
n13 -> n14 // "EXPRESSION" -> "+"
n14 -> n15 // "+" -> "4"
n14 -> n16 // "+" -> "1"

}

which can be viewed with one of the viewers here. There are even online viewers. Take this one for example: https://dreampuf.github.io/GraphvizOnline/

When feeding it the contents of ast-tree.dot, the following image is produced:

ast tree

Java Tree parser output for ANTLR

I've found a sample template in the ANTLR website, its the Javatreeparser.g, which the site says could produce the AST that I need,

No, the combined grammar Java.g from the ANTLR wiki produces a lexer and parser for Java source files. The parser then constructs an AST of this source and this AST can then be used by JavaTreeParser.g to traverse it. The tree grammar JavaTreeParser.g is not used to create an AST. This is done by the parser created from Java.g.

What I've done so far is placing the grammar file together with my existing java grammar.

That is incorrect. The tree grammar JavaTreeParser.g expects an AST as input that the parser generated from Java.g produced. You can't just plug in another parser (or other tree grammar, for that matter).

But I have no idea on how to use and output the AST that I need from the file. How do I do it?

See this previous Q&A: Visualizing an AST created with ANTLR (in a .Net environment)

EDIT

I didn't want to post this immediately, because I wanted you to give it a try yourself first (yes, I'm mean!) ;)

Here's a quick demo:

  1. copy the Java.g in a directory and remove the @header{...} and @lexer:::header{...} declarations from it;
  2. copy antlr-3.3.jar into the same directory;
  3. create the files Main.java and Test.java in this directory (see below).

Test.java

public class Test {

int i = 1 + 2;
String s;

Test(String s) {
this.s = s;
}
}

Main.java

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
public static void main(String[] args) throws Exception {
JavaLexer lexer = new JavaLexer(new ANTLRFileStream("Test.java"));
JavaParser parser = new JavaParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.javaSource().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}

Now generate a lexer and parser:

java -cp antlr-3.3.jar org.antlr.Tool Java.g 

Then compile all .java source files:

javac -cp antlr-3.3.jar *.java 

And finally run the Main class and pipe the output to a file called ast.dot.

java -cp .:antlr-3.3.jar Main > ast.dot

(on Windows, do: java -cp .;antlr-3.3.jar Main > ast.dot)

If you now open the file ast.dot, you see a DOT representation of the AST produced by the parser. You can visualize this AST by copy-pasting the DOT-source in here: http://graphviz-dev.appspot.com resulting in the following image:

enter image description here

Is there any tool to visualize parsed tree for jtb & javacc like antlrworks

Walk the tree. Spit out nodes and arcs in graphviz ("dot") format. Invoke graphviz on the result.

This won't be very useful for more than a hundred nodes, because it isn't very dense.

Remarkably, a more scalable way to is print a nested S-expression, in the following format:

<depth_from_root*spaces> ( <nodetype>  <newline>
<childnode1_as_S-expression>
<childnode2_as_S-expression>
...
<childnodeN_as_S-expression>
)<nodetype>

This in effect prints the tree sideways, e.g., with the root
at the left and children to the right. You can print out very
big trees this way, and still be able to read them (if you
can scroll up and down through the text).

As an example: for a*(b+c)-d:

   (-
(*
(+
(variable b)
(variable c)
)+
)*
(constant 1)
)-

This is also easily done with a tree walk. You can easily make the printed
version more dense, or add more information.

See examples of both of these, here.

Skip part of a tree when parsing an ANTLR AST

You may have figured this out already, but I've used . or .* in my tree grammars to skip either a given node or any number of nodes.

For example, I have a DSL that allows function declarations, and one of my tree grammars just cares about names and arguments, but not the contents (which could be arbitrarily long). I skip the processing of the code block using .* as a placeholder:

^(Function type_specifier? variable_name formal_parameters implemented_by? .*)

I don't know about the runtime performance hit, if any, but I'm not using this construct in any areas where performance is an issue for my application.

Whats the correct way to add new tokens (rewrite) to create AST nodes that are not on the input steam

First of all, you did a great job given the fact that you've never used ANTLR before.

You can omit the language=Java and ASTLabelType=CommonTree, which are the default values. So you can just do:

options {
output=AST;
}

Also, you don't have to specify the root node for each operator separately. So you don't have to do:

(ADD^ | SUB^)

but the following:

(ADD | SUB)^

will suffice. With only two operators, there's not much difference, but when implementing relational operators (>=, <=, > and <), the latter is a bit easier.

Now, for you AST: you'll probably want to create a binary tree: that way, all internal nodes are operators, and the leafs will be operands which makes the actual evaluating of your expressions much easier. To get a binary tree, you'll have to change your atom rule slightly:

atom
: INTEGER
| (
OPARN expression CPARN -> expression
)
(
OPARN e=expression CPARN -> ^(MUL $atom $e)
)*
;

which produces the following AST given the input "(2-3)(4+5)(6*7)":

enter image description here

(image produced by: graphviz-dev.appspot.com)

The DOT file was generated with the following test-class:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
public static void main(String[] args) throws Exception {
GLexer lexer = new GLexer(new ANTLRStringStream("(2-3)(4+5)(6*7)"));
GParser parser = new GParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.start().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}

ANTLR ignoring AST operators

ANTLRWorks' interpreter shows the parse tree, even though you've put output=AST; in the grammar's options-block, as you already found yourself.

However, ANTLRWorks' debugger does show the AST. To activate the debugger, press CTL+D or select Run->Debug from the menu bar.

The debugger is able to visualize the parse tree and AST. Note that it will not handle embedded code other than Java.

enter image description here

Antlr AST generating (possible) madness

I don't think there's an easy way to do that. You could make your rule like this:

function
: ID PERIOD function
-> ^(function ID)
| ID
;

but that only makes the last node the root and all other nodes its children. For example, the following source:

a.b.c.d.e

will result in the following tree:

    e
/ / \ \
d c b a

I can't see an easy fix since when you first parse a.b.c.d.e, a will be the ID and b.c.d.e the recursive call to function:

a.b.c.d.e
| +-----+
| |
| `-----> function
|
`----------> ID

resulting in the fact that b.c.d.e will have a as its child. When then b becomes the ID, it too is added as a child next to a. In your case, a should be removed as a child and then added to the list of b's children. But AFAIK, that is not possible in ANLTR (at least, not in a clean way inside the grammar).


EDIT

Okay, as a work-around I had something elegant in mind, but that didn't work as I had hoped. So, as a less elegant solution, you could match the last node as the root in your rewrite rule:

function
: (id '.')* last=id -> ^($last)
;

and then collect all possible preceding nodes (children) in a List using the += operator:

function
: (children+=id '.')* last=id -> ^($last)
;

and use a custom member-method in the parser to "inject" these children into the root of your tree (going from right to left in your List!):

function
: (children+=id '.')* last=id {reverse($children, (CommonTree)$last.tree);} -> ^($last)
;

A little demo:

grammar ReverseTree;

options {
output=AST;
}

tokens {
ROOT;
}

@members {
private void reverse(List nodes, CommonTree root) {
if(nodes == null) return;
for(int i = nodes.size()-1; i >= 0; i--) {
CommonTree temp = (CommonTree)nodes.get(i);
root.addChild(temp);
root = temp;
}
}
}

parse
: function+ EOF -> ^(ROOT function+)
;

function
: (children+=id '.')* last=id {reverse($children, (CommonTree)$last.tree);} -> ^($last)
;

id
: ID
;

ID
: ('a'..'z' | 'A'..'Z')+
;

Space
: ' ' {skip();}
;

And a little test class:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("a.b.c.d.e Stack.Overflow.Horse singleNode");
ReverseTreeLexer lexer = new ReverseTreeLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
ReverseTreeParser parser = new ReverseTreeParser(tokens);
ReverseTreeParser.parse_return returnValue = parser.parse();
CommonTree tree = (CommonTree)returnValue.getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}

which will produce an AST that looks like:

alt text

  • (image generated using http://graph.gafol.net)

For the input string:

"a.b.c.d.e    Stack.Overflow.Horse    singleNode"

Generate AST in the form of a dot file

The answer i was looking for is answered here by Bart Kiers but for people who want to generate DOT files without modifying the grammar can print a intended syntax tree by taking help from this repository.
Since i did not find much documentation on dot generation in ANTLR4 and no other option other than to modify ANTLR3 grammar file, i used Federico Tomassetti example and modified it a little to generate our own DOT file.

You can print a Dot file output by:

import java.io.File;
import java.io.IOException;

import java.util.ArrayList;

import java.nio.charset.Charset;
import java.nio.file.Files;

import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.ParserRuleContext;
import org.antlr.v4.runtime.RuleContext;
import org.antlr.v4.runtime.tree.ParseTree;

public class ASTGenerator {

static ArrayList<String> LineNum = new ArrayList<String>();
static ArrayList<String> Type = new ArrayList<String>();
static ArrayList<String> Content = new ArrayList<String>();

private static String readFile() throws IOException {
File file = new File("resource/java/Blabla.java");
byte[] encoded = Files.readAllBytes(file.toPath());
return new String(encoded, Charset.forName("UTF-8"));
}

public static void main(String args[]) throws IOException{
String inputString = readFile();
ANTLRInputStream input = new ANTLRInputStream(inputString);
Java8Lexer lexer = new Java8Lexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
Java8Parser parser = new Java8Parser(tokens);
ParserRuleContext ctx = parser.compilationUnit();

generateAST(ctx, false, 0);

System.out.println("digraph G {");
printDOT();
System.out.println("}");

}

private static void generateAST(RuleContext ctx, boolean verbose, int indentation) {
boolean toBeIgnored = !verbose && ctx.getChildCount() == 1 && ctx.getChild(0) instanceof ParserRuleContext;

if (!toBeIgnored) {
String ruleName = Java8Parser.ruleNames[ctx.getRuleIndex()];
LineNum.add(Integer.toString(indentation));
Type.add(ruleName);
Content.add(ctx.getText());
}
for (int i = 0; i < ctx.getChildCount(); i++) {
ParseTree element = ctx.getChild(i);
if (element instanceof RuleContext) {
generateAST((RuleContext) element, verbose, indentation + (toBeIgnored ? 0 : 1));
}
}
}

private static void printDOT(){
printLabel();
int pos = 0;
for(int i=1; i<LineNum.size();i++){
pos=getPos(Integer.parseInt(LineNum.get(i))-1, i);
System.out.println((Integer.parseInt(LineNum.get(i))-1)+Integer.toString(pos)+"->"+LineNum.get(i)+i);
}
}

private static void printLabel(){
for(int i =0; i<LineNum.size(); i++){
System.out.println(LineNum.get(i)+i+"[label=\""+Type.get(i)+"\\n "+Content.get(i)+" \"]");
}
}

private static int getPos(int n, int limit){
int pos = 0;
for(int i=0; i<limit;i++){
if(Integer.parseInt(LineNum.get(i))==n){
pos = i;
}
}
return pos;
}
}

For a source code like this:

class example{
public static void main(){
int a;
a = 5;
}
}

Output will be:

digraph G {
00[label="compilationUnit\n classexample{publicstaticvoidmain(){inta;a=5;}}<EOF> "]
11[label="normalClassDeclaration\n classexample{publicstaticvoidmain(){inta;a=5;}} "]
22[label="classBody\n {publicstaticvoidmain(){inta;a=5;}} "]
33[label="methodDeclaration\n publicstaticvoidmain(){inta;a=5;} "]
44[label="methodModifier\n public "]
45[label="methodModifier\n static "]
46[label="methodHeader\n voidmain() "]
57[label="result\n void "]
58[label="methodDeclarator\n main() "]
49[label="block\n {inta;a=5;} "]
510[label="blockStatements\n inta;a=5; "]
611[label="localVariableDeclarationStatement\n inta; "]
712[label="localVariableDeclaration\n inta "]
813[label="integralType\n int "]
814[label="variableDeclaratorId\n a "]
615[label="expressionStatement\n a=5; "]
716[label="assignment\n a=5 "]
817[label="expressionName\n a "]
818[label="assignmentOperator\n = "]
819[label="literal\n 5 "]
00->11
11->22
22->33
33->44
33->45
33->46
46->57
46->58
33->49
49->510
510->611
611->712
712->813
712->814
510->615
615->716
716->817
716->818
716->819
}

Insert this piece of output in http://viz-js.com/
You will get this as ouput:

You can also pipe the output to ast.dot file by:

java -jar path-to-jar-file.jar > ast.dot

Now this is not the perfect method but enough for me. :)

Hope this helps.



Related Topics



Leave a reply



Submit