How to Create Ast with Antlr4

How to create AST with ANTLR4?

Ok, let's build a simple math example. Building an AST is totally overkill for such a task but it's a nice way to show the principle.

I'll do it in C# but the Java version would be very similar.

The grammar

First, let's write a very basic math grammar to work with:

grammar Math;

compileUnit
    :   expr EOF
    ;

expr
    :   '(' expr ')'                         # parensExpr
    |   op=('+'|'-') expr                    # unaryExpr
    |   left=expr op=('*'|'/') right=expr    # infixExpr
    |   left=expr op=('+'|'-') right=expr    # infixExpr
    |   func=ID '(' expr ')'                 # funcExpr
    |   value=NUM                            # numberExpr
    ;

OP_ADD: '+';
OP_SUB: '-';
OP_MUL: '*';
OP_DIV: '/';

NUM :   [0-9]+ ('.' [0-9]+)? ([eE] [+-]? [0-9]+)?;
ID  :   [a-zA-Z]+;
WS  :   [ \t\r\n] -> channel(HIDDEN);

Pretty basic stuff, we have a single expr rule that handles everything (precedence rules etc).

The AST nodes

Then, let's define some AST nodes we'll use. These are totally custom and you can define them in the way you want to.

Here are the nodes we'll be using for this example:

internal abstract class ExpressionNode
{
}

internal abstract class InfixExpressionNode : ExpressionNode
{
    public ExpressionNode Left { get; set; }
    public ExpressionNode Right { get; set; }
}

internal class AdditionNode : InfixExpressionNode
{
}

internal class SubtractionNode : InfixExpressionNode
{
}

internal class MultiplicationNode : InfixExpressionNode
{
}

internal class DivisionNode : InfixExpressionNode
{
}

internal class NegateNode : ExpressionNode
{
    public ExpressionNode InnerNode { get; set; }
}

internal class FunctionNode : ExpressionNode
{
    public Func<double, double> Function { get; set; }
    public ExpressionNode Argument { get; set; }
}

internal class NumberNode : ExpressionNode
{
    public double Value { get; set; }
}

Converting a CST to an AST

ANTLR generated the CST nodes for us (the MathParser.*Context classes). We now have to convert these to AST nodes.

This is easily done with a visitor, and ANTLR provides us with a MathBaseVisitor<T> class, so let's work with that.

internal class BuildAstVisitor : MathBaseVisitor<ExpressionNode>
{
    public override ExpressionNode VisitCompileUnit(MathParser.CompileUnitContext context)
    {
        return Visit(context.expr());
    }

    public override ExpressionNode VisitNumberExpr(MathParser.NumberExprContext context)
    {
        return new NumberNode
        {
            Value = double.Parse(context.value.Text, NumberStyles.AllowDecimalPoint | NumberStyles.AllowExponent)
        };
    }

    public override ExpressionNode VisitParensExpr(MathParser.ParensExprContext context)
    {
        return Visit(context.expr());
    }

    public override ExpressionNode VisitInfixExpr(MathParser.InfixExprContext context)
    {
        InfixExpressionNode node;

        switch (context.op.Type)
        {
            case MathLexer.OP_ADD:
                node = new AdditionNode();
                break;

            case MathLexer.OP_SUB:
                node = new SubtractionNode();
                break;

            case MathLexer.OP_MUL:
                node = new MultiplicationNode();
                break;

            case MathLexer.OP_DIV:
                node = new DivisionNode();
                break;

            default:
                throw new NotSupportedException();
        }

        node.Left = Visit(context.left);
        node.Right = Visit(context.right);

        return node;
    }

    public override ExpressionNode VisitUnaryExpr(MathParser.UnaryExprContext context)
    {
        switch (context.op.Type)
        {
            case MathLexer.OP_ADD:
                return Visit(context.expr());

            case MathLexer.OP_SUB:
                return new NegateNode
                {
                    InnerNode = Visit(context.expr())
                };

            default:
                throw new NotSupportedException();
        }
    }

    public override ExpressionNode VisitFuncExpr(MathParser.FuncExprContext context)
    {
        var functionName = context.func.Text;

        var func = typeof(Math)
            .GetMethods(BindingFlags.Public | BindingFlags.Static)
            .Where(m => m.ReturnType == typeof(double))
            .Where(m => m.GetParameters().Select(p => p.ParameterType).SequenceEqual(new[] { typeof(double) }))
            .FirstOrDefault(m => m.Name.Equals(functionName, StringComparison.OrdinalIgnoreCase));

        if (func == null)
            throw new NotSupportedException(string.Format("Function {0} is not supported", functionName));

        return new FunctionNode
        {
            Function = (Func<double, double>)func.CreateDelegate(typeof(Func<double, double>)),
            Argument = Visit(context.expr())
        };
    }
}

As you can see, it's just a matter of creating an AST node out of a CST node by using a visitor. The code should be pretty self-explanatory (well, maybe except for the VisitFuncExpr stuff, but it's just a quick way to wire up a delegate to a suitable method of the System.Math class).

And here you have the AST building stuff. That's all that's needed. Just extract the relevant information from the CST and keep it in the AST.

The AST visitor

Now, let's play a bit with the AST. We'll have to build an AST visitor base class to traverse it. Let's just do something similar to the AbstractParseTreeVisitor<T> provided by ANTLR.

internal abstract class AstVisitor<T>
{
    public abstract T Visit(AdditionNode node);
    public abstract T Visit(SubtractionNode node);
    public abstract T Visit(MultiplicationNode node);
    public abstract T Visit(DivisionNode node);
    public abstract T Visit(NegateNode node);
    public abstract T Visit(FunctionNode node);
    public abstract T Visit(NumberNode node);

    public T Visit(ExpressionNode node)
    {
        return Visit((dynamic)node);
    }
}

Here, I took advantage of C#'s dynamic keyword to perform a double-dispatch in one line of code. In Java, you'll have to do the wiring yourself with a sequence of if statements like these:

if (node is AdditionNode) {
    return Visit((AdditionNode)node);
} else if (node is SubtractionNode) {
    return Visit((SubtractionNode)node);
} else if ...

But I just went for the shortcut for this example.

Work with the AST

So, what can we do with a math expression tree? Evaluate it, of course! Let's implement an expression evaluator:

internal class EvaluateExpressionVisitor : AstVisitor<double>
{
    public override double Visit(AdditionNode node)
    {
        return Visit(node.Left) + Visit(node.Right);
    }

    public override double Visit(SubtractionNode node)
    {
        return Visit(node.Left) - Visit(node.Right);
    }

    public override double Visit(MultiplicationNode node)
    {
        return Visit(node.Left) * Visit(node.Right);
    }

    public override double Visit(DivisionNode node)
    {
        return Visit(node.Left) / Visit(node.Right);
    }

    public override double Visit(NegateNode node)
    {
        return -Visit(node.InnerNode);
    }

    public override double Visit(FunctionNode node)
    {
        return node.Function(Visit(node.Argument));
    }

    public override double Visit(NumberNode node)
    {
        return node.Value;
    }
}

Pretty simple once we have an AST, isn't it?

Putting it all together

Last but not least, we have to actually write the main program:

internal class Program
{
    private static void Main()
    {
        while (true)
        {
            Console.Write("> ");
            var exprText = Console.ReadLine();

            if (string.IsNullOrWhiteSpace(exprText))
                break;

            var inputStream = new AntlrInputStream(new StringReader(exprText));
            var lexer = new MathLexer(inputStream);
            var tokenStream = new CommonTokenStream(lexer);
            var parser = new MathParser(tokenStream);

            try
            {
                var cst = parser.compileUnit();
                var ast = new BuildAstVisitor().VisitCompileUnit(cst);
                var value = new EvaluateExpressionVisitor().Visit(ast);

                Console.WriteLine("= {0}", value);
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }

            Console.WriteLine();
        }
    }
}

And now we can finally play with it:

Sample Image

ANTLR4 AST Creation - How to create an AstVistor

If you want to get the individual subrules then implement the visitXXX functions for them (visitConstDecl(), visitVarDef() etc.) instead of the visitMini() function. They will only be called if there's really a match for them in the input. Hence you don't need to do any checks for occurences.

build AST in antlr4

There are two key items I'd like to point out first:

Efficiency covers more than simple speed of execution. It can also apply to memory overhead, maintainability, and the effectiveness of the long-term development strategy used by the team creating and/or using a piece of software.
You are hand-waving in your argument that parse trees are slower than ASTs. I've been testing performance of ANTLR 3 and ANTLR 4 for years, and there is no question at this point that ANTLR 4's parse trees obliterate the performance of ASTs in ANTLR 3.

That said, the decision to use parse trees exclusively in ANTLR 4 was based on observations that the actual maintainability of separated parsers and tree parsers with ASTs in ANTLR 3 did not meet the expected maintainability, and therefore decreased overall efficiency to the point that they were no longer the recommended practice.

In addition, automated parse tree construction in ANTLR 4 allowed me to implement declarative rule versioning, which I then applied to code throughout ANTLRWorks 2. This allowed me to rapidly correct every portion of a large codebase relying on the shape of the parse tree, without introducing regression bugs in the final product.

Create Java 8 AST with ANTLR4 programmatically

You are missing one step. You got the java8 grammar but you haven't yet created a a parser from it. This usually involves running the antlr4 jar on the grammar file (FAQs and more), which is very simple (example taken from the Getting Started page):

$ antlr4 Hello.g4
$ javac Hello*.java

How to Create Ast with Antlr4