Using Antlr 3.3

Using ANTLR 3.3?

Let's say you want to parse simple expressions consisting of the following tokens:

- subtraction (also unary);
+ addition;
* multiplication;
/ division;
(...) grouping (sub) expressions;
integer and decimal numbers.

An ANTLR grammar could look like this:

grammar Expression;

options {
  language=CSharp2;
}

parse
  :  exp EOF 
  ;

exp
  :  addExp
  ;

addExp
  :  mulExp (('+' | '-') mulExp)*
  ;

mulExp
  :  unaryExp (('*' | '/') unaryExp)*
  ;

unaryExp
  :  '-' atom 
  |  atom
  ;

atom
  :  Number
  |  '(' exp ')' 
  ;

Number
  :  ('0'..'9')+ ('.' ('0'..'9')+)?
  ;

Now to create a proper AST, you add output=AST; in your options { ... } section, and you mix some "tree operators" in your grammar defining which tokens should be the root of a tree. There are two ways to do this:

add ^ and ! after your tokens. The ^ causes the token to become a root and the ! excludes the token from the ast;
by using "rewrite rules": ... -> ^(Root Child Child ...).

Take the rule foo for example:

foo
  :  TokenA TokenB TokenC TokenD
  ;

and let's say you want TokenB to become the root and TokenA and TokenC to become its children, and you want to exclude TokenD from the tree. Here's how to do that using option 1:

foo
  :  TokenA TokenB^ TokenC TokenD!
  ;

and here's how to do that using option 2:

foo
  :  TokenA TokenB TokenC TokenD -> ^(TokenB TokenA TokenC)
  ;

So, here's the grammar with the tree operators in it:

grammar Expression;

options {
  language=CSharp2;
  output=AST;
}

tokens {
  ROOT;
  UNARY_MIN;
}

@parser::namespace { Demo.Antlr }
@lexer::namespace { Demo.Antlr }

parse
  :  exp EOF -> ^(ROOT exp)
  ;

exp
  :  addExp
  ;

addExp
  :  mulExp (('+' | '-')^ mulExp)*
  ;

mulExp
  :  unaryExp (('*' | '/')^ unaryExp)*
  ;

unaryExp
  :  '-' atom -> ^(UNARY_MIN atom)
  |  atom
  ;

atom
  :  Number
  |  '(' exp ')' -> exp
  ;

Number
  :  ('0'..'9')+ ('.' ('0'..'9')+)?
  ;

Space 
  :  (' ' | '\t' | '\r' | '\n'){Skip();}
  ;

I also added a Space rule to ignore any white spaces in the source file and added some extra tokens and namespaces for the lexer and parser. Note that the order is important (options { ... } first, then tokens { ... } and finally the @... {}-namespace declarations).

That's it.

Now generate a lexer and parser from your grammar file:


java -cp antlr-3.2.jar org.antlr.Tool Expression.g

and put the .cs files in your project together with the C# runtime DLL's.

You can test it using the following class:

using System;
using Antlr.Runtime;
using Antlr.Runtime.Tree;
using Antlr.StringTemplate;

namespace Demo.Antlr
{
  class MainClass
  {
    public static void Preorder(ITree Tree, int Depth) 
    {
      if(Tree == null)
      {
        return;
      }

      for (int i = 0; i < Depth; i++)
      {
        Console.Write("  ");
      }

      Console.WriteLine(Tree);

      Preorder(Tree.GetChild(0), Depth + 1);
      Preorder(Tree.GetChild(1), Depth + 1);
    }

    public static void Main (string[] args)
    {
      ANTLRStringStream Input = new ANTLRStringStream("(12.5 + 56 / -7) * 0.5"); 
      ExpressionLexer Lexer = new ExpressionLexer(Input);
      CommonTokenStream Tokens = new CommonTokenStream(Lexer);
      ExpressionParser Parser = new ExpressionParser(Tokens);
      ExpressionParser.parse_return ParseReturn = Parser.parse();
      CommonTree Tree = (CommonTree)ParseReturn.Tree;
      Preorder(Tree, 0);
    }
  }
}

which produces the following output:


ROOT
  *
    +
      12.5
      /
        56
        UNARY_MIN
          7
    0.5

which corresponds to the following AST:

alt text

(diagram created using graph.gafol.net)

Note that ANTLR 3.3 has just been released and the CSharp target is "in beta". That's why I used ANTLR 3.2 in my example.

In case of rather simple languages (like my example above), you could also evaluate the result on the fly without creating an AST. You can do that by embedding plain C# code inside your grammar file, and letting your parser rules return a specific value.

Here's an example:

grammar Expression;

options {
  language=CSharp2;
}

@parser::namespace { Demo.Antlr }
@lexer::namespace { Demo.Antlr }

parse returns [double value]
  :  exp EOF {$value = $exp.value;}
  ;

exp returns [double value]
  :  addExp {$value = $addExp.value;}
  ;

addExp returns [double value]
  :  a=mulExp       {$value = $a.value;}
     ( '+' b=mulExp {$value += $b.value;}
     | '-' b=mulExp {$value -= $b.value;}
     )*
  ;

mulExp returns [double value]
  :  a=unaryExp       {$value = $a.value;}
     ( '*' b=unaryExp {$value *= $b.value;}
     | '/' b=unaryExp {$value /= $b.value;}
     )*
  ;

unaryExp returns [double value]
  :  '-' atom {$value = -1.0 * $atom.value;}
  |  atom     {$value = $atom.value;}
  ;

atom returns [double value]
  :  Number      {$value = Double.Parse($Number.Text, CultureInfo.InvariantCulture);}
  |  '(' exp ')' {$value = $exp.value;}
  ;

Number
  :  ('0'..'9')+ ('.' ('0'..'9')+)?
  ;

Space 
  :  (' ' | '\t' | '\r' | '\n'){Skip();}
  ;

which can be tested with the class:

using System;
using Antlr.Runtime;
using Antlr.Runtime.Tree;
using Antlr.StringTemplate;

namespace Demo.Antlr
{
  class MainClass
  {
    public static void Main (string[] args)
    {
      string expression = "(12.5 + 56 / -7) * 0.5";
      ANTLRStringStream Input = new ANTLRStringStream(expression);  
      ExpressionLexer Lexer = new ExpressionLexer(Input);
      CommonTokenStream Tokens = new CommonTokenStream(Lexer);
      ExpressionParser Parser = new ExpressionParser(Tokens);
      Console.WriteLine(expression + " = " + Parser.parse());
    }
  }
}

and produces the following output:


(12.5 + 56 / -7) * 0.5 = 2.25

EDIT

In the comments, Ralph wrote:

Tip for those using Visual Studio: you can put something like java -cp "$(ProjectDir)antlr-3.2.jar" org.antlr.Tool "$(ProjectDir)Expression.g" in the pre-build events, then you can just modify your grammar and run the project without having to worry about rebuilding the lexer/parser.

Antlr 3.3 return values in java

First, in your grammar, speech only gets assigned the return value of parser rule wordExp. If you want to manipulate the return value of rule name as well, you can do this with an additional variable like the example below.

text returns [String value]
 : a=wordExp space b=name {$value = $a.text+" "+$b.text;}
 ;

Second, invoking parser.text() parses the entire input. A second invocation (in your case parser.wordExp()) thus finds EOF. If you remove the second call the no viable alternative at input 'EOF' goes away.

There may be a better way to do this, but in the meantime this may help you out.

Problems building ANTLR v3.3 from source : antlr3-maven-archetype missing

Tried this myself. Got the same error.

Commented the offending module in the parent pom.xml

 <!--module>antlr3-maven-archetype</module-->

Built successfully.

Not sure about your requirement, but hopefully you may still achieve it with this workaround.

Edit 1: You can safely ignore all the warnings (related to versions), which are due to running a maven2 pom with maven3.

However you should not be getting this error:

error: file "/Users/sp2/Desktop/antlrtst/antlr-3
2.3/tool/src/main/antlr2/org/antlr/grammar/v2/codegen.g"
not found

This file exists in the source distribution. Interestingly your folder shows "antlr-3.2.3", while other messages are related to 3.3. Could it be that you have incorrect/missing source?

Why are antlr3 c# parser methods private?

Make at least one parser rule "public" like this:

grammar T;

options {
  language=CSharp2;
}

public parse
  :  privateRule+ EOF
  ;

privateRule
  :  Token+
  ;

// ...

You can then call parse() on the generated parser.

protected and private (the default if nothing is specified) are also supported.

An ANTLR grammar that behaves strangely

It recognizes both both "a a a" and "b b b", as it should. To check for yourself, add a little debug-print statement to your statement rule:

grammar Test2;

options {
  language = Java;
}

statement: ( a|b )* {System.out.println("parsed: " + $text);};

a: 'a';

b: 'b';

WS: ('\n'|' '|'\t'|'r'|'\f')+ {$channel=HIDDEN;};

and then test the parser with the following class:

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    Test2Lexer lexer = new Test2Lexer(new ANTLRStringStream(args[0]));
    Test2Parser parser = new Test2Parser(new CommonTokenStream(lexer));
    parser.statement();
  }
}

After generating a lexer & parser and compiling all .java source files:

java -cp antlr-3.3.jar org.antlr.Tool Test2.g
javac -cp antlr-3.3.jar *.java

you can test the parser with "a a a" as input:

java -cp .:antlr-3.3.jar Main "a a a"
parsed: a a a

and with "b b b" as input:

java -cp .:antlr-3.3.jar Main "b b b"
parsed: b b b

As you can see, in both cases no error is reported, and the input is printed back to the console.

My guess is that you're using ANTLRWorks' interpreter. Don't. It is notoriously buggy. Either use ANTLRWorks' debugger (it's great!), or use a small test class you write yourself (similar to my Main class).

EDIT

Note that the ANTLR IDE Eclipse plugin uses the interpreter from ANTLRWorks, AFAIK.

In antlr, is there a way to get parsed text of a CommonTree in AST mode?

Is there a better way to do this?

Better, I don't know. There is another way, of course. You decide what's better.

Another option would be to create a custom AST node class (and corresponding node-adapter) and add the matched text to this AST node during parsing. The trick here is to not use skip(), which discards the token from the lexer, but to put it on the HIDDEN channel. This is effectively the same, however, the text these (hidden) tokens match are still available in the parser.

A quick demo: put all these 3 file in a directory named demo:

demo/T.g

grammar T;

options {
  output=AST;
  ASTLabelType=XTree;
}

@parser::header {
  package demo;
  import demo.*;
}

@lexer::header {
  package demo;
  import demo.*;
}

parse
 : expr EOF -> expr
 ;

expr
@after{$expr.tree.matched = $expr.text;}
 : Int '+' Int ';' -> ^('+' Int Int)
 ;

Int
 : '0'..'9'+
 ;

Space
 : ' ' {$channel=HIDDEN;}
 ;

demo/XTree.java

package demo;

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;

public class XTree extends CommonTree {

  protected String matched;

  public XTree(Token t) {
    super(t);
    matched = null;
  }
}

demo/Main.java

package demo;

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;

public class Main {

  public static void main(String[] args) throws Exception {
    String source = "12    +  42 ;";
    TLexer lexer = new TLexer(new ANTLRStringStream(source));
    TParser parser = new TParser(new CommonTokenStream(lexer));
    parser.setTreeAdaptor(new CommonTreeAdaptor(){
      @Override
      public Object create(Token t) {
        return new XTree(t);
      }
    }); 
    XTree root = (XTree)parser.parse().getTree();
    System.out.println("tree    : " + root.toStringTree());
    System.out.println("matched : " + root.matched);    
  }
}

You can run this demo by opening a shell and cd-ing to the directory that holds the demo directory and execute the following:

java -cp demo/antlr-3.3.jar org.antlr.Tool demo/T.g
javac -cp demo/antlr-3.3.jar demo/*.java
java -cp .:demo/antlr-3.3.jar demo.Main

which will produce the following output:

tree    : (+ 12 42)
matched : 12    +  42 ;

Why does Antlr think there is a missing bracket?

As already mentioned by others, your expression has to end with a EOF, but a nested expression cannot end with an EOF, of course.

Remove the EOF from expression, and create a proper "entry point" for your parser that ends with the EOF.

file: T.g

grammar T;

options {
  output=AST;
}

parse
  :  expression EOF!
  ;

expression
  :  '('! ('&' | '||' | '!')^ (atom | expression)* ')'!
  ;

atom
  :  '('! ITEM '='^ ITEM ')'!
  ;

ITEM        
  :  ALPHANUMERIC+
  ;

fragment ALPHANUMERIC
  :  ('a'..'z' | 'A'..'Z' | '0'..'9')
  ;

WHITESPACE 
  :  (' ' | '\t' | '\r' | '\n') { skip(); }
  ;

file: Main.java

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String source = "(||(attr=hello2)(!(attr2=12)))";
    TLexer lexer = new TLexer(new ANTLRStringStream(source));
    TParser parser = new TParser(new CommonTokenStream(lexer));
    CommonTree tree = (CommonTree)parser.parse().getTree();
    DOTTreeGenerator gen = new DOTTreeGenerator();
    StringTemplate st = gen.toDOT(tree);
    System.out.println(st);
  }
}

To run the demo, do:

*nix/MacOS:

java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main

Windows:

java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .;antlr-3.3.jar Main

which produces the DOT code representing the following AST:

Sample Image

image created using graphviz-dev.appspot.com

Hints regarding the use of ANTLR v3 generated files in Flash Builder 4.5.1

Terence Parr, the project lead of ANTLR just confirmed, that ANTLRworks needs a new compile. Thanks for great support!

ANTLR grammar tutorials

I think this one is the best book about ANTLR parser generator. I read this book, when i was writing my own MSIL compiler.
You can also look at my compiler (it may be not working) for the .g grammar as an example.

p.s.

Official ANTLR project web-site share good samples about grammar rules and parser generating.

Using ANTLR to generate lexer/parser as streams

I am not 100% sure if I understand your question correctly, but you might want to have a look at https://stackoverflow.com/a/38052798/5068458. This is an in-memory compiler for antlr grammars that generates lexer and parser for a given grammar in-memory. You do not have to do it manually. Code examples are provided there.

Best,
Julian

Using Antlr 3.3

Using ANTLR 3.3?

EDIT

Antlr 3.3 return values in java

Problems building ANTLR v3.3 from source : antlr3-maven-archetype missing

Why are antlr3 c# parser methods private?

An ANTLR grammar that behaves strangely

EDIT

In antlr, is there a way to get parsed text of a CommonTree in AST mode?

demo/T.g

demo/XTree.java

demo/Main.java

Why does Antlr think there is a missing bracket?

file: T.g

file: Main.java

*nix/MacOS:

Windows:

Hints regarding the use of ANTLR v3 generated files in Flash Builder 4.5.1

ANTLR grammar tutorials

Using ANTLR to generate lexer/parser as streams

Related Topics

Leave a reply