Using ANTLR 3.3?
Let's say you want to parse simple expressions consisting of the following tokens:
-
subtraction (also unary);+
addition;*
multiplication;/
division;(...)
grouping (sub) expressions;- integer and decimal numbers.
An ANTLR grammar could look like this:
grammar Expression;
options {
language=CSharp2;
}
parse
: exp EOF
;
exp
: addExp
;
addExp
: mulExp (('+' | '-') mulExp)*
;
mulExp
: unaryExp (('*' | '/') unaryExp)*
;
unaryExp
: '-' atom
| atom
;
atom
: Number
| '(' exp ')'
;
Number
: ('0'..'9')+ ('.' ('0'..'9')+)?
;
Now to create a proper AST, you add output=AST;
in your options { ... }
section, and you mix some "tree operators" in your grammar defining which tokens should be the root of a tree. There are two ways to do this:
- add
^
and!
after your tokens. The^
causes the token to become a root and the!
excludes the token from the ast; - by using "rewrite rules":
... -> ^(Root Child Child ...)
.
Take the rule foo
for example:
foo
: TokenA TokenB TokenC TokenD
;
and let's say you want TokenB
to become the root and TokenA
and TokenC
to become its children, and you want to exclude TokenD
from the tree. Here's how to do that using option 1:
foo
: TokenA TokenB^ TokenC TokenD!
;
and here's how to do that using option 2:
foo
: TokenA TokenB TokenC TokenD -> ^(TokenB TokenA TokenC)
;
So, here's the grammar with the tree operators in it:
grammar Expression;
options {
language=CSharp2;
output=AST;
}
tokens {
ROOT;
UNARY_MIN;
}
@parser::namespace { Demo.Antlr }
@lexer::namespace { Demo.Antlr }
parse
: exp EOF -> ^(ROOT exp)
;
exp
: addExp
;
addExp
: mulExp (('+' | '-')^ mulExp)*
;
mulExp
: unaryExp (('*' | '/')^ unaryExp)*
;
unaryExp
: '-' atom -> ^(UNARY_MIN atom)
| atom
;
atom
: Number
| '(' exp ')' -> exp
;
Number
: ('0'..'9')+ ('.' ('0'..'9')+)?
;
Space
: (' ' | '\t' | '\r' | '\n'){Skip();}
;
I also added a Space
rule to ignore any white spaces in the source file and added some extra tokens and namespaces for the lexer and parser. Note that the order is important (options { ... }
first, then tokens { ... }
and finally the @... {}
-namespace declarations).
That's it.
Now generate a lexer and parser from your grammar file:
java -cp antlr-3.2.jar org.antlr.Tool Expression.g
and put the .cs
files in your project together with the C# runtime DLL's.
You can test it using the following class:
using System;
using Antlr.Runtime;
using Antlr.Runtime.Tree;
using Antlr.StringTemplate;
namespace Demo.Antlr
{
class MainClass
{
public static void Preorder(ITree Tree, int Depth)
{
if(Tree == null)
{
return;
}
for (int i = 0; i < Depth; i++)
{
Console.Write(" ");
}
Console.WriteLine(Tree);
Preorder(Tree.GetChild(0), Depth + 1);
Preorder(Tree.GetChild(1), Depth + 1);
}
public static void Main (string[] args)
{
ANTLRStringStream Input = new ANTLRStringStream("(12.5 + 56 / -7) * 0.5");
ExpressionLexer Lexer = new ExpressionLexer(Input);
CommonTokenStream Tokens = new CommonTokenStream(Lexer);
ExpressionParser Parser = new ExpressionParser(Tokens);
ExpressionParser.parse_return ParseReturn = Parser.parse();
CommonTree Tree = (CommonTree)ParseReturn.Tree;
Preorder(Tree, 0);
}
}
}
which produces the following output:
ROOT
*
+
12.5
/
56
UNARY_MIN
7
0.5
which corresponds to the following AST:
(diagram created using graph.gafol.net)
Note that ANTLR 3.3 has just been released and the CSharp target is "in beta". That's why I used ANTLR 3.2 in my example.
In case of rather simple languages (like my example above), you could also evaluate the result on the fly without creating an AST. You can do that by embedding plain C# code inside your grammar file, and letting your parser rules return a specific value.
Here's an example:
grammar Expression;
options {
language=CSharp2;
}
@parser::namespace { Demo.Antlr }
@lexer::namespace { Demo.Antlr }
parse returns [double value]
: exp EOF {$value = $exp.value;}
;
exp returns [double value]
: addExp {$value = $addExp.value;}
;
addExp returns [double value]
: a=mulExp {$value = $a.value;}
( '+' b=mulExp {$value += $b.value;}
| '-' b=mulExp {$value -= $b.value;}
)*
;
mulExp returns [double value]
: a=unaryExp {$value = $a.value;}
( '*' b=unaryExp {$value *= $b.value;}
| '/' b=unaryExp {$value /= $b.value;}
)*
;
unaryExp returns [double value]
: '-' atom {$value = -1.0 * $atom.value;}
| atom {$value = $atom.value;}
;
atom returns [double value]
: Number {$value = Double.Parse($Number.Text, CultureInfo.InvariantCulture);}
| '(' exp ')' {$value = $exp.value;}
;
Number
: ('0'..'9')+ ('.' ('0'..'9')+)?
;
Space
: (' ' | '\t' | '\r' | '\n'){Skip();}
;
which can be tested with the class:
using System;
using Antlr.Runtime;
using Antlr.Runtime.Tree;
using Antlr.StringTemplate;
namespace Demo.Antlr
{
class MainClass
{
public static void Main (string[] args)
{
string expression = "(12.5 + 56 / -7) * 0.5";
ANTLRStringStream Input = new ANTLRStringStream(expression);
ExpressionLexer Lexer = new ExpressionLexer(Input);
CommonTokenStream Tokens = new CommonTokenStream(Lexer);
ExpressionParser Parser = new ExpressionParser(Tokens);
Console.WriteLine(expression + " = " + Parser.parse());
}
}
}
and produces the following output:
(12.5 + 56 / -7) * 0.5 = 2.25
EDIT
In the comments, Ralph wrote:
Tip for those using Visual Studio: you can put something like
java -cp "$(ProjectDir)antlr-3.2.jar" org.antlr.Tool "$(ProjectDir)Expression.g"
in the pre-build events, then you can just modify your grammar and run the project without having to worry about rebuilding the lexer/parser.
Antlr 3.3 return values in java
First, in your grammar, speech
only gets assigned the return value of parser rule wordExp
. If you want to manipulate the return value of rule name
as well, you can do this with an additional variable like the example below.
text returns [String value]
: a=wordExp space b=name {$value = $a.text+" "+$b.text;}
;
Second, invoking parser.text()
parses the entire input. A second invocation (in your case parser.wordExp()
) thus finds EOF. If you remove the second call the no viable alternative at input 'EOF'
goes away.
There may be a better way to do this, but in the meantime this may help you out.
Problems building ANTLR v3.3 from source : antlr3-maven-archetype missing
Tried this myself. Got the same error.
Commented the offending module in the parent pom.xml
<!--module>antlr3-maven-archetype</module-->
Built successfully.
Not sure about your requirement, but hopefully you may still achieve it with this workaround.
Edit 1: You can safely ignore all the warnings (related to versions), which are due to running a maven2 pom with maven3.
However you should not be getting this error:
error: file "/Users/sp2/Desktop/antlrtst/antlr-3
2.3/tool/src/main/antlr2/org/antlr/grammar/v2/codegen.g"
not found
This file exists in the source distribution. Interestingly your folder shows "antlr-3.2.3", while other messages are related to 3.3. Could it be that you have incorrect/missing source?
Why are antlr3 c# parser methods private?
Make at least one parser rule "public" like this:
grammar T;
options {
language=CSharp2;
}
public parse
: privateRule+ EOF
;
privateRule
: Token+
;
// ...
You can then call parse()
on the generated parser.
protected
and private
(the default if nothing is specified) are also supported.
An ANTLR grammar that behaves strangely
It recognizes both both "a a a"
and "b b b"
, as it should. To check for yourself, add a little debug-print statement to your statement
rule:
grammar Test2;
options {
language = Java;
}
statement: ( a|b )* {System.out.println("parsed: " + $text);};
a: 'a';
b: 'b';
WS: ('\n'|' '|'\t'|'r'|'\f')+ {$channel=HIDDEN;};
and then test the parser with the following class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
Test2Lexer lexer = new Test2Lexer(new ANTLRStringStream(args[0]));
Test2Parser parser = new Test2Parser(new CommonTokenStream(lexer));
parser.statement();
}
}
After generating a lexer & parser and compiling all .java
source files:
java -cp antlr-3.3.jar org.antlr.Tool Test2.g
javac -cp antlr-3.3.jar *.java
you can test the parser with "a a a"
as input:
java -cp .:antlr-3.3.jar Main "a a a"
parsed: a a a
and with "b b b"
as input:
java -cp .:antlr-3.3.jar Main "b b b"
parsed: b b b
As you can see, in both cases no error is reported, and the input is printed back to the console.
My guess is that you're using ANTLRWorks' interpreter. Don't. It is notoriously buggy. Either use ANTLRWorks' debugger (it's great!), or use a small test class you write yourself (similar to my Main
class).
EDIT
Note that the ANTLR IDE Eclipse plugin uses the interpreter from ANTLRWorks, AFAIK.
In antlr, is there a way to get parsed text of a CommonTree in AST mode?
Is there a better way to do this?
Better, I don't know. There is another way, of course. You decide what's better.
Another option would be to create a custom AST node class (and corresponding node-adapter) and add the matched text to this AST node during parsing. The trick here is to not use skip()
, which discards the token from the lexer, but to put it on the HIDDEN
channel. This is effectively the same, however, the text these (hidden) tokens match are still available in the parser.
A quick demo: put all these 3 file in a directory named demo
:
demo/T.g
grammar T;
options {
output=AST;
ASTLabelType=XTree;
}
@parser::header {
package demo;
import demo.*;
}
@lexer::header {
package demo;
import demo.*;
}
parse
: expr EOF -> expr
;
expr
@after{$expr.tree.matched = $expr.text;}
: Int '+' Int ';' -> ^('+' Int Int)
;
Int
: '0'..'9'+
;
Space
: ' ' {$channel=HIDDEN;}
;
demo/XTree.java
package demo;
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
public class XTree extends CommonTree {
protected String matched;
public XTree(Token t) {
super(t);
matched = null;
}
}
demo/Main.java
package demo;
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "12 + 42 ;";
TLexer lexer = new TLexer(new ANTLRStringStream(source));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.setTreeAdaptor(new CommonTreeAdaptor(){
@Override
public Object create(Token t) {
return new XTree(t);
}
});
XTree root = (XTree)parser.parse().getTree();
System.out.println("tree : " + root.toStringTree());
System.out.println("matched : " + root.matched);
}
}
You can run this demo by opening a shell and cd-ing to the directory that holds the demo
directory and execute the following:
java -cp demo/antlr-3.3.jar org.antlr.Tool demo/T.g
javac -cp demo/antlr-3.3.jar demo/*.java
java -cp .:demo/antlr-3.3.jar demo.Main
which will produce the following output:
tree : (+ 12 42)
matched : 12 + 42 ;
Why does Antlr think there is a missing bracket?
As already mentioned by others, your expression has to end with a EOF
, but a nested expression cannot end with an EOF
, of course.
Remove the EOF
from expression
, and create a proper "entry point" for your parser that ends with the EOF
.
file: T.g
grammar T;
options {
output=AST;
}
parse
: expression EOF!
;
expression
: '('! ('&' | '||' | '!')^ (atom | expression)* ')'!
;
atom
: '('! ITEM '='^ ITEM ')'!
;
ITEM
: ALPHANUMERIC+
;
fragment ALPHANUMERIC
: ('a'..'z' | 'A'..'Z' | '0'..'9')
;
WHITESPACE
: (' ' | '\t' | '\r' | '\n') { skip(); }
;
file: Main.java
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "(||(attr=hello2)(!(attr2=12)))";
TLexer lexer = new TLexer(new ANTLRStringStream(source));
TParser parser = new TParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
To run the demo, do:
*nix/MacOS:
java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
Windows:
java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .;antlr-3.3.jar Main
which produces the DOT code representing the following AST:
image created using graphviz-dev.appspot.com
Hints regarding the use of ANTLR v3 generated files in Flash Builder 4.5.1
Terence Parr, the project lead of ANTLR just confirmed, that ANTLRworks needs a new compile. Thanks for great support!
ANTLR grammar tutorials
I think this one is the best book about ANTLR parser generator. I read this book, when i was writing my own MSIL compiler.
You can also look at my compiler (it may be not working) for the .g grammar as an example.
p.s.
Official ANTLR project web-site share good samples about grammar rules and parser generating.
Using ANTLR to generate lexer/parser as streams
I am not 100% sure if I understand your question correctly, but you might want to have a look at https://stackoverflow.com/a/38052798/5068458. This is an in-memory compiler for antlr grammars that generates lexer and parser for a given grammar in-memory. You do not have to do it manually. Code examples are provided there.
Best,
Julian
Related Topics
How to Parse Very Huge Xml Files in C#
Sqldatasourceenumerator.Instance.Getdatasources() Does Not Locate Local SQL Server 2008 Instance
C# Split String and Remove Empty String
Reading a Key from the Web.Config Using Configurationmanager
Creating PDF Files at Runtime in C#
Determine Assembly Version During a Post-Build Event
How to "Await Yield Return Dosomethingasync()"
Multi-Tenant with Code First Ef6
Get All Registered Routes in ASP.NET Core
Could We Save Delegates in a File (C#)
Measure a String Without Using a Graphics Object
Sqlparameter Does Not Allows Table Name - Other Options Without SQL Injection Attack
Equivalence of "With...End With" in C#
How to Store/Retrieve Rsa Public/Private Key
Units of Measure in C# - Almost
Microsoft Office Excel Cannot Access the File 'C:\Inetpub\Wwwroot\Timesheet\App_Data\Template.Xlsx'
Mock Httpcontext for Unit Testing a .Net Core MVC Controller
Should I Abstract the Validation Framework from Domain Layer