Boolean Expression Parser in Java

boolean expression parser in java

I've coded this using Javaluator.

It's not exactly the output you are looking for, but I think it could be a start point.

package test;

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import net.astesana.javaluator.*;

public class TreeBooleanEvaluator extends AbstractEvaluator<String> {
/** The logical AND operator.*/
final static Operator AND = new Operator("&&", 2, Operator.Associativity.LEFT, 2);
/** The logical OR operator.*/
final static Operator OR = new Operator("||", 2, Operator.Associativity.LEFT, 1);

private static final Parameters PARAMETERS;

static {
// Create the evaluator's parameters
PARAMETERS = new Parameters();
// Add the supported operators
PARAMETERS.add(AND);
PARAMETERS.add(OR);
// Add the parentheses
PARAMETERS.addExpressionBracket(BracketPair.PARENTHESES);
}

public TreeBooleanEvaluator() {
super(PARAMETERS);
}

@Override
protected String toValue(String literal, Object evaluationContext) {
return literal;
}

private boolean getValue(String literal) {
if ("T".equals(literal) || literal.endsWith("=true")) return true;
else if ("F".equals(literal) || literal.endsWith("=false")) return false;
throw new IllegalArgumentException("Unknown literal : "+literal);
}

@Override
protected String evaluate(Operator operator, Iterator<String> operands,
Object evaluationContext) {
List<String> tree = (List<String>) evaluationContext;
String o1 = operands.next();
String o2 = operands.next();
Boolean result;
if (operator == OR) {
result = getValue(o1) || getValue(o2);
} else if (operator == AND) {
result = getValue(o1) && getValue(o2);
} else {
throw new IllegalArgumentException();
}
String eval = "("+o1+" "+operator.getSymbol()+" "+o2+")="+result;
tree.add(eval);
return eval;
}

public static void main(String[] args) {
TreeBooleanEvaluator evaluator = new TreeBooleanEvaluator();
doIt(evaluator, "T && ( F || ( F && T ) )");
doIt(evaluator, "(T && T) || ( F && T )");
}

private static void doIt(TreeBooleanEvaluator evaluator, String expression) {
List<String> sequence = new ArrayList<String>();
evaluator.evaluate(expression, sequence);
System.out.println ("Evaluation sequence for :"+expression);
for (String string : sequence) {
System.out.println (string);
}
System.out.println ();
}
}

Here is the ouput:

Evaluation sequence for :T && ( F || ( F && T ) )

(F && T)=false

(F || (F && T)=false)=false

(T && (F || (F && T)=false)=false)=false

Evaluation sequence for :(T && T) || ( F && T )

(T && T)=true

(F && T)=false

((T && T)=true || (F && T)=false)=true

Parse Boolean Expression in Java

The first step is to tokenize the input. Define

enum Token {VAR, LP, RP, NOT, AND, OR, END}

LP and RP are parentheses. Now define a tokenizer class that looks something like this:

class Tokenizer {
Tokenizer(String input) {...}
void reset() {...}
Token getNext() {...}
String getVarName() {...}
}

Calling getNext() on your example in a loop should return

LP LP NOT VAR AND LP LP NOT VAR OR VAR RP OR LP VAR OR VAR RP RP RP OR LP VAR AND NOT VAR RP RP AND VAR END

Calling getVarName() immediately after a VAR has been returned by getNext() gives you the name of the variable (e.g. "t42").

There are many ways to implement little scanners like this. You should do this first and make sure it's bulletproof by testing. Trying to build a parser on top of a flaky scanner is torture.

As I said in comments, I'd consider recursive descent parsing. If you have a suitable grammar, writing an RD parser is a very short step as the Dragon Book (also mentioned above) shows.

A reasonable grammar (using tokens as above) is

Expr -> Term AND Term
| Term OR Term
| Term END

Term -> NOT Term
| Opnd

Opnd -> VAR
| LP Expr RP

For example, here is how you'd get started. It shows the first rule converted to a function:

class Evaluator {
final Tokenizer tokenizer = ...; // Contains the expression text.
final Map<String, Boolean> env = ... // Environment: variables to values.

Token lookAhead; // Holds the token we're parsing right now.

Evaluator(Tokenizer tokenizer, Map<String, Boolean> env) { ... }

void advance() {
lookAhead = tokenizer.getNext();
}

boolean expr() {
boolean leftHandSide = term(); // Parse the left hand side recursively.
Token op = lookAhead; // Remember the operator.
if (op == Token.END) return leftHandSide; // Oops. That's all.
advance(); // Skip past the operator.
boolean rightHandSide = term(); // Parse the right hand side recursively.
if (op == Token.AND) return leftHandSide && rightHandSide; // Evaluate!
if (op == Token.OR) return leftHandSide || rightHandSide;
dieWithSyntaxError("Expected op, found " + op);
}

boolean term() {...}

boolean opnd() {...}

}

The environment is used when a VAR is parsed. Its boolean value is env.get(tokenizer.getVarName()).

So to process the file, you'll

For each line
For each variable tX in the expression
See if the line contains the string tX is bound to in its text field.
If so, put the mapping tX -> true in the environment
else put tX -> false
Reset the tokenizer
Call Evaluator.evaluate(tokenizer, environment)
If it returns true, print the line, else skip it.

This is the simplest approach I can think of. About 150 lines. Many optimizations are possible.

Added

Well since I can no longer take away the thrill of discovery, here is my version:

import static java.lang.Character.isDigit;
import static java.lang.Character.isWhitespace;
import java.util.HashMap;
import java.util.Map;
import static java.util.stream.Collectors.toMap;

public class TextExpressionSearch {
enum Token { VAR, LP, RP, NOT, AND, OR, END }

static class Tokenizer {
final String input;
int pos = 0;
String var;

Tokenizer(String input) {
this.input = input;
}

void reset() {
pos = 0;
var = null;
}

String getRead() {
return input.substring(0, pos);
}

Token getNext() {
var = null;
while (pos < input.length() && isWhitespace(input.charAt(pos))) {
++pos;
}
if (pos >= input.length()) {
return Token.END;
}
int start = pos++;
switch (input.charAt(start)) {
case 't':
while (pos < input.length() && isDigit(input.charAt(pos))) {
++pos;
}
var = input.substring(start, pos);
return Token.VAR;
case '(':
return Token.LP;
case ')':
return Token.RP;
case 'n':
if (input.startsWith("ot", pos)) {
pos += 2;
return Token.NOT;
}
break;
case 'a':
if (input.startsWith("nd", pos)) {
pos += 2;
return Token.AND;
}
break;
case 'o':
if (input.startsWith("r", pos)) {
pos += 1;
return Token.OR;
}
break;
}
throw new AssertionError("Can't tokenize: " + input.substring(start));
}
}

static class Evaluator {
final Tokenizer tokenizer;
final Map<String, Boolean> env;
Token lookAhead;

Evaluator(Tokenizer tokenizer, Map<String, Boolean> env) {
this.tokenizer = tokenizer;
this.env = env;
advance();
}

boolean die(String msg) {
throw new AssertionError(msg + "\nRead: " + tokenizer.getRead());
}

void advance() {
lookAhead = tokenizer.getNext();
}

void match(Token token) {
if (lookAhead != token) {
die("Expected " + token + ", found " + lookAhead);
}
advance();
}

boolean evaluate() {
boolean exprVal = expr();
match(Token.END);
return exprVal;
}

boolean expr() {
boolean lhs = negated();
switch (lookAhead) {
case AND:
advance();
return negated() && lhs;
case OR:
advance();
return negated() || lhs;
case END:
return lhs;
}
return die("Expected expr, found " + lookAhead);
}

boolean negated() {
switch (lookAhead) {
case NOT:
advance();
return !negated();
default:
return operand();
}
}

boolean operand() {
switch (lookAhead) {
case VAR:
if (!env.containsKey(tokenizer.var)) {
die("Undefined variable: " + tokenizer.var);
}
boolean varVal = env.get(tokenizer.var);
advance();
return varVal;
case LP:
advance();
boolean exprVal = expr();
match(Token.RP);
return exprVal;
}
return die("Expected operand, found " + lookAhead);
}
}

public static void main(String [] args) {
String expr = "((not t1 and ((not t2 or t4) or (t3 or t4))) or (t5 and not t6)) and t7";
Map<String, String> bindings = new HashMap<>();
bindings.put("t1", "str1");
bindings.put("t2", "str2");
bindings.put("t3", "str3");
bindings.put("t4", "str4");
bindings.put("t5", "str5");
bindings.put("t6", "str6");
bindings.put("t7", "str7");
String [] lines = {"str5 str7", "str4 str2"};
Tokenizer tokenizer = new Tokenizer(expr);
for (String line : lines) {
Map<String, Boolean> env =
bindings.entrySet().stream()
.collect(toMap(e -> e.getKey(), e -> line.contains(e.getValue())));
tokenizer.reset();
if (new Evaluator(tokenizer, env).evaluate()) {
System.out.println(line);
}
}
}
}

Parse parenthesized Boolean String Expressions with Java

I found a solution using JavaScript:

String boolexpr1 = "(!(true||false)&&!((true)^(true)))"
boolean result = (Boolean) new ScriptEngineManager().getEngineByName("javascript").eval(boolexpr1);

Didn't expect it could be that simple.

Boolean Expression Evaluation in Java

You could use the scripting engine in Java6 and the choose any of the popular scripting languages like Scala, Ruby, Python, Groovy, and Javascript. Than all you have to do is make sure the expression you want to evaluate is in the right language. Groovy is probably the easiest and will integrate best.

I have used this method successfully for a feature offering capabilities much like a formula / calculated column in a popular spreadsheet application.

Nested Boolean Expression Parser using ANTLR

I'd just wrap all the expressions into a single expression rule. Be sure to define the comparator expressions alternative before your binary expression alternative to make sure comparator operators bind more tightly than OR and AND:

grammar SimpleBoolean;

parse
: expression EOF
;

expression
: LPAREN expression RPAREN #parenExpression
| NOT expression #notExpression
| left=expression op=comparator right=expression #comparatorExpression
| left=expression op=binary right=expression #binaryExpression
| bool #boolExpression
| IDENTIFIER #identifierExpression
| DECIMAL #decimalExpression
;

comparator
: GT | GE | LT | LE | EQ
;

binary
: AND | OR
;

bool
: TRUE | FALSE
;

AND : 'AND' ;
OR : 'OR' ;
NOT : 'NOT';
TRUE : 'TRUE' ;
FALSE : 'FALSE' ;
GT : '>' ;
GE : '>=' ;
LT : '<' ;
LE : '<=' ;
EQ : '=' ;
LPAREN : '(' ;
RPAREN : ')' ;
DECIMAL : '-'? [0-9]+ ( '.' [0-9]+ )? ;
IDENTIFIER : [a-zA-Z_] [a-zA-Z_0-9]* ;
WS : [ \r\t\u000C\n]+ -> skip;

To test the grammar above, use the following quick-and-dirty test classes:

public class Main {

public static void main(String[] args) throws Exception {

Map<String, Object> variables = new HashMap<String, Object>() {{
put("A", true);
put("a", true);
put("B", false);
put("b", false);
put("C", 42.0);
put("c", 42.0);
put("D", -999.0);
put("d", -1999.0);
put("E", 42.001);
put("e", 142.001);
put("F", 42.001);
put("f", 42.001);
put("G", -1.0);
put("g", -1.0);
}};

String[] expressions = {
"1 > 2",
"1 >= 1.0",
"TRUE = FALSE",
"FALSE = FALSE",
"A OR B",
"B",
"A = B",
"c = C",
"E > D",
"B OR (c = B OR (A = A AND c = C AND E > D))",
"(A = a OR B = b OR C = c AND ((D = d AND E = e) OR (F = f AND G = g)))"
};

for (String expression : expressions) {
SimpleBooleanLexer lexer = new SimpleBooleanLexer(new ANTLRInputStream(expression));
SimpleBooleanParser parser = new SimpleBooleanParser(new CommonTokenStream(lexer));
Object result = new EvalVisitor(variables).visit(parser.parse());
System.out.printf("%-70s -> %s\n", expression, result);
}
}
}

class EvalVisitor extends SimpleBooleanBaseVisitor<Object> {

private final Map<String, Object> variables;

public EvalVisitor(Map<String, Object> variables) {
this.variables = variables;
}

@Override
public Object visitParse(SimpleBooleanParser.ParseContext ctx) {
return super.visit(ctx.expression());
}

@Override
public Object visitDecimalExpression(SimpleBooleanParser.DecimalExpressionContext ctx) {
return Double.valueOf(ctx.DECIMAL().getText());
}

@Override
public Object visitIdentifierExpression(SimpleBooleanParser.IdentifierExpressionContext ctx) {
return variables.get(ctx.IDENTIFIER().getText());
}

@Override
public Object visitNotExpression(SimpleBooleanParser.NotExpressionContext ctx) {
return !((Boolean)this.visit(ctx.expression()));
}

@Override
public Object visitParenExpression(SimpleBooleanParser.ParenExpressionContext ctx) {
return super.visit(ctx.expression());
}

@Override
public Object visitComparatorExpression(SimpleBooleanParser.ComparatorExpressionContext ctx) {
if (ctx.op.EQ() != null) {
return this.visit(ctx.left).equals(this.visit(ctx.right));
}
else if (ctx.op.LE() != null) {
return asDouble(ctx.left) <= asDouble(ctx.right);
}
else if (ctx.op.GE() != null) {
return asDouble(ctx.left) >= asDouble(ctx.right);
}
else if (ctx.op.LT() != null) {
return asDouble(ctx.left) < asDouble(ctx.right);
}
else if (ctx.op.GT() != null) {
return asDouble(ctx.left) > asDouble(ctx.right);
}
throw new RuntimeException("not implemented: comparator operator " + ctx.op.getText());
}

@Override
public Object visitBinaryExpression(SimpleBooleanParser.BinaryExpressionContext ctx) {
if (ctx.op.AND() != null) {
return asBoolean(ctx.left) && asBoolean(ctx.right);
}
else if (ctx.op.OR() != null) {
return asBoolean(ctx.left) || asBoolean(ctx.right);
}
throw new RuntimeException("not implemented: binary operator " + ctx.op.getText());
}

@Override
public Object visitBoolExpression(SimpleBooleanParser.BoolExpressionContext ctx) {
return Boolean.valueOf(ctx.getText());
}

private boolean asBoolean(SimpleBooleanParser.ExpressionContext ctx) {
return (boolean)visit(ctx);
}

private double asDouble(SimpleBooleanParser.ExpressionContext ctx) {
return (double)visit(ctx);
}
}

Running the Main class will result in the following output:

1 > 2                                                                  -> false
1 >= 1.0 -> true
TRUE = FALSE -> false
FALSE = FALSE -> true
A OR B -> true
B -> false
A = B -> false
c = C -> true
E > D -> true
B OR (c = B OR (A = A AND c = C AND E > D)) -> true
(A = a OR B = b OR C = c AND ((D = d AND E = e) OR (F = f AND G = g))) -> true

evaluate boolean expression in java generate at runtime

Use http://docs.codehaus.org/display/JANINO/Home for minimum work. I can do much more than simple expressions.

Antlr4 parser for boolean logic

Both are equivalent in their ability to parse your sample input. If you simplify your input by removing the unnecessary parentheses, the output of this grammar looks pretty good too:

grammar filter;
filter: expression EOF;
expression
: LPAREN expression RPAREN
| expression (AND expression)+
| expression (OR expression)+
| atom;
atom : INT;
INT: DIGITS;
DIGITS : [0-9]+;
AND : 'AND';
OR : 'OR';
LPAREN : '(';
RPAREN : ')';
WS: [ \t\r\n]+ -> skip;

Which is what I suspect your first grammar looks like in its entirety.

Your second one requires too many parentheses for my liking (mainly in term), and the breaking up of AND and OR into separate rules instead of alternatives doesn't seem as clean to me.

You can simplify even more though:

grammar filter;
filter: expression EOF;
expression
: LPAREN expression RPAREN # ParenExp
| expression AND expression # AndBlock
| expression OR expression # OrBlock
| atom # AtomExp
;
atom : INT;
INT: DIGITS;
DIGITS : [0-9]+;
AND : 'AND';
OR : 'OR';
LPAREN : '(';
RPAREN : ')';
WS: [ \t\r\n]+ -> skip;

This gives a tree with a different shape but still is equivalent. And note the use of the # AndBlock and # OrBlock labels... these "alternative labels" will cause your generated listener or visitor to have separate methods for each, allowing you to completely separate these two in your code semantically as well as syntactically. Perhaps that's what you're looking for?

I like this one the best because it's the simplest and clearer recursion, and offers specific code alternatives for AND and OR.



Related Topics



Leave a reply



Submit