SQL Parser Library for Java

SQL parser library for Java - Retrieve the list of table names present in a SQL statement

I doubt you'll find anything prewritten that you can just use. The problem is that ISO/ANSI SQL is a very complicated grammar — something like more than 600 production rules IIRC.

Terence Parr's ANTLR parser generator (Java, but can generate parsers in any one of a number of target languages) has several SQL grammars available, including a couple for PL/SQL, one for a SQL Server SELECT statement, one for mySQL, and one for ISO SQL.

No idea how complete/correct/up-to-date they are.

http://www.antlr.org/grammar/list

Parsing SQL query in Java

  1. One approach would consist in defining a subset of the SQL grammar that would be sufficient to parse your queries, then write a parser for that grammar,
  2. compare the queries and find the parts that are identical, and those that differ,
  3. locate the literal values like 4, 6, 'test' in you queries, build (flat) syntactic tree, and compare the trees to one another to identify those literal values that may differ from one query to another.

UPDATE

To parse the SQL, you could user a parser generator like ANTLR or JavaCC. ANTLR and JavaCC grammars exist for SQL, and you could start with one of them.

That said, I think this approach would be overkill in this instance; I would rather use the third.

UPDATE 2: (third method)

To locate literal strings and numbers, you can use a regexp:

private static final Pattern CONST_PATTERN
= Pattern.compile("([^0-9a-zA-Z])((?:[0-9]+(?:\\.[0-9]*)?|[0-9]*\\.[0-9]+)"
+ "(?:[Ee][+-][0-9]+])?"
+ "|(?:\\'[^']*\\')+)", Pattern.CASE_INSENSITIVE);

You can parse a query while generating the following structure:

private static class ParameterizedQuery {
final String sql;
final Parameter[] params;

ParameterizedQuery(String sql, Parameter[] params) {
this.sql = sql;
this.params = params.clone();
}
}

private static class Parameter {
final int position;
final String value;

Parameter(int position, String value) {
this.position = position;
this.value = value;
}
}

The resulting sql query is the input query with all the literals replaced with question marks. The parsing is done as follows:

private static ParameterizedQuery parse(String query) {
List<Parameter> parms = new ArrayList<>();
Matcher matcher = CONST_PATTERN.matcher(query);
int start = 0;
StringBuilder buf = new StringBuilder();
while (matcher.find()) {
int pos = matcher.start();
buf.append(query, start, pos)
.append(matcher.group(1))
.append("?");
parms.add(new Parameter(buf.length()-1,matcher.group(2)));
start = matcher.end();
}
buf.append(query, start, query.length());
return new ParameterizedQuery(
buf.toString(), parms.toArray(new Parameter[parms.size()]));
}

Now, if you have a list of queries, and you want to keep as parameter only those that are not equal in all the input queries, you parse all your queries, producing an array of ParameterizedQuery, and the simplify that array:

private static ParameterizedQuery[] simplify(ParameterizedQuery[] queries) {
if (queries.length == 0) {
return queries;
}
ParameterizedQuery prev = null;
boolean[] diff = null;
for (ParameterizedQuery cur: queries) {
if (prev == null) {
diff = new boolean[cur.params.length];
} else {
if (!cur.sql.equals(prev.sql)) {
throw new RuntimeException(
"Queries are too different: [" + prev.sql
+ "] and [" + cur.sql + "]");
} else if (cur.params.length != prev.params.length) {
throw new RuntimeException(
"Different number of parameters: ["
+ prev.params.length
+ "] and [" + cur.params.length + "]");
}
for (int i = 0; i < diff.length; ++i) {
if (!cur.params[i].value.equals(prev.params[i].value)) {
diff[i] = true;
}
}
}
prev = cur;
}
if (and(diff)) {
return queries;
}
ParameterizedQuery[] result = new ParameterizedQuery[queries.length];
result[0] = expandQuery(queries[0].sql, queries[0].params, diff);
for (int i = 1; i < queries.length; ++i) {
result[i] = new ParameterizedQuery(result[0].sql,
keep(queries[i].params, result[0].params, diff));
}
return result;
}

private static boolean and(boolean[] arr) {
for (boolean b: arr) {
if (!b) {
return false;
}
}
return true;
}

private static ParameterizedQuery expandQuery(String query,
Parameter[] params, boolean[] diff) {
int count = 0;
for (boolean b: diff) {
if (b) {
++count;
}
}
Parameter[] result = new Parameter[count];
int r = 0;
int start = 0;
StringBuilder buf = new StringBuilder();
for (int i = 0; i < diff.length; ++i) {
Parameter parm = params[i];
if (!diff[i]) {
// expand param
buf.append(query, start, parm.position);
buf.append(parm.value);
start = parm.position+1;
} else {
buf.append(query, start, parm.position);
result[r++] = new Parameter(buf.length(), parm.value);
start = parm.position;
}
}
buf.append(query, start, query.length());
return new ParameterizedQuery(buf.toString(), result);
}

private static Parameter[] keep(Parameter[] params, Parameter[] ref,
boolean[] diff) {
Parameter[] result = new Parameter[ref.length];
int j = 0;
for (int i = 0; i < params.length; ++i) {
if (diff[i]) {
result[j] = new Parameter(ref[j].position, params[i].value);
++j;
}
}
return result;
}

Here's the program that resolves your example:

public class Main {
private static final String[] QUERIES = {
"select * from tableName as t1 where t1.tableColumnId=4 and t1.tableColumnName='test' inner join tableName2 as t2 on t1.tableColumnId=t2.tableColumnId",
"select * from tableName as t1 where t1.tableColumnId=6 and t1.tableColumnName='test' inner join tableName2 as t2 on t1.tableColumnId=t2.tableColumnId",
};
private static final Pattern CONST_PATTERN
= Pattern.compile("([^0-9a-zA-Z])((?:[0-9]+(?:\\.[0-9]*)?|[0-9]*\\.[0-9]+)"
+ "(?:[Ee][+-][0-9]+])?"
+ "|(?:\\'[^']*\\')+)", Pattern.CASE_INSENSITIVE);

private static class ParameterizedQuery {
final String sql;
final Parameter[] params;

ParameterizedQuery(String sql, Parameter[] params) {
this.sql = sql;
this.params = params.clone();
}
}

private static class Parameter {
final int position;
final String value;

Parameter(int position, String value) {
this.position = position;
this.value = value;
}
}

public static void main(String[] args) {
ParameterizedQuery[] queries = new ParameterizedQuery[QUERIES.length];
for (int i = 0; i < QUERIES.length; ++i) {
queries[i] = parse(QUERIES[i]);
}
for (ParameterizedQuery cur: queries) {
System.out.println(cur.sql);
int i = 0;
for (Parameter parm: cur.params) {
System.out.println(" " + (++i) + ": " + parm.value);
}
}
queries = simplify(queries);
for (ParameterizedQuery cur: queries) {
System.out.println(cur.sql);
int i = 0;
for (Parameter parm: cur.params) {
System.out.println(" " + (++i) + ": " + parm.value);
}
}
}

private static ParameterizedQuery parse(String query) {
List<Parameter> parms = new ArrayList<>();
Matcher matcher = CONST_PATTERN.matcher(query);
int start = 0;
StringBuilder buf = new StringBuilder();
while (matcher.find()) {
int pos = matcher.start();
buf.append(query, start, pos)
.append(matcher.group(1))
.append("?");
parms.add(new Parameter(buf.length()-1,matcher.group(2)));
start = matcher.end();
}
buf.append(query, start, query.length());
return new ParameterizedQuery(
buf.toString(), parms.toArray(new Parameter[parms.size()]));
}

private static ParameterizedQuery[] simplify(ParameterizedQuery[] queries) {
if (queries.length == 0) {
return queries;
}
ParameterizedQuery prev = null;
boolean[] diff = null;
for (ParameterizedQuery cur: queries) {
if (prev == null) {
diff = new boolean[cur.params.length];
} else {
if (!cur.sql.equals(prev.sql)) {
throw new RuntimeException(
"Queries are too different: [" + prev.sql
+ "] and [" + cur.sql + "]");
} else if (cur.params.length != prev.params.length) {
throw new RuntimeException(
"Different number of parameters: ["
+ prev.params.length
+ "] and [" + cur.params.length + "]");
}
for (int i = 0; i < diff.length; ++i) {
if (!cur.params[i].value.equals(prev.params[i].value)) {
diff[i] = true;
}
}
}
prev = cur;
}
if (and(diff)) {
return queries;
}
ParameterizedQuery[] result = new ParameterizedQuery[queries.length];
result[0] = expandQuery(queries[0].sql, queries[0].params, diff);
for (int i = 1; i < queries.length; ++i) {
result[i] = new ParameterizedQuery(result[0].sql,
keep(queries[i].params, result[0].params, diff));
}
return result;
}

private static boolean and(boolean[] arr) {
for (boolean b: arr) {
if (!b) {
return false;
}
}
return true;
}

private static ParameterizedQuery expandQuery(String query,
Parameter[] params, boolean[] diff) {
int count = 0;
for (boolean b: diff) {
if (b) {
++count;
}
}
Parameter[] result = new Parameter[count];
int r = 0;
int start = 0;
StringBuilder buf = new StringBuilder();
for (int i = 0; i < diff.length; ++i) {
Parameter parm = params[i];
if (!diff[i]) {
// expand param
buf.append(query, start, parm.position);
buf.append(parm.value);
start = parm.position+1;
} else {
buf.append(query, start, parm.position);
result[r++] = new Parameter(buf.length(), parm.value);
start = parm.position;
}
}
buf.append(query, start, query.length());
return new ParameterizedQuery(buf.toString(), result);
}

private static Parameter[] keep(Parameter[] params, Parameter[] ref,
boolean[] diff) {
Parameter[] result = new Parameter[ref.length];
int j = 0;
for (int i = 0; i < params.length; ++i) {
if (diff[i]) {
result[j] = new Parameter(ref[j].position, params[i].value);
++j;
}
}
return result;
}
}

The output is:

select * from tableName as t1 where t1.tableColumnId=? and t1.tableColumnName=? inner join tableName2 as t2 on t1.tableColumnId=t2.tableColumnId
1: 4
2: 'test'
select * from tableName as t1 where t1.tableColumnId=? and t1.tableColumnName=? inner join tableName2 as t2 on t1.tableColumnId=t2.tableColumnId
1: 6
2: 'test'
select * from tableName as t1 where t1.tableColumnId=? and t1.tableColumnName='test' inner join tableName2 as t2 on t1.tableColumnId=t2.tableColumnId
1: 4
select * from tableName as t1 where t1.tableColumnId=? and t1.tableColumnName='test' inner join tableName2 as t2 on t1.tableColumnId=t2.tableColumnId
1: 6

SQL Parser for java

I had to do some SQL parsing a short while ago. I used GSP: http://www.sqlparser.com/

It's closed source and not free so GSP might not be an option for you. I wasn't able to find a free SQL Parser for Teradata so GSP was my best option. If you decide to go with GSP, the Java docs aren't that great. Your best bet is to find example code from the examples section on their website and work from there.

Apache POI is a good API for excel: http://poi.apache.org/

Need java API to parse SQL statements

Ended up using ZQL Parsing libraries from http://zql.sourceforge.net/.

If you are having simple queries , that should do the job easily



Related Topics



Leave a reply



Submit