How to Parser a SQL Query to Pull Out the Column Names and Table Names

Is there a way to parser a SQL query to pull out the column names and table names?

I actually ended up using a tool called
SQL Pretty Printer. You can purchase a desktop version, but I just used the free online application. Just copy the query into the text box, set the Output to "List DB Object" and click the Format SQL button.

It work great using around 150 different (and complex) SQL queries.

How to extract table names and column names from sql query?

Really, this is no easy task. You could use a lexer (ply in this example) and define several rules to get several tokens out of a string. The following code defines these rules for the different parts of your SQL string and puts them back together as there could be aliases in the input string. As a result, you get a dictionary (result) with the different tablenames as key.

import ply.lex as lex, re

tokens = (
"TABLE",
"JOIN",
"COLUMN",
"TRASH"
)

tables = {"tables": {}, "alias": {}}
columns = []

t_TRASH = r"Select|on|=|;|\s+|,|\t|\r"

def t_TABLE(t):
r"from\s(\w+)\sas\s(\w+)"

regex = re.compile(t_TABLE.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl

return t

def t_JOIN(t):
r"inner\s+join\s+(\w+)\s+as\s+(\w+)"

regex = re.compile(t_JOIN.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl
return t

def t_COLUMN(t):
r"(\w+\.\w+)"

regex = re.compile(t_COLUMN.__doc__)
m = regex.search(t.value)
if m is not None:
t.value = m.group(1)
columns.append(t.value)
return t

def t_error(t):
raise TypeError("Unknown text '%s'" % (t.value,))
t.lexer.skip(len(t.value))

# here is where the magic starts
def mylex(inp):
lexer = lex.lex()
lexer.input(inp)

for token in lexer:
pass

result = {}
for col in columns:
tbl, c = col.split('.')
if tbl in tables["alias"].keys():
key = tables["alias"][tbl]
else:
key = tbl

if key in result:
result[key].append(c)
else:
result[key] = list()
result[key].append(c)

print result
# {'tb1': ['col1', 'col7'], 'tb2': ['col2', 'col8']}

string = "Select a.col1, b.col2 from tb1 as a inner join tb2 as b on tb1.col7 = tb2.col8;"
mylex(string)

Parsing SQL Query and pull out column name and Table name

I think the best answer is going to be to use the Irony parser:
http://irony.codeplex.com/

Hanselman has a great link to how to use it to parse SQL:
http://www.hanselman.com/blog/TheWeeklySourceCode59AnOpenSourceTreasureIronyNETLanguageImplementationKit.aspx

I hope this helps, and best of luck!

Parsing table and column names from SQL/HQL Java

There are multiple ways to achieve this using JSqlParser (https://github.com/JSQLParser/JSqlParser):

  1. You could pimp TableNamesFinder to traverse all columns as well. As you could see at the result list, TableNamesFinder does not traverse all occurences of Columns, because it is not necessary for it. So one has to complete the traversal implementation here as well, which I did not.

  2. You could use JSqlParser AST - Node feature to get all Columns. For specific productions JSqlParser produces nodes for a parse tree. Column is one of them.

To complete the implementation one has to collect all columns and make this list distinct (case, table, etc.)

String sql = "SELECT * FROM  ( ( SELECT TBL.ID AS rRowId, TBL.NAME AS name, TBL.DESCRIPTION as description, TBL.TYPE AS type, TBL1.SHORT_NAME AS shortName  FROM ROLE_TBL TBL WHERE ( TBL.TYPE = 'CORE' OR  TBL1.SHORT_NAME = 'TNG' AND  TBL.IS_DELETED <> 1  ) ) MINUS ( SELECT TBL.ID AS rRowId, TBL.NAME AS name, TBL.DESCRIPTION as description, TBL.TYPE AS type, TBL3.SHORT_NAME AS shortName,TBL3.NAME AS tenantName FROM ROLE_TBL TBL INNER JOIN TYPE_ROLE_TBL TBL1 ON TBL.ID=TBL1.ROLE_FK LEFT OUTER JOIN TNT_TBL TBL3 ON TBL3.ID = TBL.TENANT_FK LEFT OUTER JOIN USER_TBL TBL4 ON TBL4.ID = TBL1.USER_FK WHERE ( TBL4.ID =771100 AND  TBL.IS_DELETED <> 1  ) ) ) ORDER BY name ASC";

System.out.println("using TableNamesFinder to get column names");
Statement statement = CCJSqlParserUtil.parse(sql);
Select selectStatement = (Select) statement;
TablesNamesFinder tablesNamesFinder = new TablesNamesFinder() {
@Override
public void visit(Column tableColumn) {
System.out.println(tableColumn);
}
};
tablesNamesFinder.getTableList(selectStatement);

System.out.println("-------------------------------------------");
System.out.println("using ast nodes to get column names");
SimpleNode node = (SimpleNode) CCJSqlParserUtil.parseAST(sql);

node.jjtAccept(new CCJSqlParserDefaultVisitor() {
@Override
public Object visit(SimpleNode node, Object data) {
if (node.getId() == CCJSqlParserTreeConstants.JJTCOLUMN) {
System.out.println(node.jjtGetValue());
return super.visit(node, data);
} else {
return super.visit(node, data);
}
}
}, null);

One has to have in mind, that JSqlParser is only a parser. Therefore it is not possible to get the columns table name without having it specified like in (table.column). To get this right the database schema must be available. This becomes clear if you look at:

select a from table1, table2

which is a valid SQL.



Related Topics



Leave a reply



Submit