Is there a way to parser a SQL query to pull out the column names and table names?
I actually ended up using a tool called
SQL Pretty Printer. You can purchase a desktop version, but I just used the free online application. Just copy the query into the text box, set the Output to "List DB Object" and click the Format SQL button.
It work great using around 150 different (and complex) SQL queries.
How to extract table names and column names from sql query?
Really, this is no easy task. You could use a lexer (ply in this example) and define several rules to get several tokens out of a string. The following code defines these rules for the different parts of your SQL string and puts them back together as there could be aliases in the input string. As a result, you get a dictionary (result
) with the different tablenames as key.
import ply.lex as lex, re
tokens = (
"TABLE",
"JOIN",
"COLUMN",
"TRASH"
)
tables = {"tables": {}, "alias": {}}
columns = []
t_TRASH = r"Select|on|=|;|\s+|,|\t|\r"
def t_TABLE(t):
r"from\s(\w+)\sas\s(\w+)"
regex = re.compile(t_TABLE.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl
return t
def t_JOIN(t):
r"inner\s+join\s+(\w+)\s+as\s+(\w+)"
regex = re.compile(t_JOIN.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl
return t
def t_COLUMN(t):
r"(\w+\.\w+)"
regex = re.compile(t_COLUMN.__doc__)
m = regex.search(t.value)
if m is not None:
t.value = m.group(1)
columns.append(t.value)
return t
def t_error(t):
raise TypeError("Unknown text '%s'" % (t.value,))
t.lexer.skip(len(t.value))
# here is where the magic starts
def mylex(inp):
lexer = lex.lex()
lexer.input(inp)
for token in lexer:
pass
result = {}
for col in columns:
tbl, c = col.split('.')
if tbl in tables["alias"].keys():
key = tables["alias"][tbl]
else:
key = tbl
if key in result:
result[key].append(c)
else:
result[key] = list()
result[key].append(c)
print result
# {'tb1': ['col1', 'col7'], 'tb2': ['col2', 'col8']}
string = "Select a.col1, b.col2 from tb1 as a inner join tb2 as b on tb1.col7 = tb2.col8;"
mylex(string)
Parsing SQL Query and pull out column name and Table name
I think the best answer is going to be to use the Irony parser:
http://irony.codeplex.com/
Hanselman has a great link to how to use it to parse SQL:
http://www.hanselman.com/blog/TheWeeklySourceCode59AnOpenSourceTreasureIronyNETLanguageImplementationKit.aspx
I hope this helps, and best of luck!
Parsing table and column names from SQL/HQL Java
There are multiple ways to achieve this using JSqlParser (https://github.com/JSQLParser/JSqlParser):
You could pimp TableNamesFinder to traverse all columns as well. As you could see at the result list, TableNamesFinder does not traverse all occurences of Columns, because it is not necessary for it. So one has to complete the traversal implementation here as well, which I did not.
You could use JSqlParser AST - Node feature to get all Columns. For specific productions JSqlParser produces nodes for a parse tree. Column is one of them.
To complete the implementation one has to collect all columns and make this list distinct (case, table, etc.)
String sql = "SELECT * FROM ( ( SELECT TBL.ID AS rRowId, TBL.NAME AS name, TBL.DESCRIPTION as description, TBL.TYPE AS type, TBL1.SHORT_NAME AS shortName FROM ROLE_TBL TBL WHERE ( TBL.TYPE = 'CORE' OR TBL1.SHORT_NAME = 'TNG' AND TBL.IS_DELETED <> 1 ) ) MINUS ( SELECT TBL.ID AS rRowId, TBL.NAME AS name, TBL.DESCRIPTION as description, TBL.TYPE AS type, TBL3.SHORT_NAME AS shortName,TBL3.NAME AS tenantName FROM ROLE_TBL TBL INNER JOIN TYPE_ROLE_TBL TBL1 ON TBL.ID=TBL1.ROLE_FK LEFT OUTER JOIN TNT_TBL TBL3 ON TBL3.ID = TBL.TENANT_FK LEFT OUTER JOIN USER_TBL TBL4 ON TBL4.ID = TBL1.USER_FK WHERE ( TBL4.ID =771100 AND TBL.IS_DELETED <> 1 ) ) ) ORDER BY name ASC";
System.out.println("using TableNamesFinder to get column names");
Statement statement = CCJSqlParserUtil.parse(sql);
Select selectStatement = (Select) statement;
TablesNamesFinder tablesNamesFinder = new TablesNamesFinder() {
@Override
public void visit(Column tableColumn) {
System.out.println(tableColumn);
}
};
tablesNamesFinder.getTableList(selectStatement);
System.out.println("-------------------------------------------");
System.out.println("using ast nodes to get column names");
SimpleNode node = (SimpleNode) CCJSqlParserUtil.parseAST(sql);
node.jjtAccept(new CCJSqlParserDefaultVisitor() {
@Override
public Object visit(SimpleNode node, Object data) {
if (node.getId() == CCJSqlParserTreeConstants.JJTCOLUMN) {
System.out.println(node.jjtGetValue());
return super.visit(node, data);
} else {
return super.visit(node, data);
}
}
}, null);
One has to have in mind, that JSqlParser is only a parser. Therefore it is not possible to get the columns table name without having it specified like in (table.column). To get this right the database schema must be available. This becomes clear if you look at:
select a from table1, table2
which is a valid SQL.
Related Topics
Sql Join: Selecting the Last Records in a One-To-Many Relationship
How to Get a List of Column Names on Sqlite3 Database
Check If a Column Contains Text Using SQL
How to Convert This SQL Select to Linq Query
Convert Utc Milliseconds to Datetime in SQL Server
Best Way to Iterate Through Columns in a SQL Table
Mysql - Left Join Takes Too Long, How to Optimize Query
Regex to Filter for Numers With and Without Dots
How to Sum Up Time Field in SQL Server
Sql 0 Results for 'Not In' and 'In' When Row Does Exist
Sql Query to Select Million Records Quickly
Removing Leading Zeroes from a Field in a SQL Statement
Update Multiple Rows in a Table from Another Table When Condition Exists
List All Employee'S Names and Their Managers by Manager Name Using an Inner Join
Mysql - How to Use Like on Multiple Columns