How to extract table names and column names from sql query?
Really, this is no easy task. You could use a lexer (ply in this example) and define several rules to get several tokens out of a string. The following code defines these rules for the different parts of your SQL string and puts them back together as there could be aliases in the input string. As a result, you get a dictionary (result
) with the different tablenames as key.
import ply.lex as lex, re
tokens = (
"TABLE",
"JOIN",
"COLUMN",
"TRASH"
)
tables = {"tables": {}, "alias": {}}
columns = []
t_TRASH = r"Select|on|=|;|\s+|,|\t|\r"
def t_TABLE(t):
r"from\s(\w+)\sas\s(\w+)"
regex = re.compile(t_TABLE.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl
return t
def t_JOIN(t):
r"inner\s+join\s+(\w+)\s+as\s+(\w+)"
regex = re.compile(t_JOIN.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl
return t
def t_COLUMN(t):
r"(\w+\.\w+)"
regex = re.compile(t_COLUMN.__doc__)
m = regex.search(t.value)
if m is not None:
t.value = m.group(1)
columns.append(t.value)
return t
def t_error(t):
raise TypeError("Unknown text '%s'" % (t.value,))
t.lexer.skip(len(t.value))
# here is where the magic starts
def mylex(inp):
lexer = lex.lex()
lexer.input(inp)
for token in lexer:
pass
result = {}
for col in columns:
tbl, c = col.split('.')
if tbl in tables["alias"].keys():
key = tables["alias"][tbl]
else:
key = tbl
if key in result:
result[key].append(c)
else:
result[key] = list()
result[key].append(c)
print result
# {'tb1': ['col1', 'col7'], 'tb2': ['col2', 'col8']}
string = "Select a.col1, b.col2 from tb1 as a inner join tb2 as b on tb1.col7 = tb2.col8;"
mylex(string)
How to extract tables names in a SQL script?
If you would like to use core python:
txt = """
select *
from (
select col1 from test.test_a join test.test_a1 on a.col1 = a1.col1) a
left join test.test_b b
on a.col1 = b.col2
left join
test.test_c c
on b.col2 = c.col3
left jon
(select
col4
from
test.test_d) d
on c.col3 = d.col4"""
replace_list = ['\n', '(', ')', '*', '=']
for i in replace_list:
txt = txt.replace(i, ' ')
txt = txt.split()
res = []
for i in range(1, len(txt)):
if txt[i-1] in ['from', 'join'] and txt[i] != 'select':
res.append(txt[i])
print(res)
Extract table name using SQL from a column that stores queries
Seems like this might be a good use case for REGEXP_SUBSTR
:
SELECT REGEXP_SUBSTR(querytext, '(<?FROM database\.)(.+?\b)', 1, 1, 'i')
The pattern looks for the string literal FROM database.
that immediately precedes a tablename matching the pattern (.+?\b)
.
Further - the pattern (.+?\b)
matches any characters (.
) occurring at least one time (+
) non-greedily (?
token) up to a word boundary character (\b
), which judging by your sample data, should match on the whitespace immediately proceeding the table name.
You can see how this pattern is interpreted in even more detail using a utility like Regex101.
Further (albeit slightly tangential) reading regarding the PCRE-style RegExp support in Teradata: Regex syntax in teradata
Extract table name and columns from SQL schema
- Extract a table name:
CREATE\s+TABLE\s+([\w_]+)
- Get column names:
\s+([\w_]+)[\s\w]+,
- Get a primary key field:
\s*PRIMARY\s+KEY\s+\(([\w_]+)\)
- Get foreign keys data:
\s*FOREIGN\s+KEY\s+\(([\w_]+)\)\s+REFERENCES\s+([\w_]+)\s+\(([\w_]+)\)
You can test it here (respectively):
- https://regexr.com/59251
- https://regexr.com/59254
- https://regexr.com/5925a
- https://regexr.com/594eb
Related Topics
How to Easily Print Ascii-Art Text
Fbprophet Installation Error - Failed Building Wheel for Fbprophet
Typeerror: Unsupported Format String Passed to List._Format_
Sort List Based on Another List
Sqlalchemy - Select for Update Example
Run a Python Script from Another Python Script, Passing in Arguments
How Can Draw a Line Using the X and Y Coordinates of Two Points
I Am Trying to Split a Full Name to First Middle and Last Name in Pandas But I Am Stuck At Replace
Find First Non-Zero Value in Each Column of Pandas Dataframe
How to Make My Discord.Py Bot Play Mp3 in Voice Channel
How to Set the Precision on Str(Numpy.Float64)
Jupyter Notebook, Python3 Print Function: No Output, No Error
Print the Lines of a Log File Which Starts With Date Format "Yyyy-Mm-Dd" in Python
Python: Requests.Exceptions.Connectionerror. Max Retries Exceeded With Url
Split a Large Json File into Multiple Smaller Files
Python: String Iteration Replace a Space With a Hyphen (Or Other Character)