Regular Expression to Remove Comments from SQL Statement

Regex to remove single-line SQL comments (--)

I will disappoint all of you. This can't be done with regular expressions. Sure, it's easy to find comments not in a string (that even the OP could do), the real deal is comments in a string. There is a little hope of the look arounds, but that's still not enough. By telling that you have a preceding quote in a line won't guarantee anything. The only thing what guarantees you something is the oddity of quotes. Something you can't find with regular expression. So just simply go with non-regular-expression approach.

EDIT:
Here's the c# code:

        String sql = "--this is a test\r\nselect stuff where substaff like '--this comment should stay' --this should be removed\r\n";
char[] quotes = { '\'', '"'};
int newCommentLiteral, lastCommentLiteral = 0;
while ((newCommentLiteral = sql.IndexOf("--", lastCommentLiteral)) != -1)
{
int countQuotes = sql.Substring(lastCommentLiteral, newCommentLiteral - lastCommentLiteral).Split(quotes).Length - 1;
if (countQuotes % 2 == 0) //this is a comment, since there's an even number of quotes preceding
{
int eol = sql.IndexOf("\r\n") + 2;
if (eol == -1)
eol = sql.Length; //no more newline, meaning end of the string
sql = sql.Remove(newCommentLiteral, eol - newCommentLiteral);
lastCommentLiteral = newCommentLiteral;
}
else //this is within a string, find string ending and moving to it
{
int singleQuote = sql.IndexOf("'", newCommentLiteral);
if (singleQuote == -1)
singleQuote = sql.Length;
int doubleQuote = sql.IndexOf('"', newCommentLiteral);
if (doubleQuote == -1)
doubleQuote = sql.Length;

lastCommentLiteral = Math.Min(singleQuote, doubleQuote) + 1;

//instead of finding the end of the string you could simply do += 2 but the program will become slightly slower
}
}

Console.WriteLine(sql);

What this does: find every comment literal. For each, check if it's within a comment or not, by counting the number of quotes between the current match and the last one. If this number is even, then it's a comment, thus remove it (find first end of line and remove whats between). If it's odd, this is within a string, find the end of the string and move to it. Rgis snippet is based on a wierd SQL trick: 'this" is a valid string. Even tho the 2 quotes differ. If it's not true for your SQL language, you should try a completely different approach. I'll write a program to that too if that's the case, but this one's faster and more straightforward.

Regular Expression to Match All Comments in a T-SQL Script

This should work:

(--.*)|(((/\*)+?[\w\W]+?(\*/)+))

Java regex to remove SQL comments from a string

Try

mySb = mySb.replaceAll("/\\*.*?\\*/", "");

(notice the ? which stands for "lazy").

EDIT: To cover multiline comments, use this approach:

Pattern commentPattern = Pattern.compile("/\\*.*?\\*/", Pattern.DOTALL);
mySb = commentPattern.matcher(mySb).replaceAll("");

Hope this works for you.

How to strip comments starting at ** line form text in sql

You should use the replace twice if you want to remove comments and replace newlines with spaces inside text:

regexp_replace(regexp_replace($1, '((\n*)(\*\*.*)?)$', ''),'\n',' ')

VBA Regex to remove comments from Teradata SQL text files

An algorithm based on recursive parser calls. There are several modes: comments of 3 subtypes parsing, quoted parsing and normal. Normal mode can be alternated by any other mode, that in turn become the only normal. Thus e. g. quote chars within comments and any comment chars within quoted text are ignored. Chars to be searched depend on the current mode. The source is parsed chunk by chunk, once target chars are found the mode is switched respectively, current chunk is finished and the next one begins with the next recursive call. Call stack stores transient results. After the source ends, backward process starts, and each called parser concatenates and returns it's chunk, so finally a complete code retrieved.

Here is the code:

Option Explicit

Sub RemoveComments()

Dim strOriginal As String
Dim strProcessed As String

strOriginal = ReadTextFile("C:\Users\DELL\Desktop\tmp\source.sql", 0) ' -2 - System default, -1 - Unicode, 0 - ASCII
Parse strOriginal, strProcessed, 0
WriteTextFile strProcessed, "C:\Users\DELL\Desktop\tmp\result.sql", 0

End Sub

Sub Parse(strSrc As String, strRes As String, lngMode As Long)

Static objRegExp As Object
Dim strBeg As String
Dim objMatches As Object
Dim lngPos As Long
Dim lngEscPos As Long
Dim strRet As String

If objRegExp Is Nothing Then ' initialize regexp once
Set objRegExp = CreateObject("VBScript.RegExp")
With objRegExp
.Global = False
.MultiLine = True
.IgnoreCase = True
End With
End If
strRes = ""
If strSrc = "" Then Exit Sub ' source completed
strBeg = "" ' preceding chunk is empty by default
Select Case lngMode
Case 0 ' processing normal
With objRegExp
.Pattern = "(\/\*)|(^[ \t]*--)|(--)|(\')"
Set objMatches = .Execute(strSrc)
If objMatches.Count = 0 Then
strRes = strSrc
Exit Sub ' source completed
End If
lngPos = objMatches(0).FirstIndex
With objMatches(0)
Select Case True
Case .SubMatches(0) <> ""
lngMode = 1 ' start multiline comment
Case .SubMatches(1) <> ""
lngMode = 2 ' start whole line comment
Case .SubMatches(2) <> ""
lngMode = 3 ' start singleline comment
Case .SubMatches(3) <> ""
lngMode = 4 ' start text in quotes
lngPos = lngPos + 1 ' skip found quote char
End Select
End With
End With
strBeg = Left(strSrc, lngPos)
lngPos = lngPos + 1
Case 1 ' processing multiline comment
lngMode = 0 ' start normal
lngPos = InStr(strSrc, "*/")
If lngPos = 0 Then Exit Sub ' source completed, comment unclosed
lngPos = lngPos + 2 ' skip comment closing char
Case 2 ' processing whole line comment
lngMode = 0 ' start normal
lngPos = InStr(strSrc, vbCrLf)
If lngPos = 0 Then Exit Sub ' source completed
lngPos = lngPos + 2 ' skip new line char
Case 3 ' processing singleline comment
lngMode = 0 ' start normal
lngPos = InStr(strSrc, vbCrLf)
If lngPos = 0 Then Exit Sub ' source completed
Case 4 ' processing text within quotes
lngPos = InStr(strSrc, "'")
If lngPos = 0 Then Exit Sub ' source completed
If Mid(strSrc, lngPos, 2) = "''" Then ' escaped quote char ''
strBeg = Left(strSrc, lngPos + 1) ' store preceding chunk with escaped quote char
lngPos = lngPos + 2 ' shift next from escaped quote char
Else
lngMode = 0 ' start normal
strBeg = Left(strSrc, lngPos) ' store preceding chunk with quote char
lngPos = lngPos + 1 ' shift next from quote char
End If
End Select
Parse Mid(strSrc, lngPos), strRet, lngMode ' recursive parser call
strRes = strBeg & strRet ' concatenate preceding chunk with processed and return result

End Sub

Regex to find sql comments

With global modifier if your Regex engine accepts:

/\/\*.*?\*\/|--.*?\n/gs

s modifier is needed for multi-line comments matching.

Demo



Related Topics



Leave a reply



Submit