Regex to strip comments and multi-line comments and empty lines
$text = preg_replace('!/\*.*?\*/!s', '', $text);
$text = preg_replace('/\n\s*\n/', "\n", $text);
Remove all comment (single-/multi-line) & blank lines from source file
To remove the comments, see this answer.
After that, removing empty lines is trivial.
Replace multi-line comment with empty lines
Assuming you do not have comment-like strings in string literals and that comments can't be nested (since the string comes from T-SQL), you can try
var rx = new Regex(@"/\*(?s:.*?)\*/");
var txt = @"/*
Some
multiline
comment
*/";
var replaced = rx.Replace(txt, m => String.Concat(Enumerable.Repeat("#\r\n", m.Value.Split(new string[] {"\r\n"}, StringSplitOptions.None).Count())).Trim());
Result:
The /\*(?s:.*?)\*/
regex matches any text between /*
and */
. The logic is that we get the whole match, split it with linebreaks, and then build a replacement string based on the number of lines.
If you want to match just the lines that are all-comments, you can use the following regex (see demo):
(?m)^\s*/\*(?s:.*?)\*/\s*$
How to strip comments starting at ** line form text in sql
You should use the replace twice if you want to remove comments and replace newlines with spaces inside text:
regexp_replace(regexp_replace($1, '((\n*)(\*\*.*)?)$', ''),'\n',' ')
Visual Studio regex to remove all comments and blank lines in VB.NET code using a macro
To get rid of a line that contains whitespace or nothing, you can use this regex:
(?m)^[ \t]*[\r\n]+
Your regex, ^[\s|\t]*$\n
would work if you specified Multiline mode ((?m)
), but it's still incorrect. For one thing, the |
matches a literal |
; there's no need to specify "or" in a character class. For another, \s
matches any whitespace character, including TAB (\t
), carriage-return (\r
), and linefeed (\n
), making it needlessly redundant and inefficient. For example, at the first blank line (after the end of the first Sub
), the ^[\s|\t]*
will initially try to match everything before the word Public
, then it will back off to the end of the previous line, where the $\n
can match.
But a blank line, in addition to being empty or containing only horizontal whitespace (spaces or TABs), may also contain a comment. I choose to treat these "comment-only" lines as blank lines because it's relatively easy to do, and it simplifies the task of matching comments in non-blank lines, which is much harder. Here's my regex:
^[ \t]*(?:(?:REM|')[^\r\n]*)?[\r\n]+
After consuming any leading horizontal whitespace, if I see a REM
or '
signifying a comment, I consume that and everything after it until the next line separator. Notice that the only thing that's required to be present is the line separator itself. Also notice the absence of the end anchor, $
. It's never necessary to use that when you're explicitly matching the line separators, and in this case it would break the regex. In Multiline mode, $
matches only before a linefeed (\n
), not before a carriage-return (\r
). (This behavior of the .NET flavor is incorrect and rather surprising, given Microsoft's longstanding preference for \r\n
as a line separator.)
Matching the remaining comments is a fundamentally different task. As you've discovered, simply searching for REM
or '
is no good because you might find it in a string literal, where it does not signify the start of a comment. What you have to do is start from the beginning of the line, consuming and capturing anything that's not the beginning of a comment or a string literal. If you find a double-quote, go ahead and consume the string literal. If you find a REM
or '
, stop capturing and go ahead and consume the rest of the line. Then you replace the whole line with just the captured portion--i.e., everything before the comment. Here's the regex:
(?mn)^(?<line>[^\r\n"R']*(("[^"]*"|(?!REM)R)[^\r\n"R']*)*)(REM|')[^\r\n]*
Or, more readably:
(?mn) # Multiline and ExplicitCapture modes
^ # beginning of line
(?<line> # capture in group "line"
[^\r\n"R']* # any number of "safe" characters
(
(
"[^"]*" # a string literal
|
(?!REM)R # 'R' if it's not the beginning of 'REM'
)
[^\r\n"R']* # more "safe" characters
)*
) # stop capturing
(?:REM|') # a comment sigil
[^\r\n]* # consume the rest of the line
The replacement string would be "${line}"
. Some other notes:
- Notice that this regex does not end with
[\r\n]+
to consume the line separator, like the "blank lines" regex does. - It doesn't end with
$
either, for the same reason as before. The[^\r\n]*
will greedily consume everything before the line separator, so the anchor isn't needed. - The only thing that's required to be present is the
REM
or'
; we don't bother matching any line that doesn't contain a comment. - ExplicitCapture mode means I can use
(...)
instead of(?:...)
for all the groups I don't want to capture, but the named group,(?<line>...)
, still works. - Gnarly as it is, this regex would be a lot worse if VB supported multiline comments, or if its string literals supported backslash escapes.
I don't do VB, but here's a demo in C#.
PHP regex to remove single line comments
Regex isn't complex enough to (elegantly) do this in all cases, but you can use some assumptions. For instance: Since //
can only be a) a comment or b) part of a string, you should be able to do the following:
\/\/[^;)]*$
This means that there may not be any ;
or )
after the comment. This however only works when you don't use those in your comment. You can of course use any character like maybe '
and/or "
to better fit your needs.
delete multi line comments line c# // or /*...*/
Alright, this Regex (^\/\/.*?$)|(\/\*.*?\*\/)
(Rubular proof) will match (and potentially remove if you use Visual Studio and replace it with nothing) the following lines out of your example text:
//#define 0x00180000
// #define 0x20000000
// abcd
/*#define 0x00080000
#define 0x40000000*/
/* defg */
and almost gets you what you want. Now, I'm suspect of this line /\*#define 0x00000000*/
, but if you wanted to capture it as well you could modify the Regex to be (^\/\/.*?$)|(\/.*?\*.*?\*\/)
(Rubular proof).
Remove multi-line C style /* comments */ using Perl regex
I would do like,
perl -0777pe 's/\/\*(?:(?!\*\/).)*\*\/\n?//sg' file
Example:
$ cat fi
/* comments
comments
comments
comments */
bar
$ perl -0777pe 's/\/\*(?:(?!\*\/).)*\*\/\n?//sg' fi
bar
Related Topics
PHP & Sessions: How to Disable PHP Session Locking
PHP + MySQL + Stored Procedures, How to Get Access an "Out" Value
Executing PHP Code Inside a .Js File
How to Make a Bbcode to Parse Url Tags into Links
PHP Access Network Path Under Windows
Edit Google Calendar Events from Google Service Account: 403
Using PHP to Upload File and Add the Path to MySQL Database
How to Get a PHP Class Constructor to Call Its Parent's Parent's Constructor
When Do I Have to Declare Session_Start();
How to Fix the "Base Table or View Not Found: 1146" Error When Running 'PHP Artisan Migrate' Command
PHP Variable Interpolation VS Concatenation
MySQL Diacritic Insensitive Search (Spanish Accents)
Check If a Row Exists Using Old MySQL_* API
Parse Error: Syntax Error, Unexpected '(', Expecting ',' or ';' In
Parse Error: Syntax Error, Unexpected '[', Expecting ')'
PHP Using Preg_Replace:"Delimiter Must Not Be Alphanumeric or Backslash" Error
What Is Better in a Foreach Loop... Using the & Symbol or Reassigning Based on Key