How to Generate a 3-D Surface from Isolines

Improving/Fixing a Regex for C style block comments

Some problems I see with your regex:

There's no need for the |[\r\n] sequences in your regex; a negated character class like [^*] matches everything except *, including line separators. It's only the . (dot) metacharacter that doesn't match those.

Once you're inside the comment, the only character you have to look for is an asterisk; as long as you don't see one of those, you can gobble up as many characters you want. That means it makes no sense to use [^*] when you can use [^*]+ instead. In fact, you might as well put that in an atomic group -- (?>[^*]+) -- because you'll never have any reason to give up any of those not-asterisks once you've matched them.

Filtering out extraneous junk, the final alternative inside your outermost parens is \*+[^*/], which means "one or more asterisks, followed by a character that isn't an asterisk or a slash". That will always match the asterisk at the end of the comment, and it will always have to give it up again because the next character is a slash. In fact, if there are twenty asterisks leading up to the final slash, that part of your regex will match them all, then it will give them all up, one by one. Then the final part -- \*+/ -- will match them for keeps.

For maximum performance, I would use this regex:

/\*(?>(?:(?>[^*]+)|\*(?!/))*)\*/

This will match a well-formed comment very quickly, but more importantly, if it starts to match something that isn't a valid comment, it will fail as quickly as possible.


Courtesy of David, here's a version that matches nested comments with any level of nesting:

(?s)/\*(?>/\*(?<LEVEL>)|\*/(?<-LEVEL>)|(?!/\*|\*/).)+(?(LEVEL)(?!))\*/

It uses .NET's Balancing Groups, so it won't work in any other flavor. For the sake of completeness, here's another version (from RegexBuddy's Library) that uses the Recursive Groups syntax supported by Perl, PCRE and Oniguruma/Onigmo:

/\*(?>[^*/]+|\*[^/]|/[^*])*(?>(?R)(?>[^*/]+|\*[^/]|/[^*])*)*\*/

How to match c-style block comments in Notepad++ with a regex?

Notepadd++ uses scintilla's regular expression engine (according to its online help).

This page says that "in Scintilla, regular expression searches are made line per line," so unfortunately I think it's hopeless.

-- EDIT --

A little further digging turned up this notepad++ forum post, which offers some hope after all. Specifically, it says that notepad++'s PythonScript plugin supports multiline regular expressions.

How to match c-style block comments in Notepad++ with a regex?

Notepadd++ uses scintilla's regular expression engine (according to its online help).

This page says that "in Scintilla, regular expression searches are made line per line," so unfortunately I think it's hopeless.

-- EDIT --

A little further digging turned up this notepad++ forum post, which offers some hope after all. Specifically, it says that notepad++'s PythonScript plugin supports multiline regular expressions.

Regex to match a C-style multiline comment

Try using this regex (Single line comments only):

String src ="How are things today /* this is comment */ and is your code /* this is another comment */ working?";
String result=src.replaceAll("/\\*.*?\\*/","");//single line comments
System.out.println(result);

REGEX explained:

Match the character "/" literally

Match the character "*" literally

"." Match any single character

"*?" Between zero and unlimited times, as few times as possible, expanding
as needed (lazy)

Match the character "*" literally

Match the character "/" literally

Alternatively here is regex for single and multi-line comments by adding (?s):

//note the added \n which wont work with previous regex
String src ="How are things today /* this\n is comment */ and is your code /* this is another comment */ working?";
String result=src.replaceAll("(?s)/\\*.*?\\*/","");
System.out.println(result);

Reference:

  • https://www.regular-expressions.info/examplesprogrammer.html

How to filter out c-type comments with regex?

We can try doing a regex replacement on the following pattern:

/\*.*?\*/

This matches any old-school C style comment. It works by using a lazy dot .*? to match only content within a single comment, before the end of that comment. We can then replace with empty string, to effectively remove these comments from the input.

Code:

Dim input As String = "/* 1111 */ one /*2222*/two /*3333 */ three/* 4444*/ four /*/**/ five /**/"
Dim output As String = Regex.Replace(input, "/\*.*?\*/", "")
Console.WriteLine(input)
Console.WriteLine(output)

This prints:

one two  three four  five

Strip out C Style Multi-line Comments

Use a RegexOptions.Multiline option parameter.

string output = Regex.Replace(input, pattern, string.Empty, RegexOptions.Multiline);

Full example

string input = @"this is some stuff right here
/* blah blah blah
blah blah blah
blah blah blah */ and this is more stuff
right here.";

string pattern = @"/[*][\w\d\s]+[*]/";

string output = Regex.Replace(input, pattern, string.Empty, RegexOptions.Multiline);
Console.WriteLine(output);

Regular expression to extract string in C Code (not inside comment)

The final solution for EditPad 6/7 is:

(?<!^[ \t]*/?[*#][^"\n]*")(?<=^[^"\n]*")[^"]+

Link:
Regular expression for a string that does not start with a /*

Parse C-Style Comments with Regex, avoid Backtracking

You have heavy backtracking because of the alternation. Instead of the (?:.|[\r\n]), you may consider using a character class [\s\S] that boosts performance to a noticeable extent:

\/\*[\s\S]*?\*\/|\/\/.*

See demo

In Python, you can use the re.S/re.DOTALL modifier to make . match line breaks, too (note that the single line comment pattern should be matched with \/\/[^\r\n]* then):

/\*.*?\*/|//[^\r\n]*

See another demo

However, since *? lazy quantifier will also cause an overhead similar to the one caused by greedy quantifiers, you should consider using a much more optimal pattern for C style multiline comments - /\*[^*]*\*+(?:[^/*][^*]*\*+)*/, and the whole regex will now look like:

/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//.*

See yet another demo

Details:

  • /\* - a /*
  • [^*]* - zero or more chars other than *
  • \*+ - one or more asterisks
  • (?:[^/*][^*]*\*+)* - zero or more sequences of:

    • [^/*] - a symbol other than / and *
    • [^*]* - zero or more symbols other than *
    • \*+ - 1+ asterisks
  • / - a / symbol
  • | - or
  • //.* - // and any 0+ chars other than than line break chars.

Just wanted to note that in Python, you do not need to escape / (in JS, you do not need to escape / when declaring a regex using the RegExp constuctor).

NOTE: The last pattern does not allow simple capturing what is inside /* and */, but since the pattern is more stable than the rest, I'd advise using it even when you need to capture the contents with the trailing * - /\*([^*]*\*+(?:[^/*][^*]*\*+)*)/|//(.*) - and then you'd need to remove the last char from .group(1).



Related Topics



Leave a reply



Submit