Explain Regex That Finds CSS Comments

Explain regex that finds CSS comments

The reason yours finds only single line comments is that, in typical regular expressions, . matches anything except newlines; whereas the other one uses a negated character class which matches anything but the specified characters, and so can match newlines.

However, if you were to fix that (there's usually an option for multiline or "as if single line" matching), you would find that it would match from the /* of the first comment to the */ of the last comment; you would have to use a non-greedy quantifier, .*?, to match no more than one comment.

However, the more complex regular expression you give is even more complex than that. Based on nikc.org's answer, I believe it is to enforce the restriction that “comments may not be nested”; that is, they must not contain /* within them. In other languages which permit comments /* like /* this */ (that is, an internal /* is neither prohibited nor a nested comment), the pattern \/\*.*?\*\/ would be appropriate to match them.

Regular expression to find and remove comments in CSS

If you're running the match in C#, have you tried RegexOptions?

Match m = Regex.Match(word, pattern, RegexOptions.Multiline);

"Multiline mode. Changes the meaning of ^ and $ so they match at the beginning and end, respectively, of any line, and not just the beginning and end of the entire string."

Also see Strip out C Style Multi-line Comments

EDIT:

OK..looks like an issue w/ the regex. Here is a working example using the regex pattern from http://ostermiller.org/findcomment.html. This guy does a good job deriving the regex, and demonstrating the pitfalls and deficiencies of various approaches. Note: RegexOptions.Multiline/RegexOptions.Singleline does not appear to affect the result.

string input = @"this is some stuff right here
/* blah blah blah
blah blah blah
blah blah blah */ and this is more stuff /* blah */
right here.";

string pattern = @"(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)";
string output = Regex.Replace(input, pattern, string.Empty, RegexOptions.Singleline);

error when removing CSS comments via REGEX

Do not use zero-width assertions inside character classes.

  • ^, $, \A, \b, \B, \Z, \z, \G - as anchors, (non-)word boundaries - do not make sense inside character classes since they do not match any character. The ^ and \b mean something different in the character class: ^ is either the negated character class mark if used after the open [ or denotes a literal ^. \b means a backspace char.

  • You can't use \R (=any line break) there, neither.

The two patterns with \A inside a character class must be re-written as a grouping construct, (...), with an alternation operator |:

"`(\A|[\n;]+)/\*.+?\*/`s"=>"$1", 
"`(\A|[;\s]+)//.+\R`"=>"$1\n",

I removed the redundant modifiers and capturing groups you are not using, and replaced [\r\n] with \R. The "`(\A|[\n;]+)/\*.+?\*/`s"=>"$1" can also be re-written in a more efficient way:

"`(\A|[\n;]+)/\*[^*]*\*+(?:[^/*][^*]*\*+)*/`"=>"$1"

Note that in PHP 7.3, acc. to the Upgrade history of the bundled PCRE library table, the regex library is PCRE 10.32. See PCRE to PCRE2 migration:

Until PHP 7.2, PHP used the 8.x versions of the legacy PCRE library, and from PHP 7.3, PHP will use PCRE2. Note that PCRE2 is considered to be a new library although it's based on and largely compatible with PCRE (8.x).

Acc. to this resource, the updated library is more strict to regex patterns, and treats former leniently accepted user errors as real errors now:

  • Modifier S is now on by default. PCRE does some extra optimization.
  • Option X is disabled by default. It makes PCRE do more syntax validation than before.
  • Unicode 10 is used, while it was Unicode 7. This means more emojis, more characters, and more sets. Unicode regex may be impacted.
  • Some invalid patterns may be impacted.

In simple words, PCRE2 is more strict in the pattern validations, so after the upgrade, some of your existing patterns could not compile anymore.

Unnecessary asterisk in regex that finds CSS comment

This has already been corrected in the CSS3 Syntax module:

\/\*[^*]*\*+([^/][^*]*\*+)*\/   /* ignore comments */

Notice that the extraneous asterisk is gone, making this expression identical to what you have.

So it would seem that it was simply a mistake on their part while writing the grammar for CSS2. I'm digging the mailing list archives to see if there's any discussion there that could be relevant.

Javascript regex to match beginning and end of CSS comment

try this

.replace(/(\/\*|\*\/)/g,'') 

Find CSS class names that are not inside comments using Regular Expressions

Both of these approaches work in different cases like classes between ids or multiple classes in selection:

.class-a, #id-a, .class-b:hover {}


First approach

In this first approach you can match both comment section and CSS classes then replace class names while they are matched and leave comments intact:

\/\*[\s\S]*?\*\/|(\.[a-z_-][\w-]*)(?=[^{}]*{)
^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Match comments Match and capture classes

Breakdown:

\/\*[\s\S]*?\*\/    # Match a comment block
| # Or
( # Start of capturing group #1
\.[a-z_-][\w-]* # Match a CSS class name
) # End of CG #1
(?= # Start of a positive lookahead
[^{}]*{ # Class should be followed by a `{` (may not be immediately)
) # End of lookahead

JS code:

var str = `.text-center { text-align: center; }
table.simple{background:#fff;}.bg-white{background:#fff;}
/*# sourceMappingURL=style.css.map */
/*
Example comment file.css
*/`

console.log(str.replace(/\/\*[\s\S]*?\*\/|(\.[a-z_-][\w-]*)(?=[^{}]*{[^{}]*})/g,
function($0, $1) {
return $1 ? '.a' : $0; // If a class is matched ...
}
));

Using Regex to remove css comments

That would be normally enough (assuming cssLines is a string containing all lines of your CSS file):

 Regex.Replace(cssLines, @"/\*.+?\*/", string.Empty, RegexOptions.Singleline)

Please note that the Singleline option will allow to match multi-line comments.

what is the regex expression to identify comments (i.e. between /* and */ across multiple lines)

(?:/\*(?:(?:[^*]|\*(?!/))*)\*/)

This was originally part of a MySQL parser, designed to strip comments without removing them from strings:

("(?:(?:(?:\\.)|[^"\\\r\n])*)"|'(?:(?:(?:\\.)|[^'\\\r\n])*)'|`(?:(?:(?:\\.)|[^`\\\r\n])*)`)|((?:-- .*)|(?:#.*)|(?:/\*(?:(?:[^*]|\*(?!/))*)\*/))

That gets replaced with capture group 1 to put the strings back.



Related Topics



Leave a reply



Submit