Parsing CSS by regex
That just seems too convoluted for a single regular expression. Well, I'm sure that with the right extentions, an advanced user could create the right regex. But then you'd need an even more advanced user to debug it.
Instead, I'd suggest using a regex to pull out the pieces, and then tokenising each piece separately. e.g.,
/([^{])\s*\{\s*([^}]*?)\s*}/
Then you end up with the selector and the attributes in separate fields, and then split those up. (Even the selector will be fun to parse.) Note that even this will have pains if }'s can appear inside quotes or something. You could, again, convolute the heck out of it to avoid that, but it's probably even better to avoid regex's altogether here, and handle it by parsing one field at a time, perhaps by using a recursive-descent parser or yacc/bison or whatever.
Javascript Regular Expression to Parse CSS
Don't try to match it with a complex regex. But, instead split using a less complex one.
For splitting, we can use String.split
and pass the regex /[{}/
into it and then we use Array.map
to trim the strings so as to get only the content and removing the white space. But prior to doing that, we will just remove the unwanted empty strings using Array.filter
var arr = str.split(/[{}]/).filter(String).map(function(str){
return str.trim();
});
It also has the advantage of working on all css-rules and not just classes, provided the CSS is a valid one.
Parse CSS expression with regex
\s
means whitespace and \s*
means zero or more occurrence of whitespace
this is what you are looking for:
((:?\.*|#*)+[a-zA-Z0-9_-]+\s*{[^}]*})
Demo: https://regex101.com/r/lR4aW1/3
Regular expression to parse css
You can use pyparsing to parse such nested parentheses.
import pyparsing as pp
string = '@keyframes mymove {from {top: 0px;} to {top: 200px;}}'
pattern = pp.Regex(r'^.*?(?= \{)') + pp.original_text_for(pp.nested_expr('{', '}'))
selector, rules = pattern.parse_string(string)
# Tests
assert selector == '@keyframes mymove'
assert rules == '{from {top: 0px;} to {top: 200px;}}'
* pyparsing
can be installed by pip install pyparsing
See also this post: Python: How to match nested parentheses with regex?
Regex to parse CSS selector
Thanks all very much for your suggestions and help. I tied it all together into the following two Regex Patterns:
This one parses the CSS selector string (e.g. div#myid.myclass[attr=1,fred=3]) http://www.rubular.com/r/2L0N5iWPEJ
cssSelector = re.compile(r'^(?P<type>[\*|\w|\-]+)?(?P<id>#[\w|\-]+)?(?P<classes>\.[\w|\-|\.]+)*(?P<data>\[.+\])*$')
>>> cssSelector.match("table#john.test.test2[hello]").groups()
('table', '#john', '.test.test2', '[hello]')
>>> cssSelector.match("table").groups()
('table', None, None, None)
>>> cssSelector.match("table#john").groups()
('table', '#john', None, None)
>>> cssSelector.match("table.test.test2[hello]").groups()
('table', None, '.test.test2', '[hello]')
>>> cssSelector.match("table#john.test.test2").groups()
('table', '#john', '.test.test2', None)
>>> cssSelector.match("*#john.test.test2[hello]").groups()
('*', '#john', '.test.test2', '[hello]')
>>> cssSelector.match("*").groups()
('*', None, None, None)
And this one does the attributes (e.g. [link,key~=value]) http://www.rubular.com/r/2L0N5iWPEJ:
attribSelector = re.compile(r'(?P<word>\w+)\s*(?P<operator>[^\w\,]{0,2})\s*(?P<value>\w+)?\s*[\,|\]]')
>>> a = attribSelector.findall("[link, ds9 != test, bsdfsdf]")
>>> for x in a: print x
('link', '', '')
('ds9', '!=', 'test')
('bsdfsdf', '', '')
A couple of things to note:
1) This parses attributes using comma delimitation (since I am not using strict CSS).
2) This requires patterns take the format: tag, id, classes, attributes
The first regex does tokens, so the whitespace and '>' separated parts of a selector string. This is because I wanted to use it to check against my own object graph :)
Thanks again!
Parsing css background url and selector using regex
update
After a closer look, I offer 2 soulutions that mitigate backtracking issue's to a relative degree.
Before looking at them, I want to point out that there are only a very few delimiters associated with CSS syntax.
Moreover, it's more related to the order and content of allowed characters that define CSS syntax.
The cure to backtracking is to restrict the regex engine to fewer allowable
characters to match and withing strategic position.
If you look at the CSS specification here -> https://www.w3.org/TR/CSS21/syndata.html
you'll notice that it is entirely defined by regular expressions.
That indicates CSS parsers are entirely constructed with chopped version of regex.
However, while it would be an interesting exercise to put it into a
all encompasing regex, I will decline that challenge, because there is
nothing in it for me.
Instead, I offer these 2 regex tailored to your request.
Fisrt one:
- Matches only the first
url()
block within the<style>
element
<style[^>]*?>(?:[^{}:]*{[^{}]*?:[^{}()]*?})*?(?:([^{}:]*){[^{}]*?:\s*url\s*\(\s*([^{}()]*?)\s*\)\s*})
see -> https://regex101.com/r/2SNIks/1
Second one:
- Matches all the
url()
blocks with the<style>
element
(?:<style[^>]*?>|(?!^)\G)(?:(?:(?!</style)[^{}:])*{[^{}]*?:[^{}()]*?})*?(?:([^{}:]*){[^{}]*?:\s*url\s*\(\s*([^{}()]*?)\s*\)\s*})
see -> https://regex101.com/r/d8q6LH/1
For both regex,
- The selector is in group 1
- The url is in group 2
regex to parse CSS from HTML string fails when child combinator is used
The safer regex is this
/(?:<(style)(?:\s+(?=((?:"[\S\s]*?"|'[\S\s]*?'|(?:(?!\/>)[^>])?)+))\2)?\s*>)([\S\s]*?)<\/\1\s*>/
https://regex101.com/r/sx2YPf/1
and I recommend using this. The content is in group 3.
If you want to match all invisible content, put this in place of style script|style|object|embed|applet|noframes|noscript|noembed
For reading
(?:
<
( style ) # (1), Invisible content; end tag req'd
(?:
\s+
(?=
( # (2 start)
(?:
" [\S\s]*? "
| ' [\S\s]*? '
| (?:
(?! /> )
[^>]
)?
)+
) # (2 end)
)
\2
)?
\s* >
)
( [\S\s]*? ) # (3)
</ \1 \s* >
If anybody is curious, the lookahead assertion matching the rest of the
style tag inner attr/vals specifically not only does that validation,
but also insures the style tag is not self contained (if even a typo).
The contents of the assertion is passive and is immune to backtracking,
and is captured and inserted just past the assertion where backtracking
environment is but now the backreference is just a literal.
In the non JS environment like php, this is accomplished by substituting
an atomic group (>..)
instead of the assertion.
Parse inline CSS values with Regex?
Another way, using a regex:
$css = "color:#777;font-size:16px;font-weight:bold;left:214px;position:relative;top: 70px";
$results = array();
preg_match_all("/([\w-]+)\s*:\s*([^;]+)\s*;?/", $css, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$results[$match[1]] = $match[2];
}
print_r($results);
Outputs:
Array
(
[color] => #777
[font-size] => 16px
[font-weight] => bold
[left] => 214px
[position] => relative
[top] => 70px
)
Related Topics
How to Convert PHP Date Formats to Gmt and Vice Versa
Array_Key_Exists Is Not Working
How to Pass Data Between Pages in PHP
Split Keywords for Post PHP MySQL
PHP String Function to Get Substring Before the Last Occurrence of a Character
Selecting All Columns That Start with Xxx Using a Wildcard
Using Like in Bindparam for a MySQL Pdo Query
How to Destroy the Session Cookie Correctly with PHP
Retrieve the Id of an Inserted Record: PHP & Ms SQL Server
Warning: Preg_Match() [Function.Preg-Match]: Unknown Modifier '/'
PHP Check Value Against Multiple Values with Or-Operator
How to Check the Performance of MySQL Indexing
How to Connect to MySQL Database in PHP Using MySQLi Extension
Calculate Skip Value for Given Record for Sorted Paging
Get Size of Post-Request in PHP
Understanding Pdo MySQL Transactions
"Strict Standards: Only Variables Should Be Passed by Reference" Error