What Are the Rules Around Whitespace in Attribute Selectors

What are the rules around whitespace in attribute selectors?

The rules on whitespace in attribute selectors are stated in the grammar. Here's the Selectors 3 production for attribute selectors (some tokens substituted with their string equivalents for illustration; S* represents 0 or more whitespace characters):

attrib
: '[' S* [ namespace_prefix ]? IDENT S*
[ [ '^=' |
'$=' |
'*=' |
'=' |
'~=' |
'|=' ] S* [ IDENT | STRING ] S*
]? ']'
;

Of course, the grammar isn't terribly useful to someone looking to understand how to write attribute selectors, as it's intended for someone who's implementing a selector engine.

Here's a plain-English explanation:

Whitespace before the attribute selector

This isn't covered in the above production, but the first obvious rule is that if you're attaching an attribute selector to another simple selector or a pseudo-element, don't use a space:

a[href]::after

If you do, the space is treated as a descendant combinator instead, with the universal selector implied on the attribute selector and anything that may follow it. In other words, these selectors are equivalent to each other, but different from the above:

a [href] ::after
a *[href] *::after

Whitespace inside the attribute selector

Whether you have any whitespace within the brackets and around the comparison operator doesn't matter; I find that browsers seem to treat them as if they weren't there (but I haven't tested extensively). These are all valid according to the grammar and, as far as I've seen, work in all modern browsers:

a[href]
a[ href ]
a[ href="http://stackoverflow.com" ]
a[href ^= "http://"]
a[ href ^= "http://" ]

Whitespace is not allowed between the ^ (or other symbol) and = as these are treated as a single token, and tokens cannot be broken apart.

If IE7 and IE8 implement the grammar correctly, they should be able to handle them all as well.

If a namespace prefix is used, whitespace is not allowed between the prefix and the attribute name.

These are incorrect:

unit[sh| quantity]
unit[ sh| quantity="200" ]
unit[sh| quantity = "200"]

These are correct:

unit[sh|quantity]
unit[ sh|quantity="200" ]
unit[sh|quantity = "200"]

Whitespace within the attribute value

But notice the quotes around the attribute values above; if you leave them out, and you try to select something whose attribute has spaces in its value you have a syntax error.

This is incorrect:

div[class=one two]

This is correct:

div[class="one two"]

This is because an unquoted attribute value is treated as an identifier, which doesn't include whitespace (for obvious reasons), whereas a quoted value is treated as a string. See this spec for more details.

To prevent such errors, I strongly recommend always quoting attribute values, whether in HTML, XHTML (required), XML (required), CSS or jQuery (once required).

Whitespace after the attribute value

As of Selectors 4 (following the original publication of this answer), attribute selectors can accept flags in the form of an identifier appearing after the attribute value. Two flags have been defined pertaining to character case, one for case-insensitive matching:

div[data-foo="bar" i]

And one for case-sensitive matching (whose addition I had a part in, albeit by proxy of the WHATWG):

ol[type="A" s]
ol[type="a" s]

The grammar has been updated thus:

attrib
: '[' S* attrib_name ']'
| '[' S* attrib_name attrib_match [ IDENT | STRING ] S* attrib_flags? ']'
;

attrib_name
: wqname_prefix? IDENT S*

attrib_match
: [ '=' |
PREFIX-MATCH |
SUFFIX-MATCH |
SUBSTRING-MATCH |
INCLUDE-MATCH |
DASH-MATCH
] S*

attrib_flags
: IDENT S*

In plain English: if the attribute value is not quoted (i.e. it is an identifier), whitespace between it and attrib_flags is required; otherwise, if the attribute value is quoted then whitespace is optional, but strongly recommended for the sake of readability. Whitespace between attrib_flags and the closing bracket is optional as always.

Whitespace in CSS selectors

All of your conclusions are correct. There are nuances with regard to whitespace in attribute selectors, covered in my answer to this question.

All the exact rules of where whitespace may or may not appear are covered in the grammar. For the purposes of the grammar, the "contextual characters (such as + and >)" that you refer to are officially known as combinators. (The term "contextual selector" was first used in CSS1 but hasn't appeared since.)

Remember in addition that any number of contiguous whitespace characters that separate two simple selectors is considered a descendant combinator, which is in fact one reason why whitespace isn't "allowed" around the delimiters for pseudo-elements, pseudo-classes, attribute selectors, class selectors and ID selectors — because it has significance and therefore its presence alters the meaning of the selector.

White space and selectors

For this cases I prefer using css selectors because of its minimalistic syntax:

response.css("p.text-nowrap.hidden-xs::text")

Also google chrome developer tools displays css selectors when you observing html code
This makes scraper development much easier
google developer tools

What's the difference between CSS child selector without space a b and with space a b ?

CSS is very forgiving. The CSS selectors specification mentiones that whitespaces around combinators (like your > here) are optional:

The following selector represents a p element that is child of body:

body > p

The following example combines descendant combinators and child combinators.

div ol>li p

It represents a p element that is a descendant of an li element; the li element must be the child of an ol element; the ol element must be a descendant of a div. Notice that the optional white space around the ">" combinator has been left out.

— Section 8.2 of the CSS Selectors Level 3 recommendation

To further back this up, the specification's Grammar section makes this really apparent with an implementation approach:

combinator
/* combinators can be surrounded by whitespace */
: S+ | S* [ '>' | '+' | '~' | COLUMN | '/' IDENT '/' ] S*
;

— Section 10 of the CSS Selectors Level 3 recommendation

For this reason, the following are all valid as CSS parsers should simply strip the spaces out:

a>b {}
a > b {}
a> b {}
a >b {}
a > b {}

So to answer your question: no, there is no difference.

As for which one you should use, however: that's purely a question of personal preference. For me, I'd opt for a > b, simply because I feel it makes it easier to read, but if you want to type a>b or even a > b it's entirely up to you - although anyone who has to read your code will probably not be your number 1 fan with the latter approach!

How to select classes with spaces

As Zepplock says, that's actually two classes in a single attribute: boolean and optional. The space is not part of a class name; it acts as the separator.

These three selectors will all match it:

.boolean
.optional
.boolean.optional

The last selector only picks up this element as it has both classes.

You never include a space when chaining class selectors, not even like this:

.boolean .optional

As that selects .optional elements that are contained within separate .boolean elements.

CSS selector for empty or whitespace

Lots of people missing the point of this question, which I've addressed in the following exposition, but for those just looking for the answer, I'm mirroring the last paragraph here:

Selectors 4 now redefines :empty to include elements that contain only whitespace. This was originally proposed as a separate pseudo-class :blank but was recently retconned into :empty after it was determined that it was safe to do so without too many sites depending on the original behavior. Browsers will need to update their implementations of :empty in order to conform to Selectors 4. If you need to support older browsers, you will have to go through the hassle of marking elements containing only whitespace or pruning the whitespace before or after the fact.


While the question depicts a <p> element containing a handful of regular space characters, which seems like an oversight, it is far more common to see markup where elements contain only whitespace in the form of indentation and blank lines, such as:

<ul class="items">
<li class="item">
<div>
<!-- Some complex structure of elements -->
</div>
</li>
<li class="item">
</li> <!-- Empty, except for a single line break and
indentation preceding the end tag -->
</ul>

Some elements, like <li> in the above example as well as <p>, have optional end tags, which can cause unintended side effects in DOM processing as well in the presence of inter-element whitespace. For example, the following two <ul> elements don't produce equivalent node trees, in particular the first one does not result in a li:empty in Selectors level 3:

li:empty::before { content: '(empty)'; font-style: italic; color: #999; }
<ul>  <li></ul><ul>  <li></li></ul>

Is it possible to use the space character in CSS class names?

https://html.spec.whatwg.org/multipage/dom.html#classes

When specified on HTML elements, the class attribute must have a value that is a set of space-separated tokens representing the various classes that the element belongs to.

Of course it’s possible to escape a space character, e.g. — HTML attribute values can contain character references per https://html.spec.whatwg.org/multipage/syntax.html#syntax-attribute-value:

Attribute values are a mixture of text and character references, except with the additional restriction that the text cannot contain an ambiguous ampersand.

However, it doesn’t matter whether the space is HTML-escaped or not for the above statement to apply. A HTML-encoded space character still decodes to a space character, and so e.g. class="a b" still results in “a set of space-separated tokens”. See e.g. https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(double-quoted)-state for how double-quoted attribute values are parsed.

What is the difference between ' ' and a space in CSS selectors?

A > B will only select B that are direct children to A (that is, there are no other elements inbetween).

A B will select any B that are inside A, even if there are other elements between them.



Related Topics



Leave a reply



Submit