What characters are allowed in DOM IDs?

Actually there is a difference between HTML and XHTML.
As XHTML is XML the rules for XML IDs apply:

Values of type ID MUST match the Name production.

NameStartChar ::=   ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
[#xD8-#xF6] | [#xF8-#x2FF] |
[#x370-#x37D] | [#x37F-#x1FFF] |
[#x200C-#x200D] | [#x2070-#x218F] |
[#x2C00-#x2FEF] | [#x3001-#xD7FF] |
[#xF900-#xFDCF] | [#xFDF0-#xFFFD] |

NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 |
[#x0300-#x036F] | [#x203F-#x2040]

Source: Extensible Markup Language (XML) 1.0 (Fifth Edition) 2.3

For HTML the following applies:

id = name [CS]

This attribute assigns a name to an element. This name must be unique in a document.

ID and NAME tokens must begin with a
letter ([A-Za-z]) and may be followed
by any number of letters, digits
([0-9]), hyphens ("-"), underscores
("_"), colons (":"), and periods

Source: HTML 4 Specification, Chapter 6, ID Token

What are legal characters for an HTML element id?

In HTML5, the only restrictions are that the ID must be unique within the document, contain at least one character and contain no spaces. See http://www.w3.org/TR/2014/REC-html5-20141028/dom.html#the-id-attribute

As other answers have pointed out, HTML 4 is more restrictive and specifies that

ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").

What are valid values for the id attribute in HTML?

For HTML 4, the answer is technically:

ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").

HTML 5 is even more permissive, saying only that an id must contain at least one character and may not contain any space characters.

The id attribute is case sensitive in XHTML.

As a purely practical matter, you may want to avoid certain characters. Periods, colons and '#' have special meaning in CSS selectors, so you will have to escape those characters using a backslash in CSS or a double backslash in a selector string passed to jQuery. Think about how often you will have to escape a character in your stylesheets or code before you go crazy with periods and colons in ids.

For example, the HTML declaration <div id="first.name"></div> is valid. You can select that element in CSS as #first\.name and in jQuery like so: $('#first\\.name'). But if you forget the backslash, $('#first.name'), you will have a perfectly valid selector looking for an element with id first and also having class name. This is a bug that is easy to overlook. You might be happier in the long run choosing the id first-name (a hyphen rather than a period), instead.

You can simplify your development tasks by strictly sticking to a naming convention. For example, if you limit yourself entirely to lower-case characters and always separate words with either hyphens or underscores (but not both, pick one and never use the other), then you have an easy-to-remember pattern. You will never wonder "was it firstName or FirstName?" because you will always know that you should type first_name. Prefer camel case? Then limit yourself to that, no hyphens or underscores, and always, consistently use either upper-case or lower-case for the first character, don't mix them.

A now very obscure problem was that at least one browser, Netscape 6, incorrectly treated id attribute values as case-sensitive. That meant that if you had typed id="firstName" in your HTML (lower-case 'f') and #FirstName { color: red } in your CSS (upper-case 'F'), that buggy browser would have failed to set the element's color to red. At the time of this edit, April 2015, I hope you aren't being asked to support Netscape 6. Consider this a historical footnote.

Javascript regex to remove illegal characters from DOM ID

var str = "99% of People are not the 1%";
str = str.replace(/^[^a-z]+|[^\w:.-]+/gi, "");

Can a DOM element have an ID that contains a space?

According to the HTML 4.0 specification for basic types:

ID and NAME tokens must begin with a
letter ([A-Za-z]) and may be followed
by any number of letters, digits
([0-9]), hyphens ("-"), underscores
("_"), colons (":"), and periods

And even if spaces were valid, an id attribute with spaces would be interpreted by jQuery as an ancestor descendant selector with the current selector syntax.

Allowed HTML 4.01 id values regex

You can use this regex


^ depicts the start of string

[a-zA-Z] matches an uppercase or lowercase letter

* matches the preceding character 1 to many times

\w is similar to [a-zA-Z\d_]

$ is the end of string

Allowed characters for CSS identifiers

The charset doesn't matter. The allowed characters matters more. Check the CSS specification. Here's a cite of relevance:

In CSS, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A0 and higher, plus the hyphen (-) and the underscore (_); they cannot start with a digit, two hyphens, or a hyphen followed by a digit. Identifiers can also contain escaped characters and any ISO 10646 character as a numeric code (see next item). For instance, the identifier "B&W?" may be written as "B\&W\?" or "B\26 W\3F".

Update: As to the regex question, you can find the grammar here:

ident      -?{nmstart}{nmchar}*

Which contains of the parts:

nmstart    [_a-z]|{nonascii}|{escape}
nmchar [_a-z0-9-]|{nonascii}|{escape}
nonascii [\240-\377]
escape {unicode}|\\[^\r\n\f0-9a-f]
unicode \\{h}{1,6}(\r\n|[ \t\r\n\f])?
h [0-9a-f]

This can be translated to a Java regex as follows (I only added parentheses to parts containing the OR and escaped the backslashes):

String h = "[0-9a-f]";
String unicode = "\\\\{h}{1,6}(\\r\\n|[ \\t\\r\\n\\f])?".replace("{h}", h);
String escape = "({unicode}|\\\\[^\\r\\n\\f0-9a-f])".replace("{unicode}", unicode);
String nonascii = "[\\240-\\377]";
String nmchar = "([_a-z0-9-]|{nonascii}|{escape})".replace("{nonascii}", nonascii).replace("{escape}", escape);
String nmstart = "([_a-z]|{nonascii}|{escape})".replace("{nonascii}", nonascii).replace("{escape}", escape);
String ident = "-?{nmstart}{nmchar}*".replace("{nmstart}", nmstart).replace("{nmchar}", nmchar);

System.out.println(ident); // The full regex.

Update 2: oh, you're more a PHP'er, well I think you can figure how/where to do str_replace?

Selecting elements with special characters in the ID

Try escaping it:


First paragraph of http://api.jquery.com/category/selectors/

What is a practical maximum length for HTML id?

Just tested: 1M characters works on every modern browser: Chrome1, FF3, IE7, Konqueror3, Opera9, Safari3.

I suspect even longer IDs could become hard to remember.

