Where Is the HTML5 Document Type Definition

Where is the HTML5 Document Type Definition?

There is no HTML5 DTD. The HTML5 RC explicitly says this when discussing XHTML serialization, and this clearly applies to HTML serialization as well.

DTDs have been regarded by the designers of HTML5 as too limited in expressive power, and HTML5 validators (basically the HTML5 mode of http://validator.nu and its copy at http://validator.w3.org/nu/) use schemas and ad hoc checks, not DTD-based validation.

Moreover, HTML5 has been designed so that writing a DTD for it is impossible. For example, there is no SGML way to capture the HTML5 rule that any attribute name that starts with “data-” and complies with certain general rules is valid. In SGML, attributes need to be listed individually, so a DTD would need to be infinite.

It is possible to design DTDs that correspond to HTML5 with some omissions and perhaps with some extra rules imposed, but they won’t really be HTML5 DTDs. My experiment with the idea is not very encouraging: too many limitations, too tricky, and the DTD would need to be so permissive that many syntax errors would go uncaught.

HTML5 is not based on SGML, and therefore does not require a reference to a DTD

The terminology is confusing, but a DTD (document type definition) is only one part of a document type declaration (usually shortened to "doctype"). You should always include a doctype declaration (<!DOCTYPE html> if you use HTML5), but a document type definition identifier is no longer necessary.

To provide a concrete example, this is what a HTML4.01 document type declaration ("doctype") might have looked like:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

The document type definition ("DTD") identifier in the above declaration is this part:

"-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"

That's the part you can leave off for HTML5. "PUBLIC" specifies the DTD's availability, so that should also not be included if there is no DTD.

HTML5 Doctype for Domparser

If you want to continue using XML, but don't want to use the XHTML doctype, then you have to declare the character entities of XHTML via ENTITY declarations directly in your document (in the internal subset or an external declaration set) since only HTML has nbsp and many others as predefined entities (XML has only quot, amp, apos, lt, and gt). You can use the HTML5 entity set from https://www.w3.org/2003/entities/2007/htmlmathml-f.ent (which includes the large set of MathML entities), or the much smaller set of classic HTML4 entities.

But I would first check if DomParser actually processes markup declarations and/or external declaration sets with markup declarations. Try to parse the following

<?xml version="1.0"?>
<!DOCTYPE test [
<!ENTITY nbsp " ">
]>
<test>
 
</test>

and check the console for error messages.

There is no "official" DTD for HTML (in fact, no formal grammar at all), but there's my SGML DTD for W3C HTML 5.1 with much more information about parsing HTML5 than you probably are interested in, including info about HTML5's predefined entities.

In HTML5, can my !DOCTYPE html declaration contain anything else?

8.1.1 The DOCTYPE


A DOCTYPE is a required preamble.

Note: DOCTYPEs are required for legacy reasons. When omitted, browsers tend to use a different rendering mode that is incompatible
with some specifications. Including the DOCTYPE in a document ensures
that the browser makes a best-effort attempt at following the relevant
specifications.


A DOCTYPE must consist of the following components, in this order:

  1. A string that is an ASCII case-insensitive match for the string "<!DOCTYPE".
  2. One or more space characters.
  3. A string that is an ASCII case-insensitive match for the string "html".
  4. Optionally, a DOCTYPE legacy string or an obsolete permitted DOCTYPE string (defined below).
  5. Zero or more space characters.
  6. A ">" (U+003E) character.


Note: In other words, <!DOCTYPE html>, case-insensitively.


For the purposes of HTML generators that cannot output HTML markup
with the short DOCTYPE "<!DOCTYPE html>", a DOCTYPE legacy
string
may be inserted into the DOCTYPE (in the position defined
above). This string must consist of:

  1. One or more space characters.
  2. A string that is an ASCII case-insensitive match for the string "SYSTEM".
  3. One or more space characters.
  4. A U+0022 QUOTATION MARK or U+0027 APOSTROPHE character (the quote mark).
  5. The literal string "about:legacy-compat".
  6. A matching U+0022 QUOTATION MARK or U+0027 APOSTROPHE character (i.e. the same character as in the earlier step labeled quote mark).


Note: In other words, <!DOCTYPE html SYSTEM "about:legacy-compat"> or <!DOCTYPE html SYSTEM 'about:legacy-compat'>,
case-insensitively except for the part in single or double quotes.


The DOCTYPE legacy string should not be used unless the document
is generated from a system that cannot output the shorter string.

To help authors transition from HTML4 and XHTML1, an obsolete
permitted DOCTYPE string
can be inserted into the DOCTYPE (in the
position defined above). This string must consist of:

  1. One or more space characters.
  2. A string that is an ASCII case-insensitive match for the string "PUBLIC".
  3. One or more space characters.
  4. A U+0022 QUOTATION MARK or U+0027 APOSTROPHE character (the first quote mark).
  5. The string from one of the cells in the first column of the table below. The row to which this cell belongs is the selected row.
  6. A matching U+0022 QUOTATION MARK or U+0027 APOSTROPHE character (i.e. the same character as in the earlier step labeled first quote
    mark
    ).
  7. If a system identifier is used,

    1. One or more space characters.
    2. A U+0022 QUOTATION MARK or U+0027 APOSTROPHE character (the third quote mark).
    3. The string from the cell in the second column of the selected row.
    4. A matching U+0022 QUOTATION MARK or U+0027 APOSTROPHE character (i.e. the same character as in the earlier step labeled third quote
      mark
      ).


Allowed values for public and system identifiers in an obsolete
permitted DOCTYPE string.

┌────────────────────────────────┬─────────────────────────────────────────────────┬───────────────────────────┐
│Public identifier │System identifier │System identifier optional?│
├────────────────────────────────┼─────────────────────────────────────────────────┼───────────────────────────┤
│-//W3C//DTD HTML 4.0//EN │http://www.w3.org/TR/REC-html40/strict.dtd │Yes │
│-//W3C//DTD HTML 4.01//EN │http://www.w3.org/TR/html4/strict.dtd │Yes │
│-//W3C//DTD XHTML 1.0 Strict//EN│http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd│No │
│-//W3C//DTD XHTML 1.1//EN │http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd │No │
└────────────────────────────────┴─────────────────────────────────────────────────┴───────────────────────────┘

A DOCTYPE containing an obsolete permitted DOCTYPE string is
an obsolete permitted DOCTYPE. Authors should not use obsolete
permitted DOCTYPEs, as they are unnecessarily long.

Where is the HTML5 Document Type Definition?

There is no HTML5 DTD. The HTML5 RC explicitly says this when discussing XHTML serialization, and this clearly applies to HTML serialization as well.

DTDs have been regarded by the designers of HTML5 as too limited in expressive power, and HTML5 validators (basically the HTML5 mode of http://validator.nu and its copy at http://validator.w3.org/nu/) use schemas and ad hoc checks, not DTD-based validation.

Moreover, HTML5 has been designed so that writing a DTD for it is impossible. For example, there is no SGML way to capture the HTML5 rule that any attribute name that starts with “data-” and complies with certain general rules is valid. In SGML, attributes need to be listed individually, so a DTD would need to be infinite.

It is possible to design DTDs that correspond to HTML5 with some omissions and perhaps with some extra rules imposed, but they won’t really be HTML5 DTDs. My experiment with the idea is not very encouraging: too many limitations, too tricky, and the DTD would need to be so permissive that many syntax errors would go uncaught.

html5 doctype attributes

Have a look at the global attributes.

You should only use the xml attributes if you have aa xml document (http://www.w3.org/TR/html5/global-attributes.html#the-lang-and-xml:lang-attributes)



Related Topics



Leave a reply



Submit