Valid Content-Type for Xml, HTML and Xhtml Documents

Valid content-type for XML, HTML and XHTML documents

HTML: text/html, full-stop.

XHTML: application/xhtml+xml, or only if following HTML compatbility guidelines, text/html. See the W3 Media Types Note.

XML: text/xml, application/xml (RFC 2376).

There are also many other media types based around XML, for example application/rss+xml or image/svg+xml. It's a safe bet that any unrecognised but registered ending in +xml is XML-based. See the IANA list for registered media types ending in +xml.

(For unregistered x- types, all bets are off, but you'd hope +xml would be respected.)

Content type beside text/html

Common ones include:

  • text/xml
  • application/json
  • image/jpeg

Comprehensive list here :
http://www.freeformatter.com/mime-types-list.html#mime-types-list

What Content-Type value should I send for my XML sitemap?

The difference between text/xml and application/xml is the default character encoding if the charset parameter is omitted:

Text/xml and application/xml behave differently when the charset
parameter is not explicitly specified. If the default charset (i.e.,
US-ASCII) for text/xml is inconvenient for some reason (e.g., bad web
servers), application/xml provides an alternative (see "Optional
parameters" of application/xml registration in Section 3.2).

For text/xml:

Conformant with [RFC2046], if a text/xml entity is received with
the charset parameter omitted, MIME processors and XML processors
MUST use the default charset value of "us-ascii"[ASCII]. In cases
where the XML MIME entity is transmitted via HTTP, the default
charset value is still "us-ascii".

For application/xml:

If an application/xml entity is received where the charset
parameter is omitted, no information is being provided about the
charset by the MIME Content-Type header. Conforming XML
processors MUST follow the requirements in section 4.3.3 of [XML]
that directly address this contingency. However, MIME processors
that are not XML processors SHOULD NOT assume a default charset if
the charset parameter is omitted from an application/xml entity.

So if the charset parameter is omitted, the character encoding of text/xml is US-ASCII while with application/xml the character encoding can be specified in the document itself.

Now a rule of thumb on the internet is: “Be strict with the output but be tolerant with the input.” That means make sure to meet the standards as much as possible when delivering data over the internet. But build in some mechanisms to overlook faults or to guess when receiving and interpreting data over the internet.

So in your case just pick one of the two types (I recommend application/xml) and make sure to specify the used character encoding properly (I recommend to use the respective default character encoding to play safe, so in case of application/xml use UTF-8 or UTF-16).

What's the difference between text/xml vs application/xml for webservice response

This is an old question, but one that is frequently visited and clear recommendations are now available from RFC 7303 which obsoletes RFC3023. In a nutshell (section 9.2):

The registration information for text/xml is in all respects the same
as that given for application/xml above (Section 9.1), except that
the "Type name" is "text".

IO Error: Non-XML Content-Type: text/html how can we fix this error?

Either:

  1. Validate against a profile that doesn't require XML (such as XHTML 1.0 Transitional, which your document claims to be written in)
  2. Configure your web server to serve the document with an XML media type (e.g. application/xml) in the HTTP Content-Type header
  3. Click the Be lax about content-type option

I'd recommend option 1 if you are writing a web page intended for the general public.

Why put an XHTML doctype declaration on HTML files? What does that do?

Why put an XHTML doctype declaration on HTML files? What does that do?

All that does is tell markup validators that they're about to validate an XHTML document, as opposed to a regular, SGML-rooted, HTML document. It describes the content, or more specifically the markup that follows, but nothing else.

Why are people doing this? What do they hope to achieve? Why not reserve the XHTML doctype declaration for actual XHTML files?

Or am I missing something?

Kind of. What actually happened was that people weren't aware that just putting an XHTML doctype declaration on top of an HTML document didn't automatically transform it into an XHTML document, although admittedly that was what everybody was hoping for.

You see, most web applications out there aren't configured to serialize XHTML documents as application/xhtml+xml properly, instead opting to serve pages as just text/html. (It's typically because of the .html file extension more than anything else, really; generally speaking, servers do correctly apply application/xhtml+xml to documents with .xhtml or .xht as the extension, but only static sites that actually make use of the file format will benefit from this.) That leads browsers to decide that they received a regular HTML document, and so that tag soup parsing nonsense we've all come to know and love inevitably ensues.

Note that it doesn't matter even if you have a meta tag like this on your XHTML document:

<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />

Browsers will ignore that, and only look at the actual HTTP Content-Type header that was sent along with the XHTML document.

To make matters worse, Internet Explorer, being the most-used browser in the past few years in XHTML's heyday, never properly supported the application/xhtml+xml MIME type before version 9 was finally released: instead of parsing the markup, constructing the DOM and rendering the page, all it would do was ask for a file download. That doesn't make a very usable XHTML page!

So, guess what we all had to live with until HTML5 became cool?

This, along with things like IE6 going quirky on pages with the XML declaration before the doctype declaration, is also one of the biggest factors leading to XHTML's downfall (along with XHTML 1.1 never gaining widespread usage, and XHTML 2.0 being canceled in favor of HTML5).

XPath is primarily used for Html or XML or XHTML?

You have a right to be confused.

XPath operates against a data model that generally assumes that markup is well-formed. By definition, XML and XHTML are necessarily well-formed; HTML, not necessarily. However, HTML parsers can often successfully parse non-well-formed markup anyway, in the spirit of being liberal in what one accepts as input, into a data model suitable for XPath.

Therefore, you can usually also use XPath with HTML. Using XPath in this manner, in fact, is a common web page scraping technique.



Related Topics



Leave a reply



Submit