When Is a Cdata Section Necessary Within a Script Tag

When is a CDATA section necessary within a script tag?

A CDATA section is required if you need your document to parse as XML (e.g. when an XHTML page is interpreted as XML) and you want to be able to write literal i<10 and a && b instead of i<10 and a && b, as XHTML will parse the JavaScript code as parsed character data as opposed to character data by default. This is not an issue with scripts that are stored in external source files, but for any inline JavaScript in XHTML you will probably want to use a CDATA section.

Note that many XHTML pages were never intended to be parsed as XML in which case this will not be an issue.

For a good writeup on the subject, see https://web.archive.org/web/20140304083226/http://javascript.about.com/library/blxhtml.htm

What is CDATA in HTML?

All text in an XML document will be parsed by the parser.

But text inside a CDATA section will be ignored by the parser.

CDATA - (Unparsed) Character Data

The term CDATA is used about text data that should not be parsed by the XML parser.

Characters like "<" and "&" are illegal in XML elements.

"<" will generate an error because the parser interprets it as the start of a new element.

"&" will generate an error because the parser interprets it as the start of an character entity.

Some text, like JavaScript code, contains a lot of "<" or "&" characters. To avoid errors script code can be defined as CDATA.

Everything inside a CDATA section is ignored by the parser.

A CDATA section starts with "<![CDATA[" and ends with "]]>"

Use of CDATA in program output

CDATA sections in XHTML documents are liable to be parsed differently by web browsers if they render the document as HTML, since HTML parsers do not recognise the CDATA start and end markers, nor do they recognise HTML entity references such as < within <script> tags. This can cause rendering problems in web browsers and can lead to cross-site scripting vulnerabilities if used to display data from untrusted sources, since the two kinds of parsers will disagree on where the CDATA section ends.

A brief SGML tutorial.

Also, see the Wikipedia entry on CDATA.

Why use *//![CDATA[* and *//]]* in a jQuery script?

CDATA is used to allow the document to be loaded as straight XML. You can embed JS in XML documents without replacing special XML characters like <, >, &, etc by XML entities <, >, & etc to prevent that the XML syntax get corrupted.

So double slash // in your XML will be treated as text instead of a comment and hence it makes CDATA as an XML tag.

The wiki says that:-

In an XML document or external parsed entity, a CDATA section is a
section of element content that is marked for the parser to interpret
as only character data, not markup. A CDATA section is merely an
alternative syntax for expressing character data; there is no semantic
difference between character data that manifests as a CDATA section
and character data that manifests as in the usual syntax in which <
and & would be represented by < and &, respectively.

Why does CDATA is commented out under script tags ?

XHTML is supposed to be served as XML by using media type application/xhtml+xml. In HTML5, the markup is only XHTML if it is served with an XML media type. When served like this, the contents of script elements are not CDATA.

So to get the XML parser to treat the script contents as CDATA, they can be wrapped in <![CDATA[ ]]>.

While few people have historically served markup as application/xhtml+xml, many have validated their pages as if it was XHTML. The XHTML validator equally expects that the script contents are not ordinarily CDATA, and so will typically reject tags and other scraps of markup embedded in the JavaScript, unless they are escaped with <![CDATA[ ]]>

Having validated their pages as XHTML, they'd then serve their pages with a text/html media type to browsers, which meant that the browser treats the markup as HTML, not XHTML. In this case, the HTML parser is used, which does treat the script contents as CDATA automatically, so the <![CDATA[ and ]]>. become part of the script to be run by the JavaScript engine. Therefore, to hide those strings from the JavaScript engine, they are preceded with // on the same line, which means that the JavaScript engine thinks the lines are comments.

Finally, some people serve the same markup as both application/xhtml+xml and text/html, switching based on the information found in the HTTP request message. For the same reasons as above, to get the script contents to be processed correctly in both modes, the //<![CDATA[ and //]]> pattern is a very effective technique.

Should I use ![CDATA[...]] in HTML5?

The CDATA structure isn't really for HTML at all, it's for XML.

People sometimes use them in XHTML inside script tags because it removes the need for them to escape <, > and & characters. It's unnecessary in HTML though, since script tags in HTML are already parsed like CDATA sections.

Edit: This is where we open that really mouldy old can of worms from 2002 over whether you're sending XHTML as text/html or as application/xhtml+xml like you’re “supposed” to :-)

Should I use ]] or //]] for closing a CDATA section into xHTML

According to www.w3.org/TR/xhtml1/#h-4.8 the CDATA section can be defined as: [no //]

Yeah. In XHTML, they can. Proper XHTML, as read by an XML parser like when you serve application/xhtml+xml to a web browser that isn't IE.

But probably you're actually serving as text/html, which means your browser isn't an ‘XML processor’ as referenced in that section. It's a legacy-HTML4 parser, so you have to abide by the appendix C guidelines and avoid any XML features that don't work in HTML4.

In particular, the strings <![CDATA[ and ]]> in a <script> or <style> block are not special to an HTML4 parser, because in HTML4 those two elements are ‘CDATA elements’ where markup doesn't apply (except for the </ ETAGO sequence to end the element itself). So an HTML4 parser will send those strings straight to the CSS or JavaScript engine.

Because <![CDATA[ is not valid JS, you'll get a JavaScript syntax error. (The other answers are wrong here: it's not just very old browsers, but all HTML4 browsers, that will give errors for an uncommented CDATA section in script.)

You use the // or /* comment markup to hide the content from the JavaScript or CSS engine. So:

<script type="text/javascript">//<![CDATA[
alert('a&b');
//]]></script>

(Note the leading //; this was omitted in the W3Schools example code, and makes that example code not work at all. Fail. Don't trust W3Schools: they are nothing to do with W3C and their material is often rubbish.)

This is read by an HTML parser as:

  • Open-tag script establishing CDATA content until the next ETAGO
  • Text //<![CDATA[\n alert('a&b');\n//]]>
  • ETAGO and close-tag script
  • -> resultant content sent to JavaScript engine: //<![CDATA[\nalert('a&b');\n//]]>

But by an XML parser as:

  • Open-tag script (no special parsing implications)
  • Text content //
  • Open CDATA section establishing CDATA content until the next ]]> sequence
  • Text \n alert('a&b');\n//
  • Close CDATA section
  • Close-tag script
  • -> resultant content sent to JavaScript engine: //\nalert('a&b');\n//

Whilst the parsing process is quite different, the JS engine ends up with the same effective code in each case, as thanks to the //​s the only difference is in the comments.

Note this is a very different case to the old-school:

<script type="text/javascript"><!--
alert('a&b');
//--></script>

which was to hide script/style content so that it didn't get written onto the page in browsers that didn't understand <script> and <style> tags. This will not generate a JavaScript/CSS error, because a hack was put it at a different level: it is a syntactical feature of the CSS and JavaScript languages themselves that <!-- is defined to do nothing, allowing this hack to work.

Those browsers are ancient history; you absolutely should not use this technique today. Especially in XHTML, as an XML parser would take you at your word, turning the whole script block into an XML comment instead of executable code.

I want to inline Scripts or CSSs into xHTML without escaping special characters.

Avoid doing this and you will be much happier.

Do you really need the < and & characters in a <style>? No, almost never. Do you really need them in <script>? Well... sometimes, yeah, and in that case the commented-CDATA-section is acceptable.

But to be honest, XHTML compatibility guideline C.4 is as applicable to HTML4 as it is to XHTML1: anything non-trivial should be an in external script, and then you don't have to worry about any of this.

What does ![CDATA[]] in XML mean?

CDATA stands for Character Data and it means that the data in between these strings includes data that could be interpreted as XML markup, but should not be.

The key differences between CDATA and comments are:

  • As Richard points out, CDATA is still part of the document, while a comment is not.
  • In CDATA you cannot include the string ]]> (CDEnd), while in a comment -- is invalid.
  • Parameter Entity references are not recognized inside of comments.

This means given these four snippets of XML from one well-formed document:

<!ENTITY MyParamEntity "Has been expanded">


<!--
Within this comment I can use ]]>
and other reserved characters like <
&, ', and ", but %MyParamEntity; will not be expanded
(if I retrieve the text of this node it will contain
%MyParamEntity; and not "Has been expanded")
and I can't place two dashes next to each other.
-->


<![CDATA[
Within this Character Data block I can
use double dashes as much as I want (along with <, &, ', and ")
*and* %MyParamEntity; will be expanded to the text
"Has been expanded" ... however, I can't use
the CEND sequence. If I need to use CEND I must escape one of the
brackets or the greater-than sign using concatenated CDATA sections.
]]>


<description>An example of escaped CENDs</description>
<!-- This text contains a CEND ]]> -->
<!-- In this first case we put the ]] at the end of the first CDATA block
and the > in the second CDATA block -->
<data><![CDATA[This text contains a CEND ]]]]><![CDATA[>]]></data>
<!-- In this second case we put a ] at the end of the first CDATA block
and the ]> in the second CDATA block -->
<alternative><![CDATA[This text contains a CEND ]]]><![CDATA[]>]]></alternative>

In this case, Do I need to using CDATA section?

The intention is so you do not need to escape < within your javascript in an XHTML document (which is a form of XML). When enclosed in a CDATA section, < is treated as literal <.

HTML itself does not recognise CDATA sections, which is why the CDATA start and end tokens are in JS comments.

For more detail, a similar question was answered at: What is CDATA in HTML?

As it happens, the example you posted would work fine with or without the two lines that start and end the CDATA section.



Related Topics



Leave a reply



Submit