What Exactly Is Cdata and What Does It Do

What exactly is CDATA and what does it do?

It tells the interpreter that it should not try to interpret the data enclosed in the tags. For example, if you want a XML file to contain a comment with < or >, XML interpreters will report the file as invalid because the caracters < and > will not be part of standard tag. You simply have to surround the code with the CDATA tags.

what does ![CDATA[![CDATA[some text]]]]![CDATA[]] in XML mean?

Unless there are special characters such as "&" and "<" in the content, the string

<![CDATA[xxxxx]]>

means exactly the same as

xxxxx

The difference is that in the second form, "&" and "<" have a special meaning, while in the CDATA form, the only thing with a special meaning is the string "]]>" which acts as a terminator.

Your more complex example:

<![CDATA[<![CDATA[TAX INVOICE]]]]><![CDATA[>]]>

is a bit of a nightmare, and results from a sloppy programming habit of wrapping text in CDATA sections out of laziness. CDATA sections cannot be nested, so the first ]]> terminates the first <!CDATA[, which means that the string is equivalent to

<![CDATA[TAX INVOICE]]>

You might think that this in turn is equivalent to

TAX INVOICE

but that is not the case, because an XML parser will only interpret the outer CDATA delimiters, and the content it will pass to the application is therefore

<![CDATA[TAX INVOICE]]>

What is CDATA in HTML?

All text in an XML document will be parsed by the parser.

But text inside a CDATA section will be ignored by the parser.

CDATA - (Unparsed) Character Data

The term CDATA is used about text data that should not be parsed by the XML parser.

Characters like "<" and "&" are illegal in XML elements.

"<" will generate an error because the parser interprets it as the start of a new element.

"&" will generate an error because the parser interprets it as the start of an character entity.

Some text, like JavaScript code, contains a lot of "<" or "&" characters. To avoid errors script code can be defined as CDATA.

Everything inside a CDATA section is ignored by the parser.

A CDATA section starts with "<![CDATA[" and ends with "]]>"

Use of CDATA in program output

CDATA sections in XHTML documents are liable to be parsed differently by web browsers if they render the document as HTML, since HTML parsers do not recognise the CDATA start and end markers, nor do they recognise HTML entity references such as < within <script> tags. This can cause rendering problems in web browsers and can lead to cross-site scripting vulnerabilities if used to display data from untrusted sources, since the two kinds of parsers will disagree on where the CDATA section ends.

A brief SGML tutorial.

Also, see the Wikipedia entry on CDATA.

what actually is PCDATA and CDATA?

From WIKI:

PCDATA

Simply speaking, PCDATA stands for Parsed Character Data. That means the characters are to be parsed by the XML, XHTML, or HTML parser. (< will be changed to <, <p> will be taken to mean a paragraph tag, etc). Compare that with CDATA, where the characters are not to be parsed by the XML, XHTML, or HTML parser.

CDATA

The term CDATA, meaning character data, is used for distinct, but related purposes in the markup languages SGML and XML. The term indicates that a certain portion of the document is general character data, rather than non-character data or character data with a more specific, limited structure.

What is the reason that CDATA even exists?

CDATA sections are just for the convenience of human authors, not for programs. Their only use is to give humans the ability to easily include e.g. SVG example code in an XHTML page without needing to carefully replacing every < with < and so on.

That is for me the intended use. Not to make the resulting document a few bytes smaller because you can use < instead of <.

Also again taking the sample from above (SVG code in xhtml) it makes it easy for me to check the source code of the XHTML file and just copy-paste the SVG code out without again needing to back-replace < with <.

What is the Point of using XML CDATA?

You can use it to avoid XML escaping special characters.

Imagine you have an element like

<data>...</data>

And want to place the following text in the data element :

 a < b

Like so:

<data>a < b</data> 

That doesn't work, since XML recognizes the < as a potential start of a new tag.

You can escape the < character:

<data>a < b</data>

Or you can tell the XML parser to not parse your data by placing it in a CDATA section:

<data><![CDATA[a < b]]></data>

(Then again, with CDATA, your text cannot contain ]]>)

See also this question

Why use *//![CDATA[* and *//]]* in a jQuery script?

CDATA is used to allow the document to be loaded as straight XML. You can embed JS in XML documents without replacing special XML characters like <, >, &, etc by XML entities <, >, & etc to prevent that the XML syntax get corrupted.

So double slash // in your XML will be treated as text instead of a comment and hence it makes CDATA as an XML tag.

The wiki says that:-

In an XML document or external parsed entity, a CDATA section is a
section of element content that is marked for the parser to interpret
as only character data, not markup. A CDATA section is merely an
alternative syntax for expressing character data; there is no semantic
difference between character data that manifests as a CDATA section
and character data that manifests as in the usual syntax in which <
and & would be represented by < and &, respectively.

When is a CDATA section necessary within a script tag?

A CDATA section is required if you need your document to parse as XML (e.g. when an XHTML page is interpreted as XML) and you want to be able to write literal i<10 and a && b instead of i<10 and a && b, as XHTML will parse the JavaScript code as parsed character data as opposed to character data by default. This is not an issue with scripts that are stored in external source files, but for any inline JavaScript in XHTML you will probably want to use a CDATA section.

Note that many XHTML pages were never intended to be parsed as XML in which case this will not be an issue.

For a good writeup on the subject, see https://web.archive.org/web/20140304083226/http://javascript.about.com/library/blxhtml.htm

I don't understand the idea of CDATA

  1. <![CDATA allows an XML processor to skip over until the occurrence of ]]> This can be useful in many situations, e.g. like in your example: to transport user generated data within an XML envelop.

    The data in between the CDATA section must not follow XML encoding rules and therefore can be transported as is. But there are other use cases as well:

    • When is a CDATA section necessary within a script tag?
    • What does <![CDATA[]]> in XML mean?
    • Are CDATA sections really unnecessary?
    • Useful article at www.ibm.com
  2. In your example the transported data is:Dear Customer. The <![CDATA[]] is just for the XML processor.


Related Topics



Leave a reply



Submit