How to Have HTML Text or Cdata Inside an Xml Attribute

Is it possible to have HTML text or CDATA inside an XML attribute?

If an attribute is not a tokenized or enumerated type, it is processed as CDATA. The details for how the attribute is processed can be found in the Extensible Markup Language (XML) 1.0 (Fifth Edition).

3.3.1 Attribute Types

XML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types. The string type may take any literal string as a value; the tokenized types are more constrained. The validity constraints noted in the grammar are applied after the attribute value has been normalized as described in 3.3.3 Attribute-Value Normalization.

[54]  AttType       ::=    StringType | TokenizedType | EnumeratedType
[55] StringType ::= 'CDATA'
[56] TokenizedType ::= 'ID' [VC: ID]
[VC: One ID per Element Type]
[VC: ID Attribute Default]
| 'IDREF' [VC: IDREF]
| 'IDREFS' [VC: IDREF]
| 'ENTITY' [VC: Entity Name]
| 'ENTITIES' [VC: Entity Name]
| 'NMTOKEN' [VC: Name Token]
| 'NMTOKENS' [VC: Name Token]

...

3.3.3 Attribute-Value Normalization

Before the value of an attribute is passed to the application or checked for validity, the XML processor MUST normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.

  1. All line breaks MUST have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm operates on text normalized in this way.
  2. Begin with a normalized value consisting of the empty string.
  3. For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following:

    • For a character reference, append the referenced character to the normalized value.
    • For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity.
    • For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value.
    • For another character, append the character to the normalized value.

If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.

Note that if the unnormalized attribute value contains a character reference to a white space character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9). This contrasts with the case where the unnormalized value contains a white space character (not a reference), which is replaced with a space character (#x20) in the normalized value and also contrasts with the case where the unnormalized value contains an entity reference whose replacement text contains a white space character; being recursively processed, the white space character is replaced with a space character (#x20) in the normalized value.

All attributes for which no declaration has been read SHOULD be treated by a non-validating processor as if declared CDATA.

It is an error if an attribute value contains a reference to an entity for which no declaration has been read.

Using CDATA in xml attribute?

You can't.

If you want to use characters with special meaning in an XML attribute value then you must use character references for those characters.

What does ![CDATA[]] in XML mean?

CDATA stands for Character Data and it means that the data in between these strings includes data that could be interpreted as XML markup, but should not be.

The key differences between CDATA and comments are:

  • As Richard points out, CDATA is still part of the document, while a comment is not.
  • In CDATA you cannot include the string ]]> (CDEnd), while in a comment -- is invalid.
  • Parameter Entity references are not recognized inside of comments.

This means given these four snippets of XML from one well-formed document:

<!ENTITY MyParamEntity "Has been expanded">


<!--
Within this comment I can use ]]>
and other reserved characters like <
&, ', and ", but %MyParamEntity; will not be expanded
(if I retrieve the text of this node it will contain
%MyParamEntity; and not "Has been expanded")
and I can't place two dashes next to each other.
-->


<![CDATA[
Within this Character Data block I can
use double dashes as much as I want (along with <, &, ', and ")
*and* %MyParamEntity; will be expanded to the text
"Has been expanded" ... however, I can't use
the CEND sequence. If I need to use CEND I must escape one of the
brackets or the greater-than sign using concatenated CDATA sections.
]]>


<description>An example of escaped CENDs</description>
<!-- This text contains a CEND ]]> -->
<!-- In this first case we put the ]] at the end of the first CDATA block
and the > in the second CDATA block -->
<data><![CDATA[This text contains a CEND ]]]]><![CDATA[>]]></data>
<!-- In this second case we put a ] at the end of the first CDATA block
and the ]> in the second CDATA block -->
<alternative><![CDATA[This text contains a CEND ]]]><![CDATA[]>]]></alternative>

Reading and Manipulating HTML inside XML CDATA

When using XmlNode.InnerXml on an XML element that contains CDATA... it comments out the first element.

Use XmlNode.InnerText and the CDATA will work.

If you must use InnerXml... you can omit the CDATA tags and it will also work.

If you need both InnerXml and CDATA, I don't have an answer... or a scenario where the two would be used together

How to read the attribute value which is within CDATA from XML in Talend

tMap or tJavaFlex set output value as:

row1.value.replaceAll("!\\[CDATA\\[","").replaceAll("\\]\\]","")

row1.value - change for You real name

CDATA for a value in XML

You cannot put a CDATA section inside a value, nor it is meaningful — values already can only contain text data, you don't need to wrap it inside a CDATA section.



Related Topics



Leave a reply



Submit