How to Write Cdata Using Simplexmlelement

How to write CDATA using SimpleXmlElement?

Got it! I adapted the code from this great solution (archived version):

    <?php

// http://coffeerings.posterous.com/php-simplexml-and-cdata
class SimpleXMLExtended extends SimpleXMLElement {

public function addCData( $cdata_text ) {
$node = dom_import_simplexml( $this );
$no = $node->ownerDocument;

$node->appendChild( $no->createCDATASection( $cdata_text ) );
}

}

$xmlFile = 'config.xml';

// instead of $xml = new SimpleXMLElement( '<site/>' );
$xml = new SimpleXMLExtended( '<site/>' );

$xml->title = NULL; // VERY IMPORTANT! We need a node where to append

$xml->title->addCData( 'Site Title' );
$xml->title->addAttribute( 'lang', 'en' );

$xml->saveXML( $xmlFile );

?>

XML file generated:

    <?xml version="1.0"?>
<site>
<title lang="en"><![CDATA[Site Title]]></title>
</site>

Thank you Petah

PHP: How to handle ![CDATA[ with SimpleXMLElement?

You're probably not accessing it correctly. You can output it directly or cast it as a string. (in this example, the casting is superfluous, as echo automatically does it anyway)

$content = simplexml_load_string(
'<content><![CDATA[Hello, world!]]></content>'
);
echo (string) $content;

// or with parent element:

$foo = simplexml_load_string(
'<foo><content><![CDATA[Hello, world!]]></content></foo>'
);
echo (string) $foo->content;

You might have better luck with LIBXML_NOCDATA:

$content = simplexml_load_string(
'<content><![CDATA[Hello, world!]]></content>'
, null
, LIBXML_NOCDATA
);

Edit XML with CDATA using SimpleXMLElement

The XML written out contains no CDATA nodes, because you told SimpleXML to get rid of them when you passed LIBXML_NOCDATA. To keep them, simply don't pass that option!

Here's a live demo of the fixed version, as below:

<?php
$str_xml = "<myxml><mytag><![CDATA[In this content, 8 > 2 & 1 < 9, and 10 is a 10% of 100. ]]></mytag></myxml>";

try {
echo "XML with CDATA: \n\n";
$xml = simplexml_load_string($str_xml);
echo $xml->asXML() ."\n\n";
} catch(Exception $e){
echo "XML with CDATA: \n\n";
echo $e->getMessage() ."\n\n";
}

All the answers and comments you have read telling you you need to pass that option in order to use CDATA nodes with SimpleXML are quite simply wrong. The only problem is that the output of print_r, var_dump, etc, doesn't give a full representation of the data accessible by SimpleXML; that doesn't mean it's not there.

To get at the text in your example XML, you just need to cast the element to string (some contexts, such as an echo statement, do this automatically). As in this example:

$xml = simplexml_load_string($str_xml);
$tag_content = (string)$xml->mytag;
echo "Here is the content of the <mytag> node: $tag_content";

How to get CDATA using SimpleXMLElement

You need to explicitly cast the DATA element to string:

$xml = <<<EOF
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE WEBCNPDATA [<!ELEMENT WEBCNPDATA (SUCCESS,COUNT,SEARCHID,REPORTID,DATA)><!ELEMENT SUCCESS (#PCDATA)><!ELEMENT COUNT (#PCDATA)><!ELEMENT SEARCHID (#PCDATA)><!ELEMENT REPORTID (#PCDATA)><!ELEMENT DATA (#PCDATA)><!ATTLIST DATA ENCODING CDATA #FIXED "base64"><!ATTLIST DATA COMPRESSION CDATA #FIXED "gzip">]><WEBCNPDATA><SUCCESS>true</SUCCESS><COUNT>9</COUNT><SEARCHID>11</SEARCHID><REPORTID>1</REPORTID><DATA ENCODING="base64" COMPRESSION="gzip"><![CDATA[H4sIAMsHBlQAA71YTW8cNwy951cYPmsGIiXq47gF0l5q1+gE2aPh2ttgiwYObDct9tf3kZpdb9KL
5EMNQyONl29F6vGR8sPdy93z7mX/4J73Lzt9/PXbH7t7fbFc3W5sOOi42GDT68enz8vu5TR9eQKI
+3j1Xg3a49Cey/pYl0fD10Uz3fzwIyxtPNhjaWNbHK1O82b08+YaRjYe7LG0sS2ORqd5M7r++CuM
bDzYY2ljWxyNTvNmtFW/tubVVn3amkfbV3+233rz4YN6o+PBHksb2+LkzXHejJYWvWWN3tKit6zR
]]></DATA></WEBCNPDATA>
EOF;

$document = simplexml_load_string($xml);
// Cast to string
$data = (string) $document->DATA;
var_dump($data);

Output:

string(308) "H4sIAMsHBlQAA71YTW8cNwy951cYPmsGIiXq47gF0l5q1+gE2aPh2ttgiwYObDct9tf3kZpdb9KL
5EMNQyONl29F6vGR8sPdy93z7mX/4J73Lzt9/PXbH7t7fbFc3W5sOOi42GDT68enz8vu5TR9eQKI
+3j1Xg3a49Cey/pYl0fD10Uz3fzwIyxtPNhjaWNbHK1O82b08+YaRjYe7LG0sS2ORqd5M7r++CuM
bDzYY2ljWxyNTvNmtFW/tubVVn3amkfbV3+233rz4YN6o+PBHksb2+LkzXHejJYWvWWN3tKit6zR
"

Reading text in ` ![CDATA[...]] ` with SimpleXMLElement

SimpleXML reads CDATA nodes absolutely fine. The only problem you're having is that print_r, var_dump, and similar functions don't give an accurate representation of SimpleXML objects, because they are not implemented fully in PHP.

If you run echo $myNode->description you will see the content of the CDATA section just fine. The reason is that when you ask for a SimpleXMLElement to be converted to a string, it automatically combines all the text and CDATA content for you - but until you do, it remembers the distinction.

As a general case, to extract the string content of any element or attribute in SimpleXML, cast to string with (string)$myNode. This also prevents other issues, such as functions complaining about getting an object when they were expecting a string, or failure to serialize when saving to a session.

See also my previous answer at https://stackoverflow.com/a/13830559/157957

simplexml editing CDATA node

SimpleXML does not make CDATA elements accessible by default. You can either tell simplexml to skip them (default) or to read them (see: read cdata from a rss feed). If you read them, they are standard text values, so they get merged with other textnodes.

More control is offered by the Document Object ModelDocs, which offers a DOMCdataSection which extends from DOMText, the standard text node model.

Even though this is a different PHP library (DOM vs. SimpleXML), both are compatible to each other. For example a SimpleXMLElement can be converted into a DOMElement by using the dom_import_simplexml function.

If you post some code what you've done so far it should be easy to figure out how to access the CDATA sections you want to modify. Please provide as well some demo XML data so the example is more speaking.

PHP, SimpleXML, decoding entities in CDATA

The purpose of CDATA sections in XML is to encapsulate a block of text "as is" which would otherwise require special characters (in particular, >, < and &) to be escaped. A CDATA section containing the character & is the same as a normal text node containing &.

If a parser were to offer to ignore this, and pretend all CDATA nodes were really just text nodes, it would instantly break as soon as someone mentioned "P&O Cruises" - that & simply can't be there on its own (rather than as &, or &somethingElse;).

The LIBXML_NOCDATA is actually pretty useless with SimpleXML, because (string)$foo neatly combines any sequence of text and CDATA nodes into an ordinary PHP string. (Something which people frequently fail to notice, because print_r doesn't.) This isn't necessarily true of more systematic access methods, such as DOM, where you can manipulate text nodes and CDATA nodes as objects in their own right.

What it effectively does is go through the document, and wherever it encounters a CDATA section, it takes the content, escapes it, and puts it back as an ordinary text node, or "merges" it with any text nodes to either side. The text represented is identical, just stored in the document in a different way; you can see the difference if you export back to XML, as in this example:

$xml_string = "<person><name>Welcome aboard this <![CDATA[P&O Cruises]]> voyage!</name></person>";

$person = new SimpleXMLElement($xml_string);
echo 'CDATA retained: ', $person->asXML();
// CDATA retained: <?xml version="1.0"?>
// <person><name>Welcome aboard this <![CDATA[P&O Cruises]]> voyage!</name></person>

$person = new SimpleXMLElement($xml_string, LIBXML_NOCDATA);
echo 'CDATA merged: ', $person->asXML();
// CDATA merged: <?xml version="1.0"?>
// <person><name>Welcome aboard this P&O Cruises voyage!</name></person>

If the XML document you're parsing contains a CDATA section which actually contains entities, you need to take that string and unescape it completely independent of the XML. One common reason to do this (other than laziness with poorly understood libraries) is to treat something marked up in HTML as just any old string inside an XML document, like this:

<Comment>
<SubmittedBy>IMSoP</SubmittedBy>
<Text><![CDATA[I'm <em>really</em> bad at keeping my answers brief <tt>;)</tt>]]></Text>
</Comment>


Related Topics



Leave a reply



Submit