How to Get Values Inside <![Cdata[Values]] > Using PHP Dom

How to get values inside ![CDATA[values]] using php DOM?

Working with PHP DOM is fairly straightforward, and is very similar to Javascript's DOM.

Here are the important classes:

  • DOMNode — The base class for anything that can be traversed inside an XML/HTML document, including text nodes, comment nodes, and CDATA nodes
  • DOMElement — The base class for tags.
  • DOMDocument — The base class for documents. Contains the methods to load/save XML, as well as normal DOM document methods (see below).

There are a few staple methods and properties:

  • DOMDocument->load() — After creating a new DOMDocument, use this method on that object to load from a file.
  • DOMDocument->getElementsByTagName() — this method returns a node list of all elements in the document with the given tag name. Then you can iterate (foreach) on this list.
  • DOMNode->childNodes — A node list of all children of a node. (Remember, a CDATA section is a node!)
  • DOMNode->nodeType — Get the type of a node. CDATA nodes have type XML_CDATA_SECTION_NODE, which is a constant with the value 4.
  • DOMNode->textContent — get the text content of any node.

Note: Your CDATA sections are malformed. I don't know why there is an extra ]] in the first one, or an unclosed CDATA section at the end of the line, but I think it should simply be:

<![CDATA[Aghia Paraskevi, Skiatos, Greece]]>

Putting this all together we:

  1. Create a new document object and load the XML
  2. Get all Destination elements by tag name and iterate over the list
  3. Iterate over all child nodes of each Destination element
  4. Check if the node type is XML_CDATA_SECTION_NODE
  5. If it is, echo the textContent of that node.

Code:

$doc = new DOMDocument();
$doc->load('test.xml');
$destinations = $doc->getElementsByTagName("Destination");
foreach ($destinations as $destination) {
foreach($destination->childNodes as $child) {
if ($child->nodeType == XML_CDATA_SECTION_NODE) {
echo $child->textContent . "<br/>";
}
}
}

Result:

Aghia Paraskevi, Skiatos, Greece

Amettla, Spain

Amoliani, Greece

Boblingen, Germany

SimpleXML: handle CDATA tag presence in node value

As far as a parser like SimpleXML is concerned, the <![CDATA[ is not part of the text content of the XML element, it's just part of the serialization of that content. A similar confusion is discussed here: PHP, SimpleXML, decoding entities in CDATA

What you need to look at is the "inner XML" of that element, which is tricky in SimpleXML (->asXML() will give you the "outer XML", e.g. <Dest><![CDATA[some text...]]></Dest>).

Your best bet here is to use the DOM which gives you more access to the detailed structure of the document, rather than trying to give you the content, so distinguishes "text nodes" and "CDATA nodes". However, it's worth double-checking that you do actually need this, as for 99.9% of use cases, you shouldn't care whether somebody sent you <foo>bar & baz</foo> or <foo><![CDATA[bar & baz]]></foo>, since by definition they represent the same string.

Saving CDATA with saveXML in PHP

Let's say your test.xml file is

<?xml version="1.0"?>
<Root>
<FirstNode>
<SomeNode>a</SomeNode>
</FirstNode>
</Root>

You have two possibilities :

Or you want add in SomeNode

<SomeNode>
a
<![CDATA[whatever]]>
</SomeNode>

Then you can do it like that :

$cdata = $doc->createCDATASection( 'whatever' );
$doc->getElementsByTagName("SomeNode")
->item(0)
->appendChild($cdata);

Or you want to replace in SomeNode

<SomeNode>
<![CDATA[whatever]]>
</SomeNode>

Then you can achieve it like that :

$cdata = $doc->createCDATASection( 'whatever' );
$oldNode = $doc->getElementsByTagName("SomeNode")
->item(0);
$oldNode->parentNode
->replaceChild($cdata,$oldNode);

Parsing CDATA in PHP with DOMDocument

createCDATASection() creates an XML cdata node, createElement() or createTextNode() create other node types.

You need to append it to your description element node:

$description = $newItem->appendChild($xml->createElement('description'));
$description->appendChild($xml->createCDATASection($_POST['description']));

Modify ![CDATA[]] in PHP? (XML)

That is true for SimpleXML. CDATA Sections are a special kind of text nodes. They are actually here to make embedded parts more readable for humans. SimpleXML does not really handle XML nodes so you will have to let it convert them to standard text nodes.

If you have a JS or HTML fragment in XML it is easier to read if the special characters like < are not escaped. And this is what CDATA sections are for (and some backwards compatibility for browsers).

So to modify a CDATA section and keep it, you will have to use DOM. DOM actually knows about the different node types. Here is a small example:

$xml = '<link><![CDATA[https://google.de]]></link>';

$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);

foreach ($xpath->evaluate('//link/text()') as $linkValue) {
$linkValue->data .= '?abc';
}
echo $document->saveXml();

Output:

<?xml version="1.0"?>
<link><![CDATA[https://google.de?abc]]></link>

How to change value of cdata inside a .xml file and then again save it using php

You change the text inside a CDATA section by setting the nodeValue of that CDATA node (DOMCdataSection in PHP):

$child->nodeValue = $change;

Output (excerpt & simplified):

    ...
<flip isText="true" xPos="600" yPos="470" openDelay="8" openDuration="2" tweenMethod="easeOut" tweenType="Elastic" action="link" url="http://activeden.net/">
<text id="name" ... color="0x802020"><![CDATA[changed ABCD]]></text>
</flip>

<flip isText="true" xPos="300" yPos="30" openDelay="2" openDuration="2" tweenMethod="easeOut" tweenType="Elastic">
<text font="Sansation_Regular" ... ><![CDATA[changed HAPPY]]></text>
</flip>

...

For your second question you have on how to save the document: The method to save the XML document is DOMDocument::save:

$filename = '/path/to/file.xml';
$doc->save($filename);

simpleXML get value from CDATA

In your simplexml_load_file(), you need to add the parameter LIBXML_NOCDATA flag:

$url = "http://www.ss.lv/lv/real-estate/flats/riga/hand_over/rss/";
$result = simplexml_load_file($url, 'SimpleXMLElement', LIBXML_NOCDATA);
// ^^ here
foreach($result->channel->item as $item) {
$title = (string) $item->title;
$desc = (string) $item->description;
$dom = new DOMDocument($desc);
$dom->loadHTML($desc);
$bold_tags = $dom->getElementsByTagName('b');
foreach($bold_tags as $b) {
echo $b->nodeValue . '<br/>';
}
}


Related Topics



Leave a reply



Submit