How to get values inside ![CDATA[values]] using php DOM?
Working with PHP DOM is fairly straightforward, and is very similar to Javascript's DOM.
Here are the important classes:
- DOMNode — The base class for anything that can be traversed inside an XML/HTML document, including text nodes, comment nodes, and CDATA nodes
- DOMElement — The base class for tags.
- DOMDocument — The base class for documents. Contains the methods to load/save XML, as well as normal DOM document methods (see below).
There are a few staple methods and properties:
DOMDocument->load()
— After creating a newDOMDocument
, use this method on that object to load from a file.DOMDocument->getElementsByTagName()
— this method returns a node list of all elements in the document with the given tag name. Then you can iterate (foreach
) on this list.DOMNode->childNodes
— A node list of all children of a node. (Remember, a CDATA section is a node!)DOMNode->nodeType
— Get the type of a node. CDATA nodes have type XML_CDATA_SECTION_NODE, which is a constant with the value 4.DOMNode->textContent
— get the text content of any node.
Note: Your CDATA sections are malformed. I don't know why there is an extra ]]
in the first one, or an unclosed CDATA section at the end of the line, but I think it should simply be:
<![CDATA[Aghia Paraskevi, Skiatos, Greece]]>
Putting this all together we:
- Create a new document object and load the XML
- Get all
Destination
elements by tag name and iterate over the list - Iterate over all child nodes of each
Destination
element - Check if the node type is
XML_CDATA_SECTION_NODE
- If it is,
echo
thetextContent
of that node.
Code:
$doc = new DOMDocument();
$doc->load('test.xml');
$destinations = $doc->getElementsByTagName("Destination");
foreach ($destinations as $destination) {
foreach($destination->childNodes as $child) {
if ($child->nodeType == XML_CDATA_SECTION_NODE) {
echo $child->textContent . "<br/>";
}
}
}
Result:
Aghia Paraskevi, Skiatos, Greece
Amettla, Spain
Amoliani, Greece
Boblingen, Germany
SimpleXML: handle CDATA tag presence in node value
As far as a parser like SimpleXML is concerned, the <![CDATA[
is not part of the text content of the XML element, it's just part of the serialization of that content. A similar confusion is discussed here: PHP, SimpleXML, decoding entities in CDATA
What you need to look at is the "inner XML" of that element, which is tricky in SimpleXML (->asXML()
will give you the "outer XML", e.g. <Dest><![CDATA[some text...]]></Dest>
).
Your best bet here is to use the DOM which gives you more access to the detailed structure of the document, rather than trying to give you the content, so distinguishes "text nodes" and "CDATA nodes". However, it's worth double-checking that you do actually need this, as for 99.9% of use cases, you shouldn't care whether somebody sent you <foo>bar & baz</foo>
or <foo><![CDATA[bar & baz]]></foo>
, since by definition they represent the same string.
Saving CDATA with saveXML in PHP
Let's say your test.xml
file is
<?xml version="1.0"?>
<Root>
<FirstNode>
<SomeNode>a</SomeNode>
</FirstNode>
</Root>
You have two possibilities :
Or you want add in SomeNode
<SomeNode>
a
<![CDATA[whatever]]>
</SomeNode>
Then you can do it like that :
$cdata = $doc->createCDATASection( 'whatever' );
$doc->getElementsByTagName("SomeNode")
->item(0)
->appendChild($cdata);
Or you want to replace in SomeNode
<SomeNode>
<![CDATA[whatever]]>
</SomeNode>
Then you can achieve it like that :
$cdata = $doc->createCDATASection( 'whatever' );
$oldNode = $doc->getElementsByTagName("SomeNode")
->item(0);
$oldNode->parentNode
->replaceChild($cdata,$oldNode);
Parsing CDATA in PHP with DOMDocument
createCDATASection() creates an XML cdata node, createElement() or createTextNode() create other node types.
You need to append it to your description element node:
$description = $newItem->appendChild($xml->createElement('description'));
$description->appendChild($xml->createCDATASection($_POST['description']));
Modify ![CDATA[]] in PHP? (XML)
That is true for SimpleXML. CDATA Sections are a special kind of text nodes. They are actually here to make embedded parts more readable for humans. SimpleXML does not really handle XML nodes so you will have to let it convert them to standard text nodes.
If you have a JS or HTML fragment in XML it is easier to read if the special characters like <
are not escaped. And this is what CDATA sections are for (and some backwards compatibility for browsers).
So to modify a CDATA section and keep it, you will have to use DOM. DOM actually knows about the different node types. Here is a small example:
$xml = '<link><![CDATA[https://google.de]]></link>';
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
foreach ($xpath->evaluate('//link/text()') as $linkValue) {
$linkValue->data .= '?abc';
}
echo $document->saveXml();
Output:
<?xml version="1.0"?>
<link><![CDATA[https://google.de?abc]]></link>
How to change value of cdata inside a .xml file and then again save it using php
You change the text inside a CDATA section by setting the nodeValue
of that CDATA node (DOMCdataSection in PHP):
$child->nodeValue = $change;
Output (excerpt & simplified):
...
<flip isText="true" xPos="600" yPos="470" openDelay="8" openDuration="2" tweenMethod="easeOut" tweenType="Elastic" action="link" url="http://activeden.net/">
<text id="name" ... color="0x802020"><![CDATA[changed ABCD]]></text>
</flip>
<flip isText="true" xPos="300" yPos="30" openDelay="2" openDuration="2" tweenMethod="easeOut" tweenType="Elastic">
<text font="Sansation_Regular" ... ><![CDATA[changed HAPPY]]></text>
</flip>
...
For your second question you have on how to save the document: The method to save the XML document is DOMDocument::save
:
$filename = '/path/to/file.xml';
$doc->save($filename);
simpleXML get value from CDATA
In your simplexml_load_file()
, you need to add the parameter LIBXML_NOCDATA
flag:
$url = "http://www.ss.lv/lv/real-estate/flats/riga/hand_over/rss/";
$result = simplexml_load_file($url, 'SimpleXMLElement', LIBXML_NOCDATA);
// ^^ here
foreach($result->channel->item as $item) {
$title = (string) $item->title;
$desc = (string) $item->description;
$dom = new DOMDocument($desc);
$dom->loadHTML($desc);
$bold_tags = $dom->getElementsByTagName('b');
foreach($bold_tags as $b) {
echo $b->nodeValue . '<br/>';
}
}
Related Topics
Codeigniter - Correct Way to Link to Another Page in a View
Reusing the Same Curl Handle. Big Performance Increase
How to Remove Empty Paragraph Tags from String
How to Log All API Calls Using Guzzle 6
Get the Price of an Item on Steam Community Market with PHP and Regex
How to Send Email with Smtp in PHP
PHP - Flushing While Loop Data with Ajax
How to Append to a Xml File with PHP Preferably with Simplexml
Codeigniter Assets Folder Best Practice
Get Start and End Days for a Given Week in PHP
Group Array Values Based on Key in PHP
Creating an Image Without Storing It as a Local File
How to Tell If a Timezone Observes Daylight Saving at Any Time of the Year