Domdocument in PHP

DOMDocument in php

If you want to work with DOM you have to understand the concept. Everything in a DOM Document, including the DOMDocument, is a Node.

The DOMDocument is a hierarchical tree structure of nodes. It starts with a root node. That root node can have child nodes and all these child nodes can have child nodes on their own. Basically everything in a DOMDocument is a node type of some sort, be it elements, attributes or text content.

          HTML                               Legend: 
/ \ UPPERCASE = DOMElement
HEAD BODY lowercase = DOMAttr
/ \ "Quoted" = DOMText
TITLE DIV - class - "header"
| \
"The Title" H1
|
"Welcome to Nodeville"

The diagram above shows a DOMDocument with some nodes. There is a root element (HTML) with two children (HEAD and BODY). The connecting lines are called axes. If you follow down the axis to the TITLE element, you will see that it has one DOMText leaf. This is important because it illustrates an often overlooked thing:

<title>The Title</title>

is not one, but two nodes. A DOMElement with a DOMText child. Likewise, this

<div class="header">

is really three nodes: the DOMElement with a DOMAttr holding a DOMText. Because all these inherit their properties and methods from DOMNode, it is essential to familiarize yourself with the DOMNode class.

In practise, this means the DIV you fetched is linked to all the other nodes in the document. You could go all the way to the root element or down to the leaves at any time. It's all there. You just have to query or traverse the document for the wanted information.

Whether you do that by iterating the childNodes of the DIV or use getElementByTagName() or XPath is up to you. You just have to understand that you are not working with raw HTML, but with nodes representing that entire HTML document.

If you need help with extracting specific information from the document, you need to clarify what information you want to fetch from it. For instance, you could ask how to fetch all the links from the table and then we could answer something like:

$div = $dom->getElementById('showContent');
foreach ($div->getElementsByTagName('a') as $link)
{
echo $dom->saveXML($link);
}

But unless you are more specific, we can only guess which nodes might be relevant.

If you need more examples and code snippets on how to work with DOM browse through my previous answers to related questions:

  • https://stackoverflow.com/search?q=user%3A208809+DOM

By now, there should be a snippet for every basic to medium UseCase you might have with DOM.

PHP DOMDocument - how to add Namespace declaration?

Use DOMDocument::createElementNS() like this:

$root = $dom->createElementNS("http://www.sitemaps.org/schemas/sitemap/0.9", "urlset");

PHP DOMDocument: what is the nicest way to safely add text to an element

You will have to create the text node and append it. I described the problem in this answer: https://stackoverflow.com/a/22957785/2265374

However you can extend DOMDocument and overload createElement*().

class MyDOMDocument extends DOMDocument {

public function createElement($name, $content = '') {
$node = parent::createElement($name);
if ((string)$content !== '') {
$node->appendChild($this->createTextNode($content));
}
return $node;
}

public function createElementNS($namespace, $name, $content = '') {
$node = parent::createElementNS($namespace, $name);
if ((string)$content !== '') {
$node->appendChild($this->createTextNode($content));
}
return $node;
}
}

$dom = new MyDOMDocument();
$root = $dom->appendChild($dom->createElement('foo'));
$root->appendChild($dom->createElement('bar', 'Company & Son'));
$root->appendChild($dom->createElementNS('urn:bar', 'bar', 'Company & Son'));

$dom->formatOutput = TRUE;
echo $dom->saveXml();

Output:

<?xml version="1.0"?>
<foo>
<bar>Company & Son</bar>
<bar xmlns="urn:bar">Company & Son</bar>
</foo>

Read XML File with DOMDocument in php

The XML uses a namespace, so you should use the namespace aware methods. They have the suffix _NS.

$tns = 'http://www.testgroup.com/TestPDM';
$document = new DOMDocument();
$document->loadXml($xml);
foreach ($document->getElementsByTagNameNS($tns, "pdmNumber") as $node) {
var_dump($node->textContent);
}

Output:

string(6) "654321"

A better option is to use Xpath expression. They allow a more comfortable access to DOM nodes. In this case you have to register a prefix for the namespace that you can use in the Xpath expression:

$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('t', 'http://www.testgroup.com/TestPDM');

var_dump(
$xpath->evaluate('string(/t:getPDMNumber/t:getPDMNumberResponse/t:pdmNumber)')
);

DomDocument and special characters

Solution:

$oDom = new DOMDocument();
$oDom->encoding = 'utf-8';
$oDom->loadHTML( utf8_decode( $sString ) ); // important!

$sHtml = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">';
$sHtml .= $oDom->saveHTML( $oDom->documentElement ); // important!

The saveHTML() method works differently specifying a node.
You can use the main node ($oDom->documentElement) adding the desired !DOCTYPE manually.
Another important thing is utf8_decode().
All the attributes and the other methods of the DOMDocument class, in my case, don't produce the desired result.

Using DOMDocument, is it possible to get all elements that exists within a certain DOM?

You can pass an asterisk * with getElementsByTagName() which returns all elements:

foreach($dom->getElementsByTagName('*') as $element ){

}

From the Manual:

name

The local name (without namespace) of the tag to match on. The special value * matches all tags.

How to add text node in new created element with PHP DOMDocument

You can set the content using $textContent property of DOMElement.

$newElement = $finalDom->createElement("script");
$newElement->setAttribute("src", "https://stackoverflow.com/");

$newElement->textContent = 'alert("ok");';

Get all the content from an element using PHP DOMDocument

You need the text content of the node:

$snippet .= $firstp->textContent;

PHP Docs: php.net DOMNode->textContent

Add element from one DOMDocument to another one PHP

One cannot import DOMDocument directly: you should go a bit deeper in hierarchy, using documentElement property instead (that gives you a DOMElement):

$import = $legacyDomDocument->importNode($legacyDomDocument2->documentElement, true);

BTW, looks like it's appropriate to use documentElement when appending too:

$legacyDomDocument->documentElement->appendChild($import);

Here's demo illustrating the concept. Note the difference when you drop the intermediary documentElement: the second document's contents are basically inserted adjacent to the root node, and that's hardly the desired outcome.



Related Topics



Leave a reply



Submit