DOMDocument in php
If you want to work with DOM you have to understand the concept. Everything in a DOM Document, including the DOMDocument, is a Node.
The DOMDocument is a hierarchical tree structure of nodes. It starts with a root node. That root node can have child nodes and all these child nodes can have child nodes on their own. Basically everything in a DOMDocument
is a node type of some sort, be it elements, attributes or text content.
HTML Legend:
/ \ UPPERCASE = DOMElement
HEAD BODY lowercase = DOMAttr
/ \ "Quoted" = DOMText
TITLE DIV - class - "header"
| \
"The Title" H1
|
"Welcome to Nodeville"
The diagram above shows a DOMDocument with some nodes. There is a root element (HTML) with two children (HEAD and BODY). The connecting lines are called axes. If you follow down the axis to the TITLE element, you will see that it has one DOMText leaf. This is important because it illustrates an often overlooked thing:
<title>The Title</title>
is not one, but two nodes. A DOMElement with a DOMText child. Likewise, this
<div class="header">
is really three nodes: the DOMElement with a DOMAttr holding a DOMText. Because all these inherit their properties and methods from DOMNode, it is essential to familiarize yourself with the DOMNode class.
In practise, this means the DIV you fetched is linked to all the other nodes in the document. You could go all the way to the root element or down to the leaves at any time. It's all there. You just have to query or traverse the document for the wanted information.
Whether you do that by iterating the childNodes
of the DIV
or use getElementByTagName()
or XPath is up to you. You just have to understand that you are not working with raw HTML, but with nodes representing that entire HTML document.
If you need help with extracting specific information from the document, you need to clarify what information you want to fetch from it. For instance, you could ask how to fetch all the links from the table and then we could answer something like:
$div = $dom->getElementById('showContent');
foreach ($div->getElementsByTagName('a') as $link)
{
echo $dom->saveXML($link);
}
But unless you are more specific, we can only guess which nodes might be relevant.
If you need more examples and code snippets on how to work with DOM browse through my previous answers to related questions:
- https://stackoverflow.com/search?q=user%3A208809+DOM
By now, there should be a snippet for every basic to medium UseCase you might have with DOM.
PHP DOMDocument - how to add Namespace declaration?
Use DOMDocument::createElementNS()
like this:
$root = $dom->createElementNS("http://www.sitemaps.org/schemas/sitemap/0.9", "urlset");
PHP DOMDocument: what is the nicest way to safely add text to an element
You will have to create the text node and append it. I described the problem in this answer: https://stackoverflow.com/a/22957785/2265374
However you can extend DOMDocument
and overload createElement*()
.
class MyDOMDocument extends DOMDocument {
public function createElement($name, $content = '') {
$node = parent::createElement($name);
if ((string)$content !== '') {
$node->appendChild($this->createTextNode($content));
}
return $node;
}
public function createElementNS($namespace, $name, $content = '') {
$node = parent::createElementNS($namespace, $name);
if ((string)$content !== '') {
$node->appendChild($this->createTextNode($content));
}
return $node;
}
}
$dom = new MyDOMDocument();
$root = $dom->appendChild($dom->createElement('foo'));
$root->appendChild($dom->createElement('bar', 'Company & Son'));
$root->appendChild($dom->createElementNS('urn:bar', 'bar', 'Company & Son'));
$dom->formatOutput = TRUE;
echo $dom->saveXml();
Output:
<?xml version="1.0"?>
<foo>
<bar>Company & Son</bar>
<bar xmlns="urn:bar">Company & Son</bar>
</foo>
Read XML File with DOMDocument in php
The XML uses a namespace, so you should use the namespace aware methods. They have the suffix _NS
.
$tns = 'http://www.testgroup.com/TestPDM';
$document = new DOMDocument();
$document->loadXml($xml);
foreach ($document->getElementsByTagNameNS($tns, "pdmNumber") as $node) {
var_dump($node->textContent);
}
Output:
string(6) "654321"
A better option is to use Xpath expression. They allow a more comfortable access to DOM nodes. In this case you have to register a prefix for the namespace that you can use in the Xpath expression:
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('t', 'http://www.testgroup.com/TestPDM');
var_dump(
$xpath->evaluate('string(/t:getPDMNumber/t:getPDMNumberResponse/t:pdmNumber)')
);
DomDocument and special characters
Solution:
$oDom = new DOMDocument();
$oDom->encoding = 'utf-8';
$oDom->loadHTML( utf8_decode( $sString ) ); // important!
$sHtml = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">';
$sHtml .= $oDom->saveHTML( $oDom->documentElement ); // important!
The saveHTML()
method works differently specifying a node.
You can use the main node ($oDom->documentElement
) adding the desired !DOCTYPE
manually.
Another important thing is utf8_decode()
.
All the attributes and the other methods of the DOMDocument
class, in my case, don't produce the desired result.
Using DOMDocument, is it possible to get all elements that exists within a certain DOM?
You can pass an asterisk *
with getElementsByTagName()
which returns all elements:
foreach($dom->getElementsByTagName('*') as $element ){
}
From the Manual:
name
The local name (without namespace) of the tag to match on. The special value * matches all tags.
How to add text node in new created element with PHP DOMDocument
You can set the content using $textContent property of DOMElement.
$newElement = $finalDom->createElement("script");
$newElement->setAttribute("src", "https://stackoverflow.com/");
$newElement->textContent = 'alert("ok");';
Get all the content from an element using PHP DOMDocument
You need the text content of the node:
$snippet .= $firstp->textContent;
PHP Docs: php.net DOMNode->textContent
Add element from one DOMDocument to another one PHP
One cannot import DOMDocument
directly: you should go a bit deeper in hierarchy, using documentElement property instead (that gives you a DOMElement
):
$import = $legacyDomDocument->importNode($legacyDomDocument2->documentElement, true);
BTW, looks like it's appropriate to use documentElement
when appending too:
$legacyDomDocument->documentElement->appendChild($import);
Here's demo illustrating the concept. Note the difference when you drop the intermediary documentElement
: the second document's contents are basically inserted adjacent to the root node, and that's hardly the desired outcome.
Related Topics
How to Get a File'S Extension in PHP
Simplest PHP Example For Retrieving User_Timeline With Twitter API Version 1.1
MySQL Query to Get Column Names
Why Would $_Files Be Empty When Uploading Files to PHP
Finding the Number of Days Between Two Dates
Send Email With PHP from HTML Form on Submit With the Same Script
How to Post Data in PHP Using File_Get_Contents
Extract a Single (Unsigned) Integer from a String
In PHP, What Is a Closure and Why Does It Use the "Use" Identifier
Escaping Quotation Marks in PHP
How to Search by Key=≫Value in a Multidimensional Array in PHP
How to Convert Json String to Array
PHP Display Image Blob from MySQL
How to Create a Pdo Parameterized Query With a Like Statement