Why doesn't var_dump work with DOMDocument objects, while print($dom-saveHTML()) does?
Update: As of PHP 5.4.1 you can finally var_dump
DOM objects. See https://gist.github.com/2499678
It's a bug:
- https://bugs.php.net/bug.php?id=48527
Can't print_r domDocument
When you create a DOMDocument instance, you have a PHP object. The DOM classes do not implement a helpful __toString
functionality.
To get HTML from a DOMDocument instance, you'll need to use saveHTML
:
print_r($dom->saveHTML());
NB that your question suggests you are actually looking at a collection of elements (a DOMNodeList
) rather than an actual DOMDocument instance. Depending on your code, you'll need to extract the code for these individually:
foreach ($elements as $el) {
print_r($dom->saveHTML($el)); // use saveXML if you are using a version before 5.3.6
}
Why does PHP treats this DOM object like an array?
childNodes
is a property of DOMNodeList
type. The reason why var_dump doesn't show anything about it is simply because var_dump shows only those class properties that have been declared by their developers by calling such C-functions as
ZEND_API int zend_declare_property(...)
ZEND_API int zend_declare_property_null(...)
ZEND_API int zend_declare_property_bool(...)
ZEND_API int zend_declare_property_long(...)
ZEND_API int zend_declare_property_double(...)
ZEND_API int zend_declare_property_string(...)
ZEND_API int zend_declare_property_stringl(...)
Source: answer by akond: Why doesn't var_dump work with DomDocument objects, while print($dom->saveHTML()) does?
That is, developers of DOM extension chose not to expose the structure of DOMNodeList
class.
The reason why you can iterate through DOMNodeList
is because it implements Traversable
interface which signals that the class can be iterated through by using foreach
.
Problem with iterating throught DOM using DOMDocument
I think the main problem is that when you use getElementsByTagName()
, this returns a list of nodes (actually a DOMNodeList
). So when you want to access (for example) the first item for that tag, you will need to reference the first item in an array.
If you extended your initial code to get the nested tag elements, you could end up with the following code, which always uses [0]
on the result of getElementsByTagName()
to pick out the first item.
$title = $doc->getElementById('info')->childNodes->item(1)->nodeValue;
$volume_list = $doc->getElementById('list')->getElementsByTagName('dl');
$a = $volume_list[0]->getElementsByTagName('dd')[0]->getElementsByTagName('a');
echo $a[0]->getAttribute('href');
DOMDocument-saveHTML isn't working
My_ friend this is not how it works.
You should have your edited HTML in the result of saveHTML()
so:
$editedHtml = $dom->saveHTML()
var_dump($editedHtml);
Now you should see your changed HTML.
Explanation is that $page
is completely different object that has nothing to do with $dom
object.
Cheers!
Why am I not getting back any images here?
It appears $html's contents stop at the tag for this page. Any idea why?
Yes, you must provide this page with a valid user agent.
$url = 'http://www.w3schools.com/js/js_loop_for.asp';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
curl_exec($ch);
outputs everything to the ending </html>
including your requested <img border="0" width="336" height="69" src="/images/w3schoolslogo.gif" alt="W3Schools.com" style="margin-top:5px;" />
When a simple wget or curl without the user agent returns only up to the <body>
tag.
$url = 'http://www.w3schools.com/js/js_loop_for.asp';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($ch);
$doc = new DOMDocument();
$doc->loadHTML($html);
$xml = simplexml_import_dom($doc);
$images = $xml->xpath('//img');
var_dump($images);
die();
EDIT: My first post stated that there was still an issue with xpath... I was just not doing my due diligence and the updated code above works great. I forgot to force curl to output to a string rather then print to the screen(as it does by default).
HTML DOMNodelist?
Some DOMDocument
debugging hints.
If applicable upgrade to the latest PHP 5.4 because it will give you more information on
var_dump
forDOMDocument
and friends.
I take your small example and will add some hints how to debug the code:
$dom = new DOMDocument;
$html = 'http://localhost/foo/index.php';
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $node) {
echo $dom->saveHtml($node), PHP_EOL;
}
Did the loading work? That is this line:
$dom->loadHTML($html);
You can take a look inside the document by outputting it's content. If you display that in the browser you need to look into the source of your document or you just change the output with htmlspecialchars
:
var_dump(htmlspecialchars($dom->saveHTML()));
This will give you the documented as loaded in the HTML variant verbatim inside your browser.
The next part you might want to debug is the result of getElementsByTagName
:
foreach ($dom->getElementsByTagName('a') as $node) {
First assign it to a variable, and then check the length
if it's not NULL
or FALSE
:
$aTags = $dom->getElementsByTagName('a');
var_dump(htmlspecialchars($aTags), $aTags->length());
The length will tell you how many elements were matched.
Example/Demo:
<?php
$dom = new DOMDocument;
$html = 'http://localhost/foo/index.php';
$dom->loadHTML($html);
echo 'Document HTML loaded: ', var_dump($dom->saveHTML()), "\n";
$aTags = $dom->getElementsByTagName('a');
echo 'A Elements found: ', var_dump($aTags->length), "\n";
foreach ($aTags as $node) {
echo $dom->saveHtml($node), "\n";
}
Output:
Document HTML loaded: string(171) "<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>http://localhost/foo/index.php</p></body></html>
"
A Elements found: int(0)
Hope this is helpful.
PHP Dom XPath - Why isn't it working?
The reason you are not getting any results is that there is no <a>
elements that satisfy both conditions.
These are the links containing "3499047" in @href:
<a href="showthread.php?s=9bc55ab5990282a5353fb20d505d577e&t=3499047" id="thread_title_3499047">Tesco misprices and discussion (Thread 12)</a>
<a href="showthread.php?s=9bc55ab5990282a5353fb20d505d577e&t=3499047">1</a>
<a href="showthread.php?s=9bc55ab5990282a5353fb20d505d577e&t=3499047&page=2">2</a>
<a href="showthread.php?s=9bc55ab5990282a5353fb20d505d577e&t=3499047&page=3">3</a>
<a href="showthread.php?s=9bc55ab5990282a5353fb20d505d577e&t=3499047&page=110">Last Page</a>
<a href="member.php?s=9bc55ab5990282a5353fb20d505d577e&find=lastposter&t=3499047" rel="nofollow">ExiledCockney</a>
<a href="misc.php?do=whoposted&t=3499047" onclick="who(3499047); return false;">2,184</a>
<a rel="shadowbox;width=732;height=527;player=iframe;" href="wow.php?t=3499047" target="_blank" style="display: block; width: 100%; height: 100%; cursor: pointer;">
<div style="width: 100%; height: 100%; background-image: url('http://images2.moneysavingexpert.com/images/forum_style_2/misc//wow_big_faint_grey.gif');">
<div style="padding: 12px 0px 0px 0px;">
<strong>3</strong>
</div>
</div>
</a>
As you can see, none of them contain "'font-weight:bold'" in a style attribute.
In case the markup on the page has elements with your desired combination when you view it in a browser, they might have been added via javascript. DOM will not run any JavaScript, so you have to check the markup fetched with DOM.
PHP using DOM to get anchors and modify them
I have done a similar thing not long ago. You can iterate over a DOMNodeList and then get the href attribute of the anchor.
$dom = new DOMDocument;
$dom->loadHTML($content);
foreach ($dom->getElementsByTagName('a') as $node) {
$original_url = $node->getAttribute('href');
// Do something here
$node->setAttribute('href', $var);
}
$html = $dom->saveHtml();
Related Topics
Enabling Postgresql Support in PHP on MAC Os X
Why Do I Have to Run "Composer Dump-Autoload" Command to Make Migrations Work in Laravel
Access Post Values in Symfony2 Request Object
What Is an .Inc and Why Use It
Phpexcel Auto Size Column Width
Redirect After Login on Wordpress
Getting Last Month's Date in PHP
How to Get the Subversion Revision Number in PHP
Deleting a File After User Download It
PHP Date() in Foreign Languages - E.G. Mar 25 Aoû 09
Difference Between PHP Echo and Return in Terms of a Jquery Ajax Call
Generate Cryptographically Secure Random Numbers in PHP
How to Handle Multiple File Upload Using PHP
Another Twitter Oauth Curl Access Token Request That Fails