How to use PHP DOMDocument saveHTML($node) without added whitespace?
If you know your document is going to be valid XML as well, you can use saveXML()
instead...
$html = '<html><body><div>123</div><div>456</div></body></html>';
$dom = new DOMDocument;
$dom->preserveWhiteSpace = true;
$dom->formatOutput = false;
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD);
$body = $dom->getElementsByTagName('body')->item(0);
echo $dom->saveXML($body);
which gives...
<body><div>123</div><div>456</div></body>
PHP DOMDocument without the DTD, head, and body tags?
Since PHP 5.3.6, you can use a node in echo $DOMDocument->saveHTML($the_node_you_want_to_show)
, before that, I've abused ->saveXML()
with minor fixes. You must however have 1 surrounding included node (e.g. output is <div>...somecontent and nodex....</div>
, or loop through the nodes children if you don't want have 1 surrounding tag;
$html = '';
foreach($rootnode->childNodes as $node){
$html .= $rootnode->ownerdocument->saveHTML($node);
}
How do I save as a HTML fragment, not as full DOM model
You need to initialize the DOM structure like this:
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$html=$dom->saveHTML();
See PHP documentation:
LIBXML_HTML_NOIMPLIED
(integer)
SetsHTML_PARSE_NOIMPLIED
flag, which turns off the automatic adding of implied html/body... elements.
LIBXML_HTML_NODEFDTD
(integer)
SetsHTML_PARSE_NODEFDTD
flag, which prevents a default doctype being added when one is not found.
php DOMDocument: element ending up within another
A DomDocument
has to have a single root element, so it will move all following siblings inside the first top-level element.
You could most easily address this by bookending your content with a container tag e.g.
$content = '<div><figure class="image image-style-align-left">
<img src="https://placekitten.com/g/200/300"></figure>
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p></div>';
How to avoid DOM parsing adding html doctype, head and body tags?
I'm actually looking for the same solution. I've been using the following method to do this, however the <p>
around the text node will still be added when you do loadHTML()
. I don't there's a way to get around that without using another parser, or there's some hidden flag to tell it to not do that.
This code:
<?php
function innerHTML($node){
$doc = new DOMDocument();
foreach ($node->childNodes as $child)
$doc->appendChild($doc->importNode($child, true));
return $doc->saveHTML();
}
$string = '
Some photos<br>
<span class="naslov_slike">photo_by_ile_IMG_1676-01</span><br />
<span class="naslov_slike">photo_by_ile_IMG_1699-01</span><br />
<span class="naslov_slike">photo_by_ile_IMG_1697-01</span><br />
<span class="naslov_slike">photo_by_ile_IMG_1695-01</span><br />
';
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHTML($string);
$elements = $dom->getElementsByTagName('span');
$spans = array();
foreach($elements as $span) {
$spans[] = $span;
}
foreach($spans as $span) {
$span->parentNode->removeChild($span);
}
echo innerHTML( $dom->documentElement->firstChild );
Will output:
<p>Some photos<br><br><br><br><br></p>
However of course this solution does not keep the markup 100% intact, but it's close.
Replace and return partial HTML with DOMDocument without adding body, doctype, etc
you can use extra options available in loadHTML() to achieve what you want. Check the options
parameter. Detail about the libxml constants here. And note that its available since PHP 5.4. Like:
...
$doc->loadHTML('<p>hello</p><img src="REPLACE" /><p></p>',
LIBXML_HTML_NOIMPLIED |
LIBXML_HTML_NODEFDTD);
...
$doc->saveHTML();
Update
If you see UTF-8 characters being changed to some odd characters, then using mb_convert_encoding can fix this, like:
$doc->loadHTML(
mb_convert_encoding('<p>hello</p><img src="REPLACE" /><p></p>', 'HTML-ENTITIES', 'UTF-8'),
LIBXML_HTML_NOIMPLIED |
LIBXML_HTML_NODEFDTD
);
DOMDocument-saveHTML isn't working
My_ friend this is not how it works.
You should have your edited HTML in the result of saveHTML()
so:
$editedHtml = $dom->saveHTML()
var_dump($editedHtml);
Now you should see your changed HTML.
Explanation is that $page
is completely different object that has nothing to do with $dom
object.
Cheers!
PHP DOMDocument saveHTML not encoding cyrillic correctly
The problem is with $dom->saveHTML();
, you need to add the root node as a parameter, like this:
return $dom->saveHTML((new \DOMXPath($dom))->query('/')->item(0));
The suddenly it renders the page differently, with substitution. If it does not, double check the values of $dom->encoding
and $dom->substituteEntities
, they should read UTF-8
and TRUE
.
Related Topics
PHP File_Get_Contents() and Setting Request Headers
Best Way to Avoid Duplicate Entry into MySQL Database
How to Savehtml of Domdocument Without HTML Wrapper
Understanding MVC Views in PHP
Run PHP Script as Daemon Process
The Plugin Generated X Characters of Unexpected Output During Activation (Wordpress)
PHP Regular Expressions: No Ending Delimiter '^' Found In
Slowness Found When Base 64 Image Select and Encode from Database
Automated or Regular Backup of MySQL Data
Unicode Character in PHP String
Is There a PHP Function That Can Escape Regex Patterns Before They Are Applied
PHP Error: "Cannot Pass Parameter 2 by Reference"
Merge Two Arrays as Key Value Pairs in PHP