How to Savehtml of Domdocument Without HTML Wrapper

How to use PHP DOMDocument saveHTML($node) without added whitespace?

If you know your document is going to be valid XML as well, you can use saveXML() instead...

$html = '<html><body><div>123</div><div>456</div></body></html>';
$dom = new DOMDocument;
$dom->preserveWhiteSpace = true;
$dom->formatOutput = false;
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD);
$body = $dom->getElementsByTagName('body')->item(0);
echo $dom->saveXML($body);

which gives...

<body><div>123</div><div>456</div></body>

PHP DOMDocument without the DTD, head, and body tags?

Since PHP 5.3.6, you can use a node in echo $DOMDocument->saveHTML($the_node_you_want_to_show), before that, I've abused ->saveXML() with minor fixes. You must however have 1 surrounding included node (e.g. output is <div>...somecontent and nodex....</div>, or loop through the nodes children if you don't want have 1 surrounding tag;

$html = '';
foreach($rootnode->childNodes as $node){
$html .= $rootnode->ownerdocument->saveHTML($node);
}

How do I save as a HTML fragment, not as full DOM model

You need to initialize the DOM structure like this:

$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$html=$dom->saveHTML();

See PHP documentation:

LIBXML_HTML_NOIMPLIED (integer)

Sets HTML_PARSE_NOIMPLIED flag, which turns off the automatic adding of implied html/body... elements.

LIBXML_HTML_NODEFDTD (integer)

Sets HTML_PARSE_NODEFDTD flag, which prevents a default doctype being added when one is not found.

php DOMDocument: element ending up within another

A DomDocument has to have a single root element, so it will move all following siblings inside the first top-level element.

You could most easily address this by bookending your content with a container tag e.g.

$content = '<div><figure class="image image-style-align-left">
<img src="https://placekitten.com/g/200/300"></figure>
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p></div>';

How to avoid DOM parsing adding html doctype, head and body tags?

I'm actually looking for the same solution. I've been using the following method to do this, however the <p> around the text node will still be added when you do loadHTML(). I don't there's a way to get around that without using another parser, or there's some hidden flag to tell it to not do that.

This code:

<?php

function innerHTML($node){
$doc = new DOMDocument();
foreach ($node->childNodes as $child)
$doc->appendChild($doc->importNode($child, true));

return $doc->saveHTML();
}

$string = '
Some photos<br>
<span class="naslov_slike">photo_by_ile_IMG_1676-01</span><br />
<span class="naslov_slike">photo_by_ile_IMG_1699-01</span><br />
<span class="naslov_slike">photo_by_ile_IMG_1697-01</span><br />
<span class="naslov_slike">photo_by_ile_IMG_1695-01</span><br />
';

$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHTML($string);
$elements = $dom->getElementsByTagName('span');
$spans = array();
foreach($elements as $span) {
$spans[] = $span;
}
foreach($spans as $span) {
$span->parentNode->removeChild($span);
}

echo innerHTML( $dom->documentElement->firstChild );

Will output:

<p>Some photos<br><br><br><br><br></p>

However of course this solution does not keep the markup 100% intact, but it's close.

Replace and return partial HTML with DOMDocument without adding body, doctype, etc

you can use extra options available in loadHTML() to achieve what you want. Check the options parameter. Detail about the libxml constants here. And note that its available since PHP 5.4. Like:

...
$doc->loadHTML('<p>hello</p><img src="REPLACE" /><p></p>',
LIBXML_HTML_NOIMPLIED |
LIBXML_HTML_NODEFDTD);
...
$doc->saveHTML();

Update

If you see UTF-8 characters being changed to some odd characters, then using mb_convert_encoding can fix this, like:

$doc->loadHTML(
mb_convert_encoding('<p>hello</p><img src="REPLACE" /><p></p>', 'HTML-ENTITIES', 'UTF-8'),
LIBXML_HTML_NOIMPLIED |
LIBXML_HTML_NODEFDTD
);

DOMDocument-saveHTML isn't working

My_ friend this is not how it works.

You should have your edited HTML in the result of saveHTML() so:

$editedHtml = $dom->saveHTML()
var_dump($editedHtml);

Now you should see your changed HTML.

Explanation is that $page is completely different object that has nothing to do with $dom object.

Cheers!

PHP DOMDocument saveHTML not encoding cyrillic correctly

The problem is with $dom->saveHTML();, you need to add the root node as a parameter, like this:

return $dom->saveHTML((new \DOMXPath($dom))->query('/')->item(0));

The suddenly it renders the page differently, with substitution. If it does not, double check the values of $dom->encoding and $dom->substituteEntities, they should read UTF-8 and TRUE.



Related Topics



Leave a reply



Submit