PHP "Pretty Print" HTML (Not Tidy)

PHP pretty print HTML (not Tidy)

you're right, there seems to be no indentation for HTML (others are also confused). XML works, even with loaded code.

<?php
function tidyHTML($buffer) {
// load our document into a DOM object
$dom = new DOMDocument();
// we want nice output
$dom->preserveWhiteSpace = false;
$dom->loadHTML($buffer);
$dom->formatOutput = true;
return($dom->saveHTML());
}

// start output buffering, using our nice
// callback function to format the output.
ob_start("tidyHTML");

?>
<html>
<head>
<title>foo bar</title><meta name="bar" value="foo"><body><h1>bar foo</h1><p>It's like comparing apples to oranges.</p></body></html>
<?php
// this will be called implicitly, but we'll
// call it manually to illustrate the point.
ob_end_flush();
?>

result:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<title>foo bar</title>
<meta name="bar" value="foo">
</head>
<body>
<h1>bar foo</h1>
<p>It's like comparing apples to oranges.</p>
</body>
</html>

the same with saveXML() ...

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<title>foo bar</title>
<meta name="bar" value="foo"/>
</head>
<body>
<h1>bar foo</h1>
<p>It's like comparing apples to oranges.</p>
</body>
</html>

probably forgot to set preserveWhiteSpace=false before loadHTML?

disclaimer: i stole most of the demo code from tyson clugg/php manual comments. lazy me.


UPDATE: i now remember some years ago i tried the same thing and ran into the same problem. i fixed this by applying a dirty workaround (wasn't performance critical): i just somehow converted around between SimpleXML and DOM until the problem vanished. i suppose the conversion got rid of those nodes. maybe load with dom, import with simplexml_import_dom, then output the string, parse this with DOM again and then printed it pretty. as far as i remember this worked (but it was really slow).

Beautify HTML stored in a string on PHP

Using DomDocument we load the html passing the LIBXML_HTML_NOIMPLIED flag

which will prevent the loadHTML method to add the extra html wrapper.

We save as XML to get the nice indentation, while passing the $dom->documentElement parameter to prevent the XML header.

$html = '<body><div><p>hello</p><div></body>';

$dom = new DOMDocument();

$dom->preserveWhiteSpace = false;
$dom->loadHTML($html,LIBXML_HTML_NOIMPLIED);
$dom->formatOutput = true;

print $dom->saveXML($dom->documentElement);

This will output

<body>
<div>
<p>hello</p>
<div/>
</div>
</body>

Notice that the HTML was fixed for you as the second div should have been a closing tag, I assume.

If we pass the proper HTML as the input string, the output will be as you require

$html = '<body><div><p>hello</p></div></body>';

<body>
<div>
<p>hello</p>
</div>
</body>

Is there a pretty print for PHP?

Both print_r() and var_dump() will output visual representations of objects within PHP.

$arr = array('one' => 1);
print_r($arr);
var_dump($arr);

How to keep PHP 'View Source' html output clean

That's something that's bugging me, too. The best you can do is using tidy to postprocess the text. Add this line to the start of your page (and be prepared for output buffering havoc when you encounter your first PHP error with output buffering on):

ob_start('ob_tidyhandler');

Tidying PHP and HTML Code?

You could use HTML Tidy from within PHP to clean up your output. Use ob_start() and friends to get the whole HTML output as a string, then send it through Tidy. You might want to use som sort of caching if you do this, though.

<?php

function callback($buffer)
{
// Clean up

$config = array(
'indent' => true,
'output-xhtml' => true,
'wrap' => 200);

return tidy_repair_string($buffer, $config, 'utf8');
}

// Do some output.

ob_start("callback");
?>
<html>
<body>
<p>Outputting stuff here</p>
<p>
Testing a broken tag:
<span> This span should be closed by Tidy.
</p>
</body>
</html>
<?php
ob_end_flush();

?>

PHP Tidy alternative to only tab-indent output

Two years later and there is still no library to achieve HTML output indentation without using implementations that rely on DOM API (ie. Tidy and alike).

I've developed library that tokenises HTML input using regular expression. None of the HTML is changed beyond adding the required spacing for indentation.

https://github.com/gajus/dindent



Related Topics



Leave a reply



Submit