PHP pretty print HTML (not Tidy)
you're right, there seems to be no indentation for HTML (others are also confused). XML works, even with loaded code.
<?php
function tidyHTML($buffer) {
// load our document into a DOM object
$dom = new DOMDocument();
// we want nice output
$dom->preserveWhiteSpace = false;
$dom->loadHTML($buffer);
$dom->formatOutput = true;
return($dom->saveHTML());
}
// start output buffering, using our nice
// callback function to format the output.
ob_start("tidyHTML");
?>
<html>
<head>
<title>foo bar</title><meta name="bar" value="foo"><body><h1>bar foo</h1><p>It's like comparing apples to oranges.</p></body></html>
<?php
// this will be called implicitly, but we'll
// call it manually to illustrate the point.
ob_end_flush();
?>
result:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<title>foo bar</title>
<meta name="bar" value="foo">
</head>
<body>
<h1>bar foo</h1>
<p>It's like comparing apples to oranges.</p>
</body>
</html>
the same with saveXML() ...
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<title>foo bar</title>
<meta name="bar" value="foo"/>
</head>
<body>
<h1>bar foo</h1>
<p>It's like comparing apples to oranges.</p>
</body>
</html>
probably forgot to set preserveWhiteSpace=false before loadHTML?
disclaimer: i stole most of the demo code from tyson clugg/php manual comments. lazy me.
UPDATE: i now remember some years ago i tried the same thing and ran into the same problem. i fixed this by applying a dirty workaround (wasn't performance critical): i just somehow converted around between SimpleXML and DOM until the problem vanished. i suppose the conversion got rid of those nodes. maybe load with dom, import with
simplexml_import_dom
, then output the string, parse this with DOM again and then printed it pretty. as far as i remember this worked (but it was really slow).
Beautify HTML stored in a string on PHP
Using DomDocument
we load the html passing the LIBXML_HTML_NOIMPLIED
flag
which will prevent the loadHTML
method to add the extra html
wrapper.
We save as XML to get the nice indentation, while passing the $dom->documentElement
parameter to prevent the XML
header.
$html = '<body><div><p>hello</p><div></body>';
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHTML($html,LIBXML_HTML_NOIMPLIED);
$dom->formatOutput = true;
print $dom->saveXML($dom->documentElement);
This will output
<body>
<div>
<p>hello</p>
<div/>
</div>
</body>
Notice that the HTML
was fixed for you as the second div
should have been a closing tag, I assume.
If we pass the proper HTML
as the input string, the output will be as you require
$html = '<body><div><p>hello</p></div></body>';
<body>
<div>
<p>hello</p>
</div>
</body>
Is there a pretty print for PHP?
Both print_r()
and var_dump()
will output visual representations of objects within PHP.
$arr = array('one' => 1);
print_r($arr);
var_dump($arr);
How to keep PHP 'View Source' html output clean
That's something that's bugging me, too. The best you can do is using tidy to postprocess the text. Add this line to the start of your page (and be prepared for output buffering havoc when you encounter your first PHP error with output buffering on):
ob_start('ob_tidyhandler');
Tidying PHP and HTML Code?
You could use HTML Tidy from within PHP to clean up your output. Use ob_start() and friends to get the whole HTML output as a string, then send it through Tidy. You might want to use som sort of caching if you do this, though.
<?php
function callback($buffer)
{
// Clean up
$config = array(
'indent' => true,
'output-xhtml' => true,
'wrap' => 200);
return tidy_repair_string($buffer, $config, 'utf8');
}
// Do some output.
ob_start("callback");
?>
<html>
<body>
<p>Outputting stuff here</p>
<p>
Testing a broken tag:
<span> This span should be closed by Tidy.
</p>
</body>
</html>
<?php
ob_end_flush();
?>
PHP Tidy alternative to only tab-indent output
Two years later and there is still no library to achieve HTML output indentation without using implementations that rely on DOM API (ie. Tidy and alike).
I've developed library that tokenises HTML input using regular expression. None of the HTML is changed beyond adding the required spacing for indentation.
https://github.com/gajus/dindent
Related Topics
Inkscape Inside PHP/Apache Doesn't Render Fonts to Png
Cakephp-3.X: How to Change the Data Type of a Selected Alias
How to Determine the Memory Footprint (Size) of a Variable
Phpexcel Auto Size Column Width
New Csrf Token Per Request or Not
Save PHP Variables to a Text File
What Security Issues Should I Look Out for in PHP
How to Force PHP to Use Strings for Array Keys
Multiple Index Variables in PHP Foreach Loop
How to Parse JSON into a HTML Table Using PHP
Php: Best Way to Check If Input Is a Valid Number
Another Twitter Oauth Curl Access Token Request That Fails
Intl Extension PHP_Intl.Dll with Wamp
PHP Split Array into Smaller Even Arrays
Get All Child, Grandchild etc Nodes Under Parent Using PHP with MySQL Query Results