PHP Domdocument Error Handling

PHP DOMDocument error handling

From what I can gather from the documentation, handling warnings issued by this method is tricky because they are not generated by the libxml extension and thus cannot be handled by libxml_get_last_error(). You could either use the error suppression operator and check the return value for false...

if (@$xdoc->load($url) === false)
// ...handle it

...or register an error handler which throws an exception on error:

function exception_error_handler($errno, $errstr, $errfile, $errline ) {
throw new ErrorException($errstr, 0, $errno, $errfile, $errline);
}

and then catch it.

PHP DOMDocument error handling

Get rid on the $html variable and just load the file into $dom by doing @$dom->loadHTMLFile("http://example.com/");, then have an if statement below that to check if $dom is empty.

why would does this error handling function cause domdocument() to hang?

It's not the custom error handler that is causing the error.

I ran the following code without a custom error handler:

$output = file_get_contents("http://www.ssense.com/women/designers/all/all/page_1");
$dom = new DOMDocument();
$dom->loadHTML($output);
$xpath = new DOMXPath($dom);

When I ran it, I got a ton of warning messages similar to the ones in your error handler.

I think the problem you're seeing is just that your error handler is reporting errors that PHP isn't reporting by default.

By default, the level of error reporting is determined by your php.ini settings, but can be overridden by using the error_reporting() function. When you set your own error handler, you have to determine for yourself what level of reporting you want to deal with. Your error handler will be called on every error and notice, and so you will output error messages for everything unless you explicitly check the error being generated against the current error_reporting() level.

Remember that using the @ error suppression operator is just shorthand for setting error_reporting(0) for that line. For example, this line:

@$dom->loadHTML($output);

Is simply shorthand for the following:

$errorLevel = error_reporting(0);
$dom->loadHTML($output);
error_reporting($errorLevel);

Since normal PHP error reporting is entirely bypassed when using a custom handler, using the @ operator is meaningless since the current error_reporting() level is completely ignored. You would have to write custom code into your error handler to check the current error_reporting() level and handle it accordingly, for example:

function my_error_handler() {
if (error_reporting() == 0) {
return; // do nothing when error_reporting is disabled.
}

// normal error handling here
}

My assumption is that, when not using a custom error handler, PHP is simply defaulting to an error_reporting() level which is lower than the errors being produced.

If you add error_reporting(E_ALL | E_STRICT); to the top of your code, you will see those same errors even when you don't have your custom error handler enabled.

Handling 500 Internal Server Error from DomDocument in Laravel 5

You could handle this in Laravel app/Exceptions/Hnadler.php

NB: I have looked in the option of using DOMException handler which is available in PHP, however the error message you are getting in not really and exception by an I/O Warning.

This what PHP native DomException looks like:

/**
* DOM operations raise exceptions under particular circumstances, i.e.,
* when an operation is impossible to perform for logical reasons.
* @link http://php.net/manual/en/class.domexception.php
*/
class DOMException extends Exception {

/**
* @var
* (PHP 5)<br/>
* An integer indicating the type of error generated
* @link http://php.net/manual/en/class.domexception.php#domexception.props.code
*/
public $code;
}

So I came up with this because we can not use DomException to dictate this error since its not an Exception, you can add this in your app/Exceptions/Handler.php

 <?php

namespace App\Exceptions;

use Exception;
use Illuminate\Foundation\Exceptions\Handler as ExceptionHandler;

class Handler extends ExceptionHandler
{
/**
* A list of the exception types that should not be reported.
*
* @var array
*/
protected $dontReport = [
\Symfony\Component\HttpKernel\Exception\HttpException::class,
];

/**
* Report or log an exception.
*
* This is a great spot to send exceptions to Sentry, Bugsnag, etc.
*
* @param \Exception $e
* @return void
*/
public function report(Exception $e)
{
return parent::report($e);
}

/**
* Render an exception into an HTTP response.
*
* @param \Illuminate\Http\Request $request
* @param \Exception $e
* @return \Illuminate\Http\Response
*/
public function render($request, Exception $e)
{

$message = $e->getMessage();

if (str_contains($message, 'DOMDocument::loadHTMLFile(): I/O warning')) {
return redirect($request->fullUrl())->with('error', "This team does not exist");
}

//We could also handle DomException like so, but not Dom warning
if ($e instanceof \DomException){
return redirect($request->fullUrl())->with('error', "Your friendly message here");
}

}
}

NB: Be careful when modifying Handler.php, you might start having blank pages instead of Laravel whoops or errors if you don't know what you are doing. You can make a backup somewhere if you are unsure.

How to validate DOMDocument object

Use libxml_use_internal_errors to handle the errors by yourself. Example from http://php.net/manual/de/function.libxml-use-internal-errors.php

// enable user error handling
libxml_use_internal_errors(true);

// load the document
$doc = new DOMDocument;

if (!$doc->load('file.xml')) {
foreach (libxml_get_errors() as $error) {
// handle errors here
}

libxml_clear_errors();
}

DOMDocument::loadHTML error

Header, Nav and Section are elements from HTML5. Because HTML5 developers felt it is too difficult to remember Public and System Identifiers, the DocType declaration is just:

<!DOCTYPE html>

In other words, there is no DTD to check, which will make DOM use the HTML4 Transitional DTD and that doesnt contain those elements, hence the Warnings.

To surpress the Warnings, put

libxml_use_internal_errors(true);

before the call to loadHTML and

libxml_use_internal_errors(false);

after it.

An alternative would be to use https://github.com/html5lib/html5lib-php.

DomDocument Validate Error formatting

See schaffhirt's comment in PHP manual of DOMDocument::validate. It contains a class that does exactly that:

<?php
class MyDOMDocument {
private $_delegate;
private $_validationErrors;

public function __construct (DOMDocument $pDocument) {
$this->_delegate = $pDocument;
$this->_validationErrors = array();
}

public function __call ($pMethodName, $pArgs) {
if ($pMethodName == "validate") {
$eh = set_error_handler(array($this, "onValidateError"));
$rv = $this->_delegate->validate();
if ($eh) {
set_error_handler($eh);
}
return $rv;
}
else {
return call_user_func_array(array($this->_delegate, $pMethodName), $pArgs);
}
}
public function __get ($pMemberName) {
if ($pMemberName == "errors") {
return $this->_validationErrors;
}
else {
return $this->_delegate->$pMemberName;
}
}
public function __set ($pMemberName, $pValue) {
$this->_delegate->$pMemberName = $pValue;
}
public function onValidateError ($pNo, $pString, $pFile = null, $pLine = null, $pContext = null) {
$this->_validationErrors[] = preg_replace("/^.+: */", "", $pString);
}
}
?>

<?php
// $doc is a DOMDocument object
$myDoc = new MyDOMDocument($doc); // copy constructor

// do anything with $myDoc that you would with $doc

$isValid = $myDoc->validate(); // won't create warnings
if (!$isValid) {
print_r($myDoc->errors); // the array all warnings are collected in
}
?>

how to detect when savehtml fails?

I would recommend reading the PHP documentation page on DOMDocument::saveHTML.

When DOMDocument::saveHTML fails, it will return false. If you would like to disable libxml errors and fetch them on your own, consider using libxml_use_internal_errors.

<?php

libxml_use_internal_errors(true);

$doc = new DOMDocument('1.0');

$root = $doc->createElement('html');
$root = $doc->appendChild($root);

$head = $doc->createElement('head');
$head = $root->appendChild($head);

$title = $doc->createElement('title');
$title = $head->appendChild($title);

$text = $doc->createTextNode('This is the title');
$text = $title->appendChild($text);

$saved = $doc->saveHTML();

if ($saved === false) {
echo 'Unable to save DOM document';
} else {
echo $saved;
}

To fetch any errors that may have occurred when using libxml errors were disabled, use libxml_get_errors:

if ($errors = libxml_get_errors()) {
foreach ($errors as $error) {
echo $error->message . PHP_EOL;
}
}

If you don't care about any of the error messages, you could just use libxml_get_last_error which will return a LibXMLError object if an error occurred, false otherwise.



Related Topics



Leave a reply



Submit