Disable warnings when loading non-well-formed HTML by DomDocument (PHP)
You can install a temporary error handler with set_error_handler
class ErrorTrap {
protected $callback;
protected $errors = array();
function __construct($callback) {
$this->callback = $callback;
}
function call() {
$result = null;
set_error_handler(array($this, 'onError'));
try {
$result = call_user_func_array($this->callback, func_get_args());
} catch (Exception $ex) {
restore_error_handler();
throw $ex;
}
restore_error_handler();
return $result;
}
function onError($errno, $errstr, $errfile, $errline) {
$this->errors[] = array($errno, $errstr, $errfile, $errline);
}
function ok() {
return count($this->errors) === 0;
}
function errors() {
return $this->errors;
}
}
Usage:
// create a DOM document and load the HTML data
$xmlDoc = new DomDocument();
$caller = new ErrorTrap(array($xmlDoc, 'loadHTML'));
// this doesn't dump out any warnings
$caller->call($fetchResult);
if (!$caller->ok()) {
var_dump($caller->errors());
}
PHP DOMDocument errors/warnings on html5-tags
No, there is no way of specifying a particular doctype to use, or to modify the requirements of the existing one.
Your best workable solution is going to be to disable error reporting with libxml_use_internal_errors
:
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML('...');
libxml_clear_errors();
Basic xPath getting lots of warnings
Disable warnings with libxml_use_internal_errors(true)
http://www.php.net/manual/en/function.libxml-use-internal-errors.php
It is malformed HTML, nothing you can really do about it if you do not control the HTML.
DOMDocument::loadHTML(): Empty string supplied as input
Alright @Bruce..I understand the issue now. What you want to do is test the value of file_get_contents()
<?php
error_reporting(-1);
ini_set("display_errors", 1);
$article_url = 'http://google.com';
if (isset($article_url)){
$title = 'contact us';
$str = @file_get_contents($article_url);
// return an error
if ($str === FALSE) {
echo 'problem getting url';
return false;
}
// Continue
$test1 = str_word_count(strip_tags(strtolower($str)));
if ($test1 === FALSE) $test = '0';
if ($test1 > '550') {
echo '<div><i class="fa fa-check-square-o" style="color:green"></i> This article has ' . $test1 . ' words.';
} else {
echo '<div><i class="fa fa-times-circle-o" style="color:red"></i> This article has ' . $test1 . ' words. You are required to have a minimum of 500 words.</div>';
}
$document = new DOMDocument();
$libxml_previous_state = libxml_use_internal_errors(true);
$document->loadHTML($str);
libxml_use_internal_errors($libxml_previous_state);
$tags = array ('h1', 'h2');
$texts = array ();
foreach($tags as $tag) {
$elementList = $document->getElementsByTagName($tag);
foreach($elementList as $element) {
$texts[$element->tagName] = strtolower($element->textContent);
}
}
if (in_array(strtolower($title),$texts)) {
echo '<div><i class="fa fa-check-square-o" style="color:green"></i> This article used the correct title tag.</div>';
} else {
echo "no";
}
}
?>
So if ($str === FALSE) { //return an error }
and don't let the script continue. You could return false like I am doing or just do an if/else.
Having difficulties parsing dirty html code with PHP DOMDocument
There is no clean way to parse HTML with namespaces using DOMDocument without losing the namespaces but there are some workarounds:
- Use another parser that accepts namespaces in HMTL code. Look here for a nice and detailed list of HTML parsers. This is probably the most efficient way to do it.
If you want to stick with DOMDocument you basically have to pre- and postprocess the code.
Before you send the code to DOMDocument->loadHTML, use regex, loops or whatever you want to find all namespaced tags and add a custom attribute to the opening tags containing the namespace.
<fb:like send="true" width="450" show_faces="true"></fb:like>
would then result in
<fb:like xmlNamespace="fb" send="true" width="450" show_faces="true"></fb:like>
Now give the edited code to DOMDocument->loadHTML. It will strip out the namespaces but it will keep the attributes resulting in
<like xmlNamespace="fb" send="true" width="450" show_faces="true"></like>
Now (again using regex, loops or whatever you want) find all tags with the attribute xmlNamespace and replace the attribute with the actual namespace. Don't forget to also add the namespace to the closing tags!
PHP domDocument parsing with HTML Table ( PHP Fatal error: Call to a member function getElementsByTagName() on a non-object)
The array returned by getElementsByTagName
is zero-indexed, meaning that in this case, $tables[1]
does not exist (you only have one table in the HTML, and that table is referred to as $tables[0]
) so you need to change the definition of $rows
to this:
$rows = $tables->item(0)->getElementsByTagName('tr');
You also have an error in the loop; you can't refer to a DOMNodelist with an index like you are. You'd need to change the assignment of $betreffzeile
to this: $betreffzeile.=$cols->item(2)->nodeValue;
Hope this helps.
domDocument is not returning node information
You are looking for
$dom->documentElement
this will return a
DOMNode
object.
Also: Get rid of the htmlentities
because this will mess up the HTML code you fetch. e.g.: <
will get <
, which your loadHTML
won't interpret as a <
. Take a look at: Disable warnings when loading non-well-formed HTML by DomDocument (PHP)
Dummy-Dump:
function dump(DOMNode $node)
{
echo $node->nodeName;
if ($node->hasChildNodes())
{
echo '<div style="margin-left:20px; border-left:1px solid black; padding-left: 5px;">';
foreach ($node->childNodes as $childNode)
{
dump($childNode);
}
echo '</div>';
}
}
dump($dom->documentElement);
Which looks like:
Parsing html code with html error problem
The page is written in very old HTML code (you can tell by the FONT tags, capitalization, etc.) and so <br> tags and probably paragraphs and other things as well, are not closed. I recommend using regular expressions to find them in this case.
Related Topics
PHP Write File from Input to Txt
Parsing Json Object in PHP Using Json_Decode
Run a PHP File in a Cron Job Using Cpanel
Is There a PHP Equivalent of Perl'S Www::Mechanize
Is There a Built-In Way to Get All of the Changed/Updated Fields in a Doctrine 2 Entity
Find Highest Value in Multidimensional Array
Copy Image from Remote Server Over Http
PHP Array Multiple Sort - by Value Then by Key
How to Execute Ssh Commands Via PHP
How to Remove the Querystring and Get Only the Url
Multiple Files Upload in Codeigniter
What Are Register_Globals in PHP
Change Product Prices Via a Hook in Woocommerce 3+
Get Data from Json File With PHP