How to Use Dom PHP Parser

how to use dom php parser

First i have to tell you that you can't use the same id on two different divs; there are classes for that point. Every element should have an unique id.

Code to get the contents of the div with id="interestingbox"

$html = '
<html>
<head></head>
<body>
<div id="interestingbox">
<div id="interestingdetails" class="txtnormal">
<div>Content1</div>
<div>Content2</div>
</div>
</div>

<div id="interestingbox2"><a href="#">a link</a></div>
</body>
</html>';

$dom_document = new DOMDocument();

$dom_document->loadHTML($html);

//use DOMXpath to navigate the html with the DOM
$dom_xpath = new DOMXpath($dom_document);

// if you want to get the div with id=interestingbox
$elements = $dom_xpath->query("*/div[@id='interestingbox']");

if (!is_null($elements)) {

foreach ($elements as $element) {
echo "\n[". $element->nodeName. "]";

$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}

}
}

//OUTPUT
[div] {
Content1
Content2
}

Example with classes:

$html = '
<html>
<head></head>
<body>
<div class="interestingbox">
<div id="interestingdetails" class="txtnormal">
<div>Content1</div>
<div>Content2</div>
</div>
</div>

<div class="interestingbox"><a href="#">a link</a></div>
</body>
</html>';

//the same as before.. just change the xpath

[...]

$elements = $dom_xpath->query("*/div[@class='interestingbox']");

[...]

//OUTPUT
[div] {
Content1
Content2
}

[div] {
a link
}

Refer to the DOMXPath page for more details.

Use PHP Simple HTML DOM Parser to find table cell and get contents of next sibling

It can be done using the DOMXPath class. You won't need an external library for this.

Here comes an example:

<?php

$html = <<<EOF
<tr>
<td>fluff</td>
<td>irrelevant</td>
<td>etc</td>
<td><a href="one">Hello world</a></td>
<td>123.456</td>
<td>fluff</td>
<td>irrelevant</td>
<td>etc</td>
</tr>
EOF;

// create empty document
$document = new DOMDocument();

// load html
$document->loadHTML($html);

// create xpath selector
$selector = new DOMXPath($document);

// selects the parent node of <a> nodes
// which's content is 'Hello world'
$results = $selector->query('//td/a[text()="Hello world"]/..');

// output the results
foreach($results as $node) {
echo $node->nodeValue . PHP_EOL;
}

PHP Simple HTML DOM Parser - loop

You probably want to

  1. Find all ul.dane elements
  2. Inside each ul, search for li elements
  3. Inside each li, search for div.name and div.value elements

In that case the problem with your code is that you forgot to find each li element inside each ul, which would be step 2. Try this:

foreach($html->find('ul.dane') as $ul) {
foreach($ul->find('li') as $article){
$item['name'] = $article->find('div.name',0)->plaintext;
$item['value'] = $article->find('div.value',0)->plaintext;
$articles[] = $item;
}
}

Parse HTML with PHP's HTML DOMDocument

If you want to get :

  • The text
  • that's inside a <div> tag with class="text"
  • that's, itself, inside a <div> with class="main"

I would say the easiest way is not to use DOMDocument::getElementsByTagName -- which will return all tags that have a specific name (while you only want some of them).

Instead, I would use an XPath query on your document, using the DOMXpath class.


For example, something like this should do, to load the HTML string into a DOM object, and instance the DOMXpath class :

$html = <<<HTML
<div class="main">
<div class="text">
Capture this text 1
</div>
</div>

<div class="main">
<div class="text">
Capture this text 2
</div>
</div>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);


And, then, you can use XPath queries, with the DOMXPath::query method, that returns the list of elements you were searching for :

$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');
foreach ($tags as $tag) {
var_dump(trim($tag->nodeValue));
}


And executing this gives me the following output :

string 'Capture this text 1' (length=19)
string 'Capture this text 2' (length=19)

The use of the PHP Simple HTML DOM Parser when parsing large html files results in an error

MAX_FILE_SIZE defined in simple_html_dom to be 600KB.

you can edit this code: define('MAX_FILE_SIZE', 600000); on simple_html_dom.php file.

worked for me

How do I use the PHP Simple HTML DOM Parser to parse this?

Use XPath to find and extract elements in an HTML/XML document - specifically the SimpleXMLElement::xpath method.

The following example will find the telephone number for a location:

$doc = new DOMDocument();
$doc->loadHTML('your html snippet goes here - or use loadHTMLFile()');
$xml = simplexml_import_dom($doc);
$elements = $xml->xpath('//*[contains(@class, "dump-location")]/div[@class="SingleLinkNoTx"]/strong[@class="telephone"]');
print_r($elements);

The most complex part is the XPath expression. A quick breakdown:

  1. //
    • This rule tells the parser to recursively apply rules to all elements in the document.
  2. *[contains(@class, "dump-location")]
    • Matches any element that has the dump-location class
  3. /
    • Tells the parser to apply the next rule only to elements that have a dump-location parent.
  4. div[@class="SingleLinkNoTx"]
    • Matches any DIV element that has a SingleLinkNoTx class (and no other class name).
  5. strong
    • Rule that matches all the STRONG tags with a telephone class.

Using this XPath expression on the HTML snippet provided in the question will result in output like the following. Which is fairly easy to iterate and extract information from:

Array
(
[0] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => telephone
)

[0] => (212) 555-1234
)

)

If you know the document structure it's possible to construct an XPath expression for each piece of information you want to extract. Or, it might be simpler to use a more general XPath expression (say, an expression that retrieves all dump-location elements) and manually iterate though the elements.

Fetch content of all div with same class using PHP Simple HTML DOM Parser

In your example code, you have

echo $x = $html->find('h2[class="section-heading"]',1)->outertext; 

as you are calling find() with a second parameter of 1, this will only return the 1 element. If instead you find all of them - you can do whatever you need with them...

$list = $html->find('h2[class="section-heading"]');
foreach ( $list as $item ) {
echo $item->outertext . PHP_EOL;
}

The full code I've just tested is...

include(__DIR__."/simple_html_dom.php");
$html = file_get_html('http://campaignstudio.in/');

$list = $html->find('h2[class="section-heading"]');
foreach ( $list as $item ) {
echo $item->outertext . PHP_EOL;
}

which gives the output...

<h2 class="section-heading text-white">We've got what you need!</h2>
<h2 class="section-heading">At Your Service</h2>
<h2 class="section-heading">Let's Get In Touch!</h2>


Related Topics



Leave a reply



Submit