how to use dom php parser
First i have to tell you that you can't use the same id on two different divs; there are classes for that point. Every element should have an unique id.
Code to get the contents of the div with id="interestingbox"
$html = '
<html>
<head></head>
<body>
<div id="interestingbox">
<div id="interestingdetails" class="txtnormal">
<div>Content1</div>
<div>Content2</div>
</div>
</div>
<div id="interestingbox2"><a href="#">a link</a></div>
</body>
</html>';
$dom_document = new DOMDocument();
$dom_document->loadHTML($html);
//use DOMXpath to navigate the html with the DOM
$dom_xpath = new DOMXpath($dom_document);
// if you want to get the div with id=interestingbox
$elements = $dom_xpath->query("*/div[@id='interestingbox']");
if (!is_null($elements)) {
foreach ($elements as $element) {
echo "\n[". $element->nodeName. "]";
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
//OUTPUT
[div] {
Content1
Content2
}
Example with classes:
$html = '
<html>
<head></head>
<body>
<div class="interestingbox">
<div id="interestingdetails" class="txtnormal">
<div>Content1</div>
<div>Content2</div>
</div>
</div>
<div class="interestingbox"><a href="#">a link</a></div>
</body>
</html>';
//the same as before.. just change the xpath
[...]
$elements = $dom_xpath->query("*/div[@class='interestingbox']");
[...]
//OUTPUT
[div] {
Content1
Content2
}
[div] {
a link
}
Refer to the DOMXPath page for more details.
Use PHP Simple HTML DOM Parser to find table cell and get contents of next sibling
It can be done using the DOMXPath
class. You won't need an external library for this.
Here comes an example:
<?php
$html = <<<EOF
<tr>
<td>fluff</td>
<td>irrelevant</td>
<td>etc</td>
<td><a href="one">Hello world</a></td>
<td>123.456</td>
<td>fluff</td>
<td>irrelevant</td>
<td>etc</td>
</tr>
EOF;
// create empty document
$document = new DOMDocument();
// load html
$document->loadHTML($html);
// create xpath selector
$selector = new DOMXPath($document);
// selects the parent node of <a> nodes
// which's content is 'Hello world'
$results = $selector->query('//td/a[text()="Hello world"]/..');
// output the results
foreach($results as $node) {
echo $node->nodeValue . PHP_EOL;
}
PHP Simple HTML DOM Parser - loop
You probably want to
- Find all
ul.dane
elements - Inside each ul, search for
li
elements - Inside each li, search for
div.name
anddiv.value
elements
In that case the problem with your code is that you forgot to find each li
element inside each ul
, which would be step 2. Try this:
foreach($html->find('ul.dane') as $ul) {
foreach($ul->find('li') as $article){
$item['name'] = $article->find('div.name',0)->plaintext;
$item['value'] = $article->find('div.value',0)->plaintext;
$articles[] = $item;
}
}
Parse HTML with PHP's HTML DOMDocument
If you want to get :
- The text
- that's inside a
<div>
tag withclass="text"
- that's, itself, inside a
<div>
withclass="main"
I would say the easiest way is not to use DOMDocument::getElementsByTagName
-- which will return all tags that have a specific name (while you only want some of them).
Instead, I would use an XPath query on your document, using the DOMXpath
class.
For example, something like this should do, to load the HTML string into a DOM object, and instance the DOMXpath
class :
$html = <<<HTML
<div class="main">
<div class="text">
Capture this text 1
</div>
</div>
<div class="main">
<div class="text">
Capture this text 2
</div>
</div>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
And, then, you can use XPath queries, with the DOMXPath::query
method, that returns the list of elements you were searching for :
$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');
foreach ($tags as $tag) {
var_dump(trim($tag->nodeValue));
}
And executing this gives me the following output :
string 'Capture this text 1' (length=19)
string 'Capture this text 2' (length=19)
The use of the PHP Simple HTML DOM Parser when parsing large html files results in an error
MAX_FILE_SIZE
defined in simple_html_dom
to be 600KB.
you can edit this code: define('MAX_FILE_SIZE', 600000);
on simple_html_dom.php
file.
worked for me
How do I use the PHP Simple HTML DOM Parser to parse this?
Use XPath to find and extract elements in an HTML/XML document - specifically the SimpleXMLElement::xpath method.
The following example will find the telephone number for a location:
$doc = new DOMDocument();
$doc->loadHTML('your html snippet goes here - or use loadHTMLFile()');
$xml = simplexml_import_dom($doc);
$elements = $xml->xpath('//*[contains(@class, "dump-location")]/div[@class="SingleLinkNoTx"]/strong[@class="telephone"]');
print_r($elements);
The most complex part is the XPath expression. A quick breakdown:
//
- This rule tells the parser to recursively apply rules to all elements in the document.
*[contains(@class, "dump-location")]
- Matches any element that has the
dump-location
class
- Matches any element that has the
/
- Tells the parser to apply the next rule only to elements that have a
dump-location
parent.
- Tells the parser to apply the next rule only to elements that have a
div[@class="SingleLinkNoTx"]
- Matches any
DIV
element that has aSingleLinkNoTx
class (and no other class name).
- Matches any
strong
- Rule that matches all the
STRONG
tags with atelephone
class.
- Rule that matches all the
Using this XPath expression on the HTML snippet provided in the question will result in output like the following. Which is fairly easy to iterate and extract information from:
Array
(
[0] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => telephone
)
[0] => (212) 555-1234
)
)
If you know the document structure it's possible to construct an XPath expression for each piece of information you want to extract. Or, it might be simpler to use a more general XPath expression (say, an expression that retrieves all dump-location
elements) and manually iterate though the elements.
Fetch content of all div with same class using PHP Simple HTML DOM Parser
In your example code, you have
echo $x = $html->find('h2[class="section-heading"]',1)->outertext;
as you are calling find()
with a second parameter of 1, this will only return the 1 element. If instead you find all of them - you can do whatever you need with them...
$list = $html->find('h2[class="section-heading"]');
foreach ( $list as $item ) {
echo $item->outertext . PHP_EOL;
}
The full code I've just tested is...
include(__DIR__."/simple_html_dom.php");
$html = file_get_html('http://campaignstudio.in/');
$list = $html->find('h2[class="section-heading"]');
foreach ( $list as $item ) {
echo $item->outertext . PHP_EOL;
}
which gives the output...
<h2 class="section-heading text-white">We've got what you need!</h2>
<h2 class="section-heading">At Your Service</h2>
<h2 class="section-heading">Let's Get In Touch!</h2>
Related Topics
Display Message Before Redirect to Other Page
How Long Can a Tld Possibly Be
Insert PHP Code in Wordpress Page and Post
Php: How to Resolve a Relative Url
How to Get Closest Date Compared to an Array of Dates in PHP
Is This a How to Destroy All Session Data in PHP
Inserting into MySQL from PHP (Jquery/Ajax)
How to Pass an Array via $_Get in PHP
How to Check If a Request If Coming from the Same Server or Different Server
String Contains Any Items in an Array (Case Insensitive)
How Get All Values in a Column Using PHP
Remove Index.Phproute=Common/Home from Opencart
How to Set Utf-8 Encoding for a PHP File
Authentication on Google: Oauth2 Keeps Returning 'Invalid_Grant'
Select Count() VS MySQL_Num_Rows();
Pre-Incrementation VS. Post-Incrementation
Jquery - Uncaught Typeerror: Cannot Use 'In' Operator to Search for '324' In