Parse HTML Table Using File_Get_Contents to PHP Array

Parse html table using file_get_contents to php array

Don't cripple yourself parsing HTML with regexps! Instead, let an HTML parser library worry about the structure of the markup for you.

I suggest you to check out Simple HTML DOM (http://simplehtmldom.sourceforge.net/). It is a library specifically written to aid in solving this kind of web scraping problems in PHP. By using such a library, you can write your scraping in much less lines of code without worrying about creating working regexps.

In principle, with Simple HTML DOM you just write something like:

$html = file_get_html('http://flow935.com/playlist/flowhis.HTM');
foreach($html->find('tr') as $row) {
// Parse table row here
}

This can be then extended to capture your data in some format, for instance to create an array of artists and corresponding titles as:

<?php
require('simple_html_dom.php');

$table = array();

$html = file_get_html('http://flow935.com/playlist/flowhis.HTM');
foreach($html->find('tr') as $row) {
$time = $row->find('td',0)->plaintext;
$artist = $row->find('td',1)->plaintext;
$title = $row->find('td',2)->plaintext;

$table[$artist][$title] = true;
}

echo '<pre>';
print_r($table);
echo '</pre>';

?>

We can see that this code can be (trivially) changed to reformat the data in any other way as well.

HTML table to php array

I've updated your edit to fix it.

function tdrows($elements)
{
$str = "";
foreach ($elements as $element) {
$str .= $element->nodeValue . ", ";
}

return $str;
}

function getdata()
{
$contents = "<table><tr><td>Row 1 Column 1</td><td>Row 1 Column 2</td></tr><tr><td>Row 2 Column 1</td><td>Row 2 Column 2</td></tr></table>";
$DOM = new DOMDocument;
$DOM->loadHTML($contents);

$items = $DOM->getElementsByTagName('tr');

foreach ($items as $node) {
echo tdrows($node->childNodes) . "<br />";
}
}

getdata();

How to parse HTML table using PHP?

For tidy HTML codes, one of the parsing approach can be DOM. DOM divides your HTML code into objects and then allows you to call the desired object and its values/tag name etc.

The official documentation of PHP HTML DOM parsing is available at http://php.net/manual/en/book.dom.php

For finding the values of second column for the given table following DOM implementation can be done:

<?php
$data = file_get_contents('http://mytemporalbucket.s3.amazonaws.com/code.txt');

$dom = new domDocument;

@$dom->loadHTML($data);
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');

$rows = $tables->item(1)->getElementsByTagName('tr');

foreach ($rows as $row) {
$cols = $row->getElementsByTagName('td');
echo $cols[2];
}

?>

Reference: Customized the code provided at How to parse this table and extract data from it? to match this question's demand.

PHP find and get value based on another one from HTML table parsed file

You would need to give the table divisions an ID for JavaScript to be able to get the data for submission and put it into hidden inputs with names and IDs so that PHP will get them using POST.

<script language="javascript">
function transfer_data(){
documentGetElementById('ex1_hidden').value = documentGetElementById('ex1').innerHTML;
documentGetElementById('ex2_hidden').value = documentGetElementById('ex2').innerHTML;
submit();
}
</script>

<table class="example">
<tbody>
<tr>
<td id="hdg1">
Heading #1
<p>Description of heading #1 here ...</p>
</td>
<td id="ex1">Example of data #1</td>
</tr>
<tr>
<td>
Heading #2
<p>Description of heading #2 here ...</p>
</td>
<td id="ex2">Example of data #2</td>
</tr>
</tbody>
</table>

In your form which submits to wherever you want it to go using method="post" you would need:

    <input type="hidden" name="ex1_hidden" id="ex1_hidden" />
<input type="hidden" name="ex2_hidden" id="ex2_hidden" />

<input type="button" value="Submit" onClick="transfer_data()" />

In PHP you would pick them up with $_POST['ex1_hidden'] and $_POST['ex2_hidden'] (remember to clean up submitted data.)

This is not a method which would be suitable for for secure data.

You could add an ID to the heading and make it conditional in your script:

if(documentGetElementById('hdg1').innerHTML == "Heading #1"){
documentGetElementById('ex1_hidden').value = documentGetElementById('ex1').innerHTML;
}

You might need to trim the whitespace off the heading perhaps by using something like

    var str=documentGetElementById('hdg1').innerHTML.replace(/^\s+|\s+$/g,'');

Credit @Paul on how do I strip white space when grabbing text with jQuery?

Lots of useful ideas on other ways here How to get a table cell value using jQuery?

If this is scraped data from another website which you don't have control over at all, but which you already have in a PHP variable, you could explode() it by <td> and work out which array positions contain the data you want. Ref: http://php.net/manual/en/function.explode.php

This is what I think you are really looking for - might be a nice idea to ask the owner of the site if it is OK first but that is up to you. You were on the right track with strpos(); and arrays (tested using your table):

 // only works if fopen is allowed on the site's server and in PHP5+
$handle = fopen("http://websiteyouwanttoscrape.com/file.html", "r");

$contents = stream_get_contents($handle);
$contents_array = array();
$bit_i_want = array();

// give yourself a chance
$contents = htmlspecialchars($contents);

// swap these if you don't use htmlspecialchars();
$contents_array = explode('<td>',$contents);
//$contents_array = explode('<td>',$contents);

$counter = 0;
while($counter < count($contents_array)){
if(strpos($contents_array[$counter], 'Heading #1') > 0 ){
// swap these if you don't use htmlspecialchars();
$bit_i_want = explode('</td>',$contents_array[$counter+1]);
//$bit_i_want = explode('</td>',$contents_array[$counter+1]);
echo $bit_i_want[0] . '<br />';
// uncomment break; to stop the loop if you don't
// want to look for any more instances of "Heading #1" if there were any
//break;
}
$counter++;
}
fclose($handle); //close the file


Related Topics



Leave a reply



Submit