How to convert HTML to JSON using PHP?
If you are able to obtain a DOMDocument
object representing your HTML, then you just need to traverse it recursively and construct the data structure that you want.
Converting your HTML document into a DOMDocument
should be as simple as this:
function html_to_obj($html) {
$dom = new DOMDocument();
$dom->loadHTML($html);
return element_to_obj($dom->documentElement);
}
Then, a simple traversal of $dom->documentElement
which gives the kind of structure you described could look like this:
function element_to_obj($element) {
$obj = array( "tag" => $element->tagName );
foreach ($element->attributes as $attribute) {
$obj[$attribute->name] = $attribute->value;
}
foreach ($element->childNodes as $subElement) {
if ($subElement->nodeType == XML_TEXT_NODE) {
$obj["html"] = $subElement->wholeText;
}
else {
$obj["children"][] = element_to_obj($subElement);
}
}
return $obj;
}
Test case
$html = <<<EOF
<!DOCTYPE html>
<html lang="en">
<head>
<title> This is a test </title>
</head>
<body>
<h1> Is this working? </h1>
<ul>
<li> Yes </li>
<li> No </li>
</ul>
</body>
</html>
EOF;
header("Content-Type: text/plain");
echo json_encode(html_to_obj($html), JSON_PRETTY_PRINT);
Output
{
"tag": "html",
"lang": "en",
"children": [
{
"tag": "head",
"children": [
{
"tag": "title",
"html": " This is a test "
}
]
},
{
"tag": "body",
"html": " \n ",
"children": [
{
"tag": "h1",
"html": " Is this working? "
},
{
"tag": "ul",
"children": [
{
"tag": "li",
"html": " Yes "
},
{
"tag": "li",
"html": " No "
}
],
"html": "\n "
}
]
}
]
}
Answer to updated question
The solution proposed above does not work with the <script>
element, because it is parsed not as a DOMText
, but as a DOMCharacterData
object. This is because the DOM extension in PHP is based on libxml2
, which parses your HTML as HTML 4.0, and in HTML 4.0 the content of <script>
is of type CDATA
and not #PCDATA
.
You have two solutions for this problem.
The simple but not very robust solution would be to add the
LIBXML_NOCDATA
flag toDOMDocument::loadHTML
. (I am not actually 100% sure whether this works for the HTML parser.)The more difficult but, in my opinion, better solution, is to add an additonal test when you are testing
$subElement->nodeType
before the recursion. The recursive function would become:
function element_to_obj($element) {
echo $element->tagName, "\n";
$obj = array( "tag" => $element->tagName );
foreach ($element->attributes as $attribute) {
$obj[$attribute->name] = $attribute->value;
}
foreach ($element->childNodes as $subElement) {
if ($subElement->nodeType == XML_TEXT_NODE) {
$obj["html"] = $subElement->wholeText;
}
elseif ($subElement->nodeType == XML_CDATA_SECTION_NODE) {
$obj["html"] = $subElement->data;
}
else {
$obj["children"][] = element_to_obj($subElement);
}
}
return $obj;
}
If you hit on another bug of this type, the first thing you should do is check the type of node $subElement
is, because there exists many other possibilities my short example function did not deal with.
Additionally, you will notice that libxml2
has to fix mistakes in your HTML in order to be able to build a DOM for it. This is why an <html>
and a <head>
elements will appear even if you don't specify them. You can avoid this by using the LIBXML_HTML_NOIMPLIED
flag.
Test case with script
$html = <<<EOF
<script type="text/javascript">
alert('hi');
</script>
EOF;
header("Content-Type: text/plain");
echo json_encode(html_to_obj($html), JSON_PRETTY_PRINT);
Output
{
"tag": "html",
"children": [
{
"tag": "head",
"children": [
{
"tag": "script",
"type": "text\/javascript",
"html": "\n alert('hi');\n "
}
]
}
]
}
How to Convert HTML Table to JSON in PHP
I prefer to use XPath
with DomDocument
because of utility/ease of the syntax. By targeting the only the <tr>
elements inside the <tbody>
tag, you can access all required data.
With the exception of the href
value, the final "all-letters" substring in each <td>
class value represents your desired key for the associated value. For this I am using preg_match()
to extract the final "word" in the class attribute.
When the $key
is name
, the href
attribute value must be stored with the hardcode key: user_link
.
Your sample date values require some preparation to yield the desired format. As your input data varies, you may need to modify the regular expression to allow strtotime()
to properly handle the date expression.
Code: (Demo)
$html = <<<HTML
<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>
<table class="table-list table table-responsive table-striped" border="1">
<thead>
<tr>
<th class="coll-1 name">name</th>
<th class="coll-2">height</th>
<th class="coll-3">weight</th>
<th class="coll-date">date</th>
<th class="coll-4"><span class="info">info</span></th>
<th class="coll-5">country</th>
</tr>
</thead>
<tbody>
<tr>
<td class="coll-1 name">
<a href="/username/Jhon Doe/" class="icon"><i class="flaticon-user"></i></a>
<a href="/username/Jhon Doe/">Jhon Doe</a>
</td>
<td class="coll-2 height">45</td>
<td class="coll-3 weight">50</td>
<td class="coll-date">9am May. 16th</td>
<td class="coll-4 size mob-info">abcd</td>
<td class="coll-5 country"><a href="/country/CA/">CA</a></td>
</tr>
<tr>
<td class="coll-1 name">
<a href="/username/Kasim Shk/" class="icon"><i class="flaticon-user"></i></a>
<a href="/username/Kasim Shk/">Kasim Shk</a>
</td>
<td class="coll-2 height">33</td>
<td class="coll-3 weight">54</td>
<td class="coll-date">Mar. 14th '18</td>
<td class="coll-4 size mob-info">ijkl</td>
<td class="coll-5 country"><a href="/country/UAE/">UAE</a></td>
</tr>
</tbody>
</table>
</body>
</html>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//tbody/tr') as $tr) {
$tmp = []; // reset the temporary array so previous entries are removed
foreach ($xpath->query("td[@class]", $tr) as $td) {
$key = preg_match('~[a-z]+$~', $td->getAttribute('class'), $out) ? $out[0] : 'no_class';
if ($key === "name") {
$tmp['user_link'] = $xpath->query("a[@class = 'icon']", $td)[0]->getAttribute('href');
}
$tmp[$key] = trim($td->textContent);
}
$tmp['date'] = date("M. dS 'y", strtotime(preg_replace('~\.|\d+[ap]m *~', '', $tmp['date'])));
$result[] = $tmp;
}
var_export($result);
echo "\n----\n";
echo json_encode($result);
Output: (as multidim array, then json encoded string)
array (
0 =>
array (
'user_link' => '/username/Jhon Doe/',
'name' => 'Jhon Doe',
'height' => '45',
'weight' => '50',
'date' => 'May. 16th \'18',
'info' => 'abcd',
'country' => 'CA',
),
1 =>
array (
'user_link' => '/username/Kasim Shk/',
'name' => 'Kasim Shk',
'height' => '33',
'weight' => '54',
'date' => 'Jan. 01st \'70',
'info' => 'ijkl',
'country' => 'UAE',
),
)
----
[{"user_link":"\/username\/Jhon Doe\/","name":"Jhon Doe","height":"45","weight":"50","date":"May. 16th '18","info":"abcd","country":"CA"},{"user_link":"\/username\/Kasim Shk\/","name":"Kasim Shk","height":"33","weight":"54","date":"Jan. 01st '70","info":"ijkl","country":"UAE"}]
convert html to json in PHP
I completly agree with Magnus in the comments, that you should contact the API providers, and ask them for an JSON endpoint..
But if that is not possible, you could do something like this :
<?php
$theFile = file_get_contents('https://www.israelpost.co.il/zip_data.nsf/SearchZip?OpenAgent&Location=%25u05EA%25u05DC%20%25u05D0%25u05D1%25u05D9%25u05D1%20-%20%25u05D9%25u05E4%25u05D5&POB=&Street=%25u05D3%25u05D9%25u05D6%25u05E0%25u05D2%25u05D5%25u05E3&House=99&Entrance=');
libxml_use_internal_errors(true); //Prevents Warnings, remove if desired
$dom = new DOMDocument();
$dom->loadHTML($theFile);
$body = "";
foreach($dom->getElementsByTagName("body")->item(0)->childNodes as $child) {
$body .= $dom->saveHTML($child);
}
echo $body;
This will get the content of the body tag for you.
This example will output RES86439611
- whatever that means to you
How to convert HTML data to json, php, mysql?
You have also one error in the cycle in your code.
Try this:
if (count($query) > 0) {
foreach ($query as $queryElement) {
$el = $queryElement;
$el['description'] = trim(preg_replace('/\s+/', ' ', strip_tags($el['description'])));
$arr[] = $el;
}
}
Convert HTML entities in Json back to characters
There is the solution. I needed to
- convert
&
to&
to standardize encoding systems; - convert all applicable characters to HTML entities.
There is the final code. Many thanks to all for all your comments and suggestions.
Full code and online test here: https://www.tehplayground.com/zythX4MUdF3ric4l
array_walk_recursive($data, function(&$item, $key) {
if(is_string($item)) {
$item = str_replace("&", "&", $item); // 1. Replace & by &
$item = html_entity_decode($item); // 2. Convert HTML entities to their corresponding characters
}
});
Convert html to json in php laravel
There's a 1 at the end of the output, possibly you're echoing something extra that you shouldn't .
I suspect you expect curl to return the actual result but you are not using the appropriate flag. The reason I suspect that is because you are assigning the return result to $json
but without the flag CURLOPT_RETURNTRANSFER
will return true
and not any json value.
Here's what you can try:
$url ='https://graph.facebook.com/' . $connection->provider_id . '?fields=link&access_token=' . $connection->token;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER);
$json= curl_exec ($ch);
$jsonArray = json_decode($json, true);
$link = $jsonArray["link"];
More information on the curl flags in the manual
Related Topics
Iterate Through Nested JavaScript Objects
How to Include a JavaScript Script File in Angular and Call a Function from That Script
Add a "Hook" to All Ajax Requests on a Page
JavaScript Asynchronous Return Value/Assignment with Jquery
JavaScript "This" Pointer Within Nested Function
What Is "Undefined X 1" in JavaScript
How to Determine Which Submit Button Was Pressed, Form Onsubmit Event, Without Jquery
Setting CSS Value Limits of the Window Scrolling Animation
Trigger Standard HTML5 Validation (Form) Without Using Submit Button
Javascript: Extract Video Frames Reliably
How to Download a File Using Window.Fetch
JavaScript Objects: Get Parent
How to Sort an Array of Objects with Jquery or JavaScript
JavaScript Removeeventlistener Not Working
Detect Back Button Click in Browser
Foreach on Queryselectorall Not Working in Recent Microsoft Browsers
How to Get Image from Canvas Element and Use It in Img Src Tag