Remove empty tags from a XML with PHP
You can use XPath with the predicate not(node())
to select all elements that do not have child nodes.
<?php
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadxml('<parentnode>
<tag1>2</tag1>
<tag2>4</tag2>
<tag3></tag3>
<tag2>4</tag2>
<tag3></tag3>
<tag2>4</tag2>
<tag3></tag3>
</parentnode>');
$xpath = new DOMXPath($doc);
foreach( $xpath->query('//*[not(node())]') as $node ) {
$node->parentNode->removeChild($node);
}
$doc->formatOutput = true;
echo $doc->savexml();
prints
<?xml version="1.0"?>
<parentnode>
<tag1>2</tag1>
<tag2>4</tag2>
<tag2>4</tag2>
<tag2>4</tag2>
</parentnode>
Remove empty elements from XML in php
The XPath in the other answer only returns empty elements in the sense that the element has no child node of any kind (no element node, no text node, nothing). To get all empty elements according to your definition, that is element without non-empty text content, try using the following XPath instead :
//*[not(normalize-space())]
eval.in demo
output :
<?xml version="1.0"?>
<data>
<!-- keep oneDay -->
<oneDay>
<startDate>1450288800000</startDate>
<endDate>1449086400000</endDate>
</oneDay>
<!-- remove range entirely -->
<!-- remove deadline entirely -->
</data>
PHP xml to array - how to get rid of empty tags?
Found it out my self. Took a while but works perfectly.
/**
* @param array|\SimpleXMLElement[]|\SimpleXMLElement $data .
*
* @return array
*/
protected function emptyNodesToNull($data)
{
if ($data instanceof \SimpleXMLElement and $data->count() === 0) {
// is empty object like
// SimpleXMLElement::__set_state(array())
// which was f.e. a <foo/> tag
// or
// SimpleXMLElement::__set_state(array(0 => ' ',))
// which was f.e. a <foo> </foo> (with white space only)
return null;
}
$data = (array)$data;
foreach ($data as &$value) {
if (is_array($value) or $value instanceof \SimpleXMLElement) {
$value = $this->emptyNodesToNull($value);
} else {
// $value is the actual value of a node.
// Could do further checks here.
}
}
return $data;
}
My tests did exactly what i expected
and returns imo exactly what you can expect from a xmlToArray method.
I mean we wont be able to handle attributes, but this is not the requirement.
Test:
$xml
= '<?xml version="1.0"?>
<Envelope>
<a/><!-- expecting null -->
<foo>
<b/><!-- expecting null -->
<bar>
<baz>Hello</baz>
<!-- expecting here an array of 2 x null -->
<c/>
<c/>
</bar>
</foo>
<foo>
<bar>
<baz>Hello Again</baz>
<d> </d><!-- expecting null -->
<item>
<firstname>Foo</firstname>
<email></email><!-- expecting null -->
<telephone/><!-- expecting null -->
<lastname>Bar</lastname>
</item>
<item>
<firstname>Bar</firstname>
<email>0</email><!-- expecting value 0 (zero) -->
<telephone/><!-- expecting null -->
<lastname>Baz</lastname>
</item>
<!-- expecting array of values 1, 2 null, 4 -->
<number>1</number>
<number>2</number>
<number></number>
<number>4</number>
</bar>
</foo>
</Envelope>';
$xml = new \SimpleXMLElement($xml);
$array = $class::emptyNodesToNull($xml);
Returns:
[
'Envelope' => [
'a' => null,
'foo' => [
0 => [
'b' => null,
'bar' => [
'baz' => 'Hello',
'c' => [
0 => null,
1 => null,
],
],
],
1 => [
'bar' => [
'baz' => 'Hello Again',
'd' => null,
'item' => [
0 => [
'firstname' => 'Foo',
'email' => null,
'telephone' => null,
'lastname' => 'Bar',
],
1 => [
'firstname' => 'Bar',
'email' => '0',
'telephone' => null,
'lastname' => 'Baz',
],
],
'number' => [
0 => '1',
1 => '2',
2 => null,
3 => '4',
],
],
],
],
],
];
How to delete empty tags using PHP new DomDocument () ?
<?php
error_reporting(0);
$html = "<blockquote>
<p>Lorem Ipsum has been the industry\'s standard dummy text ever since the 1500s,</p>
</blockquote>
<p>8</p>
<hr>
<p></p>
<strong></strong>
<a href=\"\" title=\"Link Name\" target=\"_blank\"></a>
<img src=\"tex.png\" />
<span></span>
<ul><li></li></ul>
<ol><li></li></ol>
<em></em>
<u></u>
<s></s>
<blockquote></blockquote>
<p> </p>";
$dom = new DomDocument();
$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//p|//br|//a|//strong|//img|//ul|//ol|//li|//em|//u|//s|//hr|//blockquote') as $tag) {
// Sadece belirtilen etiketlere işlem yapılır.
if( in_array($tag->tagName, ['p','br','strong','ul','ol','li','em','u','s','hr','blockquote']) ){
// Etiketin öznitelikleri varsa işlem devam eder.
if( $tag->hasAttributes() ){
// Etiketin tüm öznitelikleri döngüye alınır.
foreach (iterator_to_array($tag->attributes) as $all_attribute_detail) {
// Etiketin tüm öznitelikleri siler.
$tag->removeAttribute($all_attribute_detail->name);
}
}
}
// Sadece belirtilen "img" etiketine işlem yapılır.
if( $tag->tagName == 'img'){
// Etiketin öznitelikleri varsa işlem devam eder.
if( $tag->hasAttributes() ){
// Etiketin src özniteliği blob: | data: | //:0 ve boş ise tespit eder ve etiketi siler.
preg_match('/(^blob:|^data:|^\/\/:0|^$)/', trim($tag->getAttribute('src')), $matches);
count($matches[0]) ? $tag->parentNode->removeChild($tag) : "";
// Etiketin tüm öznitelikleri döngüye alınır.
foreach (iterator_to_array($tag->attributes) as $img_attribute_detail) {
// İzin verilenler haricindeki tüm öznitelikleri siler.
if( !in_array($img_attribute_detail->name, ['src','alt']) ){
// Etiketin tüm öznitelikleri silinir.
$tag->removeAttribute($img_attribute_detail->name);
}
}
}
}
// Sadece belirtilen "a" etiketine işlem yapılır.
if( $tag->tagName == 'a'){
// Etiketin öznitelikleri varsa işlem devam eder.
if( $tag->hasAttributes() ){
// Etiketin src özniteliği blob: | data: | //:0 ve boş ise tespit eder ve etiketi siler.
empty(trim($tag->getAttribute('href'))) ? $tag->parentNode->removeChild($tag) : "";
// Etiketin tüm öznitelikleri döngüye alınır.
foreach (iterator_to_array($tag->attributes) as $a_attribute_detail) {
// İzin verilenler haricindeki tüm öznitelikleri siler.
if( !in_array($a_attribute_detail->name, ['href','target','title']) ){
// Etiketin tüm öznitelikleri silinir.
$tag->removeAttribute($a_attribute_detail->name);
}
$tag->setAttribute('rel', 'nofollow noopener');
}
}
}
}
foreach($xpath->query('//*[not(*) and not(@*) and not(text()[normalize-space()])]') as $tag) {
if( !in_array($tag->tagName, ['hr','br']) ){
$tag->parentNode->removeChild($tag);
}
}
$cleanHtml = $dom->saveHTML();
$cleanHtml = preg_replace('~<(\w+)[^>]*>(?>[\p{Z}\p{C}]|<br\b[^>]*>|&(?:(?:nb|thin|zwnb|e[nm])sp|zwnj|#xfeff|#xa0|#160|#65279);|(?R))*</\1>~iu',"",$cleanHtml);
$cleanHtml = strip_tags($cleanHtml,'<p><br><a><strong><img><ul><ol><li><em><u><s><hr><blockquote>');
echo $cleanHtml;
?>
Remove empty XML tags with PHP but ignore tags with attributes
Try with this XPath
'//*[not(node()) and not(@*)]'
Related Topics
How to Call Codeigniter Controller Function from View
How to Create Codeigniter Batch Insert Array
Using PHP to Populate a <Select></Select> Dropdown
PHP Function to Build Query String from Array
PHP Replacing Special Characters Like à->A, è->E
PHP - Remove <Img> Tag from String
The Difference Between Unset and = Null
Laravel Preg_Match(): No Ending Delimiter '/' Found
Why Should I Fix E_Notice Errors
PHP Get File Listing Including Sub Directories
Unexpected Character in Input: '\' (Ascii=92) State=1
PHP Emitting 500 on Errors - Where Is This Documented
Laravel Recursive Relationships
How to Set Default Value for Form Field in Symfony2
Dynamically Create PHP Object Based on String