Remove Empty Tags from a Xml with PHP

Remove empty tags from a XML with PHP

You can use XPath with the predicate not(node()) to select all elements that do not have child nodes.

<?php
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadxml('<parentnode>
<tag1>2</tag1>
<tag2>4</tag2>
<tag3></tag3>
<tag2>4</tag2>
<tag3></tag3>
<tag2>4</tag2>
<tag3></tag3>
</parentnode>');

$xpath = new DOMXPath($doc);

foreach( $xpath->query('//*[not(node())]') as $node ) {
$node->parentNode->removeChild($node);
}

$doc->formatOutput = true;
echo $doc->savexml();

prints

<?xml version="1.0"?>
<parentnode>
<tag1>2</tag1>
<tag2>4</tag2>
<tag2>4</tag2>
<tag2>4</tag2>
</parentnode>

Remove empty elements from XML in php

The XPath in the other answer only returns empty elements in the sense that the element has no child node of any kind (no element node, no text node, nothing). To get all empty elements according to your definition, that is element without non-empty text content, try using the following XPath instead :

//*[not(normalize-space())]

eval.in demo

output :

<?xml version="1.0"?>
<data>
<!-- keep oneDay -->
<oneDay>
<startDate>1450288800000</startDate>
<endDate>1449086400000</endDate>
</oneDay>
<!-- remove range entirely -->
<!-- remove deadline entirely -->
</data>

PHP xml to array - how to get rid of empty tags?

Found it out my self. Took a while but works perfectly.

/** 
* @param array|\SimpleXMLElement[]|\SimpleXMLElement $data .
*
* @return array
*/
protected function emptyNodesToNull($data)
{
if ($data instanceof \SimpleXMLElement and $data->count() === 0) {
// is empty object like
// SimpleXMLElement::__set_state(array())
// which was f.e. a <foo/> tag
// or
// SimpleXMLElement::__set_state(array(0 => ' ',))
// which was f.e. a <foo> </foo> (with white space only)
return null;
}
$data = (array)$data;
foreach ($data as &$value) {
if (is_array($value) or $value instanceof \SimpleXMLElement) {
$value = $this->emptyNodesToNull($value);
} else {
// $value is the actual value of a node.
// Could do further checks here.
}
}
return $data;
}

My tests did exactly what i expected

and returns imo exactly what you can expect from a xmlToArray method.

I mean we wont be able to handle attributes, but this is not the requirement.

Test:

    $xml
= '<?xml version="1.0"?>
<Envelope>
<a/><!-- expecting null -->
<foo>
<b/><!-- expecting null -->
<bar>
<baz>Hello</baz>

<!-- expecting here an array of 2 x null -->
<c/>
<c/>

</bar>
</foo>
<foo>
<bar>
<baz>Hello Again</baz>
<d> </d><!-- expecting null -->
<item>
<firstname>Foo</firstname>
<email></email><!-- expecting null -->
<telephone/><!-- expecting null -->
<lastname>Bar</lastname>
</item>
<item>
<firstname>Bar</firstname>
<email>0</email><!-- expecting value 0 (zero) -->
<telephone/><!-- expecting null -->
<lastname>Baz</lastname>
</item>

<!-- expecting array of values 1, 2 null, 4 -->
<number>1</number>
<number>2</number>
<number></number>
<number>4</number>
</bar>
</foo>
</Envelope>';

$xml = new \SimpleXMLElement($xml);
$array = $class::emptyNodesToNull($xml);

Returns:

[
'Envelope' => [
'a' => null,
'foo' => [
0 => [
'b' => null,
'bar' => [
'baz' => 'Hello',
'c' => [
0 => null,
1 => null,
],
],
],
1 => [
'bar' => [
'baz' => 'Hello Again',
'd' => null,
'item' => [
0 => [
'firstname' => 'Foo',
'email' => null,
'telephone' => null,
'lastname' => 'Bar',
],
1 => [
'firstname' => 'Bar',
'email' => '0',
'telephone' => null,
'lastname' => 'Baz',
],
],
'number' => [
0 => '1',
1 => '2',
2 => null,
3 => '4',
],
],
],
],
],
];

How to delete empty tags using PHP new DomDocument () ?

<?php
error_reporting(0);

$html = "<blockquote>
<p>Lorem Ipsum has been the industry\'s standard dummy text ever since the 1500s,</p>
</blockquote>
<p>8</p>
<hr>
<p></p>
<strong></strong>
<a href=\"\" title=\"Link Name\" target=\"_blank\"></a>
<img src=\"tex.png\" />
<span></span>
<ul><li></li></ul>
<ol><li></li></ol>
<em></em>
<u></u>
<s></s>
<blockquote></blockquote>
<p> </p>";

$dom = new DomDocument();
$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);

foreach($xpath->query('//p|//br|//a|//strong|//img|//ul|//ol|//li|//em|//u|//s|//hr|//blockquote') as $tag) {
// Sadece belirtilen etiketlere işlem yapılır.
if( in_array($tag->tagName, ['p','br','strong','ul','ol','li','em','u','s','hr','blockquote']) ){
// Etiketin öznitelikleri varsa işlem devam eder.
if( $tag->hasAttributes() ){
// Etiketin tüm öznitelikleri döngüye alınır.
foreach (iterator_to_array($tag->attributes) as $all_attribute_detail) {
// Etiketin tüm öznitelikleri siler.
$tag->removeAttribute($all_attribute_detail->name);
}
}
}

// Sadece belirtilen "img" etiketine işlem yapılır.
if( $tag->tagName == 'img'){
// Etiketin öznitelikleri varsa işlem devam eder.
if( $tag->hasAttributes() ){
// Etiketin src özniteliği blob: | data: | //:0 ve boş ise tespit eder ve etiketi siler.
preg_match('/(^blob:|^data:|^\/\/:0|^$)/', trim($tag->getAttribute('src')), $matches);
count($matches[0]) ? $tag->parentNode->removeChild($tag) : "";

// Etiketin tüm öznitelikleri döngüye alınır.
foreach (iterator_to_array($tag->attributes) as $img_attribute_detail) {
// İzin verilenler haricindeki tüm öznitelikleri siler.
if( !in_array($img_attribute_detail->name, ['src','alt']) ){
// Etiketin tüm öznitelikleri silinir.
$tag->removeAttribute($img_attribute_detail->name);
}
}
}
}

// Sadece belirtilen "a" etiketine işlem yapılır.
if( $tag->tagName == 'a'){
// Etiketin öznitelikleri varsa işlem devam eder.
if( $tag->hasAttributes() ){
// Etiketin src özniteliği blob: | data: | //:0 ve boş ise tespit eder ve etiketi siler.
empty(trim($tag->getAttribute('href'))) ? $tag->parentNode->removeChild($tag) : "";

// Etiketin tüm öznitelikleri döngüye alınır.
foreach (iterator_to_array($tag->attributes) as $a_attribute_detail) {
// İzin verilenler haricindeki tüm öznitelikleri siler.
if( !in_array($a_attribute_detail->name, ['href','target','title']) ){
// Etiketin tüm öznitelikleri silinir.
$tag->removeAttribute($a_attribute_detail->name);
}
$tag->setAttribute('rel', 'nofollow noopener');
}
}
}
}

foreach($xpath->query('//*[not(*) and not(@*) and not(text()[normalize-space()])]') as $tag) {
if( !in_array($tag->tagName, ['hr','br']) ){
$tag->parentNode->removeChild($tag);
}
}

$cleanHtml = $dom->saveHTML();
$cleanHtml = preg_replace('~<(\w+)[^>]*>(?>[\p{Z}\p{C}]|<br\b[^>]*>|&(?:(?:nb|thin|zwnb|e[nm])sp|zwnj|#xfeff|#xa0|#160|#65279);|(?R))*</\1>~iu',"",$cleanHtml);

$cleanHtml = strip_tags($cleanHtml,'<p><br><a><strong><img><ul><ol><li><em><u><s><hr><blockquote>');

echo $cleanHtml;

?>

Remove empty XML tags with PHP but ignore tags with attributes

Try with this XPath

'//*[not(node()) and not(@*)]'


Related Topics



Leave a reply



Submit