How to Tell Apart Simplexml Objects Representing Element and Attribute

How to tell apart SimpleXML objects representing element and attribute?

There are no built-in properties in SimpleXMLElement which would allow you to tell these apart.

As others have suggested dom_import_simplexml can be appropriate, however, that function can change nodes on the fly sometimes, for example, if you pass in a list of childnodes or named childnodes, it will take those and turn them into the first element.

If it's an empty list, for example no attributes returned from attributes() or non-existing named childnodes, it will give a warning telling you an invalid nodetype has been given:

Warning: dom_import_simplexml(): Invalid Nodetype to import

So if you need this precise with a snappy boolean true/false, here is how it works with Simplexml:

$isElement   = $element->xpath('.') == array($element);

$isAttribute = $element[0] == $element
and $element->xpath('.') != array($element);

It works similar with attribute lists and element lists, I've just blogged about this in the morning, you need to have some specific knowledge about what to evaluate for what, so I created a cheatsheet for it:

+------------------+---------------------------------------------+
| TYPE | TEST |
+------------------+---------------------------------------------+
| Element | $element->xpath('.') == array($element) |
+------------------+---------------------------------------------+
| Attribute | $element[0] == $element |
| | and $element->xpath('.') != array($element) |
+------------------+---------------------------------------------+
| Attributes | $element->attributes() === NULL |
+------------------+---------------------------------------------+
| Elements | $element[0] != $element |
| | and $element->attributes() !== NULL |
+------------------+---------------------------------------------+
| Single | $element[0] == $element |
+------------------+---------------------------------------------+
| Empty List | $element[0] == NULL |
+------------------+---------------------------------------------+
| Document Element | $element->xpath('/*') == array($element) |
+------------------+---------------------------------------------+
  • SimpleXML Type Cheatsheet (12 Feb 2013; by hakre)

Getting the first XML element with SimpleXML

Answer form per request. ^^

If that SimpleXMLElement is the only one contained within $resource['linkedin'], you can change it with:

$resource['linkedin']->{'first-name'} = $name;

That allows you direct access to the element without needing to do an xpath on it. ^^

Remove a child with a specific attribute, in SimpleXML for PHP

While SimpleXML provides a way to remove XML nodes, its modification capabilities are somewhat limited. One other solution is to resort to using the DOM extension. dom_import_simplexml() will help you with converting your SimpleXMLElement into a DOMElement.

Just some example code (tested with PHP 5.2.5):

$data='<data>
<seg id="A1"/>
<seg id="A5"/>
<seg id="A12"/>
<seg id="A29"/>
<seg id="A30"/>
</data>';
$doc=new SimpleXMLElement($data);
foreach($doc->seg as $seg)
{
if($seg['id'] == 'A12') {
$dom=dom_import_simplexml($seg);
$dom->parentNode->removeChild($dom);
}
}
echo $doc->asXml();

outputs

<?xml version="1.0"?>
<data><seg id="A1"/><seg id="A5"/><seg id="A29"/><seg id="A30"/></data>

By the way: selecting specific nodes is much more simple when you use XPath (SimpleXMLElement->xpath):

$segs=$doc->xpath('//seq[@id="A12"]');
if (count($segs)>=1) {
$seg=$segs[0];
}
// same deletion procedure as above

How can I set text value of SimpleXmlElement without using its parent?

You can do with a SimpleXMLElement self-reference:

$firstC->{0} = "Victory!!"; // hackity, hack, hack!
// -or-
$firstC[0] = "Victory!!";

found after looking at

var_dump((array) reset($xml->xpath("(//c)[3]")))

This also works with unset operations as outlined in an answer to:

  • Remove a child with a specific attribute, in SimpleXML for PHP

What's the difference between PHP's DOM and SimpleXML extensions?

In a nutshell:

SimpleXml

  • is for simple XML and/or simple UseCases
  • limited API to work with nodes (e.g. cannot program to an interface that much)
  • all nodes are of the same kind (element node is the same as attribute node)
  • nodes are magically accessible, e.g. $root->foo->bar['attribute']

DOM

  • is for any XML UseCase you might have
  • is an implementation of the W3C DOM API (found implemented in many languages)
  • differentiates between various Node Types (more control)
  • much more verbose due to explicit API (can code to an interface)
  • can parse broken HTML
  • allows you to use PHP functions in XPath queries

Both of these are based on libxml and can be influenced to some extend by the libxml functions


Personally, I dont like SimpleXml too much. That's because I dont like the implicit access to the nodes, e.g. $foo->bar[1]->baz['attribute']. It ties the actual XML structure to the programming interface. The one-node-type-for-everything is also somewhat unintuitive because the behavior of the SimpleXmlElement magically changes depending on it's contents.

For instance, when you have <foo bar="1"/> the object dump of /foo/@bar will be identical to that of /foo but doing an echo of them will print different results. Moreover, because both of them are SimpleXml elements, you can call the same methods on them, but they will only get applied when the SimpleXmlElement supports it, e.g. trying to do $el->addAttribute('foo', 'bar') on the first SimpleXmlElement will do nothing. Now of course it is correct that you cannot add an attribute to an Attribute Node, but the point is, an attribute node would not expose that method in the first place.

But that's just my 2c. Make up your own mind :)


On a sidenote, there is not two parsers, but a couple more in PHP. SimpleXml and DOM are just the two that parse a document into a tree structure. The others are either pull or event based parsers/readers/writers.

Also see my answer to

  • Best XML Parser for PHP

Are SimpleXMLElement and SimpleXMLElement::children() the same thing?

Despite its name, instances of the SimpleXMLElement class can represent a few different things:

  • A single XML element; in your example, $foo represents the <foo> element at the root of your XML document
  • A collection of XML elements with the same name; for instance, $foo->bar gives you an object with all child elements named bar
  • A collection of XML elements with different names, but the same parent; this is what ->children() gives you
  • A single XML attribute; $foo->bar[0]->baz[0]['id'] will give you an object for the id attribute of the first baz in your example
  • A collection of XML attributes on the same element; this is what ->attributes() gives you

A lot of the time, you don't actually notice which of these you have, because various short-hands let you treat them interchangeably; for instance:

  • Property, method, and attribute calls on collections of elements and attributes generally refer to the first element in the collection. So $foo->bar->baz['id'] is the same as $foo->bar[0]->baz[0]['id']
  • Conversely, using a foreach loop with an object representing a single node automatically loops over the children of that element, as though you'd called ->children()

There are however times when you need to tell the difference. For instance, foreach ( $foo->bar as $item ) will loop over all elements named bar; but foreach ( $foo->bar->children() as $item ) will access the first element named bar, and loop over its children.

The children() method is also used to switch between namespaces, see Reference - How do I handle Namespaces (Tags and Attributes with a Colon in their Name) in SimpleXML?

How to update SimpleXMLElement using array

Considering $xml is the document element as a SimpleXMLElement and $data is your array (as in the question), if you are concerned about numbering children, e.g. for metadata, I have it numbered in the array this way:

'metadata' => array(
'entry' => array(
5 => array(
'value' => 'Sunny Days',
),
),
),

The following shows how to solve the problem in a non-recursive manner with the help of a stack:

while (list($set, $node) = array_pop($stack)) {
if (!is_array($set)) {
$node[0] = $set;
continue;
}

foreach ($set as $element => $value) {
$parent = $node->$element;
if ($parent[0] == NULL) {
throw new Exception(sprintf("Child-Element '%s' not found.", $element));
}
$stack[] = array($value, $parent);
}
}

Some notes:

  • I changed the concrete exception type to remove the dependency.
  • $parent[0] == NULL tests the element $parent is not empty (compare/see SimpleXML Type Cheatsheet).
  • As the element node is put into the stack to be retrieved later, $node[0] needs to be used to set it after it got fetched from the stack (the numbered element is already in $node (the first one by default), to change it later, the number 0 needs to be used as offset).

And the Online Demo.


The example so far does not allow to create new elements if they do not exist so far. To add adding of new elements the exception thrown for nonexisting children:

        if ($parent[0] == NULL) {
throw new Exception(sprintf("Child-Element '%s' not found.", $element));
}

needs to be replaced with some code that is adding new children including the first one:

        if ($parent[0] == NULL) {
if (is_int($element)) {
if ($element != $node->count()) {
throw new Exception(sprintf("Element Number out of order: %d unfitting for %d elements so far.", $element, $node->count()));
}
$node[] = '';
$parent = $node->$element;
} else {
$parent[0] = $node->addChild($element);
}
}

The exception still in is for the case when a new element is added but it's number is larger than the existing number of elements plus one. e.g. you have got 4 elements and then you "add" the element with the number 6, this won't work. The value is zero-based and this normally should not be any problem.

Demo



Related Topics



Leave a reply



Submit