Reference - How to Handle Namespaces (Tags and Attributes With a Colon in Their Name) in Simplexml

Reference - How do I handle Namespaces (Tags and Attributes with a Colon in their Name) in SimpleXML?

What are XML namespaces?

A colon (:) in a tag or attribute name means that the element or attribute is in an XML namespace. Namespaces are a way of combining different XML formats / standards in one document, and keeping track of which names come from which format. The colon, and the part before it, aren't really part of the tag / attribute name, they just indicate which namespace it's in.

An XML namespace has a namespace identifier, which is identified by a URI (a URL or URN). The URI doesn't point at anything, it's just a way for someone to "own" the namespace. For instance, the SOAP standard uses the namespace http://www.w3.org/2003/05/soap-envelope and an OpenDocument file uses (among others) urn:oasis:names:tc:opendocument:xmlns:meta:1.0. The example in the question uses the namespaces http://example.com and https://namespaces.example.org/two.

Within a document, or a section of a document, a namespace is given a local prefix, which is the part you see before the colon. For instance, in different documents, the SOAP namespace might be given the local prefix soap:, SOAP:, SOAP-ENV:, env:, or just ns1:. These names are linked back to the identifier of the namespace using a special xmlns attribute, e.g. xmlns:soap="http://www.w3.org/2003/05/soap-envelope". The choice of prefix in a particular document is completely arbitrary, and could change each time it was generated without changing the meaning.

Finally, there is a default namespace in each document, or section of a document, which is the namespace used for elements with no prefix. It is defined by an xmlns attribute with no :, e.g. xmlns="http://www.w3.org/2003/05/soap-envelope". In the example above, <list> is in the default namespace, which is defined as http://example.com.

Somewhat peculiarly, un-prefixed attributes are never in the default namespace, but in a kind of "void namespace", which the standard doesn't clearly define. See: XML Namespaces and Unprefixed Attributes

SimpleXML gives me an empty object; what's wrong?

If you use print_r, var_dump, or similar "dump structure" functions on a SimpleXML object with namespaces in, some of the contents will not display. It is still there, and can be accessed as described below.

How do you access namespaces in SimpleXML?

SimpleXML provides two main methods for using namespaces:

  • The ->children() method allows you to access child elements in a particular namespace. It effectively switches your object to look at that namespace, until you call it again to switch back, or to another namespace.
  • The ->attributes() method works in a similar way, but allows you to access attributes in a particular namespace.

For instance, the example above might become:

define('XMLNS_EG1', 'http://example.com');
define('XMLNS_EG2', 'https://namespaces.example.org/two');
define('XMLNS_SEQ', 'urn:example:sequences');

foreach ( $sx->children(XMLNS_EG1)->list->children(XMLNS_EG2)->item as $item ) {
echo 'Position: ' . $item->attributes(XMLNS_SEQ)->position . "\n";
echo 'Item: ' . (string)$item . "\n";
}

You can also select the initial namespace when you first parse the XML, using the $namespace_or_prefix parameter, which is the fourth parameter to simplexml_load_string, simplexml_load_file, or new SimpleXMLElement.

For instance, if we created the object this way, we wouldn't need the ->children(XMLNS_EG1) call to access the list element:

$sx = simplexml_load_string($xml, null, 0, XMLNS_EG1);

(Note that if the root element uses a default namespace rather than a prefix, SimpleXML will select it automatically; but since you can't predict which namespace will be the default in future, it's best to always include the $namespace_or_prefix parameter or initial ->children() call.)

Short-hand (not recommended)

As a short-hand, you can also pass the methods the local alias of the namespace, by giving the second parameter as true. Remember that this prefix could change at any time, for instance, a generator might assign prefixes ns1, ns2, etc, and assign them in a different order if the code changes slightly. Relying on the full namespace URIs is always the best approach.

Using this short-hand, the code would become:

foreach ( $sx->list->children('ns2', true)->item as $item ) {
echo 'Position: ' . $item->attributes('seq', true)->position . "\n";
echo 'Item: ' . (string)$item . "\n";
}

(This short-hand was added in PHP 5.2, and you may see really old examples using a more long-winded version using $sx->getNamespaces to get a list of prefix-identifier pairs. This is the worst of both worlds, as you're still hard-coding the prefix rather than the identifier.)

PHP XML not parsing with SimpleXml

Fixed with adding xpath and registerXPathNamespace

$xml = simplexml_load_string($string);
$xml->registerXPathNamespace('default', 'http://tempuri.org/');
$auto = $xml->xpath("//default:BrandLst");

Are SimpleXMLElement and SimpleXMLElement::children() the same thing?

Despite its name, instances of the SimpleXMLElement class can represent a few different things:

  • A single XML element; in your example, $foo represents the <foo> element at the root of your XML document
  • A collection of XML elements with the same name; for instance, $foo->bar gives you an object with all child elements named bar
  • A collection of XML elements with different names, but the same parent; this is what ->children() gives you
  • A single XML attribute; $foo->bar[0]->baz[0]['id'] will give you an object for the id attribute of the first baz in your example
  • A collection of XML attributes on the same element; this is what ->attributes() gives you

A lot of the time, you don't actually notice which of these you have, because various short-hands let you treat them interchangeably; for instance:

  • Property, method, and attribute calls on collections of elements and attributes generally refer to the first element in the collection. So $foo->bar->baz['id'] is the same as $foo->bar[0]->baz[0]['id']
  • Conversely, using a foreach loop with an object representing a single node automatically loops over the children of that element, as though you'd called ->children()

There are however times when you need to tell the difference. For instance, foreach ( $foo->bar as $item ) will loop over all elements named bar; but foreach ( $foo->bar->children() as $item ) will access the first element named bar, and loop over its children.

The children() method is also used to switch between namespaces, see Reference - How do I handle Namespaces (Tags and Attributes with a Colon in their Name) in SimpleXML?

XML - getting value from namespace using SimpleXML

The children() method doesn't return some kind of token for the namespace, it returns a list of elements - the children which are in the given namespace.

The $xml variable represents the top-level message element, which doesn't have any children in the http://www.blendlabs.com namespace, so $xml->children('http://www.blendlabs.com') will just return an empty list. You need to first navigate to the other element, and then get its children in the http://www.blendlabs.com namespace, which will include the loan element.

Since the top level element is in the http://www.mismo.org/residential/2009/schemas namespace, you might need an extra children() call to make sure you select that first.

You didn't provide a complete XML, so I can't test the code (I don't fancy manually writing all those close tags), but it will look something like this:

$marketing_value = (string)
$xml
->children('http://www.mismo.org/residential/2009/schemas')
->deal_sets->deal_set->deals->deal->loans->loan->extension->other
->children('http://www.blendlabs.com')
->loan->marketing_items->marketing_item->marketingvalue;


Related Topics



Leave a reply



Submit