How to Use Xmlreader in PHP

How to use XMLReader in PHP?

It all depends on how big the unit of work, but I guess you're trying to treat each <product/> nodes in succession.

For that, the simplest way would be to use XMLReader to get to each node, then use SimpleXML to access them. This way, you keep the memory usage low because you're treating one node at a time and you still leverage SimpleXML's ease of use. For instance:

$z = new XMLReader;
$z->open('data.xml');

$doc = new DOMDocument;

// move to the first <product /> node
while ($z->read() && $z->name !== 'product');

// now that we're at the right depth, hop to the next <product/> until the end of the tree
while ($z->name === 'product')
{
// either one should work
//$node = new SimpleXMLElement($z->readOuterXML());
$node = simplexml_import_dom($doc->importNode($z->expand(), true));

// now you can use $node without going insane about parsing
var_dump($node->element_1);

// go to next <product />
$z->next('product');
}

Quick overview of pros and cons of different approaches:

XMLReader only

  • Pros: fast, uses little memory

  • Cons: excessively hard to write and debug, requires lots of userland code to do anything useful. Userland code is slow and prone to error. Plus, it leaves you with more lines of code to maintain

XMLReader + SimpleXML

  • Pros: doesn't use much memory (only the memory needed to process one node) and SimpleXML is, as the name implies, really easy to use.

  • Cons: creating a SimpleXMLElement object for each node is not very fast. You really have to benchmark it to understand whether it's a problem for you. Even a modest machine would be able to process a thousand nodes per second, though.

XMLReader + DOM

  • Pros: uses about as much memory as SimpleXML, and XMLReader::expand() is faster than creating a new SimpleXMLElement. I wish it was possible to use simplexml_import_dom() but it doesn't seem to work in that case

  • Cons: DOM is annoying to work with. It's halfway between XMLReader and SimpleXML. Not as complicated and awkward as XMLReader, but light years away from working with SimpleXML.

My advice: write a prototype with SimpleXML, see if it works for you. If performance is paramount, try DOM. Stay as far away from XMLReader as possible. Remember that the more code you write, the higher the possibility of you introducing bugs or introducing performance regressions.

Get specific tag with XMLReader in PHP

Ok, some notes on this...

The file is too big for SimpleXML, so I'm using XMLReader.

That would mean that loading the XML file with SimpleXML reaches PHP's memory_limit, right?
Alternatives would be to stream or chunk read the XML file and process the parts.

$xml_chunk = (.... read file chunked ...)
$xml = simplexml_load_string($xml_chunk);
$json = json_encode($xml);
$array = json_decode($json,TRUE);

But working with XMLReader is fine!

Maybe there is a way where I can specify that XMLReader only reads the
tag after ?

Yes, there is. Like "i alarmed alien" pointed out: if you work with DomDocument, you can use an Xpath query to reach the exact (node|item|element) you want.

$dom = new DomDocument();
$dom->load("tooBig.xml");
$xp = new DomXPath($dom);

$result = $xp->query("/extensions/extension/version/downloadcounter");

print $result->item(0)->nodeValue ."\n";

For more examples see the PHP manual: http://php.net/manual/de/domxpath.query.php


If you want to stick to XMLReader:

The XMLReader extension is an XML Pull parser. The reader is going forward on the document stream, stopping on each node on the way. This explains why you get the first from beneath the tag, but not the one beneath .
This makes iterations hard, because lookahead and stuff is not really possible without re-reading.

DEMO http://ideone.com/Oykfyh

<?php

$xml = <<<'XML'
<?xml version="1.0" encoding="utf-8"?>
<extensions>
<extension extensionkey="fp_product_features">
<downloadcounter>355</downloadcounter>
<version version="0.1.0">
<title>Product features</title>
<description/>
<downloadcounter>24</downloadcounter>
<state>beta</state>
<reviewstate>0</reviewstate>
<category>plugin</category>
<lastuploaddate>1142878270</lastuploaddate>
<uploadcomment> added related features</uploadcomment>
</version>
</extension>
</extensions>
XML;

$reader = new XMLReader();
$reader->open('data:/text/plain,'.urlencode($xml));

$result = [];
$element = null;

while ($reader->read()) {

if($reader->nodeType === XMLReader::ELEMENT)
{
$element = $reader->name;

if($element === 'extensions') {
$result['extensions'] = array();
}

if($element === 'extension') {
$result['extensions']['extension'] = array();
}

if($element === 'downloadcounter') {
if(!is_array($result['extensions']['extension']['version'])) {
$result['extensions']['extension']['downloadcounter'] = '';
} /*else {
$result['extensions']['extension']['version']['downloadcounter'] = '';
}*/
}

if($element === 'version') {
$result['extensions']['extension']['version'] = array();
while ($reader->read()) {
if($reader->nodeType === XMLReader::ELEMENT)
{
$element = $reader->name;
$result['extensions']['extension']['version'][$element] = '';
}
if($reader->nodeType === XMLReader::TEXT)
{
$value = $reader->value;
$result['extensions']['extension']['version'][$element] = $value;
}
}
}
}

if($reader->nodeType === XMLReader::TEXT)
{
$value = $reader->value;

if($element === 'downloadcounter') {
if(!is_array($result['extensions']['extension']['version'])) {
$result['extensions']['extension']['downloadcounter'] = $value;
}
if(is_array($result['extensions']['extension']['version'])) {
$result['extensions']['extension']['version']['downloadcounter'] = $value;
}
}
}
}
$reader->close();

echo var_export($result, true);

Result:

array (
'extensions' =>
array (
'extension' =>
array (
'downloadcounter' => '355',
'version' =>
array (
'title' => 'Product features',
'description' => '',
'downloadcounter' => '24',
'state' => 'beta',
'reviewstate' => '0',
'category' => 'plugin',
'lastuploaddate' => '1142878270',
'uploadcomment' => ' added related features',
),
),
),
)

This transform your XML into an array (with nested arrays).
It's not really perfect, because of unnecessary iterations.
Feel free to hack away...

Additionally:
- Parsing Huge XML Files in PHP
- https://github.com/prewk/XmlStreamer

Reading child nodes with XMLReader in PHP

You are missing the code which moves onto the next item in the read loop

$xml->next($_GET['name']);

So...

while($xml->name === $_GET['name']) {
$item = array();
$node = new SimpleXMLElement($xml->readOuterXML());
if($node->from == $_GET['name']) {
echo $i.": ".$node->from." | ".$node->to." | ".$node->distance." | ".$node->fromX." | ".$node->fromY." | ".$node->toX." | ".$node->toY."<br>";
$i++;
}

// Next item...
$xml->next($_GET['name']);
}
$xml->$close();

Read XML using XMLReader in PHP without know nodes

Many of those nodes you'd think would be XMLReader::TEXT nodes are actually XMLReader::SIGNIFICANT_WHITESPACE.

Fortunately you can drop that $xml->nodeType == XMLReader::TEXT check altogether and build your result as you encounter elements.

Example:

while ($xml->read()) {
if ($xml->nodeType == XMLReader::ELEMENT) {
array_push($a, $xml->name);
$result[] = implode(",", $a);
}

if ($xml->nodeType == XMLReader::END_ELEMENT) {
array_pop($a);
}
}

This'll give you:

Array
(
[0] => Invoices
[1] => Invoices,Company
[2] => Invoices,Company,Name
[3] => Invoices,Documents
[4] => Invoices,Documents,Document
[5] => Invoices,Documents,Document,CustomerCode
[6] => Invoices,Documents,Document,CustomerWebLogin
[7] => Invoices,Documents,Document,CustomerName
)

Use XMLReader to find node and retrieve XML from current node and following children

If all the item elements are siblings you can use XMLReader::read() to find the first element and XMLReader::next() to iterate them.

Then use XMLReader::expand() to load the item and its descendants into DOM, use Xpath to read data from it.

$searchForID = '123';

$reader = new XMLReader();
$reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));

$document = new DOMDocument();
$xpath = new DOMXpath($document);

// look for the first "item" element node
while (
$reader->read() && $reader->localName !== 'item'
) {
continue;
}

// iterate "item" sibling elements
while ($reader->localName === 'item') {
// expand into DOM
$item = $reader->expand($document);
// if the node has a child "id" with the searched contents
if ($xpath->evaluate("count(self::*[id = '$searchForID']) > 0", $item)) {
var_dump(
[
// fetch node text content as string
'name' => $xpath->evaluate('string(name)', $item),
// fetch list of "call" elements and map them
'calls' => array_map(
function(DOMElement $call) use ($xpath) {
return [
'name' => $xpath->evaluate('string(name)', $call),
'text' => $xpath->evaluate('string(text)', $call)
];
},
iterator_to_array(
$xpath->evaluate('calls/call', $item)
)
)
]
);
}
$reader->next('item');
}
$reader->close();

XML with namespaces

If the XML uses a namespace (like the one you linked in the comments) you will have to takes it into consideration.

For the XMLReader that means validating not just localName (the node name without any namespace prefix/alias) but the namespaceURI as well.

For DOM methods that would mean using the namespace aware methods (with the suffix NS) and registering your own alias/prefix for the Xpath expressions.

$searchForID = '2755';

$reader = new XMLReader();
$reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));

// the namespace uri
$xmlns_siri = 'http://www.siri.org.uk/siri';

$document = new DOMDocument();
$xpath = new DOMXpath($document);
// register an alias for the siri namespace
$xpath->registerNamespace('siri', $xmlns_siri);

// look for the first "item" element node
while (
$reader->read() &&
(
$reader->localName !== 'EstimatedVehicleJourney' ||
$reader->namespaceURI !== $xmlns_siri
)
) {
continue;
}

// iterate "item" sibling elements
while ($reader->localName === 'EstimatedVehicleJourney') {
// validate the namespace of the node
if ($reader->namespaceURI === $xmlns_siri) {
// expand into DOM
$item = $reader->expand($document);
// if the node has a child "VehicleRef" with the searched contents
// note the use of the registered namespace alias
if ($xpath->evaluate("count(self::*[siri:VehicleRef = '$searchForID']) > 0", $item)) {
var_dump(
[
// fetch node text content as string
'name' => $xpath->evaluate('string(siri:OriginName)', $item),
// fetch list of "call" elements and map them
'calls' => array_map(
function(DOMElement $call) use ($xpath) {
return [
'name' => $xpath->evaluate('string(siri:StopPointName)', $call),
'reference' => $xpath->evaluate('string(siri:StopPointRef)', $call)
];
},
iterator_to_array(
$xpath->evaluate('siri:RecordedCalls/siri:RecordedCall', $item)
)
)
]
);
}
}
$reader->next('EstimatedVehicleJourney');
}
$reader->close();

how to read only part of an xml file with php xmlreader

use array_splice to extract the portion of array

require ('xmlreader-iterators.php');

$xmlFile = 'http://www.example.com/rss.xml';
$reader = new XMLReader();
$reader->open($xmlFile);

$itemIterator = new XMLElementIterator($reader, 'item');
$items = array();

$curr_page = (0 === (int) $_GET['page']) ? 1 : $_GET['page'];

$pages = 0;

$max = 10;

foreach ($itemIterator as $item) {
$xml = $item->asSimpleXML();
$items[] = array(
'title' => (string) $xml->title,
'link' => (string) $xml->link
);
}

// Take the length of the array
$len = count($items);

// Get the number of pages
$pages = ceil($len / $max);

// Calculate the starting point
$start = ceil(($curr_page - 1) * $max);

// return the portion of results
$arrayItem = array_slice($items, $start, $max);

for ($i = 0; $i <= 9; $i ++) {
echo '<a href="' . $arrayItem[$i]['link'] . '">' . $arrayItem[$i]['title'] . '</a><br>';
}

// pagining stuff

for ($i = 1; $i <= $pages; $i ++) {

if ($i === (int) $page) {
// current page

$str[] = sprintf('<span style="color:red">%d</span>', $i);
} else {

$str[] = sprintf('<a href="?page=%d" style="color:green">%d</a>', $i, $i);
}
}
echo implode('', $str);

How to open an xml file, in php using XMLReader, and using an https address?

If you want to use an HTTPS url you have to make sure that the openssl extension is activated in PHP

php.ini

extension=php_openssl.dll

or if you are unix

extension=php_openssl.so

Reading Child Nodes with XMLReader

Nevermind, figured it out. For anyone else who gets stuck on this:

$xml = new XMLReader();
if(!$xml->open('Items.xml')){
die('Failed to open file!');
} else {
echo 'File opened';
}

$items = array();

while ($xml->read() && $xml->name !== "Item");
while ($xml->name === "Item") {
$item = array();
$node = new SimpleXMLElement($xml->readOuterXML());
$item['itemkey'] = $node->ItemKey;
$item['englishName'] = $node->Name->English;
$item['englishDesc'] = $node->Description->English;
$items[] = $item;
}

Using XMLreader to read and parse large XML files. Empty values problem

Here's some code that will do what you want. It saves the value for each element when it encounters a TEXT or CDATA node, then stores it when it gets to END_ELEMENT. At that time the saved value is set to '', so that if no value is found for an element, it gets an empty string (this could be changed to null if you prefer). It also deals with self-closing tags for example <brandName /> with an isEmptyElement check when a ELEMENT node is found. It takes advantage of PHPs variable variables to avoid the long sequence of if ($nodename == ...) that you have in your code, but also uses an array to store the values for each product, which longer term I think is a better solution for your problem.

$reader = new XMLReader();
$reader->xml($xml);
$count = 0;
$this_value = '';
$products = array();
while($reader->read()) {
switch ($reader->nodeType) {
case XMLReader::ELEMENT:
// deal with self-closing tags e.g. <productEan />
if ($reader->isEmptyElement) {
${$reader->name} = '';
$products[$count][$reader->name] = '';
}
break;
case XMLReader::TEXT:
case XMLReader::CDATA:
// save the value for storage when we get to the end of the element
$this_value = $reader->value;
break;
case XMLReader::END_ELEMENT:
if ($reader->name == 'product') {
$count++;
print_r(array($categoryName, $brandName, $productCode, $productId, $productFullName, $productEan, $productEuroPriceNetto, $productFrontendPriceNetto, $productFastestSupplierQuantity, $deliveryEstimatedDays));
}
elseif ($reader->name != 'products') {
${$reader->name} = $this_value;
$products[$count][$reader->name] = $this_value;
// set this_value to a blank string to allow for empty tags
$this_value = '';
}
break;
case XMLReader::WHITESPACE:
case XMLReader::SIGNIFICANT_WHITESPACE:
default:
// nothing to do
break;
}
}
$reader->close();
print_r($products);

I've omitted the output as it's quite long but you can see the code in operation in this demo on 3v4l.org.



Related Topics



Leave a reply



Submit