How to use XMLReader in PHP?
It all depends on how big the unit of work, but I guess you're trying to treat each <product/>
nodes in succession.
For that, the simplest way would be to use XMLReader to get to each node, then use SimpleXML to access them. This way, you keep the memory usage low because you're treating one node at a time and you still leverage SimpleXML's ease of use. For instance:
$z = new XMLReader;
$z->open('data.xml');
$doc = new DOMDocument;
// move to the first <product /> node
while ($z->read() && $z->name !== 'product');
// now that we're at the right depth, hop to the next <product/> until the end of the tree
while ($z->name === 'product')
{
// either one should work
//$node = new SimpleXMLElement($z->readOuterXML());
$node = simplexml_import_dom($doc->importNode($z->expand(), true));
// now you can use $node without going insane about parsing
var_dump($node->element_1);
// go to next <product />
$z->next('product');
}
Quick overview of pros and cons of different approaches:
XMLReader only
Pros: fast, uses little memory
Cons: excessively hard to write and debug, requires lots of userland code to do anything useful. Userland code is slow and prone to error. Plus, it leaves you with more lines of code to maintain
XMLReader + SimpleXML
Pros: doesn't use much memory (only the memory needed to process one node) and SimpleXML is, as the name implies, really easy to use.
Cons: creating a SimpleXMLElement object for each node is not very fast. You really have to benchmark it to understand whether it's a problem for you. Even a modest machine would be able to process a thousand nodes per second, though.
XMLReader + DOM
Pros: uses about as much memory as SimpleXML, and XMLReader::expand() is faster than creating a new SimpleXMLElement. I wish it was possible to use
simplexml_import_dom()
but it doesn't seem to work in that caseCons: DOM is annoying to work with. It's halfway between XMLReader and SimpleXML. Not as complicated and awkward as XMLReader, but light years away from working with SimpleXML.
My advice: write a prototype with SimpleXML, see if it works for you. If performance is paramount, try DOM. Stay as far away from XMLReader as possible. Remember that the more code you write, the higher the possibility of you introducing bugs or introducing performance regressions.
Get specific tag with XMLReader in PHP
Ok, some notes on this...
The file is too big for SimpleXML, so I'm using XMLReader.
That would mean that loading the XML file with SimpleXML reaches PHP's memory_limit, right?
Alternatives would be to stream or chunk read the XML file and process the parts.
$xml_chunk = (.... read file chunked ...)
$xml = simplexml_load_string($xml_chunk);
$json = json_encode($xml);
$array = json_decode($json,TRUE);
But working with XMLReader is fine!
Maybe there is a way where I can specify that XMLReader only reads the
tag after ?
Yes, there is. Like "i alarmed alien" pointed out: if you work with DomDocument, you can use an Xpath query to reach the exact (node|item|element) you want.
$dom = new DomDocument();
$dom->load("tooBig.xml");
$xp = new DomXPath($dom);
$result = $xp->query("/extensions/extension/version/downloadcounter");
print $result->item(0)->nodeValue ."\n";
For more examples see the PHP manual: http://php.net/manual/de/domxpath.query.php
If you want to stick to XMLReader:
The XMLReader extension is an XML Pull parser. The reader is going forward on the document stream, stopping on each node on the way. This explains why you get the first from beneath the tag, but not the one beneath .
This makes iterations hard, because lookahead and stuff is not really possible without re-reading.
DEMO http://ideone.com/Oykfyh
<?php
$xml = <<<'XML'
<?xml version="1.0" encoding="utf-8"?>
<extensions>
<extension extensionkey="fp_product_features">
<downloadcounter>355</downloadcounter>
<version version="0.1.0">
<title>Product features</title>
<description/>
<downloadcounter>24</downloadcounter>
<state>beta</state>
<reviewstate>0</reviewstate>
<category>plugin</category>
<lastuploaddate>1142878270</lastuploaddate>
<uploadcomment> added related features</uploadcomment>
</version>
</extension>
</extensions>
XML;
$reader = new XMLReader();
$reader->open('data:/text/plain,'.urlencode($xml));
$result = [];
$element = null;
while ($reader->read()) {
if($reader->nodeType === XMLReader::ELEMENT)
{
$element = $reader->name;
if($element === 'extensions') {
$result['extensions'] = array();
}
if($element === 'extension') {
$result['extensions']['extension'] = array();
}
if($element === 'downloadcounter') {
if(!is_array($result['extensions']['extension']['version'])) {
$result['extensions']['extension']['downloadcounter'] = '';
} /*else {
$result['extensions']['extension']['version']['downloadcounter'] = '';
}*/
}
if($element === 'version') {
$result['extensions']['extension']['version'] = array();
while ($reader->read()) {
if($reader->nodeType === XMLReader::ELEMENT)
{
$element = $reader->name;
$result['extensions']['extension']['version'][$element] = '';
}
if($reader->nodeType === XMLReader::TEXT)
{
$value = $reader->value;
$result['extensions']['extension']['version'][$element] = $value;
}
}
}
}
if($reader->nodeType === XMLReader::TEXT)
{
$value = $reader->value;
if($element === 'downloadcounter') {
if(!is_array($result['extensions']['extension']['version'])) {
$result['extensions']['extension']['downloadcounter'] = $value;
}
if(is_array($result['extensions']['extension']['version'])) {
$result['extensions']['extension']['version']['downloadcounter'] = $value;
}
}
}
}
$reader->close();
echo var_export($result, true);
Result:
array (
'extensions' =>
array (
'extension' =>
array (
'downloadcounter' => '355',
'version' =>
array (
'title' => 'Product features',
'description' => '',
'downloadcounter' => '24',
'state' => 'beta',
'reviewstate' => '0',
'category' => 'plugin',
'lastuploaddate' => '1142878270',
'uploadcomment' => ' added related features',
),
),
),
)
This transform your XML into an array (with nested arrays).
It's not really perfect, because of unnecessary iterations.
Feel free to hack away...
Additionally:
- Parsing Huge XML Files in PHP
- https://github.com/prewk/XmlStreamer
Reading child nodes with XMLReader in PHP
You are missing the code which moves onto the next item in the read loop
$xml->next($_GET['name']);
So...
while($xml->name === $_GET['name']) {
$item = array();
$node = new SimpleXMLElement($xml->readOuterXML());
if($node->from == $_GET['name']) {
echo $i.": ".$node->from." | ".$node->to." | ".$node->distance." | ".$node->fromX." | ".$node->fromY." | ".$node->toX." | ".$node->toY."<br>";
$i++;
}
// Next item...
$xml->next($_GET['name']);
}
$xml->$close();
Read XML using XMLReader in PHP without know nodes
Many of those nodes you'd think would be XMLReader::TEXT
nodes are actually XMLReader::SIGNIFICANT_WHITESPACE
.
Fortunately you can drop that $xml->nodeType == XMLReader::TEXT
check altogether and build your result as you encounter elements.
Example:
while ($xml->read()) {
if ($xml->nodeType == XMLReader::ELEMENT) {
array_push($a, $xml->name);
$result[] = implode(",", $a);
}
if ($xml->nodeType == XMLReader::END_ELEMENT) {
array_pop($a);
}
}
This'll give you:
Array
(
[0] => Invoices
[1] => Invoices,Company
[2] => Invoices,Company,Name
[3] => Invoices,Documents
[4] => Invoices,Documents,Document
[5] => Invoices,Documents,Document,CustomerCode
[6] => Invoices,Documents,Document,CustomerWebLogin
[7] => Invoices,Documents,Document,CustomerName
)
Use XMLReader to find node and retrieve XML from current node and following children
If all the item
elements are siblings you can use XMLReader::read()
to find the first element and XMLReader::next()
to iterate them.
Then use XMLReader::expand()
to load the item
and its descendants into DOM, use Xpath to read data from it.
$searchForID = '123';
$reader = new XMLReader();
$reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));
$document = new DOMDocument();
$xpath = new DOMXpath($document);
// look for the first "item" element node
while (
$reader->read() && $reader->localName !== 'item'
) {
continue;
}
// iterate "item" sibling elements
while ($reader->localName === 'item') {
// expand into DOM
$item = $reader->expand($document);
// if the node has a child "id" with the searched contents
if ($xpath->evaluate("count(self::*[id = '$searchForID']) > 0", $item)) {
var_dump(
[
// fetch node text content as string
'name' => $xpath->evaluate('string(name)', $item),
// fetch list of "call" elements and map them
'calls' => array_map(
function(DOMElement $call) use ($xpath) {
return [
'name' => $xpath->evaluate('string(name)', $call),
'text' => $xpath->evaluate('string(text)', $call)
];
},
iterator_to_array(
$xpath->evaluate('calls/call', $item)
)
)
]
);
}
$reader->next('item');
}
$reader->close();
XML with namespaces
If the XML uses a namespace (like the one you linked in the comments) you will have to takes it into consideration.
For the XMLReader that means validating not just localName
(the node name without any namespace prefix/alias) but the namespaceURI
as well.
For DOM methods that would mean using the namespace aware methods (with the suffix NS) and registering your own alias/prefix for the Xpath expressions.
$searchForID = '2755';
$reader = new XMLReader();
$reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));
// the namespace uri
$xmlns_siri = 'http://www.siri.org.uk/siri';
$document = new DOMDocument();
$xpath = new DOMXpath($document);
// register an alias for the siri namespace
$xpath->registerNamespace('siri', $xmlns_siri);
// look for the first "item" element node
while (
$reader->read() &&
(
$reader->localName !== 'EstimatedVehicleJourney' ||
$reader->namespaceURI !== $xmlns_siri
)
) {
continue;
}
// iterate "item" sibling elements
while ($reader->localName === 'EstimatedVehicleJourney') {
// validate the namespace of the node
if ($reader->namespaceURI === $xmlns_siri) {
// expand into DOM
$item = $reader->expand($document);
// if the node has a child "VehicleRef" with the searched contents
// note the use of the registered namespace alias
if ($xpath->evaluate("count(self::*[siri:VehicleRef = '$searchForID']) > 0", $item)) {
var_dump(
[
// fetch node text content as string
'name' => $xpath->evaluate('string(siri:OriginName)', $item),
// fetch list of "call" elements and map them
'calls' => array_map(
function(DOMElement $call) use ($xpath) {
return [
'name' => $xpath->evaluate('string(siri:StopPointName)', $call),
'reference' => $xpath->evaluate('string(siri:StopPointRef)', $call)
];
},
iterator_to_array(
$xpath->evaluate('siri:RecordedCalls/siri:RecordedCall', $item)
)
)
]
);
}
}
$reader->next('EstimatedVehicleJourney');
}
$reader->close();
how to read only part of an xml file with php xmlreader
use array_splice
to extract the portion of array
require ('xmlreader-iterators.php');
$xmlFile = 'http://www.example.com/rss.xml';
$reader = new XMLReader();
$reader->open($xmlFile);
$itemIterator = new XMLElementIterator($reader, 'item');
$items = array();
$curr_page = (0 === (int) $_GET['page']) ? 1 : $_GET['page'];
$pages = 0;
$max = 10;
foreach ($itemIterator as $item) {
$xml = $item->asSimpleXML();
$items[] = array(
'title' => (string) $xml->title,
'link' => (string) $xml->link
);
}
// Take the length of the array
$len = count($items);
// Get the number of pages
$pages = ceil($len / $max);
// Calculate the starting point
$start = ceil(($curr_page - 1) * $max);
// return the portion of results
$arrayItem = array_slice($items, $start, $max);
for ($i = 0; $i <= 9; $i ++) {
echo '<a href="' . $arrayItem[$i]['link'] . '">' . $arrayItem[$i]['title'] . '</a><br>';
}
// pagining stuff
for ($i = 1; $i <= $pages; $i ++) {
if ($i === (int) $page) {
// current page
$str[] = sprintf('<span style="color:red">%d</span>', $i);
} else {
$str[] = sprintf('<a href="?page=%d" style="color:green">%d</a>', $i, $i);
}
}
echo implode('', $str);
How to open an xml file, in php using XMLReader, and using an https address?
If you want to use an HTTPS url you have to make sure that the openssl extension is activated in PHP
php.ini
extension=php_openssl.dll
or if you are unix
extension=php_openssl.so
Reading Child Nodes with XMLReader
Nevermind, figured it out. For anyone else who gets stuck on this:
$xml = new XMLReader();
if(!$xml->open('Items.xml')){
die('Failed to open file!');
} else {
echo 'File opened';
}
$items = array();
while ($xml->read() && $xml->name !== "Item");
while ($xml->name === "Item") {
$item = array();
$node = new SimpleXMLElement($xml->readOuterXML());
$item['itemkey'] = $node->ItemKey;
$item['englishName'] = $node->Name->English;
$item['englishDesc'] = $node->Description->English;
$items[] = $item;
}
Using XMLreader to read and parse large XML files. Empty values problem
Here's some code that will do what you want. It saves the value for each element when it encounters a TEXT
or CDATA
node, then stores it when it gets to END_ELEMENT
. At that time the saved value is set to ''
, so that if no value is found for an element, it gets an empty string (this could be changed to null
if you prefer). It also deals with self-closing tags for example <brandName />
with an isEmptyElement
check when a ELEMENT
node is found. It takes advantage of PHPs variable variables to avoid the long sequence of if ($nodename == ...)
that you have in your code, but also uses an array to store the values for each product, which longer term I think is a better solution for your problem.
$reader = new XMLReader();
$reader->xml($xml);
$count = 0;
$this_value = '';
$products = array();
while($reader->read()) {
switch ($reader->nodeType) {
case XMLReader::ELEMENT:
// deal with self-closing tags e.g. <productEan />
if ($reader->isEmptyElement) {
${$reader->name} = '';
$products[$count][$reader->name] = '';
}
break;
case XMLReader::TEXT:
case XMLReader::CDATA:
// save the value for storage when we get to the end of the element
$this_value = $reader->value;
break;
case XMLReader::END_ELEMENT:
if ($reader->name == 'product') {
$count++;
print_r(array($categoryName, $brandName, $productCode, $productId, $productFullName, $productEan, $productEuroPriceNetto, $productFrontendPriceNetto, $productFastestSupplierQuantity, $deliveryEstimatedDays));
}
elseif ($reader->name != 'products') {
${$reader->name} = $this_value;
$products[$count][$reader->name] = $this_value;
// set this_value to a blank string to allow for empty tags
$this_value = '';
}
break;
case XMLReader::WHITESPACE:
case XMLReader::SIGNIFICANT_WHITESPACE:
default:
// nothing to do
break;
}
}
$reader->close();
print_r($products);
I've omitted the output as it's quite long but you can see the code in operation in this demo on 3v4l.org.
Related Topics
PHP Code to Convert a MySQL Query to Csv
Simultaneous Requests to PHP Script
Convert Command Line Curl to PHP Curl
Get Sum of MySQL Column in PHP
Can Png Image Transparency Be Preserved When Using PHP'S Gdlib Imagecopyresampled
Does $_Server['Http_X_Requested_With'] Exist in PHP or Not
PHP Fatal Error: Call to Undefined Function Curl_Init()
Encrypting/Decrypting File With Mcrypt
Simple Explanation PHP Oop VS Procedural
How to Remove "Public/Index.PHP" in the Url Generated Laravel
Detect Language from String in PHP
A Non Well Formed Numeric Value Encountered
How to Check Uploaded File Type in PHP