Using Simplexml to Read Rss Feed

Using SimpleXML to read RSS feed

SimpleXML is pretty bad at handling namespaces. You have two choices: The simplest hack is to simply read the contents of the feed into a string and replace the namespaces;

$feed = file_get_contents('http://feeds.bbci.co.uk/news/england/rss.xml');
$feed = str_replace('<media:', '<', $feed);

$rss = simplexml_load_string($feed);
...

Now you can access the element thumbnail directly.

The more elegant (not really) method is to find out what URI the namespace uses. If you look at the source code for http://feeds.bbci.co.uk/news/england/rss.xml you see that it points to http://search.yahoo.com/mrss/.

Now you can use this URI in the children() method of a SimpleXMLElement to get the contents of the media:thumbnail element;

$rss = simplexml_load_file('http://feeds.bbci.co.uk/news/england/rss.xml');

foreach ($rss->channel->item as $item) {
$media = $item->children('http://search.yahoo.com/mrss/');
...
}

Accessing items in an RSS feed using SimpleXMLElement

If you want to output the <rss:item> data, then the easiest way is to get the children() of the root element, but from the namespace prefixed rss (the call to $xml->children("rss", true)). This will then allow you to access all of the data using object notation (as is normal for SimpleXML).

$url = "https://onlinelibrary.wiley.com/action/showFeed?jc=18630669&type=etoc&feed=rss";
$xml = file_get_contents($url);
$xml = new SimpleXMLElement($xml);

foreach ( $xml->children("rss", true)->item as $item ) {
echo (string)$item->title.PHP_EOL;
}

which outputs the title element of each item (the echo (string)$item->title.PHP_EOL line)(abbreviated)...

The Limiting Factor to the Outbreak of Lake Black Bloom: Roles of Ferrous Iron and Sulfide Ions
A Pilot‐Scale Diatomite Membrane Bioreactor for Slightly Polluted Surface Water Treatment
Agronomic Valorization of Olive Mill Wastewaters: Effects on Medicago sativa Growth and Soil Characteristics

One thing to note is that you say you only get object(SimpleXMLElement)[598] back - but a SimpleXMLElement may contain a list of elements, it's more a case of how you use that content. Also using print_r() or many other normal ways of seeing the content doesn't give you the full data. For SimpleXML - use echo $xml->asXML(); to see what it contains.

Display RSS feed with Laravel PHP using SimpleXML

try this

@foreach ($flux[0]->item->link as $item)
<article class="entry-item">
<img src="{{utf8_decode((string)$item->enclosure['url'])}}" alt="Sample Image">
<div class="entry-content">
<a href="{{ $item->link }}">{{ $item->title }}</a>
{{ $item->description }}
</div>
</article>
@endforeach

because you have mutliple items

Simplexml & php rss feed (view image from rss feed)

The code echo $ns_dc->thumbnail; prints out the text contents of the <media:thumbnail> element, even though it has none. To access the value in the url attribute, use the following:

echo $ns_dc->thumbnail->attributes()->url;

How to parse media:content tag in RSS with simplexml

Tags are actually empty:

<media:content ... />
^^

Information is contained in attributes, which can be fetched with SimpleXMLElement::attributes(), e.g.:

$rss = simplexml_load_file($url, null, LIBXML_NOCDATA);
$namespaces = $rss->getNamespaces(true);
$media_content = $rss->channel->item[0]->children($namespaces['media']);
foreach($media_content->group->content as $i){
var_dump((string)$i->attributes()->url);
}

I suspect the problem comes from the JSON trick. SimpleXML generates all its classes and properties dynamically (they aren't regular PHP classes), what means that you can't fully rely on standard PHP features like print_r() or json_encode(). This gets illustrated if you insert this in the above loop:

var_dump($i, json_encode($i), (string)$i->attributes()->url);
object(SimpleXMLElement)#2 (0) {
}
string(2) "{}"
string(91) "http://i2.cdn.turner.com/cnnnext/dam/assets/161115120658-trump-putin-t1-tease-super-169.jpg"
...

How to Parse XML's Media:Content with PHP?

To get 'url' attribute, use ->attribute() syntax:

$ns_media = $news->children('http://search.yahoo.com/mrss/');

/* Echoes 'url' attribute: */
echo $ns_media->content->attributes()['url'];
// in php < 5.5: $attr = $ns_media->content->attributes(); echo $attr['url'];

/* Catches 'url' attribute: */
$url = $ns_media->content->attributes()['url']->__toString();
// in php < 5.5: $attr = $ns_media->content->attributes(); $url = $attr['url']->__toString();

Namespaces explanation:

The ->children() arguments is not the URL of your XML, it is a Namespace URI.

XML namespaces are used for providing uniquely named elements and attributes in an XML document:

<xxx>       Standard XML tag
<yyy:zzz> Namespaced tag
└┬┘ └┬┘
│ └──── Element Name
└──────── Element Prefix (Namespace Identifier)

So, in your case, <media:content> is the “content” element of Namespace “media”. Namespaced elements must be have an associated Namespace URI, as attribute of a parent node or — most commonly — of the root element: this attribute has the form xmlns:yyy="NamespaceURI" (in your case xmlns:media="http://search.yahoo.com/mrss/" as attribute of root node <rss>).

Ultimately, the above $news->children( 'http://search.yahoo.com/mrss/' ) means “retrieve all children elements with http://search.yahoo.com/mrss/ as Namespace URI; an alternative — most intelligible — syntax is: $news->children( 'media', True ) (True means “regarded as a prefix”).

Returning to the code in example, the generic syntax to retrieve all first item's children with prefix media is:

$xml = simplexml_load_file( 'http://rssfeeds.webmd.com/rss/rss.aspx?RSSSource=RSS_PUBLIC' );
$xml->channel->item[0]->children( 'http://search.yahoo.com/mrss/' );

or (identical result):

$xml = simplexml_load_file( 'http://rssfeeds.webmd.com/rss/rss.aspx?RSSSource=RSS_PUBLIC' );
$xml->channel->item[0]->children( 'media', True );

Your new code:

If you want to show the <media:content url> thumbnail for each element in your page, modify the original code in this way:

(...)
$pubDate = $xml->channel->item[$i]->pubDate;
$image = $xml->channel->item[$i]->children( 'media', True )->content->attributes()['url'];
// in php < 5.5:
// $attr = $xml->channel->item[$i]->children( 'media', True )->content->attributes();
// $image = $attr['url'];

$html .= "<a href='$link'><h3>$title</h3></a>";
$html .= "<img src='$image' alt='$title'>";
(...)

PHP - RSS Parser XML

I came up with this:

<?php
$url = "http://rss.nytimes.com/services/xml/rss/nyt/Sports.xml"; // xmld.xml contains above data
$feeds = file_get_contents($url);
$rss = simplexml_load_string($feeds);

$items = [];

foreach($rss->channel->item as $entry) {
$image = '';
$image = 'N/A';
$description = 'N/A';
foreach ($entry->children('media', true) as $k => $v) {
$attributes = $v->attributes();

if ($k == 'content') {
if (property_exists($attributes, 'url')) {
$image = $attributes->url;
}
}
if ($k == 'description') {
$description = $v;
}
}

$items[] = [
'link' => $entry->link,
'title' => $entry->title,
'image' => $image,
'description' => $description,
];
}

print_r($items);
?>

Giving:

Array
(
[0] => Array
(
[link] => SimpleXMLElement Object
(
[0] => https://www.nytimes.com/2017/04/17/sports/basketball/a-court-used-for-playing-hoops-since-1893-where-paris.html?partner=rss&emc=rss
)

[title] => SimpleXMLElement Object
(
[0] => A Court Used for Playing Hoops Since 1893. Where? Paris.
)

[image] => SimpleXMLElement Object
(
[0] => https://static01.nyt.com/images/2017/04/05/sports/basketball/05oldcourt10/05oldcourt10-moth-v13.jpg
)

[description] => SimpleXMLElement Object
(
[0] => The Y.M.C.A. in Paris says its basketball court, with its herringbone pattern and loose slats, is the oldest one in the world. It has been continuously functional since the building opened in 1893.
)

)
.....

And you can iterate over

foreach ($items as $item) {
printf('<img src="%s">', $item['image']);
printf('<a href="%s">%s</a>', $item['url'], $item['title']);
}

Hope this helps.



Related Topics



Leave a reply



Submit