How To Discover RSS Feeds for a given URL
Found something that I wanted:
Google's AJAX Feed API has a load feed and lookup feed function (Docs here).
a) Load feed provides the feed (and feed status) in JSON
b) Lookup feed provides the RSS feed for a given URL
Theres also a find feed function that searches for RSS feeds based on a keyword.
Planning to use this with JQuery's $.getJSON
How to find RSS feed of a particular website?
You might be able to find it by looking at the source of the home page (or blog). Look for a line that looks like this:
<link rel="alternate" type="application/rss+xml" title="RSS Feed" href="http://example.org/rss" />
The href value will be where the RSS is located.
How to get the feed URL(s) from a website?
It is not common practice for websites to send back their RSS feed from an HTTP request to the home page asking for an application/rss+xml MIME type in the Accept header. That documentation on Mozilla you've linked is a suggestion I've never seen before after many years involvement in RSS as a developer.
A more established and widely adopted method for a site to identify its RSS feed is a technique called RSS Autodiscovery. Open the site's home page and look for this tag in the HEAD section:
<link rel="alternate" type="application/rss+xml" title="RSS"
href="http://feeds.example.com/rss-feed">
The type attribute can be any of the MIME types for RSS, Atom or JSONFeed feeds.
To find whether the given URL is a RSS Feed URL or not
There are a few things you can try, off of the top of my head:
- See what
Content-Type
the server returns for the given URL. However, this may not be definitive and a server may not necessarily return the correct header. - Try to parse the content of the URL as RSS and see if it is successful - this is likely the only definitive proof that a given URL is a RSS feed.
Extract RSS Feed url from
In general, a website that offers RSS feed(s) indicates so in the header of at least the home page, some every single page.
There is an example of an RSS feed:
<link href="http://snapwebsites.org/rss.xml"
title="Snap! A C++ Open Source CMS RSS"
type="application/rss+xml"
rel="alternate">
Note that the type will vary slightly between websites. For example some websites may use text
instead of application
(which is wrong, but XML is text...) There is also application/atom+xml
. You may also have both formats.
If that's not available, then you'd have to check the home page or other pages for anchor links to an RSS feed, which means:
- Parse the HTML
- Look for anchors
- Read the
href
attribute - Check the destination to see whether it returns an XML file
- If you get an xml file (starts with
<?xml ...
) then check the root tag:
- 'rss' -- RSS format (version is an attribute)
- 'feed' -- Atom format
I have an example on the following page that includes the <link ...>
tag in the header:
http://snapwebsites.org/implementation/feature-requirements/feed-feature-core-atom-rss-20-etc
I have to say, without that link, it will be quite a bit harder to find the RSS feeds. That being said, on many websites the feeds files make use of an extension (.rss, .atom, .xml) and that could be used to simplified the search. Yet, more and more, feeds look like directory names (.../blah
or .../foo
cannot be distinguished from a standard HTML page or a feed, so the only way is to read the file at the destination and check the file format; the Content-Type
of the HTTP reply should be application/rss+xml
or application/atom+xml
too... like the header link type=...
attribute)
As a side note, although very unlikely (I've not really seen it on a live website), it is possible to use the Link: ...
HTTP header to indicate... links just the same as the <link ...>
tag found in the HTML header. If you have access to the HTTP header (here is how to do it in PHP), then it's worth looking for those headers to see whether one of them is an RSS feed.
Find feed rss for a given URL: Feedbag error?
Currently I didn't found why this occurring. Since I don't going update this question / answer in the future, you can check the current status of my issue on GitHub clicking here.
I hope the developer has saw my issue, but until the moment I didn't get any tip.
Related Topics
Check If Current User Is Administrator in Wordpress
What Http Status Code Is Supposed to Be Used to Tell the Client the Session Has Timed Out
Fatal Error - Too Many Open Files
How to Get a User's Instagram Feed
Friend of a Friend in PHP/Mysql
Get Code Line and File That's Executing the Current Function in PHP
How to Log into Joomla Through an External Script
Implementing Acl for My PHP Application
PHP - a Db Abstraction Layer Use Static Class VS Singleton Object
PHP Method="Post" Stopped Working After I Added This .Htaccess... Why
Rename an Uploaded File with PHP But Keep the Extension
Windows 7 PHP + Symfony2 Terribly Slow
Curl Error 60: Ssl Certificate Prblm: Unable to Get Local Issuer Certificate