Feed Fetcher and API


This is API provides a couple of functions for working with RSS feeds. It exists because we were having uptime issues with the Google Feeds API. It fetches feeds, caches them for a while to reduce stress on the source server, and provides a simple javascript function to fetch, format and display the feeds.

Documentation and Examples


First, include the script:

<script src='//assets.euractiv.com/feeds/feeds-min.js'></script>

Uncompressed:

<script src='//assets.euractiv.com/feeds/feeds.js'></script>

Note: the script relies on jQuery being loaded first.

This creates a function window.VSAC.feeds. It accepts the following parameters:

  • feed_url: the URL to the feed. Required. It must be one of the whitelisted domains in http_allowed_domains or http_allowed_url.
  • callback: the callback to call with the fetched feed data. Will receive the response object as the first parameter (see Endpoint, below).
  • count: the number of entries to fetch, 1 to 100. Will not return more entries than are currently in the feed no matter how high the number is. Optional, default 3.
  • fields: the fields in each item to fetch, as an array. Optional, default ['link','title'].
  • strip_tags: strip HTML tags from the result on the server side. Highly recommended, default true.

Below you should see a list of the latest five links from feedforall.com's sample feed:

<script src="https://assets.euractiv.com/feeds/feeds-min.js"></script>
<div id='example-container'>
  <p>Below you should see a list of the latest five links
      from feedforall.com's sample feed:</p>
</div>
<script>(function () {
  var feed_url = 'http://www.feedforall.com/sample.xml',
      callback = function (links) {
          console.log(links);
          var ul = $('<ul />'), li, a, i;
          for (i = 0; i < links.length; i += 1) {
              li = $('<li />');
              a = $('<a />');
              a.attr('href', links[i].link);
              a.text(links[i].title);
              li.text(': ' + links[i].description);
              li.prepend(a);
              ul.append(li);
          }
          $('#example-container').append(ul);
      },
      fields = ['link', 'title', 'description'],
      count = 5;
  window.VSAC.feeds(feed_url, callback, count, fields);
}());</script>

The Full Content Feed Endpoint is a small API that replaces description attribute of an RSS feed with the full content of the linked item. It works by:

  1. Opening the feed
  2. Extracting the link attribute from each feed item
  3. Loading the linked resource and scraping the content via XPath selectors.
  4. Replacing the description attribute in the feed with the scrapted content.

This endpoint exists to facilitate the republishing of wire service articles. It is available at:

//assets.euractiv.com/feeds/full-content.php

It has the following query parameters:

  • feed: the configured handle for the feed. Available handles: guardian-environment, guardian-global-development, reuters.
  • api_key: to prevent abuse and copyright issues, the full content feed is hidden behind an API key, which must be specified here. Check the config file for the key.

You must be logged in to view this example.


<p>You must be logged in to view this example.</p>

Internal Use Only! This section refers to the API's internal workings and should not be used directly. It is for documentation only.

The API end point is located at:

//assets.euractiv.com/feeds/feed.php

It accepts the following query parameters:

  • feed: the URL to the feed. Required. It must be one of the whitelisted domains in http_allowed_domains or http_allowed_url.
  • count: the number of entries to fetch, 1 to 100. Will not return more entries than are currently in the feed no matter how high the number is. Optional, default 3.
  • fields: the fields in each item to fetch, as a comma separated list. Optional, default 'link,title'.
  • strip_tags: strip HTML tags from the returned results. Optional, default true.
$ curl https://assets.euractiv.com/feeds/feed.php?feed=http%3A%2F%2Fwww.feedforall.com%2Fsample.xml&count=5&fields=link%2Ctitle%2Ccomments;
[
    {
        "link": "http:\/\/www.feedforall.com\/restaurant.htm",
        "title": "RSS Solutions for Restaurants",
        "comments": "http:\/\/www.feedforall.com\/forum"
    },
    {
        "link": "http:\/\/www.feedforall.com\/schools.htm",
        "title": "RSS Solutions for Schools and Colleges",
        "comments": "http:\/\/www.feedforall.com\/forum"
    },
    {
        "link": "http:\/\/www.feedforall.com\/computer-service.htm",
        "title": "RSS Solutions for Computer Service Companies",
        "comments": "http:\/\/www.feedforall.com\/forum"
    },
    {
        "link": "http:\/\/www.feedforall.com\/government.htm",
        "title": "RSS Solutions for Governments",
        "comments": "http:\/\/www.feedforall.com\/forum"
    },
    {
        "link": "http:\/\/www.feedforall.com\/politics.htm",
        "title": "RSS Solutions for Politicians",
        "comments": "http:\/\/www.feedforall.com\/forum"
    }
]
<pre><code>$ curl https://assets.euractiv.com/feeds/feed.php?feed=http%3A%2F%2Fwww.feedforall.com%2Fsample.xml&count=5&fields=link%2Ctitle%2Ccomments;
[
    {
        &quot;link&quot;: &quot;http:\/\/www.feedforall.com\/restaurant.htm&quot;,
        &quot;title&quot;: &quot;RSS Solutions for Restaurants&quot;,
        &quot;comments&quot;: &quot;http:\/\/www.feedforall.com\/forum&quot;
    },
    {
        &quot;link&quot;: &quot;http:\/\/www.feedforall.com\/schools.htm&quot;,
        &quot;title&quot;: &quot;RSS Solutions for Schools and Colleges&quot;,
        &quot;comments&quot;: &quot;http:\/\/www.feedforall.com\/forum&quot;
    },
    {
        &quot;link&quot;: &quot;http:\/\/www.feedforall.com\/computer-service.htm&quot;,
        &quot;title&quot;: &quot;RSS Solutions for Computer Service Companies&quot;,
        &quot;comments&quot;: &quot;http:\/\/www.feedforall.com\/forum&quot;
    },
    {
        &quot;link&quot;: &quot;http:\/\/www.feedforall.com\/government.htm&quot;,
        &quot;title&quot;: &quot;RSS Solutions for Governments&quot;,
        &quot;comments&quot;: &quot;http:\/\/www.feedforall.com\/forum&quot;
    },
    {
        &quot;link&quot;: &quot;http:\/\/www.feedforall.com\/politics.htm&quot;,
        &quot;title&quot;: &quot;RSS Solutions for Politicians&quot;,
        &quot;comments&quot;: &quot;http:\/\/www.feedforall.com\/forum&quot;
    }
]</code></pre>

System Status


Cache

The database that powers this API is currently using 13.2 MB.

Expired items were last cleaned from the cache 18 hours ago. The database was last vacuumed 18 hours ago.

To ensure that the database is regularly cleaned, make sure the following line is in your cron tab (adjusting the frequency to your needs):

/15 * * * wget -q -O - https://assets.euractiv.com/feeds/clean-cache.php

Driver: sqlite | Clean cache now.


Configuration

Configuration is located in the file /config/feeds.php. If you do not have the ability to modify this file, contact your administrator to have options changed. The configuration file should contain an array with the name $config containing the following offsets:

NameTypeDescriptionCurrent Setting
full_content_feedsarrayThe feeds to convert into full content. Format is an array of arrays, where each entry contains the offsets:
  • handle (string): an internal identifier for the feed
  • url (string): the url to the feed
  • xpath (string): the XPath selector to use when scraping content from the linked items
  • feed_curl_options (array, optional): additional cURL options to use when fetching the feed, such as Digest authentication headers
  • feed_curl_option (array, optional): additional cURL options to use when fetching items linked in the feed
Login to view this setting
callmap_driverstringThe callmap driver to use, either "sqlite", "fsstore" or "noop"
noop
callmap_labelsarrayLabels for the callmap, where the key is the regular expression to match and the value is the label to useLogin to view this setting
callmap_visualize_defaultarrayNodes to visualize by defaultLogin to view this setting
callmap_probabilityintegerProbability that a given hit will log. Important for high traffic sites. Set higher for lower probability.
1000
cal_ttlintegerThe time, in seconds, to cache items before revalidating
3600
cal_driverstringThe cache driver to use, either "fs" or "sqlite" or "noop"
sqlite
http_allowed_domainsarrayAllowed domains for fetching assets or generating URLsLogin to view this setting
http_allowed_urlsarraySpecified URLs or regular expressions of specified URLs match for fetching assets or generating URLsLogin to view this setting
http_connect_timeoutintegerThe amount of time, in seconds, that cURL should wait before returning an error status. 15 is a good value for most cases.
15
api_keystringThe key to access protected API calls. This key should only be used in server-to-server communication to avoid exposing it to the broader internet. In requests, the key can be set in the query as the "api_key" value (eg http://example.com?api_key=keyboard_cat), in a POST request body as the "api_key" value, or as a header with the name "X-Vsac-Api-Key".Login to view this setting