How to create a Twitter API proxy using nginx in Cloud Foundry

In this post, I’ll describe how to run nginx in Cloud Foundry to provide a Twitter API proxy that includes authentication and caching.

First, I want to talk a bit about why I wanted this, but if you don’t care about any of that, you can just skip to the code at the end of the post. 🙂

I’ve wanted for a while to enable projects in Machine Learning for Kids that use tweets. Using live tweets is a great way to make text analytics real for students, and a good example of how natural language processing is used in the real world.

The question was how to enable this from Scratch in a way that would be easy to use by schools.

The title of this post gives away the answer I ended up with, but I’ll describe why.

As the Twitter API doesn’t like CORS requests, and Scratch is a web application, this means I wouldn’t be able to make requests directly to Twitter from the Scratch extension.

This introduced requirement 1: I needed a proxy that could receive API requests from the Scratch extension and forward them to the Twitter API.

The Twitter API requires authentication.

I could ask students to provide their own log on, but it’d be difficult to do that in a simple way. (And I’d feel uncomfortable asking under 16’s to create an account with Twitter just to complete a coding lesson. Even if Twitter allows accounts for young people over 13).

I tried building a mechanism based on asking teachers to provide a log on for use by their class. It’s technically possible. However, it has been very challenging in getting teachers and code club leaders to create the Watson API keys they need for their class as it is, so I have a good idea of how difficult it would be to get teachers to create developer accounts on Twitter. As such, I’d rather avoid this approach for now. (But if my current approach doesn’t work out, this will be the last resort fall-back).

This introduced requirement 2: I needed the proxy to use my Twitter credentials for all API requests without exposing them to clients.

The free Twitter API rate limits aggressively – I’ll only get 450 API requests every 15 minutes. I can’t afford to pay to get premium access to the Twitter API. And there are currently tens of thousands of students using Machine Learning for Kids, so I could very quickly burn through that limit.

This introduced requirement 3: I needed the proxy to cache API responses from Twitter to reduce the number of API calls that it makes.

I’ve written before about where Machine Learning for Kids is running. But to recap, it’s running as a set of Cloud Foundry applications running in IBM Cloud.

Combine this with the fact that I’m inherently lazy and prefer to avoid reinventing wheels where possible. This all introduced sort-of-requirement 4: I need a proxy that would be easy to run in Cloud Foundry with minimal code needed.

Before I share how I’ve done it, it’s worth pointing out that I’m obviously not the first person to do something like this.

There is a Twitter extension for ScratchX that is using a proxy running in Heroku. It’s also blocking cross-origin requests so I couldn’t use it from machinelearningforkids.co.uk and it only lets you fetch a single tweet, which wouldn’t work for the sort of projects I want to enable. I don’t know how it’s implemented, but I’d guess that it isn’t too far from what I need. I’m not sure if it’s still being maintained – I did try emailing the developer but didn’t get a reply.

This blog post by Dave Hall describes how to use nginx for a Twitter API proxy, and got me 90% of the way to what I needed. What I’ve ended up with was his blog post, tweaked to run in Cloud Foundry and with the caching turned up to 11. So a huge thanks to him for sharing it.

Step 1 – Create a developer account with Twitter

Go to https://developer.twitter.com/ and follow the instructions

Create an application and copy the consumer key and consumer secret.

Step 2 – Create an access token

This script in Dave’s blog post makes that easy:

$ export CONSUMER_KEY=XXXXXXXXXXXXXXXXXXXXX
$ export CONSUMER_SECRET=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
$ curl -H "Authorization: Basic `echo -ne "$CONSUMER_KEY:$CONSUMER_SECRET" | base64`" -d "grant_type=client_credentials" https://api.twitter.com/oauth2/token

Grab the access_token from the response

Step 3 – Create a manifest.yml file

You’ll need to replace your-proxy-name, your-host.com and the-access-token-from-step-2.

manifest.yml

applications:
- name: your-proxy-name
  instances: 1
  memory: 256M
  disk_quota: 256M
  path: .
  routes:
    - route: https://your-host.com/proxies/twitter
  buildpack: https://github.com/cloudfoundry/nginx-buildpack.git
  env:
    TWITTER_BEARER_TOKEN: the-access-token-from-step-2

Step 4 – Create an nginx.conf file

nginx.conf

# write errors to stderr where Cloud Foundry can grab them
error_log stderr;

# leave as default for now
events { worker_connections 1024; }

http {
  # Defines a 200 megabyte space for the API cache
  proxy_cache_path  {{env "HOME"}}/api_cache_space levels=1:2 keys_zone=twitter_api_proxy:200m;

  server {
    # get the port number from Cloud Foundry
    listen {{port}};

    # defines the Twitter proxy
    location /proxies/twitter/ {

      # Cloud Foundry's access log is good enough, so save a little
      # disk space by not asking nginx to create another
      access_log off;

      # Use the 200m cache space defined above
      proxy_cache twitter_api_proxy;

      # Cache successful API requests for 15 minutes
      #  as the aim is to avoid sending the same request to Twitter
      #  more than once within a rate-limiting request window
      proxy_cache_valid 200 302 404 15m;

      # Use the cache even after 15 minutes if we get API errors
      proxy_cache_use_stale error updating timeout;

      # Ignore and strip the cache headers set by the Twitter API
      proxy_ignore_headers X-Accel-Expires Expires Cache-Control Set-Cookie;
      proxy_hide_header X-Accel-Expires;
      proxy_hide_header Expires;
      proxy_hide_header Cache-Control;
      proxy_hide_header pragma;
      proxy_hide_header set-cookie;

      # Tells the client to cache this for 15 minutes
      expires 15m;

      # Set the correct host name to connect to the Twitter API.
      proxy_set_header Host api.twitter.com;

      # Get the auth header from manifest.yml
      proxy_set_header Authorization "Bearer {{env "TWITTER_BEARER_TOKEN"}}";

      # Location of the Twitter API
      #  (The trailing slash is important for the URL rewriting - don't remove it)
      proxy_pass https://api.twitter.com/;

      # Add a header to the response that tells us if it came from the cache or not
      add_header X-Cache-Status $upstream_cache_status;
    }

  }
}

Step 5 – Deploy

That’s it.

All that’s left is to deploy it.

$ cf push

Tags: , ,

Comments are closed.