Thoughts on API Best Practices API Management and Infrastructure Blog

Turn up the volume: API Scalability with Caching

(Part 7 in our blog series: 'Is your API Naked?: 10 API Roadmap considerations".

So far our discussion of APIs has focused on aspects like security, visibility, and data protection. But how do you make your API scale?

"Scale" means different things to different people, so let's narrow it down to the question of what to do as your traffic increases? Do you have a plan to handle 10, 100, or 10,000 times more traffic than your API is receiving today? 

The truth is that solving this problem at the high end can require fundamental changes to your architecture and code. The kind of engineering required to run an API that accepts a few requests per second is very different from what's required to scale to the size of Facebook or Twitter.  

Writing about all the dimensions of scalability and how to achieve them is a subject for a pretty long book. But here are a few specific things to think about:

  • Caching
  • Rate limiting and threat protection
  • Offloading expensive processing

In this entry, we'll focus on caching.

Caching

Caching is a huge part of any scaling strategy. Whole products and web sites are built around caching as a fundamental architectural concept. Caching works because even in an API, usually much of the data that's returned is read. A good caching strategy can decrease latency by huge percentages, and improves throughput by taking load off expensive back-end servers and databases.

Typically, caching today is done in a few different places. Caching between the application server and database using a product like Coherence or memcached helps by reducing the number of database queries needed to serve an API request. Additional caching inside the application server code can further decrease throughput by making it possible to reassemble parts of an API response from pre-cached parts.

For instance, the response to a typical Twitter API call consists of an arrangement of many individual snippets of XML, each of which may have been cached to reduce the overhead needed to fetch data from the database and convert it into XML.

HTTP and CDN caching

Caching at the HTTP level is also very common. The HTTP protocol supports several headers that enable caching by proxy servers and of course by the web browser itself. This type of caching works fine for API calls. However, since it's based only on the HTTP request, it doesn't work for everything. Imagine a SOAP API, for instance, which includes an HTTP POST body describing the request - HTTP-only proxies cannot cache the response because they can't look inside the request and see what needs to be cached.

Similarly, CDNs like Akamai work by caching pieces of content all over the Internet so that users will receive it from a server near them. Caching strategies like this are great for big files that don't change often, but by design they are poor at data that changes more than every day or so.

Caching API responses

With APIs, it's also possible to add caching in between, by caching entire API responses. For instance, imagine a "getAccount" method on an API.  API management solutions like Sonoa ServiceNet can look at the parameters to the request and use them as a "key" to the cache. If a response already exists in the cache, ServiceNet simply returns the exact response from the back-end server from its in-memory cache.

This cuts out all the latency to make an HTTP request to the application server, process the request, query the database, assemble the response, and so on, and unlike a cache that works only at the HTTP level, it can be configured to understand the semantics of the API so that it caches effectively and correctly. There are also ways to purge cache items programmatically, so that the "updateAccount" API method ensures that the next call to "getAccount" won't return stale data.

In other words, adding a intelligent caching layer between the client of an API and the back end application server adds more caching options that can increase scaling and decrease latency even further than some alternatives because this cache is closer to the API client. Plus, due to the magic of HTTP, XML and/or JSON, it's possible for this caching to be performed without any changes to the API client or server.

Summary: Up to 4 tiers of caching

This means that a highly scalable web site could use caching in different tiers, each with its own contribution to overall API performance:

  • A cache like Coherence or memcached may be used between the application server and database to reduce database load.
  • A similar cache may also be used from within the application server to reduce the overhead of assembling API responses.
  • An API proxy like Sonoa ServiceNet can cache complete API responses, reducing load on the application server tier.
  • A CDN like Akamai can provide an extra caching layer for large files that do not change often.

Up next: API Scalability through Traffic Management and Server offloading

COMMENTS (0)  |  Add a comment

*required ADD A COMMENT



Please enter the characters you see below