Loading Search...

API Best Practices Blog

HUGE: Running an API at Scale »

Thanks to all who participated in last week's strategy webinar - HUGE: Running an API at Scale.

And thanks to our speakers @sramji, @brianpagano, and @edanuff.

Here are the slides and video. We'd love more of your thoughts, insights, or questions on the api-craft forum.  

Huge: Running an API at Scale
View more presentations from Apigee

API Scalability: Cache large ‘chunks’ in the API response »

Highscalability.com has an interesting piece on caching as "secret sauce" for good web performance.

Facebook and Twitter are cache assembly lines -- every web page and API request is served up by many calls to various caches at different levels - assembling the final result from many different chunks. At this scale there is almost no other way to deliver reasonable performance.
 
For APIs - what's the largest chunk of all? The entire API response.

APIs lend themselves nicely to caching responses because it is often easy to identify the cache key. If you follow the REST pattern each URL maps uniquely to a resource that has a well-defined lifecycle.

If you're working on the next Twitter, you will need many layers of caching to deliver great performance, including the filesystem and database caches, and then you will add additional caching layers using products like memcached or Coherence.  But a final tier of caching for the API responses themselves can only help performance, so consider:

  • Can API analytics data show the most common or slowest API calls?
  • Are there slow API calls that return the same data over and over?
  • Can you design using a REST pattern to make it easy to identify individual cache items?

There are a number of ways to speed up calls that are caching candidates, such as memcached or something similar.

Another way is to use an API proxy. A proxy that is sitting in front of your API servers is an efficient and easy way to add an additional caching layer without making any changes to the server tier. We have helped a number of customers drastically improve the performance of their APIs by inserting such a caching tier without touching the clients or servers. 

And caching isn't just helpful if you are building an API. Applications that consume APIs, whether they are running on another web server or on a smartphone, can set up their own API proxy so that responses from many devices are cached in a central place, even if the APIs that the application is depending on can't be depended on themselves to return good performance.

(thanks to jules:stone for the photo!)

In the cloud, scale means concurrency »

In enterprise computing, scale has traditionally meant “lots of transactions per second."  On Wall Street for many years, “20,000 TPS” was the magic number as it was the rate of a typical market data feed.  Infrastructure like TIBCO’s UDP-based information bus and then IBM’s MQSeries became the base platforms for much of this scale of computing, and are still heavily used alongside modern JMS and MSMQ implementations.
 
Relatively little attention was paid to concurrent connections.  Enterprise environments tend to be well-regulated, and most applications will have under 1000 simultaneous users (whether human or machine driven).  As a result, application servers and related technologies evolved to support high transaction throughput at limited concurrency.
 
The web on the other hand brought in much higher concurrency requirements, and platforms like WebLogic became default components of web computing environments for sites serving 1,000s people at the same time.  This was a breakthrough and led to significant market success in a short time period.
 
With the rise of cloud computing, two things change.  First, mobile applications and the API economy are driving an order of magnitude increase in the number of simultaneous users.  Second, these users are often machines rather than people, and therefore aren’t limited to the demand patterns of humans users clicking links or refreshing their pages.
 
This produces a new set of demand patterns which increase both total throughput and peak concurrency.  As an example, travel sites like Kayak.com and Bing.com/travel issue hundreds of API requests to airline reservation system backends as a result of a single human-driven query.  Furthermore, these requests are being made not just by desktop or web applications but by mobile applications – especially iPhone applications.  As most people are aware, the next 10 billion devices that come online will be mobile devices (phones, MIDs, GPS, game units, media players).  Each of these is prized for its native application experiences.  Each of these devices will be making user-driven and automated calls to cloud services in order to deliver those experiences.
 
Where backend systems are not protected from this demand, they are being penalized in performance and load management.  This causes either outright outages, “web brownouts” where the core website that uses the same backend slows down, or erratic performance across both the web and cloud properties.  Again, mobile access exacerbates the issue due to the intermittent nature of mobile internet connectivity, which multiplies the number of connections that need to be set up and torn down as the device comes on and off the network.
 
So the explosion of concurrent usage is already beginning, as the traffic and backend impact is expanding.  To manage this and maintain stability of existing infrastructure, a new layer of infrastructure is emerging, much as HTTP load balancers have evolved to serve the needs of web computing.  What we’re seeing is the rise of cloud service controllers, a category of infrastructure that works well with existing systems and builds on top of the strengths of application servers, enterprise messaging systems, and application delivery controllers.

Cloud Security tech talk series: Security and Scalability »

Next in our series of tech talks on cloud security issues, Greg and Ryan Bagnulo, Security Architect for ASPECT-i discuss how scalability can change security requirements and how cloud computing offers new opportunities to fend off attacks on services including. 

  • security at high scale - how to preserve the resilency of the busines
  • cloud powered security - using elastic cloud resources at the edge to protect core services
  • protecting against bot attacks and spikes through security policy enforcement and caching

Check out this talk below, last week's video on PII and Audit compliance ,  and the full series here.

API Scalability, part 2 - caching, rate limits, and offloading »

(Following from Tuesday's blog entry on API Scalability and Caching.

Last time we wrote about 3 things to think about when planning how to scale your API.

  • Caching
  • Rate limiting and threat protection
  • Offloading expensive processing

and then talked about caching at length, so let's finish up with:

Rate Limiting and Threat Protection

Another aspect of scaling is just keeping unnecessary traffic away from your application servers and databases. Some of the techniques that we've discussed previously, such as rate limits and threat protection, apply here as well.

For instance, an API's performance can drop precipitously if a client, on purpose or by accident, sends too much traffic. A rate limit helps a lot here!

Bad requests can kill API performance too. XML threats, which we discussed in the last episode, are one example of a way that a bad request from a client can cause performance problems or even a crash on the server side. It's a lot easier to maintain scalability if you can stop these kinds of problems before they can hurt your servers.

Server Processing Offloading

Finally, consider the things that you can offload from your application server tier. The more you can offload to more efficient platforms, the less load your application servers have to handle. Plus, the more things you can offload, the simpler those application servers and their applications become, which means they're easier to manage and easier to scale.

For example:

SSL. Load balancers and ADCs like F5 and NetScaler products, not to mention web service proxies like Sonoa ServiceNet, can process SSL more efficiently than most application servers.

HTTP Connections. Those same products are highly optimized to handle tens of thousands of simultaneous connections from HTTP clients, and operate a smaller pool of connections to the back-end application servers. Offloading HTTP connection handling to another tier can free up a lot of server resources.

Authentication. If you perform authentication, a proxy like Sonoa ServiceNet can handle all your authentication for you, freeing your application servers to worry only about properly-authenticated requests. And if you're using SOAP, a product like ServiceNet can process many of the SOAP headers, such as WS-Security headers for authentication, then remove them so that the application server doesn't even need to see them.

Validation. If your API depends on XML input, it may run more efficiently if it only accepts valid XML requests. Turning on XML schema validation can hurt performance of most application servers - products like ServiceNet can do it more efficiently.

So to finishing up, key Questions to ask for your API scalability roadmap might include:

  • What kind of volume are you expecting?
  • Are you prepared if you get 10, 100, or 10,000 times that amount of volume with little warning?
  • Do you have a way to shut a user off if they consume too much volume?
  • Do you have a way to control API traffic in case you are unable to handle the volume (see Traffic Management)
  • Are your back end servers capable of handling tens of thousands of concurrent connections?
  • Are your back end services cacheable? Do you have a cache that you can use to reduce response times?
  • Are you monitoring response times and tracking them to gauge customer satisfaction?

(next time:  API user management and oboarding)

Turn up the volume: API Scalability with Caching »

(Part 7 in our blog series: 'Is your API Naked?: 10 API Roadmap considerations".

So far our discussion of APIs has focused on aspects like security, visibility, and data protection. But how do you make your API scale?

"Scale" means different things to different people, so let's narrow it down to the question of what to do as your traffic increases? Do you have a plan to handle 10, 100, or 10,000 times more traffic than your API is receiving today? 

The truth is that solving this problem at the high end can require fundamental changes to your architecture and code. The kind of engineering required to run an API that accepts a few requests per second is very different from what's required to scale to the size of Facebook or Twitter.  

Writing about all the dimensions of scalability and how to achieve them is a subject for a pretty long book. But here are a few specific things to think about:

  • Caching
  • Rate limiting and threat protection
  • Offloading expensive processing

In this entry, we'll focus on caching.

Caching

Caching is a huge part of any scaling strategy. Whole products and web sites are built around caching as a fundamental architectural concept. Caching works because even in an API, usually much of the data that's returned is read. A good caching strategy can decrease latency by huge percentages, and improves throughput by taking load off expensive back-end servers and databases.

Typically, caching today is done in a few different places. Caching between the application server and database using a product like Coherence or memcached helps by reducing the number of database queries needed to serve an API request. Additional caching inside the application server code can further decrease throughput by making it possible to reassemble parts of an API response from pre-cached parts.

For instance, the response to a typical Twitter API call consists of an arrangement of many individual snippets of XML, each of which may have been cached to reduce the overhead needed to fetch data from the database and convert it into XML.

HTTP and CDN caching

Caching at the HTTP level is also very common. The HTTP protocol supports several headers that enable caching by proxy servers and of course by the web browser itself. This type of caching works fine for API calls. However, since it's based only on the HTTP request, it doesn't work for everything. Imagine a SOAP API, for instance, which includes an HTTP POST body describing the request - HTTP-only proxies cannot cache the response because they can't look inside the request and see what needs to be cached.

Similarly, CDNs like Akamai work by caching pieces of content all over the Internet so that users will receive it from a server near them. Caching strategies like this are great for big files that don't change often, but by design they are poor at data that changes more than every day or so.

Caching API responses

With APIs, it's also possible to add caching in between, by caching entire API responses. For instance, imagine a "getAccount" method on an API.  API management solutions like Apigee Enterprise can look at the parameters to the request and use them as a "key" to the cache. If a response already exists in the cache, Apigee Enterprise simply returns the exact response from the back-end server from its in-memory cache.

This cuts out all the latency to make an HTTP request to the application server, process the request, query the database, assemble the response, and so on, and unlike a cache that works only at the HTTP level, it can be configured to understand the semantics of the API so that it caches effectively and correctly. There are also ways to purge cache items programmatically, so that the "updateAccount" API method ensures that the next call to "getAccount" won't return stale data.

In other words, adding a intelligent caching layer between the client of an API and the back end application server adds more caching options that can increase scaling and decrease latency even further than some alternatives because this cache is closer to the API client. Plus, due to the magic of HTTP, XML and/or JSON, it's possible for this caching to be performed without any changes to the API client or server.

Summary: Up to 4 tiers of caching

This means that a highly scalable web site could use caching in different tiers, each with its own contribution to overall API performance:

  • A cache like Coherence or memcached may be used between the application server and database to reduce database load.
  • A similar cache may also be used from within the application server to reduce the overhead of assembling API responses.
  • An API proxy like Apigee Enterprise can cache complete API responses, reducing load on the application server tier.
  • A CDN like Akamai can provide an extra caching layer for large files that do not change often.

Up next: API Scalability through Traffic Management and Server offloading