API Best Practices Blog
Screencast: Creating your first API proxy »
The first in a series of how to get started using Apigee to proxy API traffic (looks best in HD).
Testing API latency and response time with Apigee »
There were some good comments last week on TechCrunch on the pros and cons of using a proxy for analytics and protection (or any operational or business policy) on your API.
Biggest concerns discussed were: latency, single-point-of-failure, and loss-of-control. All great points.
We wanted to talk about latency first. (and address the other two in a later post.)
A proxy definitely adds latency. Both for the additional server hop and processing time of the proxy software. So any proxy needs to minimize latency and add enough value (capability, time-to-market, etc.) to justify this extra hop.
Our conservative estimate for Apigee is to expect 200-400 ms of latency. This is mostly due to the extra hop and includes the 20-40 ms of latency due to Apigee's proxy 'think time.' (More detail our latency FAQ)
Your mileage might vary based on message size, the policies you are enforcing, and where you are hosted. For example, our estimates are based on a 5K message size. If you proxy Twitter with it's small messaages, your latency will likely be less, and if you are processing big message sizes (such as inserting ads into email), it will likely be higher.
Test it yourself
Soon we'll introduce a tool to test your Apigee proxy's latency during the proxy setup process. In the meantime, you can test this yourself with Apache Workbench (or cURL) by:
1. Set up your Apigee proxy (or feel free to use my Yahoo Local API proxy in the steps below)
2. Open up a terminal.
3. Run a *before* test - get the latency *without* apigee. Run this Apache Workbench command (for 10 test requests).
For this example, I'm using the Yahoo Local API's example API methods.
ab -n 10 http://local.yahooapis.com/LocalSearchService/V3/localSearch?appid=YahooDemo&query=pizza&zip=94708&results=2
(This is an apache workbench command where -n 10 specifies 10 iterations)
You should get a results set in this format (where the "10" was for running the test 10 times).

So you can see - just hitting Yahoo Local without a proxy I get a latency of 250ms for all 10 requests.
4. Next, I get the latency *with* Apigee using my Apigee proxy URL. (Feel free to use this URL yourself, don't worry, I rate limited it in Apigee)
ab -n 10 http://yahoo-local-1.apigee.com/LocalSearchService/V3/localSearch?appid=YahooDemo&query=pizza&zip=23662&results=2
In this case my results are:

In this case, my longest response with Apigee is 357ms.
5. Subtract (3) from (4) and there is your approximate latency for the proxy. Here the latency was roughly 357 ms - 250 ms = 107 ms for my 10 requests, on my verizon card outside Berkeley's Cafe Roma. (thanks to Yahoo Local's great API for the recommendation.)
Run this a couple times to make sure your responses are consistent, and also mixing up your API query parameters so you don't accidentally compare a cached vs non-cached response time. For example, I changed zip codes in my Yahoo Local requests.
Turn up the volume: API Scalability with Caching »
(Part 7 in our blog series: 'Is your API Naked?: 10 API Roadmap considerations".
So far our discussion of APIs has focused on aspects like security, visibility, and data protection. But how do you make your API scale?
"Scale" means different things to different people, so let's narrow it down to the question of what to do as your traffic increases? Do you have a plan to handle 10, 100, or 10,000 times more traffic than your API is receiving today?
The truth is that solving this problem at the high end can require fundamental changes to your architecture and code. The kind of engineering required to run an API that accepts a few requests per second is very different from what's required to scale to the size of Facebook or Twitter.
Writing about all the dimensions of scalability and how to achieve them is a subject for a pretty long book. But here are a few specific things to think about:
- Caching
- Rate limiting and threat protection
- Offloading expensive processing
In this entry, we'll focus on caching.
Caching
Caching is a huge part of any scaling strategy. Whole products and web sites are built around caching as a fundamental architectural concept. Caching works because even in an API, usually much of the data that's returned is read. A good caching strategy can decrease latency by huge percentages, and improves throughput by taking load off expensive back-end servers and databases.
Typically, caching today is done in a few different places. Caching between the application server and database using a product like Coherence or memcached helps by reducing the number of database queries needed to serve an API request. Additional caching inside the application server code can further decrease throughput by making it possible to reassemble parts of an API response from pre-cached parts.
For instance, the response to a typical Twitter API call consists of an arrangement of many individual snippets of XML, each of which may have been cached to reduce the overhead needed to fetch data from the database and convert it into XML.
HTTP and CDN caching
Caching at the HTTP level is also very common. The HTTP protocol supports several headers that enable caching by proxy servers and of course by the web browser itself. This type of caching works fine for API calls. However, since it's based only on the HTTP request, it doesn't work for everything. Imagine a SOAP API, for instance, which includes an HTTP POST body describing the request - HTTP-only proxies cannot cache the response because they can't look inside the request and see what needs to be cached.
Similarly, CDNs like Akamai work by caching pieces of content all over the Internet so that users will receive it from a server near them. Caching strategies like this are great for big files that don't change often, but by design they are poor at data that changes more than every day or so.
Caching API responses
With APIs, it's also possible to add caching in between, by caching entire API responses. For instance, imagine a "getAccount" method on an API. API management solutions like Apigee Enterprise can look at the parameters to the request and use them as a "key" to the cache. If a response already exists in the cache, Apigee Enterprise simply returns the exact response from the back-end server from its in-memory cache.
This cuts out all the latency to make an HTTP request to the application server, process the request, query the database, assemble the response, and so on, and unlike a cache that works only at the HTTP level, it can be configured to understand the semantics of the API so that it caches effectively and correctly. There are also ways to purge cache items programmatically, so that the "updateAccount" API method ensures that the next call to "getAccount" won't return stale data.
In other words, adding a intelligent caching layer between the client of an API and the back end application server adds more caching options that can increase scaling and decrease latency even further than some alternatives because this cache is closer to the API client. Plus, due to the magic of HTTP, XML and/or JSON, it's possible for this caching to be performed without any changes to the API client or server.
Summary: Up to 4 tiers of caching
This means that a highly scalable web site could use caching in different tiers, each with its own contribution to overall API performance:
- A cache like Coherence or memcached may be used between the application server and database to reduce database load.
- A similar cache may also be used from within the application server to reduce the overhead of assembling API responses.
- An API proxy like Apigee Enterprise can cache complete API responses, reducing load on the application server tier.
- A CDN like Akamai can provide an extra caching layer for large files that do not change often.
Up next: API Scalability through Traffic Management and Server offloading



