API Best Practices Blog
HUGE: Running an API at Scale »
Thanks to all who participated in last week's strategy webinar - HUGE: Running an API at Scale.
And thanks to our speakers @sramji, @brianpagano, and @edanuff.
Here are the slides and video. We'd love more of your thoughts, insights, or questions on the api-craft forum.
API Scalability: Cache large ‘chunks’ in the API response »
Highscalability.com has an interesting piece on caching as "secret sauce" for good web performance.
Facebook and Twitter are cache assembly lines -- every web page and API request is served up by many calls to various caches at different levels - assembling the final result from many different chunks. At this scale there is almost no other way to deliver reasonable performance.
For APIs - what's the largest chunk of all? The entire API response.
APIs lend themselves nicely to caching responses because it is often easy to identify the cache key. If you follow the REST pattern each URL maps uniquely to a resource that has a well-defined lifecycle.
If you're working on the next Twitter, you will need many layers of caching to deliver great performance, including the filesystem and database caches, and then you will add additional caching layers using products like memcached or Coherence. But a final tier of caching for the API responses themselves can only help performance, so consider:
- Can API analytics data show the most common or slowest API calls?
- Are there slow API calls that return the same data over and over?
- Can you design using a REST pattern to make it easy to identify individual cache items?
There are a number of ways to speed up calls that are caching candidates, such as memcached or something similar.
Another way is to use an API proxy. A proxy that is sitting in front of your API servers is an efficient and easy way to add an additional caching layer without making any changes to the server tier. We have helped a number of customers drastically improve the performance of their APIs by inserting such a caching tier without touching the clients or servers.
And caching isn't just helpful if you are building an API. Applications that consume APIs, whether they are running on another web server or on a smartphone, can set up their own API proxy so that responses from many devices are cached in a central place, even if the APIs that the application is depending on can't be depended on themselves to return good performance.
(thanks to jules:stone for the photo!)
Speeds and Feeds: RSS feed management, validation, and performance »
We use this blog to talk about issues we see around securing, managing and scaling APIs and web services. We also see many of these same issues and requirements with feeds. Arguably, feeds - specifically RSS and Atom feeds - might just be the most common type of XML API.
Feeds are are growing beyond just being a great way to keep people abreast of changes to news or a blog and becoming a great way to aggregate or syndicate content to partners or customers or applications in general.
Our media customers, like MTV Networks, use RSS feeds management as a way to distribute updates about their content to customers and partners. Essentially, the RSS feed becomes a “catalog” of what they’re pushing out to their web site partners right now.
The needs of feeds: customization, validation, and caching
In general, everything we said about API analytics, API traffic management and API scalability can apply to feeds that you offer or consume as well. Do you know who is looking at your feeds, how the traffic changes over time, and what kind of response time your feeds are delivering? Do you have any way to limit access to your feed if it becomes tremendously popular? And while most feeds associated with blogs, for instance, have no access control so that they are open to the whole Internet, there are also feeds that need authentication and authorization just like any web service.
But aside from these other “web services” types of requirements, feeds have some special requirements that come up all the time – customization, validation, and caching.
We see feed validation issues frequently. Feeds are more likely to have risks with broken links, especially if you are aggregating frequently updated content from around the web. Feeds often don’t match the proper XML schema or aren't valud XML, as often they are generated or manually assembled. Sometimes the feed provider needs to address these issues, and other times the feed consumer has to find a way to deal with the bad input.
Customization is another issue. The basic feed on your blog is pretty much the same for everyone – but for others that is not the case. Lots of companies use feeds to push content to their partners, and each partner may demand a different format. Sometimes a partner needs only certain items, or a certain number of items, or they need the feed transformed into different formats like RSS, MRSS, or Atom. It’s not unusual for a feed provider to have to provide 10 or 20 (or more) custom variations of the same feed if content syndication is important to the business. You may want to drive these with existing profiles or business rules in SQL or LDAP.
Caching is the last, but certainly not the least. Sometimes feeds are just static HTML files served by a web server, but sometimes they’re not. And when they’re not, feeds can be slow. We’ve seen some infrastructure that takes over two seconds to produce a feed – that’s a long time. When that happens, a little caching can go a long way.
So think about feed management when considering API management or content syndication. If you want to experiment with API management on on a feed, Apigee is a good way to get a feel for what stats on feed usage, latency, data volumes, and error rates might offer.
API Scalability, part 2 - caching, rate limits, and offloading »
(Following from Tuesday's blog entry on API Scalability and Caching.
Last time we wrote about 3 things to think about when planning how to scale your API.
- Caching
- Rate limiting and threat protection
- Offloading expensive processing
and then talked about caching at length, so let's finish up with:
Rate Limiting and Threat Protection
Another aspect of scaling is just keeping unnecessary traffic away from your application servers and databases. Some of the techniques that we've discussed previously, such as rate limits and threat protection, apply here as well.
For instance, an API's performance can drop precipitously if a client, on purpose or by accident, sends too much traffic. A rate limit helps a lot here!
Bad requests can kill API performance too. XML threats, which we discussed in the last episode, are one example of a way that a bad request from a client can cause performance problems or even a crash on the server side. It's a lot easier to maintain scalability if you can stop these kinds of problems before they can hurt your servers.
Server Processing Offloading
Finally, consider the things that you can offload from your application server tier. The more you can offload to more efficient platforms, the less load your application servers have to handle. Plus, the more things you can offload, the simpler those application servers and their applications become, which means they're easier to manage and easier to scale.
For example:
SSL. Load balancers and ADCs like F5 and NetScaler products, not to mention web service proxies like Sonoa ServiceNet, can process SSL more efficiently than most application servers.
HTTP Connections. Those same products are highly optimized to handle tens of thousands of simultaneous connections from HTTP clients, and operate a smaller pool of connections to the back-end application servers. Offloading HTTP connection handling to another tier can free up a lot of server resources.
Authentication. If you perform authentication, a proxy like Sonoa ServiceNet can handle all your authentication for you, freeing your application servers to worry only about properly-authenticated requests. And if you're using SOAP, a product like ServiceNet can process many of the SOAP headers, such as WS-Security headers for authentication, then remove them so that the application server doesn't even need to see them.
Validation. If your API depends on XML input, it may run more efficiently if it only accepts valid XML requests. Turning on XML schema validation can hurt performance of most application servers - products like ServiceNet can do it more efficiently.
So to finishing up, key Questions to ask for your API scalability roadmap might include:
- What kind of volume are you expecting?
- Are you prepared if you get 10, 100, or 10,000 times that amount of volume with little warning?
- Do you have a way to shut a user off if they consume too much volume?
- Do you have a way to control API traffic in case you are unable to handle the volume (see Traffic Management)
- Are your back end servers capable of handling tens of thousands of concurrent connections?
- Are your back end services cacheable? Do you have a cache that you can use to reduce response times?
- Are you monitoring response times and tracking them to gauge customer satisfaction?
(next time: API user management and oboarding)
Turn up the volume: API Scalability with Caching »
(Part 7 in our blog series: 'Is your API Naked?: 10 API Roadmap considerations".
So far our discussion of APIs has focused on aspects like security, visibility, and data protection. But how do you make your API scale?
"Scale" means different things to different people, so let's narrow it down to the question of what to do as your traffic increases? Do you have a plan to handle 10, 100, or 10,000 times more traffic than your API is receiving today?
The truth is that solving this problem at the high end can require fundamental changes to your architecture and code. The kind of engineering required to run an API that accepts a few requests per second is very different from what's required to scale to the size of Facebook or Twitter.
Writing about all the dimensions of scalability and how to achieve them is a subject for a pretty long book. But here are a few specific things to think about:
- Caching
- Rate limiting and threat protection
- Offloading expensive processing
In this entry, we'll focus on caching.
Caching
Caching is a huge part of any scaling strategy. Whole products and web sites are built around caching as a fundamental architectural concept. Caching works because even in an API, usually much of the data that's returned is read. A good caching strategy can decrease latency by huge percentages, and improves throughput by taking load off expensive back-end servers and databases.
Typically, caching today is done in a few different places. Caching between the application server and database using a product like Coherence or memcached helps by reducing the number of database queries needed to serve an API request. Additional caching inside the application server code can further decrease throughput by making it possible to reassemble parts of an API response from pre-cached parts.
For instance, the response to a typical Twitter API call consists of an arrangement of many individual snippets of XML, each of which may have been cached to reduce the overhead needed to fetch data from the database and convert it into XML.
HTTP and CDN caching
Caching at the HTTP level is also very common. The HTTP protocol supports several headers that enable caching by proxy servers and of course by the web browser itself. This type of caching works fine for API calls. However, since it's based only on the HTTP request, it doesn't work for everything. Imagine a SOAP API, for instance, which includes an HTTP POST body describing the request - HTTP-only proxies cannot cache the response because they can't look inside the request and see what needs to be cached.
Similarly, CDNs like Akamai work by caching pieces of content all over the Internet so that users will receive it from a server near them. Caching strategies like this are great for big files that don't change often, but by design they are poor at data that changes more than every day or so.
Caching API responses
With APIs, it's also possible to add caching in between, by caching entire API responses. For instance, imagine a "getAccount" method on an API. API management solutions like Apigee Enterprise can look at the parameters to the request and use them as a "key" to the cache. If a response already exists in the cache, Apigee Enterprise simply returns the exact response from the back-end server from its in-memory cache.
This cuts out all the latency to make an HTTP request to the application server, process the request, query the database, assemble the response, and so on, and unlike a cache that works only at the HTTP level, it can be configured to understand the semantics of the API so that it caches effectively and correctly. There are also ways to purge cache items programmatically, so that the "updateAccount" API method ensures that the next call to "getAccount" won't return stale data.
In other words, adding a intelligent caching layer between the client of an API and the back end application server adds more caching options that can increase scaling and decrease latency even further than some alternatives because this cache is closer to the API client. Plus, due to the magic of HTTP, XML and/or JSON, it's possible for this caching to be performed without any changes to the API client or server.
Summary: Up to 4 tiers of caching
This means that a highly scalable web site could use caching in different tiers, each with its own contribution to overall API performance:
- A cache like Coherence or memcached may be used between the application server and database to reduce database load.
- A similar cache may also be used from within the application server to reduce the overhead of assembling API responses.
- An API proxy like Apigee Enterprise can cache complete API responses, reducing load on the application server tier.
- A CDN like Akamai can provide an extra caching layer for large files that do not change often.
Up next: API Scalability through Traffic Management and Server offloading
How is cloud computing related to SOA? Case study on API Policy and Governance Patterns »
Last week, Scott Metzger of Truecredit.com gave a great case study presentation on how they opened their internal SOA as APIs for partners at the Burton Group Catalyst conference. Specifically, the different policy and governance patterns.
Scott talks about the factors driving them to identify and implement a separate application agnostic layer for 5 major policy patterns including service access, routing, caching, transformations, and operations. (And more details of their implementation in this video)
Scott Metzger of TrueCredit Catalyst Presentation
TrueCredit.com API case study »
Scott Metzger, CTO of TrueCredit.com was kind enough to take some time to talk about their Consumer Connect API program and some of the technical challenges that they have addressed using Apigee's API Gateway.
Scott wanted to make life easier on his development team as they ramped up their number of APIs, partners and traffic volumes. Here, he describes how he uses the technology as a 'policy layer' to provide API analytics, fine-grained data protection, and caching in an API Gateway. In this case, Apigee Enterprise is deployed on-premise virtualized software.
We're very excited to be working with Scott and TrueCredit, and check out the full TrueCredit Case study.



