Thoughts on API Best Practices API Management and Infrastructure Blog

Mobile patterns are different from Web patterns

Mobile application patterns are different from web application patterns.  There are a consistent, discrete set of differences in how they access cloud services.  There are consistent reasons why they’re favored over websites as well, primarily based on implicit intent and purposive computing experience, but that’s a subject for a future blog entry. 

For now, let’s assume that like web applications, mobile applications use HTTP to access their services, but unlike old-school web applications, they use REST and SOAP as the basis of their service protocols.

Difference 1: Bandwidth is expensive

Bandwidth always costs two unique things in mobile applications – time and battery.  Jeffrey Sharkey has a great talk about battery usage and good citizenship.  In some areas and for some users, bandwidth also hits their data plan, which makes bandwidth cost real money as well 

Difference 2: Bandwidth is inconsistent

Disconnections are part of everyday life when using the mobile internet to access websites or cloud services. When your local cell area becomes overloaded with requests or your service loses track of where you are between towers, when you are in even a momentary cell shadow, your connection is gone.  If this is in the middle of a data connection, that connection is reset and has to start over.

Difference 3: Local processing matters

First, non-trivial requests for data from cloud services often results in large datasets being returned to the device.  These chunks can not only be hard to process, but may be more information than the user will bother to access.  A request that returns hundreds of row-equivalents worth of responses may be mostly wasted processing if the user is only going to glance at the first few displayed screens’ worth.

Second, local applications have differentiated access to devices – awareness of onboard camera, location, or other services.  They also have differentiated preferences about data.  For example, the iPhone operating system is fluent in XML processing, and many iPhone applications transmit XML dialects to their cloud services.  However, XML is more expensive to the iPhone than PLISTs (a JSON-like simple data format) – roughly 4-5 times more expensive in compute cycles.  Other mobile devices have their own variations based on operating system version and device services.

Difference 4: Welcome to the hit-driven app economy

Media has been a hit-driven economy for decades, with winners and losers being made and broken overnight based on the wisdom or madness of crowds.  With the fantastic potential of mobile applications and the inconsistent actual experiences, collaborative filtering and editorial selection are producing a hit-driven “Top 25” economy for mobile applications – the equivalent of being “above the fold” in a website.  It’s a steady climb to get in this elite area, but once your application gets to this point you might as well have been written up by Walt Mossberg or Slashdot – traffic, downloads, and usage of the cloud services backing your app will all surge dramatically.

Difference 5: Concurrent usage by millions of nomadic users

Just combining Difference 2 (inconsistent bandwidth) with the fact that most mobile connections use HTTP 1.0 means that many more connections are being made, dropped and suffer an expensive reset not just of application state but of the HTTP connection itself.  Adding Difference 4 (the hit-driven app economy) to this means that concurrency – including “shadow concurrency”, the load of the dropped and restarted connections – has an even bigger role in mobile applications than traditional web computing.

Solving for mobile application patterns in cloud computing

There are probably ways to solve each of these problems individually.  What we’ve seen with some of our key customers is that all of these can be solved by applying a cloud service controller to manage the connections between mobile applications and their cloud services.

With a cloud service controller in the middle of the application and cloud service interactions, they’ve done the following things:

  • Compressed service request and response data by 6-10x
  • Accelerated service response time 5-7x through intelligent caching
  • Carved large service responses into chunks likely to be used by the mobile user
  • Translated service responses into formats easily  processed by the mobile device
  • Reduced total network airtime usage by 15-20x
  • Reduced battery drain on the mobile device
  • Reduced dropped connection experiences for the mobile application
  • Scaled caching and response capacity dynamically to match growth and spikes in usage


In one example, a customer took the network request/response time from 17 seconds to 1 second, and took local processing on the mobile device from 17 seconds to one second.  This reduced total application response time from 34 seconds to 2 seconds – an acceptable, even exciting level of responsiveness for that application’s users.  This was all achieved in a few weeks without rewriting either the mobile application or the cloud service upon which the application depended.

They did this by taking our core product (ServiceNet), writing policies that let it route, cache, accelerate, paginate, and format their cloud services.  Since ServiceNet is available as an .ami (Amazon EC2’s virtual machine format)  we've deployed it as a cloud service that expands and contracts its use of computing resources to match the load.  This way they haven’t been caught unprepared when their app made the top 25 list on the iPhone App Store and their legions of new users had the same responsive application experience as the users who popularized it in the first place.  Finally, future devices and application platforms will be easier to support from a single cloud service through construction of new formatting or pagination policies that match the needs of those device platforms.

Mobile acceleration may seem like a standalone thing for Sonoa, but really it’s an example of using policy to solve an application pattern challenge.  More on that – and policy-oriented programming – in a future blog entry.  There are many more domains of use for this approach in cloud computing.

Here's a video we posted today describing our Mobile App Acceleration service.

In the cloud, scale means concurrency

In enterprise computing, scale has traditionally meant “lots of transactions per second."  On Wall Street for many years, “20,000 TPS” was the magic number as it was the rate of a typical market data feed.  Infrastructure like TIBCO’s UDP-based information bus and then IBM’s MQSeries became the base platforms for much of this scale of computing, and are still heavily used alongside modern JMS and MSMQ implementations.
 
Relatively little attention was paid to concurrent connections.  Enterprise environments tend to be well-regulated, and most applications will have under 1000 simultaneous users (whether human or machine driven).  As a result, application servers and related technologies evolved to support high transaction throughput at limited concurrency.
 
The web on the other hand brought in much higher concurrency requirements, and platforms like WebLogic became default components of web computing environments for sites serving 1,000s people at the same time.  This was a breakthrough and led to significant market success in a short time period.
 
With the rise of cloud computing, two things change.  First, mobile applications and the API economy are driving an order of magnitude increase in the number of simultaneous users.  Second, these users are often machines rather than people, and therefore aren’t limited to the demand patterns of humans users clicking links or refreshing their pages.
 
This produces a new set of demand patterns which increase both total throughput and peak concurrency.  As an example, travel sites like Kayak.com and Bing.com/travel issue hundreds of API requests to airline reservation system backends as a result of a single human-driven query.  Furthermore, these requests are being made not just by desktop or web applications but by mobile applications – especially iPhone applications.  As most people are aware, the next 10 billion devices that come online will be mobile devices (phones, MIDs, GPS, game units, media players).  Each of these is prized for its native application experiences.  Each of these devices will be making user-driven and automated calls to cloud services in order to deliver those experiences.
 
Where backend systems are not protected from this demand, they are being penalized in performance and load management.  This causes either outright outages, “web brownouts” where the core website that uses the same backend slows down, or erratic performance across both the web and cloud properties.  Again, mobile access exacerbates the issue due to the intermittent nature of mobile internet connectivity, which multiplies the number of connections that need to be set up and torn down as the device comes on and off the network.
 
So the explosion of concurrent usage is already beginning, as the traffic and backend impact is expanding.  To manage this and maintain stability of existing infrastructure, a new layer of infrastructure is emerging, much as HTTP load balancers have evolved to serve the needs of web computing.  What we’re seeing is the rise of cloud service controllers, a category of infrastructure that works well with existing systems and builds on top of the strengths of application servers, enterprise messaging systems, and application delivery controllers.

API Scalability, part 2 - caching, rate limits, and offloading

(Following from Tuesday's blog entry on API Scalability and Caching.

Last time we wrote about 3 things to think about when planning how to scale your API.

  • Caching
  • Rate limiting and threat protection
  • Offloading expensive processing

and then talked about caching at length, so let's finish up with:

Rate Limiting and Threat Protection

Another aspect of scaling is just keeping unnecessary traffic away from your application servers and databases. Some of the techniques that we've discussed previously, such as rate limits and threat protection, apply here as well.

For instance, an API's performance can drop precipitously if a client, on purpose or by accident, sends too much traffic. A rate limit helps a lot here!

Bad requests can kill API performance too. XML threats, which we discussed in the last episode, are one example of a way that a bad request from a client can cause performance problems or even a crash on the server side. It's a lot easier to maintain scalability if you can stop these kinds of problems before they can hurt your servers.

Server Processing Offloading

Finally, consider the things that you can offload from your application server tier. The more you can offload to more efficient platforms, the less load your application servers have to handle. Plus, the more things you can offload, the simpler those application servers and their applications become, which means they're easier to manage and easier to scale.

For example:

SSL. Load balancers and ADCs like F5 and NetScaler products, not to mention web service proxies like Sonoa ServiceNet, can process SSL more efficiently than most application servers.

HTTP Connections. Those same products are highly optimized to handle tens of thousands of simultaneous connections from HTTP clients, and operate a smaller pool of connections to the back-end application servers. Offloading HTTP connection handling to another tier can free up a lot of server resources.

Authentication. If you perform authentication, a proxy like Sonoa ServiceNet can handle all your authentication for you, freeing your application servers to worry only about properly-authenticated requests. And if you're using SOAP, a product like ServiceNet can process many of the SOAP headers, such as WS-Security headers for authentication, then remove them so that the application server doesn't even need to see them.

Validation. If your API depends on XML input, it may run more efficiently if it only accepts valid XML requests. Turning on XML schema validation can hurt performance of most application servers - products like ServiceNet can do it more efficiently.

So to finishing up, key Questions to ask for your API scalability roadmap might include:

  • What kind of volume are you expecting?
  • Are you prepared if you get 10, 100, or 10,000 times that amount of volume with little warning?
  • Do you have a way to shut a user off if they consume too much volume?
  • Do you have a way to control API traffic in case you are unable to handle the volume (see Traffic Management)
  • Are your back end servers capable of handling tens of thousands of concurrent connections?
  • Are your back end services cacheable? Do you have a cache that you can use to reduce response times?
  • Are you monitoring response times and tracking them to gauge customer satisfaction?

(next time:  API user management and oboarding)