Thoughts on API Best Practices API Management and Infrastructure Blog

Darwin’s Finches, 20th Century Business, and APIs: Evolve Your Business Model

What do APIs have in comon with Darwin's observations on evolution, the 20th century garment district, and the Kobayashi Maru?

Sam Ramji makes the case for APIs in his much written about web 2.0 talk  - watch and listen to the full talk or just flip through the slides - both below.

 

Don’t try to market to developers.  Instead, solve their problems.

Not long ago you could count the number of 'developer marketing' programs on one hand.  Now there are hundreds of programs as Web companies and enterprises open APIs.   These companies know that developer adoption will make their API strategy succeed or fail.

But Developer Marketing is an oxymoron.  Developers hate marketing.   

You cannot drive adoption by 'marketing to developers.'  Sure, you can send offers to your developers but your mileage may vary.

A better formula - understand what's important to developers and give them what they need to reach these goals. Developers want to:

  •  build new skills that lead to the best projects and jobs.  This is why new or proprietary tools and programming models are tough to get off the ground - it's a small market of new projects for the developer.
  •  increase their productivity.  With good tools and by connecting developers with decent resources and each other for help.  This is why sites like StackOverflow take off. 
  •  be recognized for good work and see their products used.  Focus on showcasing their work, not your product.  It's not about you.
  •  get paid.  Think App Store model, or affilate marketing networks.

Talk to the folks that made the big developer networks sucessful and you'll hear these points over and over. Some others:

  • Developers are not buyers, but are very strong influencers.  There are superstars in the developer world - make them fans and that is the best marketing you'll ever get.
  • You can't 'own' or 'use' developers because they have an account on your service.  Developers have lots of options and switching costs might be low from your API.
  • Act on their feedback.  Developers are smart and listening and acting on their complaints and ideas is critical to your credibility.
  • Developer communities are fragmented.  For example, there is no such thing as an "API developer', but instead there are Twitter or Facebook or Salesforce developers.

Once you have attracted a developer to use your service - they are like gold.   So treat them with respect - don't try to 'use' developers or you might lose them!

Building Apigee for Multiple Clouds:  10 Cloud Portability Lessons Learned

This is a repost of a piece I prepared this for Shlomo Swidler's panel "Writing Code for Many Clouds" at CloudConnect 2010 and also posted on my own blog earlier this year.   It's a long post, so later I created this short screenr of a cliffs notes version below.

At Sonoa, we have an enterprise product which we turned into a service called Apigee.  From the first, we needed to move beyond just being packaged as a VM and “deployable anywhere” to really living in the cloud.

This is what we’ve learned so far – some of which we anticipated and some of which we reacted to.  Build, deploy, and manage – of the three basic parts of running a service only deploy and manage really change.  The big difference is in operationalization of the system.

Most recently we realized that we needed to be HA across providers and get total control of our latency, so we are building a new datacenter on Rackspace as well.  This is a work in progress so I’ll be reporting from the front lines.

Finally, we’ve helped implement a multi-cloud architecture for ING which has taught us something about where multi-cloud services may be headed.

The first cloud: EC2

1.    Build:

The first and biggest step for any system will be building it as a VM if you haven’t done so before.  Once you have done this, you can practically drop it onto any box.  You’ve become independent of the hardware and other aspects of the operating environment. 

Beyond build, you have to focus on: setting up the network topology, configuring the virtual boxes once they’re up, and managing the result.

2.    Deploy:

 The next phase is figuring out how you bring up instances in your cloud platform.  EC2 has its own interfaces for this, and Rackspace has different ones.  Rightscale normalizes these interfaces and provides a UI.  There’s an open source package with no UI that we evaluated but aren’t using called libcloud. 

Now that you’re hardware independent, you can run as many instances of your service’s components as you can afford.  The main solutions here are Chef and Puppet, both open source.  We use Capistrano for scripting automation. 

Then you need to configure the topology of the different subsystems you’ve built.  Here things get interesting.  EC2 does not support multicasting across your default virtual network; this was tough for us and would be for anyone relying on clustering.  VPN-Cubed from CohesiveFT let us build a private network within our EC2 environment and let us do the multicasting we needed.

Once your network is up and you can push software, it’s just the same as having your own private datacenter.  You can connect from anywhere, manage instances, and get alerts and reports.

3.    Manage:

That brings us to management.  We use Nagios for monitoring our virtual boxes.  We learned that we needed to have a separate machine outside of EC2 as a “monitor monitor” – a Nagios instance that monitored the health and responsiveness of the Nagios box in the cloud environment  We use RightScale for managing all of our accounts and instance creation.  With this setup we’ve had zero downtime since our launch in late August of last year.

We realized at the outset that we wanted to build a service that would be portable, so we chose not to use the least portable features of AWS, such as S3.  While it would have made our life simpler for some of the assets we were managing, there was no corollary (and Walrus, the Eucalyptus storage subsystem that mimics S3, does not count as a corollary, even though it really works).  We did use EBS (Elastic Block Storage) which is so close to a SAN that we felt it was reasonably standard; and forcing our hand was the fact that we needed to solve for persistence and performance. 

But the evils of cloud computing were present as well as the the good.  EC2 does not guarantee the availability of an instance, but the availability of a zone.  As a result we found that the latency of our service had a high degree of jitter (between 5 and 15ms), which was acceptable but not ideal.  The lack of control in this environment means that we’ve been buying instances ahead of our need in order to guarantee not just availability but performance.  This is one of the headaches that cloud computing is supposed to transcend.

In a nutshell – “it’s elastic but you have to manage it.”

So in order to manage the network performance issues (achieve constant performance AND availability) we realized that we needed to go multi-cloud.  We also realized that our core service principle – we’re a cloud service gateway and active proxy for people's API traffic – meant that we had to have a “strongest link” architecture so that no set of failures at a single cloud provider could take down our service.

We’re now building on what we’d anticipated and developing a new instance of our service at Rackspace.  The big changes here are the level of control we have out of the gate for network topology, process isolation, CPU performance… and price, which is higher. 

The second cloud: Rackspace

1.    Build

Architecturally the big differences are database replication and cross-provider load balancing.  This places really specific requirements on your networking design and technology as well as your database design. 

One of the things our service does is store all of our customers’ cloud API traffic for their later use in analytics.  Thinking about data modularly helps with replication.  In a replicated world we need to break out types of datasets – such as customer information and service configuration – into smaller chunks that can meet higher-speed replication requirements cost-effectively, and break them away from all the historical traffic data.  Even the traffic data needs to be handled differently in this world. 

We are now sharding the database into circular tables, where the incoming data is always written to a write-only area, and revolves to the next area every five minutes.  In our user base a 5-minute delay on analytics is more than acceptable (compare this with the SLA for Google Analytics), and the working set of data used for traffic management is handled separately in realtime.  All of this means that we can have either a hot standby or live-live dual-cloud configuration without breaking our customer promise that they can tweak their service at any time at all, and that their analytics are consistently available.  This will also let us evolve both sides of the service as it grows.

2.    Deploy

Deployment tooling stays the same – our old friends RightScale and Capistrano are used to spin and configure instances.

On the networking side, obviously you need to connect your clouds securely in order to replicate between them as well as to exchange performance data which can be used for load balancing.  We found again that VPN-Cubed helps us establish a trusted connection between our heterogeneous cloud environments. 

3.    Manage

Since we are using standard monitoring and management tools – Nagios, RightScale, and Capistrano – these all work in both environments, and our approach of using a “monitor monitor” doesn’t change.. although now we need to monitor monitors in each cloud.

Is there an easier way?

For an infrastructure play like Apigee, we don’t think so.  Given our customer promise of near-zero and predictable latency we need as much control as possible.  For an application-level service play though, we think some parts can be easier.  We’re built on Sonoa technology that manages all of our cloud API traffic processing, as is ING, a financial service company that’s moving to cloud. Their challenge is elasticity in financial modeling – specifically the Monte Carlo simulation workload which is compute-intensive and highly intermittent in use of resource.  When you’re running the simulation, you need all the compute resource you can get.  When you’re not running a simulation, you need almost none.

Cloud infrastructure like EC2 and Rackspace take care of the racking & stacking problem associated with scaling up for Monte Carlo.  You still need to manage that with a  tool like RightScale or libcloud plus your configuration and deployment tool of choice.  But at the higher level where you’re load-balancing between clouds you don’t necessarily need a VPN, as there’s no data replication requirement.  At this layer they’ve implemented a secure API which is called by internal clients, and then this API request is load-balanced by Sonoa’s API gateway.  The gateway then calls the right cloud based on policies set by the monitoring and scheduling software.  So in this situation you are monitoring your cloud instances and letting the API gateway handle the dirty work of dispatching and securing the calls.

10 Lessons Learned From Building to Multiple Clouds:

1.    Get everyone comfortable with virtualization fundamentals, from developers to admins.

2.    Limit your dependency on provider-specific APIs by using 3rd party tools that manage this for you.

3.    There may be SLAs on your cloud instances but there are no SLAs on the APIs your cloud providers give you.

4.    Refuse to use services that have no corollary in other clouds.  It will cost you more in rearchitecture than you gain by using it.

5.    Understand the cost trade-offs for your business of the different clouds’ strengths – especially in the dimension of availability, price, and performance.

6.    Anticipate your needs for data replication and design your databases accordingly.

7.    Pay attention to your networking requirements and network topology.

8.    Consider the granularity of the requests that you need to load balance – is it at the service or API layer or is it finer-grained than that?

9.    You’ll still buy more than you need but the waste ratio is much less in the cloud.

10.    Monitor the monitors!

Right product at the right time:  API Product Management

Recently we were asked by a SaaS company exec "can't we just hire someone to come in here and build our API for us?"

Danger, Will Robinson.  Just like any other product in your stable, your API needs to go through your product management practice.  Successful APIs usually have a dedicated API product manager that creates the 'right product at the right time" by continually focusing helping the team stay on target by driving:

1. What is the vision for the API?      How do you go from an idea to great product?  Start by asking what is your vision?  if you were sitting around with your top 5 execs..would they agree?   One good PM framework we've seen really focus an effort is explicitly defining the "VMSO" or "Vision, Mission, Strategy, Objectives" before every major release.  For example:

  • Vision - what is "the dream" (example: be the most widely used widget catalog on the planet)
  • MIssion - what do we do every day to achieve the dream? (example: have the easiest catalog API to learn and use)
  • Strategy - what is our unique approach for achieving our mission? (example: have the smoothest sign up, clearest REST API and best community support)
  • Objectives - what are our 1-3 key API metrics to determine if the strategy is working? (example: developer apps, API transactions, API revenue)

2. What is the target customer segment for the API?  Mobile developers?  Your top 10 partners? Affliate marketers? Each segments may need different features, policies, or marketing approaches. Do a customer or developer segmentation analysis and force rank priority segments. 

And if you ask your API team who their target segment is and the answer is 'everybody' - get worried.

3. Develop use cases.  Ask how little, not how much, you can launch with your API.  Taking back functionality is difficult once it's out there.  Identify and prioritize the minimum set of use cases (or user scenarios - such as 'browse catalog information')  and consider throwing out anything outside what's needed for each use case.

4.  Iterate quickly.  It's rare to find a successful API program where the PM doesn't say something like "and after we launched our customers took us in a completely different direction." Consider agile development techniques to help your team iterate quickly. 

5.  Differentiate your API. How is your API or content different than competing APIs in your vertical?  Why should I drop what I'm doing now and use your API?  Using a well-worn PM 'positioning framework' can help the team agree on this beforehand. For example:

For the (target customer)

(example: Mobile developers)

Who needs (primary pain point or need)
(ex: a free and complete widget catalog for commerce apps)
Our solution (our API is.)
(ex: the most comprehensive, open widget catalog that is incredibly easy to use)
That (key benefit)
(ex: provides comprehensive and accurate widget product data for 3rd party apps)
Unlike (the compeition)

(ex: 'for pay' catalog APIs or catalogs with low rate limits

inaccurate, incomplete catalogs or APIs that are hard to use)

Solution is (greatest differentation)

(ex: free and easy to get started with - with amazing community support )

What other PM processes do you recommend?

(and thanks to respres for the great photo.)

Moving the needle: Example API metrics

It's an old cliche, but it's been said that you can't move the needle if you can't *see* the needle.  So frequently we're asked "what are good metrics to measure an API program?

While individual metrics are important - it might be as much about the 'process around metrics.'  Or..how metrics are evangelized and used to drive specific parts of the API product development pipeline.  Specifically:

1. Get early buy-in on the 'top 3'  -  strong API product managers often focus in on 1-3 top level 'strategic' metrics and get early, wide agreement from all parts of the extended team - the sponsoring exec, PM, engineering, BD, and operations. If different stakeholders are measuring success with different metrics (say number of developer sign-ups vs. API traffic vs. revenue) this can pull resources in different directions.  

2. Track against realistic projections.   Set expectations early by modeling anticipated results and then track actuals against this estimate. For example, pick a 'comparable' or competitor's API to guess developer portal traffic, then model the expected developer sign-ups and conversions  (for example, 10% of visitors might ask for a key, 20% of them might built an app, 10% of those apps might drive ongoing traffic, each of those apps might drive a certain volume of traffic, and so on... )  

3. Publish a weekly dashboard, religiously.  Proactively call out how product updates and community activities do or don't move the needle so you can quickly adjust tactics and think of new ideas that might move the needle.

4. Create a metrics 'pipeline' -  How do different metrics diagnose how each stage of the customer conversion process is working?  For example, developer portal traffic might be a good metric to measure the marketing guys. (that is, they might be responsible for getting developers *to* the portal.)  But whether or not a developer converts to ask for a key and then converts into an active API user might be a measure of how effective the PM process is working to create a product that developers want to use.  User experienced bugs can measure development and product QA effectiveness, and so on..

Here is an example of a metrics pipeline that we recently discussed with a customer.

 

Category Example Metric 
Awareness (measure of marketing effectiveness)

-Developer portal traffic: Unique users, page views, and engagement (PVs/UU)
-Top traffic sources (search, direct, referrals

Signups (measure of portal messaging effectiveness)
-Registrations (developer keys issued)
Adoption (measure of product fit)
-Active developers, partners
-Applications (number, by app type, geo, partner 'tier')
-App end users (such as mobile app users)
-Traffic: volume and % API vs. non-API 
-Developer retention (active developers lost)
Quality (measure of dev process)
-User experienced problems (errors returned)
-Bugs reported
-Critical situations (P1 bugs or blocking bugs)
Community (measure of customer sat)
-Community members
-Community forum activity and engagement
-Number of very active members
-Net promoter score
Financial  (measure of business model fit)
-Revenue
-Cost of data served (if licensed)
-Profit and margin
-Market share

(Thanks to seenoevil for the photo)

Cloud Security series - issues around PII, privacy, and audit compliance

Greg recently sat down with Ryan Bagnulo, Security Architect for ASPECT-i, to discuss a number of cloud security concerns and issues.   

We captured these discussions in six short videos, each focusing on a  topic.  Here are the first two on PII, data filtering, and audit and regulatory concerns,  (see the full series here.)

In this first video, Greg and Ryan set things up with discussions on:

  • Challenges in deploying cloud, starting with: should you trust your cloud administrator?
  • Good data for early cloud adoption (such as public data like news, stocks)

This 2nd short focuses on:

  • issues around PII (personally identifiable information)
  • counter-measures, such as de-identifying data with filtering, screening or access control
  • privacy and regulatory risks around stored in the cloud.
  • best practices for protecting data
  • implications for violating security breaches privacy regulations


We'd love your thoughts and comments.

OAuth — Take care with those keys!

Greg Brail's photo

A lot has been happening with OAuth recently. Earlier this year a security hole was discovered in the protocol which exposed it to a potential “social engineering” attacks.  However, the OAuth community is working on a revision to the spec that will eliminate this particular hole.

Last week we wrote a bit on OAuth as an option for API security.  But today I wanted to bring up a related OAuth issue - how do you securely manage all those keys?

With traditional username / password authentication, good security practices require you don't just have a big database on the back end with a list of unencrypted passwords. Instead, a hash of the password is stored, preferably using a salt. So someone who can read the password file can verify they have the right password, but cannot see the actual password.

It is still critical to protect access to these encrypted passwords. Otherwise, an attacker can mount a dictionary attack to try and crack them. However, even if someone gains access to your entire database of encrypted passwords, they can still only easily gain access to lousy passwords. At least users who choose secure passwords are relatively safe. (It is also critical to protect access to the cleartext password, but at least this mechanism doesn’t require that it be stored in a database for all to see.)

As networking and middleware people, we spend a lot of time thinking about the security of our network protocols, and especially ensuring that someone eavesdropping on a network cannot grab our passwords and other sensitive data as they fly by. But how many times have we heard of a security breach caused by a stolen laptop? I would argue that protecting so-called “data at rest” is just as important, or maybe even more important, as protecting the data flying around your laptop.

Now, back to OAuth. Each “user” in OAuth holds something called an “access token,” which is like a username, and a “token secret,” which is like a password. When a request is sent over the network containing an OAuth authentication token, a bunch of data in the token is encrypted using the token secret, but the secret itself is never sent over the network. That way, regardless of whether SSL is in use, there is no way to gain access to the token secret by sniffing the network.

However, on the server side, in order to validate the OAuth token, the server must make the same calculation that the client made when it encrypted the data to put in the token. That means that both the client side and the server side in OAuth must be able to read the unencrypted token secret from some sort of database. Without it, OAuth doesn’t work. There’s no set of standard ways for storing those keys like there are for passwords, so presumably different implementations are storing them in different ways.

As a result, any client and any server that uses OAuth has to take extra-special care with all those token secrets. Otherwise, anyone who gets access to the database of tokens and secrets used by the back end servers immediately has access to all the OAuth-enabled accounts.

I am not suggesting a change to the OAuth protocol here — it solves an important problem. However, I am suggesting that anyone who implements either the “service provider” or “consumer” side of OAuth take very special care of those tokens!

For instance:

  •     If they’re on a regular disk file, protect them using filesystem permissions, make sure that they’re encrypted, and hide the password well.
  •     If they’re in a database, encrypt the fields, store the key well, and protect access to the database itself carefully.
  •     If they’re in LDAP, do the same.

Come to think of it, perhaps the world needs a standard LDAP schema for storing OAuth secrets in a secure way. Anyone care to make a proposal?

Tech Talk: API Visibility and Metrics

Earlier this week, Greg speculated that Twitter might have benefited from digging deeper into API metrics and usage patterns, so we thought it would be a good time to put him on the spot with a tech talk he recorded on API visibility a couple weeks ago. 

For more, here are some sample API metrics considerations and a demo of our own API Analytics solution.

So you want to open an API?

Greg Brail, our CTO, took advantage of a break from O'Reilly Velocity and a new Flip HD ultra to record a series of four short whiteboard talks on issues many face when opening an API - visibility, security, traffic management, and more.  Here is the first clip and you can preview all of these and more API case studies on our Sonoa youtube channel.).  

We love feedback and these are quick to do - so any more topics you'd like to see, please let us know..

Challenges when building APIs

Greg Brail's photo

If you’re planning to build an API and expose it to the Internet then you’re going to have to face some challenges that you won’t necessarily find when building an internal web service. For instance:

Design. The best APIs are the simplest, but designing a simple API isn’t easy. Plus, what’s simple to one user population isn’t simple to another. A “REST-style” API like Twitter’s is great for AIR programmers or Perl hackers but someone accessing it from inside a big web app server stack might actually find it easier to use a SOAP web service with a WSDL. On the other hand, a SOAP-only API would have been death for Twitter because it would have meant that those tens of thousands of Perl hackers would have had a heck of a time using it in the first place.

Compatibility. Let’s say you don’t get the design right the first time — and how often does that happen? How many “old” versions of your API can you afford to keep running to keep clients functioning? Are are you willing to tell your users, “sorry, we changed the API and now you have to re-write your apps.”

Authentication and Authorization. What does your API do? If it just lets you look up public information, maybe you don’t need authentication. But are you planning on using it with more sensitive data? Will people be using your API to spend money? They’re going to expect that they have to authenticate using a username and password at the very least. There are quite a few ways to do that — which one(s) will you choose? How will you manage all those accounts?

Threat protection. Is there a possibility that a malformed API request can cause your servers to go off in la-la-land, trying to execute an impossible query? Did you code everything write to prevent a SQL injection attach? What if a client sends your servers some bizarre XML — will they run out of memory or crash?

Latency. Since the goal of your API is to provide a service over the Internet, then you will have to live with anywhere up to several hundred milliseconds of latency just to get to and from your API. If each API request takes hundreds more milliseconds, or even several seconds, to run, then how will that affect the perception of your service?

Visibility. Who is using your API? How often? How do the patterns change over time? What kind of latency are they seeing? How many errors do they get? Do different users see a higher error rate? Is the user you signed up last week actually using the API? These are all questions you will want to answer in order to serve your customers better.

Rate limiting. How do you plan to limit user access to your API? Sometimes the right answer is to do nothing — and this is often the right answer for an internal system, where saying “no” is not an option. But for a public, Internet-based API, you owe it to yourself to at least protect your API against disaster — a user who decides today’s a great day to see if they can call your API 100 or 1000 times per second, or one that makes a programming mistake and codes up an infinite loop, or worse. And if you’re planning on a larger user population, then a formal set of quotas makes a lot of sense, which is why Twitter, Yahoo, Google, Amazon, and others all put limits on how much you’re allowed to use their APIs before you give them a call and let them know what you’re up to.

Next, I hope to dive into what we're seeing for each of these and more in detail -Greg