Loading Search...

API Best Practices Blog

Tradeoffs in XML data transformations »

Daniel Jacobson of NPR posted a fascinating piece about how NPR tackles a common problem – what’s the best way to render content on a variety of devices, from modern web browsers with top-notch CSS implementations that look almost like typesetting (like Safari) to mobile phones using WAP to low-end devices like HD Radio receivers that don’t understand anything but plain ASCII text.

NPR’s clever solution is to strip markup out of the text and store it in a database table, indexed by position in the text document. To re-generate the content for a particular device, their software queries the database and re-applies the markup tags to the content according to what device it is rendering to.

This takes me back to the original reason SGML was invented and made an ISO standard in 1986. The idea was to describe the semantic meaning of text, and then to let a computer program figure out how to render it for human consumption.

SGML was a little over-engineered for that purpose, however, so a bunch of smart people got together in 1996 and invented XML. XML then begat technologies like HTML, XSLT, and CSS.

So today, instead of writing something like:

<h1 class=”headline”>This is a headline</h1><p class=”byline”><b>By I.M.A. Reporter</b></p><p class=”paragraph”>And here is my first paragraph with something in <i>italics</i>.</p>

 

XML lets us write:

<main_headline>This is a headline</main_headline><byline>By I.M.A. Reporter</byline><p>And here is my first paragraph with something in <i>italics</i>.</p>

 

The difference is that my second example isn’t HTML – it’s part of a document that uses an XML schema that’s up to me, and when writing it I don’t care if I’m coding for an HTML browser or for a car radio – I just have to identify when I’m writing a headline, or a byline, or a caption, and so on. I can now use XSLT or another transformation technology to transform this XML into very simple HTML for a simple browser, or into very complex HTML with links to a CSS stylesheet for a more sophisticated browser, or just into plain text. And if I decide that part of my XML schema should look just like HTML (like I did above with the “p” and “i” tags) then that’s fine too.

Other approaches and tradeoffs

NPR’s approach has a lot of benefts. Depending on your business and situation, this might mean lot of database processing, which could to be expensive to scale in either licenses or capacity.  Caching helps a lot in this case, since once is content there’s no need to do it again.

You could also solve this problem by writing the original content in very simple HTML or XML (in whatever schema one desires) and then by using something like XSLT to transform the content for each input device. This solution might be CPU-intensive but might compare favorably vs. database operations depending on what you are doing. Plus, XSLT processing can be easily scaled across thousands of parallel nodes if necessary without buying any more database licenses.  

If development resources and cycles are the constraint, a dedicated policy layer can help.  In the case of our Sonoa ServiceNet technology - you could configure transformation policies that leverage XPath or XSLT from within our proxy.   This might also make it easier to add and validate 3rd party APIs or feeds from outside your own database.   You can also handle other types of mediation such as versioning or protocol transformations, if that is in your use case, such as some of our Sonoa media and consumer web services customers do.

 

 

How is cloud computing related to SOA?  Case study on API Policy and Governance Patterns »

Last week, Scott Metzger of Truecredit.com gave a great case study presentation on how they opened their internal SOA as APIs for partners at the Burton Group Catalyst conference.  Specifically, the different policy and governance patterns.

Scott talks about the factors driving them to identify and implement a separate application agnostic layer for 5 major policy patterns including service access, routing, caching, transformations, and operations. (And more details of their implementation in this video)

Scott Metzger of TrueCredit Catalyst Presentation

One size doesn’t fit all: API Versioning and Mediation »

(continuing our series on API roadmap considerations)

TrueCredit.com tells a story of calculating they would need thousands of IP addresses for all the different versions and flavors of their open API - to account for different variations and versions needed by partners.  

Even if you have a ‘one sized fits all’ API - you might need to be able to transform data, mediate terms or customize SLAs without coding each change or a creating a new version of the API. Reasons could include:

•    Protocol needs - A SaaS customer with a REST API had an important deal on the table with a bank, but the Bank insisted on a SOAP API with WS-Security.  Some SOAP shops want to offer a RESTful API because it’s easier for developers to work with. And you might need to transform between different syntaxes of SOAP, REST, or REST/JSON, etc...
•    Monetization – You might want to sell a premium version of your ‘one size fits all’ free API.   For example, a search API provider wanted to do a BD deal with it’s free API and needed to insert extra data ('enrich the API' as they called it) the partner wanted to pay for.
•    Standardization –  A customer of ours grew from offering 1 API to 40, and needed to add some standard fields to each API - enforcing some consistency without needing to coordinate a bunch of teams to write code.
•    Versioning - Ever used an API where you get an email every month asking you to upgrade to a new version?   TrueCredit wanted to provide API upgrades to customers that needed it while holding the API fixed for everybody else longer, to reduce versioning headaches.

So you may need to figure out how you to provide and manage different flavors or versions of the same API – or ‘mediate’ (or transform) API content and syntax.

Alternatives might be to support multiple APIs (painful), hold off as long as possible and push back on customers to snap to a ‘one size fits all’ model (more painful), or create a ‘mediation’ capability or layer that can transform between different ‘shapes of the API – protocol, data, version, credentials, etc.

(And going back to TrueCredit’s story at the top, this is what led them to think about an API gateway for mediation, caching, load balancing, and more.)

So ask if and when any of these issues might apply to your roadmap:

Will you need to mediate protocols?

  • Will you need to offer more than one protocol or a different protocol?  (SOAP for enterprise customers? REST or JSON for developer adoption? )
  • Would you ever need to map across different security or credential schemes?  (ex: from simple HTTP auth to WS-Security)
  • Will you need to handle syntax issues across a particular protocol (SOAP 1.1 vs. 1.2, etc.)
  • How important will it be to minimize API versions?

How important is version management?

  • How often will you need to release upgrades to the API?  What is your process for asking customers to upgrade and how long will it take to sunset a version?
  • If you offer more than one API, any need to standardize elements of the API (header or payload)?  Do different teams need to do this or does it make sense to put this capability at one point?

Will there be a need for payload transformations?

  • Will you ever need to enrich an API for a particular customer or class of service?  (such as a big customer that licenses more data..)
  • Will you need to remove or clip certain fields for certain customers or classes of service?
  • How fast will you need to turnaround these requests for the business vs. your dev or product cycle?

A mediation layer (see more flavors here) can be important to handle complexity so you can focus development on business specific API capabilities.

(and thanks to collinanderson for the photo)