Loading Search...

API Best Practices Blog

Why XML won’t die: XML vs. JSON for your API »

Last week I wrote that if you're API doesn't support JSON and JSONP - you're doing it wrong.   I don't think that's terribly controversial.

But is JSON (and JSONP) perfect for everything you need to support with your API?  Is XML dead?

JSON is especially good at representing programming-language objects. If you have a JavaScript or Java object, or even a C struct, the structure of the object and all its fields can be easily and quickly converted to JSON, sent over a network, and retrieved on the other end without too much difficulty and (usually) comes out the same on both ends.

But not everything in the world is a programming-language object. Sometimes to describe a complex real-world object we have to combine different descriptions and languages from different places, mash them up, and use them to describe even more complex things. The descriptions of these complex things need to be validated, they need to be commented on, they need to be shared and sometimes annotated with additional data that doesn't affect the original structure.

When the world gets complicated and open-ended like that, what's needed is not a programming-language-format object, but a open-ended, extensible -- umm -- markup language. That's what we have today with XML.

For instance, the travel industry (through the Open Axis Group), the insurance industry (through ACORD) and the financial services industry (through FpML) have all spent many person-years developing standards that describe what they do in XML format. Each standard comes complete with a schema, which means that any client or server can validate a document to ensure it is correct enough before starting to parse it, and which makes it easier to edit the document using one of the many of the mature tools that are available.

Sure, parsing and understanding these documents is not simple, but they do not represent simple things. The ability to represent a complex travel itinerary, a life insurance policy, or an interest-rate swap in a standards-based format is a big deal and a triumph of XML technology.

Similarly, look at HTML. (Most HTML is not XML but both come from SGML and are very similar.) HTML works because it can combine both structured and unstructured content in various ways and accept the ability to mash up different standards into one document.

In my opinion, XML will only be dead when the web has replaced HTML with JSON.

So for our APIs, let's embrace JSON -- it's small, simple, and easy to use. But when we have to collaborate on complex documents, pull information from different places, and define complex schemas to represent complex real-world concepts, let's also not forget about good old XML.

Not serving JSON AND JSONP? Then you’re doing it wrong! »

If you've used an API recently, you've probably seen that the popular APIs out there support JSON. JavaScript Object Notation is a standard defined a while back by Douglas Crockford from Yahoo. It uses a subset of the JavaScript syntax to simply and effectively describe an object.

In the last few years, JSON has taken its place alongside XML as the de facto way to describe API data. Today's leading APIs support JSON in addition to XML, and an increasing number support only JSON.

JSON is popular because it's simple. Programming-language objects map to and from JSON in a straightforward way that everyone can understand. You need a "JSON parser" to convert JSON into an object (unless you're working in JavaScript) but you don't need to know much about it other than how to make it go.

If you are thinking of building an API, JSON support is critical. Here's why:

JavaScript. JSON is JavaScript. That is, a "JSON object" is literally a small fragment of JavaScript that represents an object and its sub-objects. That means that creating a real JavaScript object based on some JSON text is simple and fast. Web programmers love JSON.

JSONP. This lets JavaScript running inside the browser invoke APIs that reside on a different host on the Internet. This doesn't sound like a big deal but it is actually huge because all browsers implement a "same-origin policy" that otherwise makes this impossible. JSONP is hard to implement, but libraries like jQuery make it easy for the client if the server already supports it.  If you're not serving JSON AND JSONP you're doing it wrong!

Smaller. Smaller is better, especially in the mobile environment, and since JSON doesn't "say" every field name twice like XML does, JSON output is a lot smaller than XML.

Less complicated. JSON is free of namespaces, attributes, multiple "text" nodes, and other complexities of XML. The result is that JSON parsers exist for every language, they're small, and they're fast. Furthermore, if you need to write your own, it's not complicated. The same goes for security -- all that's necessary to prove that a JSON document is valid JSON is a simple regular expression check, which is easily available in nearly every programming environment.

Tools. An increasing number of tools support JSON. JSON support is not ubiquitous yet, but at the rate JSON is gaining it will be soon.

Many APIs now support XML and JSON - like the Twitter API, where JSON is the default. Some APIs support only JSON (like Foursquare's V2 API).

But JSON isn't  for everything. Next up: Why XML isn't dead yet!

Tradeoffs in XML data transformations »

Daniel Jacobson of NPR posted a fascinating piece about how NPR tackles a common problem – what’s the best way to render content on a variety of devices, from modern web browsers with top-notch CSS implementations that look almost like typesetting (like Safari) to mobile phones using WAP to low-end devices like HD Radio receivers that don’t understand anything but plain ASCII text.

NPR’s clever solution is to strip markup out of the text and store it in a database table, indexed by position in the text document. To re-generate the content for a particular device, their software queries the database and re-applies the markup tags to the content according to what device it is rendering to.

This takes me back to the original reason SGML was invented and made an ISO standard in 1986. The idea was to describe the semantic meaning of text, and then to let a computer program figure out how to render it for human consumption.

SGML was a little over-engineered for that purpose, however, so a bunch of smart people got together in 1996 and invented XML. XML then begat technologies like HTML, XSLT, and CSS.

So today, instead of writing something like:

<h1 class=”headline”>This is a headline</h1><p class=”byline”><b>By I.M.A. Reporter</b></p><p class=”paragraph”>And here is my first paragraph with something in <i>italics</i>.</p>

 

XML lets us write:

<main_headline>This is a headline</main_headline><byline>By I.M.A. Reporter</byline><p>And here is my first paragraph with something in <i>italics</i>.</p>

 

The difference is that my second example isn’t HTML – it’s part of a document that uses an XML schema that’s up to me, and when writing it I don’t care if I’m coding for an HTML browser or for a car radio – I just have to identify when I’m writing a headline, or a byline, or a caption, and so on. I can now use XSLT or another transformation technology to transform this XML into very simple HTML for a simple browser, or into very complex HTML with links to a CSS stylesheet for a more sophisticated browser, or just into plain text. And if I decide that part of my XML schema should look just like HTML (like I did above with the “p” and “i” tags) then that’s fine too.

Other approaches and tradeoffs

NPR’s approach has a lot of benefts. Depending on your business and situation, this might mean lot of database processing, which could to be expensive to scale in either licenses or capacity.  Caching helps a lot in this case, since once is content there’s no need to do it again.

You could also solve this problem by writing the original content in very simple HTML or XML (in whatever schema one desires) and then by using something like XSLT to transform the content for each input device. This solution might be CPU-intensive but might compare favorably vs. database operations depending on what you are doing. Plus, XSLT processing can be easily scaled across thousands of parallel nodes if necessary without buying any more database licenses.  

If development resources and cycles are the constraint, a dedicated policy layer can help.  In the case of our Sonoa ServiceNet technology - you could configure transformation policies that leverage XPath or XSLT from within our proxy.   This might also make it easier to add and validate 3rd party APIs or feeds from outside your own database.   You can also handle other types of mediation such as versioning or protocol transformations, if that is in your use case, such as some of our Sonoa media and consumer web services customers do.

 

 

API threat protection pack: 10 XML attack types to guard against »

The cost of IT security breaches has almost doubled from 2008 according to this piece via ComputerWorld Canada.

While we'd love to tell you this is just a problem for our Canadian friends - unfortunately we all need to understand API attack types.

(Remember in our Cloud security tech talks last week we saw that for breaches over a certain size you may even need to issue a press release!)

Here are 10 threats that we cover in our API threat protection policy pack

1. Malicious Code Injection:  exploits backend services that use SQL/LDAP/ XPATH/ XQuery statements from user-supplied input.  Servicenet ‘s Malicous Code Injection Detection policy can filter SQL,LDAP, XPATH, XQUERY injection or use Custom Regular Expression, XPATH and XSD technologies to filter the request further.   It also can integrate with anti-virus products to scan for virus in the API requests especially in the attachements or mime contents.
 
2. DOS Attacks: Denial of Service (DoS) intends to prevent an API or Service from serving normal user activity. These malicious attacks includes mega-message and entities attack, recursive element attack, request flooding, larger volume of  invalid requests etc.  The ServiceNet Message Payload protection policy detects various kind of DOS attacks and protect the backend from the attacker.
 
3. Service Information Leakage:  APIs can unintentionally leak information about their configuration, internal workings, or violate privacy through a variety of service errors.  For example verbose and informative error messages may result in data leakage, and the information revealed could be used to formulate the next level of attack. ServiceNet response Message control policies can customize fault/response message reaching the client which can weed out this attack.

4. Broken Authentication, Session id and Keys: Proper authentication, API key and session management is critical to service security. Flaws in this area most frequently involve the failure to authenticate (weak or multiple adhoc authentication schemes), weak session/key tokens that helps attacker to replay or fake the keys or tokens.   ServiceNet’s authentication and API key management policies  provides single point strong authentications and key generation techniques that frees-up API developer from attack risks.

5. Failure to protect API and corresponding Data access: Frequently, authorization is based only on base URI or operation of API.  An attacker can try passing various parameters to this API operation and get access to the data that he not authorized to access.  ServiceNet fle xible authorization policies supports authorization based on various request parameters/data not just URI or Operation name.

API Data snooping: Failure to encrypt sensitive API communications means that an attacker who can sniff traffic from the network will be able to access the conversation, including any credentials or sensitive information transmitted. Servicenet’s SSL or XML encryption policies can be used to secure the API data from getting snooped in the communication path.

7. API Request and Response tampering: The API data tampering attack is based on the manipulation of API request and response parameters exchanged between client and services in order to modify application data, such as user credentials and permissions, price and quantity of products, etc. Usually, this information is part of HTTP URI or Header or Body(XML or non-xml).  Servicenet’s SSL or XML signature policies can be used to secure the API request and response message from getting tampered in the communication path.

8. Request Burst: Spikes in API requests might bring down the backend server. Spike Arresting and caching helps the backend services to perform better under various load conditions.

9.  Auditing: If your API is going to be handling money, you may be required by law to adhere to certain security practices and regulations.  One important regulation is auditing every (full or part of) request or response from authorized and unauthorized users.  ServiceNet auditing policy supports very flexible way to log API audit data in various formats to different destinations like Local disk, NFS, Syslog, JMS or Web Services.

10. Threat Detection and Analysis:  Analyzing the threat data is important to find the failures and fix those failures on the API infrastructure. ServiceNet’s analytics policy provides capability to visualize and analyze various API errors or failures. It can also provide various patterns or rates of these failures that help an architect or developer to fix the problem in his or her API.

For more on API security and threat protection, check out our compliation of API roadmap issues - Is your API Naked?  And let us know if you like to see the demo of this policy pack in action.

(Senthil Doraiswamy is a product manager at Sonoa Systems.)

(And thanks misocrazy to for the photo..)