Thursday, 17 July 2014

Is your API broken?

"Welcome to the Example Rutabaga Company. We’ve got a simple REST API for all your rutabaga needs!"

Indeed, it is simple…

   POST https://rutabaga.example.com/Order/ HTTP/1.1
   Content-Type: application/json

   {"Quantity": 5800,
    "Quality": "Tasty!",
    "DeliverTo": "123 Fake Street, New Orleans"}

Send this and you'll either get an error or an "OK" response with a tracking ID inside. Later, you’ll get several thousand tasty rutabagas in the post. What could go wrong?

Everything.

Schrödinger's Response


From the client’s point of view, there’s a clear action to take depending on the response code.
  • 200, log the tracking ID.
  • 5xx, try again later.
But what if there’s no response? Perhaps your friendly HTTP client library code has thrown an exception because the connection has broken down. These errors are unavoidable, especially when the client is on a mobile device. What should we do in this situation?

You could try again later? But hang on, this violates the thing that makes POST different from GET and PUT. (GET and PUT are designed to be repeatable, but POST requests are express calls to take action.)

You might reason that the first POST request failed, so you're not actually repeating anything, but aren’t you? There are two possibilities when you get an error from any sort of network request.
  1. The request was lost on the way and the remote server did not handle the request.
  2. The request arrived and was handled, but the response to the client was lost.
If A, we’re fine to repeat the POST. No problem.
If B, the remote server is already in the process of shipping a truckload of rutabagas to you and has no idea the response got lost. Repeat that request and you’ll end up with two truckloads of rutabagas.

But this is the point, the client has no way of knowing if its A or B. The only entity that knows is the server and we can't talk to it.

For a surprising number of APIs I've written client code for, that's the end of the story. The API simply has no reliable way for the client to find out what happened.

How does your API handle this situation? Is your API broken?

Opening the box


One way an API designer could resolve this issue is to provide a way to look up the order history.

This is probably what you’d do if (say) you were shopping online and your internet connection died just as you hit the Complete Purchase button. Once you got back online, you'd check to see if the order was in the system before repeating the order.

Sounds simple? This would work but be careful, for alas, this approach has lots of caveats. Fortunately none of them are really insurmountable.

Beware of false duplicates

Say you're in this worst case scenario and your link to the server has just been restored. Your code dutifully downloads the list of outstanding orders and finds one for 5800 rutabagas. Job done?

Wait! Was that your order? Maybe the account holder deliberately made another identical order from a different machine. We don’t know - We can’t know.

This can be resolved by ensuring the client has the opportunity to supply its own way to identify the the initial request - perhaps with a client supplied ID - and allowing for a lookup later on.

How long should we keep that ID around?

Expire ID records too quickly and a client that’s been offline for a prolonged amount of time will not be able to resynchronize. Store the IDs forever and that would be a waste of space.

You may have a figure in mind that’s reasonable. If not, add an occasional reconciliation of expired IDs to your API.

Who chooses the ID?

The client should be able to freely chose an ID. You may be looking at your database and thinking there’s a field supplied by the client that’s already got a no-duplicates constraint. If those values came from a source external to the client, it won’t be able to control the uniqueness of those important values. That external entity might very well be feeding identical records into the system through different channels and the client won’t know if that duplicate it found was their own or someone else's.

Whose ID is it anyway?

Make sure the client has a clear space from which to select IDs. We can’t have multiple users all counting from 1 because you’ll get collisions very quickly. A GUID would work as long as they are generated correctly. Maybe if the API requires that the client log-in first, the server could track IDs on a per-user basis, but not all APIs require a log-in or pre-registration.

Avoid colliding with prior attempts still being processed.

Consider this: A client attempts to send a request to a server, but the connection fails with a time-out error. Thirty seconds later, the client asks the server if that prior request made it, which it answers "No". Time to repeat that first attempt?

But wait! That first attempt timed out because the server was unexpectantly busy and has only just started dealing with your first request.

You can mitigate this (probably rare) scenario by making sure the server will return an error to the second POST request. Almost all DBs allow for any field or combination of fields to have a uniqueness constraint and the error will just happen if this scenario ended up playing out.

Do you have a ticket?


There’s another protocol that works in a similar way but puts the server in control of the IDs, at the cost of requiring two separate phases. (The actual request could be carried along with either the first or second phases.)

The first phase has the client asking the server for an ID while the second phase has the client committing to complete the transaction with that ID.

This protocol does require that when the client begins phase two, they have committed to not return to phase one for this transaction. The client must also store that ID and be ready to use it for when the connection has been restored. Similarly, the server needs to agree that it only starts processing a transaction once the second phase request has arrived.

This two-phase approach covers for failures at any step along the conversation, so long as the client and server stick to the agreement.
  • If the first request is lost, there’s no problem in repeating the first phase.

  • If the first response is lost, the server will have allocated an ID that will never be committed, but will be left indefinitely in an uncommitted state. (A later occasional reconciliation of orphaned IDs would be useful here.)

  • If the second request is lost, the client can later repeat the commitment of the transaction after checking its state using the ID it received in the first phase.

  • If the second response is lost, the client can later check the state of the transaction using the ID and see that it is already committed.
This protocol has a similar caveat from the earlier plan - How long should the server keep track of used ID numbers? The server will be left with IDs that will never be committed as well as committed IDs that the client might still need to check up upon later. Again, you may wish to come up with reasonable time limits or allow for a reconciliation of IDs later on.

While this protocol might be considered more complicated because of the two phases of conversation, there are fewer caveats to this plan and fewer oportunities for things to go wrong. This is my personal favorite.

Do I really need to do this?


As I write this I'm also working on a small web service that uses a REST API with POST requests, but taking none of the advice I offer on this page. Why not? Simply that the cost of the resources being allocated by this API-to-be are so close to zero that making the effort to implement the API robustly is just not worth it in this particular case.

But consider, even if you're not transmitting invoices worth thousands of dollars, do you really want duplicates turning up?

Picture Credits
"Rutabagas" by Dale Calder
"Barney the cat" by Bill P. Godfrey (me).
"Rutabaga 2" by Dolan Halbrook
"Commit no nuisance" by Pat Joyce


No comments:

Post a Comment