Design: REST and Event Notifications - when might they work?
Andrew Matthews highlighted this interesting article by Marc Jacobs tonight in an email to Readify’s internal tech mailing list. It describes a part of the Microsoft Robotics Studio API runtime, which provides a whole set of tooling around concurrency, coordination and hosting of a distributed services for use in robotics.
This article on MSDN provides a basic explanation of the application model:
http://msdn.microsoft.com/robotics/getstarted/prgmmodel/default.aspx
While there looks to be a whole load of functionality provided by the Robotics Studio framework, which, as the first article explains, could probably find its way into use in many of the service oriented systems we work with, one of the items I took an interest in was that they’ve taken a REST-based architecture and added support for event notification.
While there seems to be a lot more to the framework than just this feature, it seems to be one of the foundations - the part that allows one service to notify another of state changes - and so it’s pretty important. Let me first try and explain how I understand it, and then to voice some of my thoughts on it.
Disclaimer: I’ve based this on all kinds of assumptions, no experience with big web architecture, a single scenario that I picked out, and no hard data or evidence. It’s simply the conclusion I’ve come to in my head from reasoning, and I wrote this merely to address the point that was made about how the API’s will change web service architecture. Hopefully if you’re actually planning to architect a bunch of web services you’ll do more research than my blog
The Scenario
Imagine that you are a massive online book store conglomerate. You allow multiple websites to act as a front-end for your store, while you perform the job of taking and processing orders. You might have an architecture that looks something like this:
- Web Site 1, 2 and 3 - these are the website front-ends for your store. They list books, and allow people to add them to their cart. When someone has finished shopping and they complete their order, that order is sent to the Order Taking Service for processing.
- Order Taking Service - this is a REST-based web service which keeps a great big list of all the customer orders that have been taken. The web site front-ends call this service when someone hits “Check Out!” and confirms their payment details.
- Credit Card Processing Service - as its name suggests, when an Order has been taken, this web service contacts a bunch of banks and checks credit card details, and performs the transactions.
- Order Dispatch Service - after credit card details have been processed and the transaction is finalised, the Order Dispatch Service has the job of doing all the warehouse gunk that goes with shipping books - whatever that entails. Every now and then it needs to go back to the Order Taking Service to tell it the status of a given Order (”Shipped on the 12th of November, 2007″).
Note: Obviously there’d be a gazillion more services than this, and they probably wouldn’t work anything like this, I’m just using this to illustrate a point.
So, Chief Clancy Wiggum bumbles onto Web Site 1 to order a book - “101 World Famous Donuts” - and proceeds to the checkout using the website’s “1 click order” scheme, which always seems to take him 5 clicks minimum. When his order is finalized, the web site submits his Order to the Order Taking Service using an HTTP PUT request. He’s sent an email to confirm his order, and he closes his browser.
Behind the scenes, the Order Taking Service needs to send the Order to the Credit Card Processing service, to subtract the payment from his credit card. There are two ways to do this.
Approach 1 - Polling and Pulling (the REST way)
The first model uses a “pull” approach, and relies on the Credit Card Processing Service and the Order Dispatch Service to poll the Order Taking Service, to get a list of orders to process:
- Every 5 minutes, if the Credit Card Processing service isn’t busy, it issues an HTTP GET request to the Order Taking Service, asking for the next order to process.
- It then contacts the bank or whoever it needs to in order to execute the purchase and transfer the money. In the mean time, a virus on the server sends your credit card information to the Russian mafia.
- When the purchase is approved, the Credit Card Processing service then executes an HTTP POST request back to the Order Taking Service, telling it that the purchase was approved.
Whilst all this is taking place, the Order Dispatch Service is doing something very similar.
- Like the Credit Card Processing service, the Order Dispatch Service also continuously polls the Order Taking Service with an HTTP GET request to ask for the next Order to ship. The Order Taking Service is smart enough to only return Orders that have already been purchased.
- The Order Dispatch Service then goes off to the multitudes of warehouse systems and does everything to ship the book.
- As the books are gathered, labels are printed, and parcels are tied to the feet of carrier pigeons for delivery, the Order Dispatch Service makes HTTP POST requests to the Order Taking Service to inform it of its progress.
This model is the way the HTTP protocol and the web were designed to work. When you opened your browser and came to my blog, you are “polling” my site for new information, and “pulling” the blog entry down to you. Even if you used an RSS aggregator, my website didn’t go out to tell your aggregator that I wrote this entry - your aggregator polled my site to ask for a any new entries. RSS might actually be a cool way to implement this order management system.
Approach 2 - Event Notifications
In this model, the Order Taking Service publishes two events - “OrderPlaced” and “PurchaseFinalized” - which the Credit Card Processing service and Order Dispatch Service can subscribe to. Every time something happens to an Order, the Order Taking Service goes through the list of subscribers and raises the appropriate event. This is quite similar to how you’d design the solution in C# (using events) or perhaps even Java (using Observer, plus a gazillion other facades, dependency injection implementations and other patterns to make it feel like Java
):
- When an Order is placed, the Order Taking Service records it like before, but then it takes an active role - it cycles through a list of subscribers to the event, and makes a call to them to notify them that “an order was placed”. They can then do what they like with that information.
- In our scenario, the only subscriber to the event is the Credit Card Processing service. “An order was purchased, you say? I’ll credit their credit card and get back to you when it’s done!”
- When the bank has been contacted and the Russian mafia notified of your details, the Credit Card Processing service then executes an HTTP POST request to tell the Order Taking Service that the purchase succeeded.
- This update then triggers the Order Taking Service to raise the “PurchaseFinalized” event. Again, there is only one subscriber to this event - the Order Dispatch Service.
- The Order Dispatch Service again goes to the warehouse like before, and and progress updates are fed back to the Order Taking Service.
To support this model, the Robotics team had to extend some of the core REST concepts to allow event notifications, since they aren’t really a standard part of the HTTP protocol.
Why the Event Notifications approach is nice
When I first looked at the Event Notifications approach, what I liked most about it was the programming model. If you wanted to add another service to the system, you could do so simply by subscribing to the correct event. It felt very much like the languages I’m used to (C#), and coming from a data binding perspective where we raise events every time state changes, it felt very natural. However, the Polling system seems just as extensible - the new service would just have to make the right HTTP GET request. So really, both models seem the same in terms of extensibility.
The only good thing about Event Notifications is that they happen immediately. In the example on the MSDN website, if the “Bumper” sensor determines that the robot has bumped into a wall, the “Driver” service that is in control of the robots wheels need to be told straight away - the “Explorer” service can’t just poll the Bumper every couple of seconds saying “have you hit something yet?”. It’s just not the right model for that situation.
These requirements probably aren’t limited to robotics - they might also apply to an enterprise such as a currency exchange system, where you want to be notified immediately when something happens.
Why the Polling/Pulling approach is nice
Whilst there are occasions where updates/notifications should be immediate, I think there’s just as many (if not a lot more) cases where they don’t need to be. In our example, credit cards don’t need to be charged immediately - we can send them an email later if the order wasn’t successful. Likewise, it’s probably going to take days until the warehouse is ready to begin processing the order - why does that notification need to be immediate?
I think the polling approach also leads to easier code. In the event notifications approach, if the warehouse wasn’t ready to process the order, the Order Dispatch System would have to keep a queue of orders that had been purchased but not yet begun shipping. Likewise, if the Credit Card Processing system was busy (or, it was night time and the banks were closed), the Credit Card Processing service would need to keep a queue of orders to process.
So, while each service would be notified immediately, they’d all have to keep their own queues of data if any of their services couldn’t complete straight away. Of course, since they won’t be notified again, they’d also have to persist that queue somewhere - otherwise you run the risk of dropping orders. In the polling approach, there would only need to be one service to maintain the state of orders, and the rest would simply be asking for “units of work” to process when they were ready.
When I think about the two approaches, the polling model feels like it would fare better in the categories of performance, scalability and safety as well. Let me explain why.
Scalability
By scalability, I’m going to use a scenario again. Imagine it is Christmas time, and all three of the websites above decided to sell books at 95% off. First, as the demand increased, how much extra processing would the servers have to put up with, and secondly, where could we add new servers to cope with the demand?
I think both approaches would scale equally well as new servers were added, assuming you made no other changes. The same number of HTTP requests would go back and forth between the servers in both approaches, but the extra queues needed to store updates until they can be processed might add a little extra overhead to the Event Notifications approach.
I think the biggest difference would come in the flexibility that the polling model provides. Using polling, the “subscriber” decides when they are ready to do something, whilst using event notifications, it’s the publisher (”got any orders?” versus “process this order!”).
This allows us to fool around with the system a little - for example, we could turn off all of the Order Dispatch servers during the day when the site is busy. This would mean we wouldn’t be making any “got any orders for me to ship?” requests during the day to the Order Taking service, which would take away a third of the internal network traffic in this system (probably even more, because we wouldn’t be sending any progress updates back either). If we had especially smart infrastructure guys, we could potentially even turn these dead servers into Order Taking or Credit Card Processing servers during the day, and switch them at night (that’s if we didn’t decide to turn off the Credit Card processors too).
Basically, we can easily shutdown any non-essential parts of the system, and nothing will be lost as we’ll simply have a very big queue to get through when we turn them back on. In the event notifications model, we need to ensure all servers are running at all times.
Safety
I think this one is pretty obvious. As above, when we only use an event notification model (of course, our system could support both models - but then you may as well stick to polling), we need to ensure the servers are always up; otherwise we might end up with lost notifications!
Imagine if all of our Credit Card Processing servers went down; it could be disaster! All of the event notification calls would lead to timeouts, which would block our Order Taking servers from processing orders, which could cascade to more timeouts.
When the servers came back online, there would be no way to “rescue” those notifications, unless we went and asked the Order Taking service for a list of unprocessed orders - which is essentially the code we’d need for our polling approach anyway. We could perhaps queue the notifications, and send them again when the server is online, but this would probably lead to a message queuing architecture - which essentially relies on polling in the first place.
Performance
There’s a gazillion different ways to measure “performance”, and I’m not going to examine all of them because, really, I’m sleepy. I think the event notifications approach may actually win out on most of these, but that doesn’t matter too much if it doesn’t scale and it’s not safe (although it might in your scenarios).
Orders processed per second
I’d expect this to be pretty much the same for both approaches, although the event notifications might take the lead as the notifications would be instant. During a peak period, however, this probably wouldn’t matter as the queue of work items is likely to provide a nice buffer, and processing servers could poll as soon as they are free. The extra need for queuing might impact the processing speed, and you would probably need one bunch of threads to handle the requests (”add this order to the queue of purchases to charge”) and another to process them (”charge this purchase”); in the polling approach these would all be reserved for processing (the same thread would poll as well as process).
Bandwidth
If you just take into account bandwidth between the web services (for polling or notifications), it’s probably pretty obvious that polling is likely to take a little extra bandwidth too, as there may be a number of “null” responses. However, in this example at least, this will usually be during slow periods rather than high traffic periods, so it wouldn’t cause much of an impact. If orders came through very consistently, polling would be a good option; if they were very sporadic and the servers were being used for other things, event notifications may be a good way to avoid tying up the network.
Latency
Event notifications wins this one again (of course) but as mentioned, this is only going to matter if you really need it.
Conclusion
Obviously this is going to be different in every situation, and I don’t have any hard numbers (nor experience with it) to back me up, but I don’t think that Event Notifications have much to offer to critical enterprise web services that are designed to scale or perform well. Marc Jacobs did a good job of making the Robotics API sound like it would lead to huge changes to web service architecture, and perhaps some of the things in the API will - but I don’t think Event Notifications will be among them.
However, it’s worth noting that the two approaches aren’t mutually exclusive, and a system of the future might well comprise of both.
How does this relate to data binding?
Yeah, this is a pretty off-topic entry for my blog, but it does have a lot to do with some thoughts I’ve been having around the idea of “poll-based data binding”. This examination of web services actually highlights that the same things can be achieved through both polling and events, and whilst data binding uses events (because often we want immediate updates), polling does have a lot of advantages. This leads very much into some thoughts I’ve had around asynchronous bindings, which is a post for another evening
Filed under: Architecture

In either “model” (Event vs. Poll) you have to make sure that the Orders live through a hard/soft failure. In uni we solved this by using MSMQ. We basically built a simple Windows Service that checked a MSMQ for order objects. The queue was setup so it would serialize any message to disk, in case of power failure or what not.
That queue could easily have implemented an event system, so that our service didn’t have to ask for objects, but rather would be told when there was an order there.
- Jarle -