One of the items that we discussed was our "batch broker" - a process responsible for handing out unique batch ids - that uniquely identify processes, end up in logs, in audit tables, and sometimes tagged to rows in the database.
We laughed about how embarrassingly simple this process was: just a few dozen lines of python code that
- open up and lock a file
- increment the number within
- close & lock the file
- log the requester & new batch_id
- return the batch_id
What makes this an embarrassing solution is that it isn't distributed (say, using ZooKeeper), is subject to downtime when its server gets upgraded or crashes, and it won't scale to hundreds or thousands of requests a second (it supports about 2), doesn't log enough information to make it easy to diagnose mis-configured requesters, and isn't resilient enough to be recovered from a toasted server without some amount of work.
And yet it deserves praise. Because it has exceeded our requirements at a ridiculously low cost: it was originally written in just a couple of hours over ten years ago and has offered 99.999% up time while assigning 26 million batch ids without a hitch. It's a small enough process that any programmer can learn it in about 5 minutes and it shares hosting with other process. ZooKeeper, in comparison, would have involved weeks of research, training and configuration time as well as multiple servers for deployment.
The next upgrade for this program is to move the id generation into a relational database, and store client arguments (process name, organization, etc). That's pretty trivial and should improve diagnostics, recovery and speed. And if our requirements change such that we can't afford any downtime then we'll look at something sexy. In the meanwhile we'll have to be satisfied with cheap & simple. And I'm OK with that.
No comments:
Post a Comment