Tuesday, November 27, 2012

NoSQL vs. SomeSQL

Linux Journal had a fantastic article (SQL vs. NoSQL) some time back. While I know this is a bit of hopping on the bandwagon, I like the point this video is trying to make: http://www.xtranormal.com/watch/6995033/mongo-db-is-web-scale. Caution: The language used in this "video" may not be appropriate for some viewers.

There are lots of folks out there that like to tout numbers on performance and how sometimes performance is really fast under "ideal" conditions, but as both point out, the trick is to know how to balance performance/scalability, reliability, and availability. /dev/null is extremely scalable and available but it's completely unreliable. In MySQL - the Blackhole storage engine has a lot of the same performance metrics but used properly, can be a great way to "pass through" data in a replication ring.

Sunday, February 12, 2012

A basic shared-nothing data sharding system

There's a lot of buzz about sharding data. Today, I'll provide a very brief overview of how sharding helps systems I manage run more efficiently and how we're addressing keeping individual shards balanced.

The goal of sharding data in our environment involves: 1) make the structure of the data consistent across all the shards, 2) dividing data up so it can be found easily, 3) automatically and continuously re-balance the shards, and 4) allow for changes in scale (like adding a new shard or different shard sizes).

Item 1 is a snap - all we do there is to deploy the same data structures in each of the shards with all the supporting data required to answer questions related to a user. Some of this data is user-specific, some is globally replicated. In any case, this goal makes it easy to use one set of code to access data in any of the shards without having to cross to another shard or database to get the answer for a question. This reduces workload in the application and on other database servers.

Item 2 is done by hashing our key data. Let's say that we have a set of widgets that users are concerned with. Some users have a few widgets, some have a lot, but each user is very different from another. Widgets are pretty common and well defined. Each user has a user ID and any question we ask the system always involves a specific user ID. So - our key data we hash against in this case would be the user ID. Data about the widgets is replicated to all the shards, but data about each user is only kept on the shard where that user's data lives.

Item 3 is handled by a separate process that utilizes the same API the application uses. Balancing the data between shards is simple - the balancer asks the API if there are any users that need to move. If yes, the balancer lets the API know to lock that user temporarily, moves the data, then unlocks the users for use on the new shard. What this means for applications is each time a location is returned for a specific user, that location is only guaranteed for a given window of time (30 seconds for example). So - when the balancer tells the API it's moving a user's records, any requests for that user's records are held up until the user's data is moved. The API is smart enough to only let the balancer move data that has not been accessed recently. This doesn't prevent all lock collisions, but it handles most of them.

Item 4 is handled through the configuration of the API. Because we use an API to tell the application where the data is for a given user, we've abstracted away where data actually lives. This makes it easy to add and remove servers from the sharding pool. We've extended this to include allowing a shard to be marked as in a draining state. When a shard is draining, the API will ask the balancer to move rows from the draining shard and redistribute that information onto other members of the sharding pool. This makes it possible to take a shard out of rotation for routine maintenance without the loss of data.

Notice that I didn't mention any specific software here. I didn't tell you what language the application is written in, what language the API is written in, or what the actual data store was. The technique of sharding data is pretty simple and can be done with nearly any persistence layer using any programming language.

The beauty of this system is that once the API is written, the balancer can be a complete "black box" to the application. This type of system could be implemented with a data store when just starting out and be expanded to multiple stores as the need expands. Also - sharding key needs to change, again, the application doesn't need to change - just the API and the balancer.

One other big benefit to sharding data like this - it's often a lot cheaper to buy several smaller systems than to buy and maintain one very large system. If one of the systems in the sharding pool goes off-line, the worst possible exposure in a shared-nothing sharding system is the data stored on the member that went down. In a monolithic system, you stand to lose a lot more.

While I wouldn't suggest trying to do this type of work on top of every data set out there, I do see that there is a lot of benefit when the types of questions being asked of a data set can be divided up easily while still making it relatively easy to answer the "question at hand" from a single source. The secret in the sauce is making sure that any common data is shared among all the systems in the pool.

Sunday, January 15, 2012

Managing incoming emails

Reading emails all day long tends to be very counter-productive for me. I usually end up responding faster than anyone else which generally gets me a lot more work than I need. At the same time, I have a responsibility during my times as primary and secondary on-call to respond within our service level agreement. So - how do I find balance? My team and I use mailing lists to help us manage those truly urgent issues versus those issues that can be handled as time allows. We have three lists:


We've published these three lists to our operations center. Everyone else just gets the admin list. We don't tell others about the primary and secondary lists because anything we'd get on primary or secondary would need to come via the operations center anyway. We also don't want our over 600 co-workers (not on our team and not in the NOC) to email us willy-nilly using our on-call emails.

Next, on each of our team's smart phones, we've set them up to recognize emails going specifically to the primary and secondary emails so our phones will either go off like a pager or (in my case) read the sender and destination email (think "Inbound Primary email from the NOC"). That prevents me from having to look at my phone every time a new message comes in but lets me know when there's something that requires my attention.

The other thing we do is to make it easy to change the destination for the primary address easily so that only primary gets notified. Secondary is notified in the same way but on my two-man team, there are only two of us so secondary always goes to the whole team (for now).

Finally, to help us have reasonable sanity, I do what I can to only check the "other" emails twice a day.

The  net result of this process is I am able to focus on getting project work done between routine email readings and it lets others figure things out for themselves or wait a bit for an answer. If it was truly urgent, the sender could simply ask the NOC to reach out to the on-call person to get a faster response.

How do you deal with your on-call processes and email?