In the ongoing war of words between Amazon and Larry Ellison, Amazon published best practices for migrating an Oracle database to Amazon RDS, and it’s chock full of saucy details. For instance, I wasn’t aware of the “Amazon Database Freedom Program”, which just gives free AWS credits to anyone ‘migrating from commercial engines’. “Are you operating with old world databases”, a slide asks, with hilarious icons that make it very clear who they’re talking about. It’s quite incredible the number and assortment of tools that Amazon has built to “free” people from their Oracle installations. Take a look.
In a more eye raising move, Amazon Distinguished Engineer James Hamilton threw some more direct shade at Oracle, calling them out for his constant lying about Amazon’s use of Oracle technologies. And there’s Andy Jassy’s twete which I’m just going to quote here in full:
In latest episode of "uh huh, keep talkin' Larry," Amazon’s Consumer business turned off its Oracle data warehouse Nov 1 and moved to Redshift. By end of 2018, they'll have 88% of their Oracle DBs (and 97% of critical system DBs) moved to Aurora and DynamoDB. #DBFreedom
You’ll note that with this much shade and anger, that number is still 88%. I expect that Larry will be auditing Amazon this December.
If you’re building a new database, what are the criteria you’ll use to pick your storage layer? The textbook answer is that there’s a handful of questions you’d ask yourself:
The careful CTO will thoughtfully examine each of these trade-offs, discuss with stakeholders, run meticulous performance benchmarks, … and then choose RocksDB.
Look, I’m not even aware of any remotely mature competitors to RocksDB, and certainly none can match the trustworthiness of being in production at Facebook, and more generally, being in widespread use by other databases. It’s got a ridiculously long list of features that it supports. Every now and then somebody comes to my insect-themed database and asks if we can swap out RocksDB for another storage engine, and we start to enumerate the list of reasons why not… and quickly give up because even 10% of the list is enough to make the point. Basically if you’re designing a new database, there are so many other factors to keep in mind, that RocksDB sort of takes the role of IBM in that nobody’s database ever failed because of their choice of RocksDB.
We haven’t even discussed more sophisticated data structures like Bw-Trees. But fundamentally, if you want to build a BTree based storage engine, all that engineering is on you. As a database vendor, you’ve got 4 main worries: A,C,I, and D. Pick RocksDB and you’re basically getting the A and D for free and your worries are down to C, and I. It’s not a difficult choice.
There’s a general principle at play here, which is that as a software component gets more complex, the reasons for using it shift from technical to social.
FoundationDB’s new release includes features for multi-region deployments:
FoundationDB 6.0 introduces native multi-region support to dramatically increase your database's global availability. Seamless failover between regions is now possible, allowing your cluster to survive the near-simultaneous loss of an entire region with no service interruption. These features can be deployed so clients experience low-latency, single-region writes.
The gist of what has changed is that they baked in “regions” as a concept that allows users to express the globe-level topology of their cluster. As “planet-scale” becomes more of a buzzword expect to see more of this, sooner in the lifecycles of products.
We’re at the point on the capability vs. suitability curve where we have real problems we need to solve (how do we wrangle the complexity of a database whose component’s communication is limited significantly by the speed of light) but I’m not sure we yet have enough experience as an industry to say what the right abstractions are. Christopher Meiklejohn has been beating this drum for a while:
We posit that striving for distributed systems that provide “single system image” semantics is fundamentally flawed and at odds with how systems operate in the physical world.
I will note that this is a direct violation of Rule 11 of Codd’s 12 Rules of Databases, which only goes to show that every day we stray further from Codd’s light.