The "Wow" effect in CockroachDB

Note: the original version of this article is available here.

A few months ago, I was invited to present CockroachDB to a tech consulting office in Amsterdam. The audience was welcoming and receptive. They understood, appreciated, and lauded the "flagship" features of CockroachDB: distribution, scalability, high availability, operating simplicity.

Yet a question came up which I had not heard before: all these are features that solve known problems; now, what are the goodies?

The goodies, the asker clarified, are those features which:

  1. the user did not expect,
  2. are not present in other products, and
  3. are small-ish in nature so that a casual user can easily show them off to a peer.

Goodies enable users to brag about their product choice after the choice is made, without too much attention for the rational trade-offs that motivated the choice.

I paused, and recollected. What are CockroachDB's goodies?

Obviously, the main CockroachDB documentation is unlikely to highlight features directly in this way: the documentation aims to treat all features as novel and useful, making no assumptions about what a particular reader may like more over another. Arguably, the doc site is also a marketing tool aiming to convince users who do not use CockroachDB yet, so it is bound to focus primarily on CockroachDB's core features.

Finding "goodies" requires looking at the thing as if all its core features were already considered familiar and uninteresting, and contemplate what sticks out beyond that in an agreeable way.

Searching for a fancy feature suitable to impart a "wow" reaction in demonstration booths, I quickly thought about the Node map: a graphical visualisation of the geographical distribution of CockroachDB nodes in the world.

Arguably, this feature is very enterprise-y (and incidentally limited to deployments with an "Enterprise license"), and perhaps of limited use when the database operates properly.

We can instead look at the layer underneath, another goodie of a more technical nature: the configuration of replication zones which enable a user to configure which parts of which SQL tables is replicated on which (sub-)sets of cluster nodes.

The zone config language is a DSL (domain-specific language) which supports a constraint algebra against arbitrary attributes of the underlying data stores. It supports both positive (mandate) and negative (avoid) conjunctions (mandate/avoid compatibility with all properties) and disjunctions (mandate/avoid compatibility with either/or properties). Its constraint solver results in automatic migration events which move the data where it is constrained. It also interacts peacefully and constructively with the automatic load balancing that happens independently to increase performance: data is migrated within its constrained zone to bring it closer to where it is needed.

I described this as solid and serious feature that is both practically essential and appealing to an audience of erudite hackers. My audience agreed.

For having contributed to some parts of the code base, I am aware of several more goodies which I indirectly or directly contributed to.

For example, CockroachDB integrates a fancy tracing infrastructure which can extract detailed debugging details. The collection of traces can be enabled using a variety of mechanisms depending on the troubleshooting scenario. For example, one can request a detailed trace of all the processing done by CockroachDB on behalf of a single query, but throughout all the abstraction layers inside CockroachDB including across all the nodes in the cluster that participated in the query's execution. Many other tracing endpoints beyond SHOW TRACE are also available, including via the web browser. It's also possible to trace all executions through particular files or functions in CockroachDB's source code.

Given the commonly known arduousness of debugging large distributed systems, developers will likely find some appeal in this powerful tool. It has certainly improved the life of CockroachDB's contributors already.

Speaking of which, a fancy advantage of exposing tracing data within SQL is that one can then further use SQL queries to filter, transform and reduce particular details of traces. In fact, CockroachDB generalizes this principle: any internal data produced by CockroachDB that can be structured as a table should be available for further processing by SQL queries.

Here, I am not considering that CockroachDB, like other SQL databases, exposes the SQL logical schema via SQL tables (e.g. information_schema) which can be queried for introspection.

Instead, beyond that, any configuration or administration SQL statements can also be used as a "virtual table" to query data. For example, there exists a SHOW JOBS statement that lists the current background tasks in the cluster (e.g. asynchronous online schema changes, such as adding an index on a very large table); given that this produces tabular data, one can refine the output with e.g. SELECT finished - created FROM [SHOW JOBS] to determine the execution time of completed jobs. This enables users to design their own views on the current status of their cluster, without the need to request an extension in CockroachDB's SQL syntax.

There exists also a command-line SQL shell (invoked via cockroach sql), analogous to the psql shell—in fact, it's so compatible with it that psql can connect to a CockroachDB cluster, and cockroach sql can connect to a PostgreSQL database.

Despite its smaller set of features compared to psql, cockroach sql contains its own goodies. For example, both psql and cockroach sql can present the user with guidance about the syntax and usage of a SQL statement using \h, but cockroach sql can also present this help if the user presses ?? then the tab key while they are currently entering a query. This enables the use of contextual help without erasing the current entry, which is particularly convenient while experimenting. To ease experimentation further, cockroach sql also supports \hf (not known to psql) which is able to pull the documentation of individual SQL built-in functions, unlike psql.

On a related note, the cockroach executable program contains many other functions besides the main server function (start, quit etc) and the SQL shell (sql). Some of them are gems of their own.

cockroach demo is a fantastic entry point for beginners, and for teachers constructing a SQL tutorial: in one fell swoop, it starts a RAM-only CockroachDB server and an interactive SQL shell, with no additional configuration needed. Type this command in, then you can start typing SQL immediately and work with CockroachDB. Lovers of sqlite tend to like this a lot. (I do too. It's gorgeously helpful to try out new code during development.)

cockroach gen man will generate CockroachDB's unix manual pages automatically, ready to read or install. Cockroach Labs distributes a single cockroach binary, to simplify the download process, but you can still install its documentation in the Right Way, like for all your other beloved unix programs.

cockroach gen autocompletes generates auto-completion data for either Bash or Zsh. Avid users of the CockroachDB command line will surely appreciate this convenience, which is designed to accelerate operations and maintenance.

There is even an Easter egg hidden in cockroach gen somewhere, but I am not telling. Will you be able to spot the CockroachDB logo?

There is much I could write about CockroachDB's unique technical features. Yet, at this point, I would like to shift this exposé and underline that CockroachDB's own documentation site is itself quite a unique achievement. To a casual observer, "it's just a documentation site for a technical project". But for amateurs of documentation resources, there is much to love.

For example, the documentation presents content both from the angle of usage scenarios (e.g. "how to do this or that") and as a reference manual (i.e. "what is everything I need to know about an aspect of the product"). There is content both for absolute beginners (e.g. "Getting started" guides) and technical audiences (e.g. an in-depth presentation of CockroachDB's architecture). Cross-references are exhaustive and relevant, so that it is particularly easy to idly surf from one area to another, much like one can educate themselves by casually browsing Wikipedia. Each documentation page has a link in the top right where the reader can become an editor and propose improvements (even propose direct changes to the text). For a project as young as CockroachDB, the maturity of its documentation is remarkable. (Disclaimer: I have personally contributed to parts of it. I am very proud.)

To conclude, I would say it is rather easy to find ways to like (or even love) CockroachDB beyond the moment you decide that it is suitable for your purpose. Plenty of goodies indeed.

Find me on twitter!

Copyright © 2018 Raphael ‘kena’ Poss. Permission is granted to distribute, reuse and modify this document according to the terms of the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit