RC: W3 D3 — LSM storage engines

February 28, 2024

Now that I have covered everything I wanted in Bitcask, I decided I would start another storage engine: one based on LSM-trees, and thus called an LSM storage engine.

LSM stands for “log-structured merged” and, as the name implies, LSM engines are log-structured, which means that we only append to the file. Therefore, as for Bitcask, writes are fast but reads are slow (we would need to read the whole file to find the most recent record). In Bitcask, reads are made fast thanks to an in-memory hash index that points to the position of the record in the file. The drawback of this approach is that it requires keeping all keys in memory (thus, it works well if we have a limited number of keys).

In contrast, LSM engines work for any number of records. The approach is to have records sorted by key in each file. This has two benefits:

  1. We can keep only a sparse index (i.e. one key every n keys): because they are sorted, we know that key “ABC” will be somewhere between keys “AAA” and “ACA”. So we can keep only keys “AAA” and “ACA” in the sparse index (thus saving space in memory) and read all records between “AAA” and “ACA” to find record “ABC” since they are sorted on the file;
  2. The merge process is faster: since datafiles are sorted by key, we can perform a mergesort algorithm.

Datafiles that are sorted by keys are called “SSTables” (= Sorted String Tables).

The remaining question is: how can we write files that are immutable but nevertheless sorted, knowing that writes could occur in any order? Answer: LSM-trees store records in memory and sort them on the fly (usually using a red-black tree or an AVL tree) and only once they get bigger than a certain size, they are flushed to disk. With this approach, records can be written on disk in an append-only manner but nevertheless be sorted.