RC: W4 D1 — Designing the in-memory component of my LSM storage engine

March 4, 2024

Today, I started designing at a high-level what the in-memory component (“memtable”) of my LSM storage engine would look like.

LSM-based storage engines usually contain a cascade of memtables: only one is active (the one in which key-value pairs are written) and the others are immutable and will be flushed to disk at some point. This design can be summed up with the drawing hereunder:

                                                       Key-value pair
                                                       
                                                             |
                                                             | Write
                                                             |
                                                             V
     +------------+         +------------+    Freeze   +------------+
     | Memtable n |   ...   | Memtable 1 |  <--------  | Memtable 0 |
     +------------+         +------------+             +------------+
           |
           | Flush to disk
           |
           V
      +---------+   +---------+       +---------+
      | SSTable |   | SSTable |  ...  | SSTable |
      +---------+   +---------+       +---------+

So far, I have implemented a red-black tree that works for insertions and lookups. This will be the data structure for my memtables, but other data structures could have worked as well, such as AVL trees or skip lists. (After having implemented my red-black tree, I discovered that real systems in production actually use skip lists… 😅. I will keep mine as a red-black tree, and might change this to a skip list later if I have time!) As far as I know, it doesn’t seem to matter much which data structure we choose, as long as it allows to keep the data sorted and allows for inserts and lookups in logarithmic time.

Here are the other objects that I will be using:

Today, I dealt with record encoding and decoding in the Record class. Tomorrow, I will start digging into the “freezing” of memtables!