RC: W3 D1 — Tombstones

February 26, 2024

Today, I decided I would add one missing feature to my storage engine: the ability to delete records.

But wait: this is a log-structured storage engine. Which means that we only ever append to a file. So how can we delete a record? The answer is: we add a special kind of record that is called a tombstone. So basically, whenever we add a tombstone, the storage engine has to understand that the key no longer has any value associated to it.

In the case of Bitcask, I figured this would have to be handled at two places:

  1. When we insert a tombstone, we want to remove the corresponding key from the keydir right away, so that we don’t even bother reading the datafile (an alternative would have been to point to the tombstone record, seek it and return ‘None’ when decoding it — but this seemed like unnecessary overhead)
  2. When we perform the merge process, every time we come across a tombstone, we need to:
    • Update the keydir so that we stop looking for this key when it is a tombstone
    • Nonetheless record the tombstone in the merged file (this is in case the key is still present in older data files)

Now, how can we encode a tombstone? I guess we have several options (this list is likely not exhaustive):

I decided to go for the third options because it seemed to be the most efficient in terms of space. Of course, now that I have finished it and that am writing it down, I am wondering if this was actually a good idea. Answer: probably not, because I can’t differentiate between a value that is an empty string and a tombstone then… Well, I’ll leave it as is for now and figure it out later!