RC: W1 D3 — A quick overview of Bitcask

February 14, 2024

The first project I want to be working on during my batch at RC is re-implementing Bitcask from the ground up.

What is Bitcask? It is a log-structured key-value store. The team that designed and implemented it published an article explaining the basic principles it relies on.

Because it is log-structured, the data is always appended sequentially to a log file. Therefore, if the value of a key changes, the log file is not updated: instead, the value for the key is appended at the end of the log file.

For example, a log file containing 3 key-value pairs could look like the following:

key1;value1
key2;value2
key3;value3

If we want to update the value for key2 to another_value2, we would append it to the end of the file, resulting in:

key1;value1
key2;value2
key3;value3
key2;another_value2

Thus, inserting/updating any key-value will always be done in constant time (o(1)). However, when we need to read the value, this would require reading the whole file until we get the last value recorded for the searched key (because the current most up-to-date value is the last one written). The time complexity to do a lookup is thus linear (o(n)), which is not great.

To overcome this limitation, the Bitcask team came up with the idea of using an in-memory hash table, that they call a “keydir”. That is: for each key, the keydir records the position in the file at which the most up-to-date value is (as well as the size of the value, so we know where to stop reading when doing a lookup). Therefore, when doing a lookup, it first searches the key in the keydir to know the file and position in which the value is written. Then, it only needs to do one single disk search. This makes the complexity for a search drop from linear to constant time.

Because Bitcask needs to record all keys in memory, it cannot be function properly if the number of distinct keys is huge. However, it is a great storage engine to use for cases where we have a limited number of keys keeps being updated often.

Another problem with this system is that, because it keeps on appending forever, it will use a lot of disk space. So the typical way to handle this is to perform a compaction process to merge keys together. I’ll talk about this in a future post!