RC: W11 D3 — Storing and retrieving historical versions in SSTables

April 24, 2024

The memtables have been updated to store multiple versions for a given key. Since these records are flushed to disk, all the versions need to be written in the SSTables.

To write all versions, I simply used the FlushMemtableIterator that I implemented yesterday: it is an iterator that yields all versions for each key, from most recent to oldest. And with this, all versions get written in this order in the SSTable. Having more recent versions written before older ones matters: it allows to make lookups more efficient. Indeed, when applying a get or a scan on the SSTable, the most recent version should be returned. Because it is written first, once the first value is found, we can return it without needing to read all other existing versions. So I updated my SSTableIterator for it to return only the first version found, and that was it!

There is one more difficulty though: SSTables are broken down into multiple data blocks (which correspond to the amount that can hold onto one SSD page). Thus, the SSTableIterator uses the DataBlockIterator to iterate over one data block at a time. Until now, the process to find a record was:

However, because we are now writing multiple versions, these may be scattered over two data blocks. Therefore, I needed to update that strategy to keep on searching on the next data block if the record with the correct version is not in the first data block searched. Indeed, this matters to retrieve a historical version: to find such a version, I can pass the max last sequence number allowed, and return the first record which has a sequence number equal or less than that number. And with all these changes, I can now access a historical version of a record inside a given SSTable. That’s great! Tomorrow, I will try to see how I can leverage this to be able to retrieve a historical version on the full engine.