RC: W11 D3 — Storing and retrieving historical versions in SSTables
April 24, 2024The memtables have been updated to store multiple versions for a given key. Since these records are flushed to disk, all the versions need to be written in the SSTables.
To write all versions, I simply used the FlushMemtableIterator
that I implemented yesterday:
it is an iterator that yields all versions for each key, from most recent to oldest.
And with this, all versions get written in this order in the SSTable.
Having more recent versions written before older ones matters: it allows to make lookups more efficient.
Indeed, when applying a get
or a scan
on the SSTable, the most recent version should be returned.
Because it is written first, once the first value is found, we can return it without needing to read all other existing
versions.
So I updated my SSTableIterator
for it to return only the first version found, and that was it!
There is one more difficulty though: SSTables are broken down into multiple data blocks (which correspond to the amount
that can hold onto one SSD page).
Thus, the SSTableIterator
uses the DataBlockIterator
to iterate over one data block at a time.
Until now, the process to find a record was:
- Read the meta blocks (that contain the first and last key of each data block) to identify which data block may contain the record
- Use the
DataBlockIterator
to iterate on that candidate data block - If the key is found inside, return the record; otherwise, state that the record was not in the SSTable.
However, because we are now writing multiple versions, these may be scattered over two data blocks. Therefore, I needed to update that strategy to keep on searching on the next data block if the record with the correct version is not in the first data block searched. Indeed, this matters to retrieve a historical version: to find such a version, I can pass the max last sequence number allowed, and return the first record which has a sequence number equal or less than that number. And with all these changes, I can now access a historical version of a record inside a given SSTable. That’s great! Tomorrow, I will try to see how I can leverage this to be able to retrieve a historical version on the full engine.