RC: W7 D1 — Reading a key from SSTables
March 25, 2024Now that memtables are flushed to disk, we need to look up records on SSTables as well.
Updating the get
method of the storage engine is straightforward: we still need to find the most recently written
record that has the searched key. Thus, we look:
- First in the current memtable
- Second in all immutables memtables (still from most recently to least recently written)
- Last in SSTables.
For now, I will consider that I have a cascade of SSTables and will search them from most recent to oldest (though this is likely to change soon).
To find a record within one SSTable, I don’t want to read the whole SSTable (because it could be up to 256MB). Instead, here is how we can proceed:
- Find the data block that is likely to contain the searched key
- Read the corresponding block
- Find the record in the block (or figure that there is no such record there)
For the first step, I needed to refactor some of my code. Indeed, I don’t want to have to read the content of all data blocks: I should be able to do it by reading only the metadata. But so far, in the metadata, I have kept only the offsets at which each data block starts. So, in each meta block (that contains the relevant info for the associated data block), I added the content boundaries (i.e. the first and last key of each data block).
My meta blocks are now of this form:
+---------+--------------+---------+--------------+---------+
| FK_size | FK | LK_size | LK | offset |
+---------+--------------+---------+--------------+---------+
| 16 bits | FK_size bits | 16 bits | LK_size bits | 32 bits |
+---------+--------------+---------+--------------+---------+
(FK = First Key, LK = Last Key, offset = start position of the data block)
Now that I have this format, to find the data block that is likely to contain the searched key, I can iterate over all meta blocks until I find one whose first and last keys are respectively sorted before and after the searched key.
Next, to read the data block in memory, I can just read the chunk that is between the offset of the selected block and
that of the next and decode it in a DataBlock
object.
Last, to find the record in the block, I need to iterate over all records in the data block until I find it (or get to a record which is supposed to be after the current key – in that case, the record is not in the SSTable searched).
On a side note, I have been wondering whether it was possible to do something more optimized: here, I iterate over all
blocks to find the correct one and then over all records, which means that we are looking in o(n)
time within each
SSTable.
I thought about implementing a binary search to make it o(log(n))
instead. However, meta blocks are of variable size (
because of the first and last keys) and we don’t know the size before reading the metablock.
So, a priori, there is no way to position the cursor to a meta block in the middle (unless by keeping offsets of all
meta blocks of all SSTables in memory, but that would not make much sense if we need to store everything written to disk
in memory).
As for records within data blocks, I guess I could potentially perform a binary search by positioning two cursors at the
first and last offsets, then move the cursor to these two offsets to read both records, and after figuring out which of
the two should move in the middle, move that cursor to the midpoint offset, and so on and so forth, alternating between
reading offsets and reading records.
I haven’t implemented it but might do it at some point.