RC: W8 D1 — Two ways to create iterators
April 1, 2024Many components in my LSM engine (memtable, SSTable, data block) need to be both serialized and iterated over.
So far, both responsibilities were held in only one class for each of those components and things were starting to get
a bit messy there.
So I decided to decouple those two behaviors in distinct classes: I will keep all things related to serialization in
the MemTable
, SSTable
and DataBlock
components.
As for seeking into each component, I will create a dedicated [Component]Iterator
class for each component whose
responsibility will be to iterate over all records in that component in a sorted manner.
On a side note, RocksDB have several kinds of dedicated iterators for this exact purpose, as mentioned
on their wiki.
When implementing an iterator, we need to conform to the iterator protocol. This means we need to implement two methods:
- the
__iter__
method that returns the iterator itself (i.e. it will iterate over all items) - the
__next__
method that either returns the next item or raises aStopIteration
exception when there are no more items. This is the method called at each loop increment.
So I started with a base class that contains those two methods:
class BaseIterator:
def __init__(self):
pass
def __iter__(self):
raise NotImplementedError()
def __next__(self):
raise NotImplementedError()
Raising a NotImplementedError
in this base class is intentional: this is to indicate that all classes inheriting from
this one will need to implement these two methods.
I have two kinds of components: those in-memory (the memtable) and those on-disk (the data block and SSTable).
For those on-disk, it is easy to access each item based on its index.
Indeed, since records are written to disk in sorted order, we only need to find the start and end offset for the i
-th
item and read between those two.
Once it has been read, we can increment the index and move on to the next item.
Here is how I implemented it for the data block:
class DataBlockIterator(BaseIterator):
def __init__(self, block: DataBlock):
super().__init__()
self._index = 0
self.block = block
def __iter__(self) -> "DataBlockIterator":
return self
def __next__(self) -> Record:
# Stop iterating when there are no more items
if self._index >= len(self.block.offsets):
raise StopIteration()
# Retrieve the offset
offsets = self.block.offsets + [len(self.block.data)]
start = offsets[self._index]
end = offsets[self._index + 1]
# Increment index for next iteration
self._index += 1
# Retrieve and decode the record
encoded_record = self.block.data[start:end]
return Record.from_bytes(data=encoded_record)
In other words, the iterator records the index of the item that is to be popped as well as the component it is iterating
over, and that’s it.
Implementing the SSTableIterator
was fairly similar to the DataBlockIterator
, except that we iterate over blocks.
However, when it comes to the memtable, we cannot access one item based on its index because the data is kept in a red-black tree, in which there is no explicit mapping of each item to its index. However, since I already have an iterator on the red-black trees that performs an in-order traversal, it is easy to use this to retrieve the next item. Here is how I implemented it:
class MemTableIterator(BaseIterator):
def __init__(self, memtable: MemTable):
super().__init__()
self.generator = iter(memtable.red_black_tree)
self.current = None
def __iter__(self) -> "MemTableIterator":
return self
def __next__(self) -> Record:
# Stop iterating when there are no more items
self.current = next(self.generator, None)
if self.current is None:
raise StopIteration
# Decode the record
encoded_record = self.current
return Record.from_bytes(data=encoded_record)
Here, the MemTableIterator
keeps track of the generator and current item (instead of the object and current index).
Overall, the rule of thumb that I used to know which kind of iterator I need is:
- if each item can be accessed based on its index, then the iterator should record the current index and increment it every time the next item is accessed;
- otherwise, the iterator should record the current item as well as an internal generator to find the next one.
With this refactoring done, implementing scans to perform range queries should be easier. I will tackle this tomorrow!