RC: W8 D1 — Two ways to create iterators

April 1, 2024

Many components in my LSM engine (memtable, SSTable, data block) need to be both serialized and iterated over. So far, both responsibilities were held in only one class for each of those components and things were starting to get a bit messy there. So I decided to decouple those two behaviors in distinct classes: I will keep all things related to serialization in the MemTable, SSTable and DataBlock components. As for seeking into each component, I will create a dedicated [Component]Iterator class for each component whose responsibility will be to iterate over all records in that component in a sorted manner. On a side note, RocksDB have several kinds of dedicated iterators for this exact purpose, as mentioned on their wiki.

When implementing an iterator, we need to conform to the iterator protocol. This means we need to implement two methods:

the __iter__ method that returns the iterator itself (i.e. it will iterate over all items)
the __next__ method that either returns the next item or raises a StopIteration exception when there are no more items. This is the method called at each loop increment.

So I started with a base class that contains those two methods:

class BaseIterator:
    def __init__(self):
        pass

    def __iter__(self):
        raise NotImplementedError()

    def __next__(self):
        raise NotImplementedError()

Raising a NotImplementedError in this base class is intentional: this is to indicate that all classes inheriting from this one will need to implement these two methods.

I have two kinds of components: those in-memory (the memtable) and those on-disk (the data block and SSTable). For those on-disk, it is easy to access each item based on its index. Indeed, since records are written to disk in sorted order, we only need to find the start and end offset for the i-th item and read between those two. Once it has been read, we can increment the index and move on to the next item. Here is how I implemented it for the data block:

class DataBlockIterator(BaseIterator):
    def __init__(self, block: DataBlock):
        super().__init__()
        self._index = 0
        self.block = block

    def __iter__(self) -> "DataBlockIterator":
        return self

    def __next__(self) -> Record:
        # Stop iterating when there are no more items
        if self._index >= len(self.block.offsets):
            raise StopIteration()

        # Retrieve the offset
        offsets = self.block.offsets + [len(self.block.data)]
        start = offsets[self._index]
        end = offsets[self._index + 1]

        # Increment index for next iteration
        self._index += 1

        # Retrieve and decode the record
        encoded_record = self.block.data[start:end]
        return Record.from_bytes(data=encoded_record)

In other words, the iterator records the index of the item that is to be popped as well as the component it is iterating over, and that’s it. Implementing the SSTableIterator was fairly similar to the DataBlockIterator, except that we iterate over blocks.

However, when it comes to the memtable, we cannot access one item based on its index because the data is kept in a red-black tree, in which there is no explicit mapping of each item to its index. However, since I already have an iterator on the red-black trees that performs an in-order traversal, it is easy to use this to retrieve the next item. Here is how I implemented it:

class MemTableIterator(BaseIterator):
    def __init__(self, memtable: MemTable):
        super().__init__()
        self.generator = iter(memtable.red_black_tree)
        self.current = None

    def __iter__(self) -> "MemTableIterator":
        return self

    def __next__(self) -> Record:
        # Stop iterating when there are no more items
        self.current = next(self.generator, None)
        if self.current is None:
            raise StopIteration

        # Decode the record
        encoded_record = self.current
        return Record.from_bytes(data=encoded_record)

Here, the MemTableIterator keeps track of the generator and current item (instead of the object and current index).

Overall, the rule of thumb that I used to know which kind of iterator I need is:

if each item can be accessed based on its index, then the iterator should record the current index and increment it every time the next item is accessed;
otherwise, the iterator should record the current item as well as an internal generator to find the next one.

With this refactoring done, implementing scans to perform range queries should be easier. I will tackle this tomorrow!