RC: W4 D2 — Introducing locks to control concurrency

March 5, 2024

In order to make storage engines performant, they are usually designed to be able to serve concurrent requests. However, it is also essential that no data is corrupted or partially written, which may happen for example if two write requests conflict and are handled concurrently.

To control concurrency, I will need to use locks. As far as my research and design goes, it seems that I will need two distinct kinds of locking mechanisms:

Mutexes
Read-write locks

A mutex is a mutual exclusive lock. It means that only one thread can have access to the resource being protected at a time, to the exclusion of all others.

A read-write lock allows for some level of concurrency: multiple threads may read the protected resource at the same time, but if one needs to write to it, it will need to be granted access to the exclusion of all others. In other words:

There can be at most one thread writing to the protected resource — if that is the case, then no thread can read. This means that, to acquire the write_lock, we must wait for all readers to be finished, and to acquire the read_lock, we must wait for the writer (if any) to be finished.
There can be multiple readers (with no writer)

In Python, we can implement these two locking mechanisms with the threading package. The threading.Lock object is already a mutex: only one thread may hold this lock at a given time. Thus, there is virtually nothing to do: I will just define a Mutex class that is a wrapper around this object (for readability purposes). As for the read-write lock, we can implement it by building on top of the threading.Lock object as follows:

import threading
from contextlib import contextmanager


class Mutex(threading.Lock):
    pass


class ReadWriteLock:
    def __init__(self):
        self.lock = threading.Lock()
        self.readers = 0
        self.writers = 0
        self.write_requests = 0
        self.condition = threading.Condition(self.lock)

    @contextmanager
    def read(self):
        with self.condition:
            while self.writers > 0 or self.write_requests > 0:
                self.condition.wait()
            self.readers += 1
        yield
        with self.condition:
            self.readers -= 1
            if self.readers == 0:
                self.condition.notify_all()

    @contextmanager
    def write(self):
        with self.condition:
            self.write_requests += 1
            while self.readers > 0 or self.writers > 0:
                self.condition.wait()
            self.write_requests -= 1
            self.writers += 1
        yield
        with self.condition:
            self.writers -= 1
            self.condition.notify_all()

With this implementation, when a thread wants to write (self.write()), it must wait until there is no reader or writer (while self.readers > 0 or self.writers > 0: self.condition.wait()). Conversely, when a thread wants to read (self.read()), it must wait until there is no current writer or writer waiting (while self.writers > 0 or self.write_requests > 0: self.condition.wait()).

Once the read or write operation is finished and conditions are met (no remaining readers in the case of a read operation), all waiting threads are notified (self.condition.notify_all()). Thus awakened, they can try to acquire the lock again and proceed.

One last thing: the use of the @contextmanager from contextlib allows to call the .read() and .write() methods in a with statement. Indeed, it creates a generator, and the code that is before the yield statement is executed when entering the context block, while the code that is after the yield statement is executing upon exiting the context block (whether it was exited normally or with an exception).

That’s it for today. Tomorrow, I will use those to handle the “freeze” operation on my memtables.