RC: W4 D2 — Introducing locks to control concurrency
March 5, 2024In order to make storage engines performant, they are usually designed to be able to serve concurrent requests. However, it is also essential that no data is corrupted or partially written, which may happen for example if two write requests conflict and are handled concurrently.
To control concurrency, I will need to use locks. As far as my research and design goes, it seems that I will need two distinct kinds of locking mechanisms:
- Mutexes
- Read-write locks
A mutex is a mutual exclusive lock. It means that only one thread can have access to the resource being protected at a time, to the exclusion of all others.
A read-write lock allows for some level of concurrency: multiple threads may read the protected resource at the same time, but if one needs to write to it, it will need to be granted access to the exclusion of all others. In other words:
- There can be at most one thread writing to the protected resource — if that is the case, then no thread can read. This
means that, to acquire the
write_lock
, we must wait for all readers to be finished, and to acquire theread_lock
, we must wait for the writer (if any) to be finished. - There can be multiple readers (with no writer)
In Python, we can implement these two locking mechanisms with the threading
package.
The threading.Lock
object is already a mutex: only one thread may hold this lock at a given time.
Thus, there is virtually nothing to do: I will just define a Mutex
class that is a wrapper around this object (for
readability purposes).
As for the read-write lock, we can implement it by building on top of the threading.Lock
object as follows:
import threading
from contextlib import contextmanager
class Mutex(threading.Lock):
pass
class ReadWriteLock:
def __init__(self):
self.lock = threading.Lock()
self.readers = 0
self.writers = 0
self.write_requests = 0
self.condition = threading.Condition(self.lock)
@contextmanager
def read(self):
with self.condition:
while self.writers > 0 or self.write_requests > 0:
self.condition.wait()
self.readers += 1
yield
with self.condition:
self.readers -= 1
if self.readers == 0:
self.condition.notify_all()
@contextmanager
def write(self):
with self.condition:
self.write_requests += 1
while self.readers > 0 or self.writers > 0:
self.condition.wait()
self.write_requests -= 1
self.writers += 1
yield
with self.condition:
self.writers -= 1
self.condition.notify_all()
With this implementation, when a thread wants to write (self.write()
), it must wait until there is no reader or writer
(while self.readers > 0 or self.writers > 0: self.condition.wait()
).
Conversely, when a thread wants to read (self.read()
), it must wait until there is no current writer or writer waiting
(while self.writers > 0 or self.write_requests > 0: self.condition.wait()
).
Once the read or write operation is finished and conditions are met (no remaining readers in the case of a read
operation), all waiting threads are notified (self.condition.notify_all()
). Thus awakened, they can try to acquire the
lock again and proceed.
One last thing: the use of the @contextmanager
from contextlib
allows to call the .read()
and .write()
methods in a with
statement. Indeed, it creates a generator, and the code that is before the yield
statement is
executed when entering the context block, while the code that is after the yield
statement is executing upon exiting
the context block (whether it was exited normally or with an exception).
That’s it for today. Tomorrow, I will use those to handle the “freeze” operation on my memtables.