RC: W10 D3 — Using property-based testing on my red-black tree implementation

April 17, 2024

While I was pairing with Ishan the other day, he mentioned that property-based testing was a powerful technique to validate the implementation of data structures. This technique consists in testing that some specific properties that are expected for a given algorithm actually hold true on one’s implementation. In practice, we select one particular property that is supposed to always hold true, we generate random inputs multiple times and assert that the property is not violated in any case.

For example, a red-black tree has multiple properties, including (but not limited to) the following:

Red nodes cannot have red children;
The in-order traversal of the tree should always result in a sorted list of elements;
The height of the tree should always be at most \(2 \times log_2(n+1)\) (the proof for this can be found in this lecture from Duke university).

To use property-based testing, values to insert in the red-black tree can be generated randomly. For example, here is a simple function that generates 20 sets of 5 to 200 keys whose value range between 1 and 1 million:

import random

def generate_random_keys_lists(nb_tests=20):
    nb_nodes = random.randint(5, 200)

    return [random.sample(range(1, 10 ** 6), nb_nodes) for _ in range(nb_tests)]

Each of those 20 sets can then be used once in a single test that checks one of the aforementioned properties. For example, to test that the in-order traversal results in a sorted list of elements, here is how to proceed:

@pytest.mark.parametrize("keys", generate_random_keys_lists())
def test_all_nodes_are_in_order(keys):
    # GIVEN
    tree = RedBlackTree()

    # WHEN
    for key in keys:
        tree.insert(key=key, data=str(key).encode(encoding="utf-8"))

    # THEN
    assert tree.in_order_traversal() == sorted(keys)

The @pytest.mark.parametrize("keys", generate_random_keys_lists()) statement allows to run the test using each set of values generated by generate_random_keys_lists() as the keys variable in the test.

Overall, property-based testing is extremely easy to implement and is very powerful: since input values are generated randomly multiple times, it becomes very easy to spot any incoherent result and thus find bugs in the implementation.