RC: W5 D4 — Diving into Python's source code for comparisons

March 14, 2024

After understanding that the heapq package relied on the way comparisons are implemented, I started wondering about those. So today, I experimented a little bit with comparison dunder methods (__lt__, __gt__, __le__, __eq__, …).

To experiment, I created a simple TestCompare class that implements an __eq__ and an __lt__ dunder methods:

class TestCompare:
    def __init__(self, key, id):
        self.key = key
        self.id = id

    def __eq__(self, other) -> bool:
        print(f"__eq__: self ({self.id}) VS other ({other.id})")
        return self.key == other.key

    def __lt__(self, other):
        print(f"__lt__: self ({self.id}) VS other ({other.id})")
        return self.key < other.key

If I test for equality between two instances of this class, we see that the __eq__ method of the element on the left is called:

>>> TestCompare(key="a", id=1) == TestCompare(key="a", id=2)
__eq__: self (1) VS other (2)
True
>>> TestCompare(key="a", id=2) == TestCompare(key="a", id=1)
__eq__: self (2) VS other (1)
True

Similarly, if I test for inferiority, we see that the __lt__ method of the element on the left is called:

>>> TestCompare(key="a", id=1) < TestCompare(key="a", id=2)
__lt__: self (1) VS other (2)
False

So far so good: everything behaves as expected.

If I test for superiority though, I get an interesting result:

>>> TestCompare(key="a", id=1) > TestCompare(key="a", id=2)
__lt__: self (2) VS other (1)
False

The __gt__ method (“greater than”) is not implemented. Thus, it seems that Python uses the __lt__ method of the element on the right when the __gt__ method of the element on the left is not implemented. Interesting!

With this in mind, I thought that calling <= would probably check both inferiority (< with __lt__) and equality (= with __eq__) when the “lower or equal” dunder method (__le__) was not implemented. Let’s check this:

>>> TestCompare(key="a", id=1) <= TestCompare(key="a", id=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<=' not supported between instances of 'TestCompare' and 'TestCompare'

Wrong! It throws an error saying that <= is not implemented.

All of this got me wondering as to why Python behaves this way, and I started reading the source code.

The code chunk that implements comparisons is in the object.c file within the do_richcompare function. The part that answers my question is the following:

do_richcompare(PyThreadState *tstate, PyObject *v, PyObject *w, int op)
{
    richcmpfunc f;
    PyObject *res;
    int checked_reverse_op = 0;

    if (!Py_IS_TYPE(v, Py_TYPE(w)) &&
        PyType_IsSubtype(Py_TYPE(w), Py_TYPE(v)) &&
        (f = Py_TYPE(w)->tp_richcompare) != NULL) {
        checked_reverse_op = 1;
        res = (*f)(w, v, _Py_SwappedOp[op]);
        if (res != Py_NotImplemented)
            return res;
        Py_DECREF(res);
    }
    if ((f = Py_TYPE(v)->tp_richcompare) != NULL) {
        res = (*f)(v, w, op);
        if (res != Py_NotImplemented)
            return res;
        Py_DECREF(res);
    }
    if (!checked_reverse_op && (f = Py_TYPE(w)->tp_richcompare) != NULL) {
        res = (*f)(w, v, _Py_SwappedOp[op]);
        if (res != Py_NotImplemented)
            return res;
        Py_DECREF(res);
    }

    /* ... */
}

We have three if statements in there. The last two seem to correspond to my observations:

It tries the direct comparison operation from the first element v (if ((f = Py_TYPE(v)->tp_richcompare) != NULL) { res = (*f)(v, w, op); /* ...*/ })
Otherwise, it tries the reverse operation from the second element w (if ((f = Py_TYPE(w)->tp_richcompare) != NULL) { res = (*f)(w, v, _Py_SwappedOp[op]); /* ...*/ }).

Interestingly, there is another if statement that is tested before those two. This checks if the second element w is a subtype of the first one v (if (!Py_IS_TYPE(v, Py_TYPE(w)) && PyType_IsSubtype(Py_TYPE(w), Py_TYPE(v)) && (f = Py_TYPE(w)->tp_richcompare) != NULL)) and performs the reverse operation on the second element if that is the case (res = (*f)(w, v, _Py_SwappedOp[op]);).

This implementation detail allows to handle polymorphism correctly. Indeed, if a subclass has reimplemented a comparison method that was already implemented in the parent class, we want to use the most specific method (that of the subclass). So this extra check allows to handle subtype polymorphism as we expect it to behave!