RC: W5 D4 — Diving into Python's source code for comparisons
March 14, 2024After understanding that the heapq
package relied on the way comparisons are implemented, I started
wondering about those. So today, I experimented a little bit with comparison dunder methods (__lt__
, __gt__
,
__le__
, __eq__
, …).
To experiment, I created a simple TestCompare
class that implements an __eq__
and an __lt__
dunder methods:
class TestCompare:
def __init__(self, key, id):
self.key = key
self.id = id
def __eq__(self, other) -> bool:
print(f"__eq__: self ({self.id}) VS other ({other.id})")
return self.key == other.key
def __lt__(self, other):
print(f"__lt__: self ({self.id}) VS other ({other.id})")
return self.key < other.key
If I test for equality between two instances of this class, we see that the __eq__
method of the element on the left
is called:
>>> TestCompare(key="a", id=1) == TestCompare(key="a", id=2)
__eq__: self (1) VS other (2)
True
>>> TestCompare(key="a", id=2) == TestCompare(key="a", id=1)
__eq__: self (2) VS other (1)
True
Similarly, if I test for inferiority, we see that the __lt__
method of the element on the left is called:
>>> TestCompare(key="a", id=1) < TestCompare(key="a", id=2)
__lt__: self (1) VS other (2)
False
So far so good: everything behaves as expected.
If I test for superiority though, I get an interesting result:
>>> TestCompare(key="a", id=1) > TestCompare(key="a", id=2)
__lt__: self (2) VS other (1)
False
The __gt__
method (“greater than”) is not implemented.
Thus, it seems that Python uses the __lt__
method of the element on the right when the __gt__
method of the element
on the left is not implemented. Interesting!
With this in mind, I thought that calling <=
would probably check both inferiority (<
with __lt__
) and
equality (=
with __eq__
) when the “lower or equal” dunder method (__le__
) was not implemented.
Let’s check this:
>>> TestCompare(key="a", id=1) <= TestCompare(key="a", id=2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<=' not supported between instances of 'TestCompare' and 'TestCompare'
Wrong! It throws an error saying that <=
is not implemented.
All of this got me wondering as to why Python behaves this way, and I started reading the source code.
The code chunk that implements comparisons is in
the object.c
file within the do_richcompare
function. The part that answers my question is the following:
do_richcompare(PyThreadState *tstate, PyObject *v, PyObject *w, int op)
{
richcmpfunc f;
PyObject *res;
int checked_reverse_op = 0;
if (!Py_IS_TYPE(v, Py_TYPE(w)) &&
PyType_IsSubtype(Py_TYPE(w), Py_TYPE(v)) &&
(f = Py_TYPE(w)->tp_richcompare) != NULL) {
checked_reverse_op = 1;
res = (*f)(w, v, _Py_SwappedOp[op]);
if (res != Py_NotImplemented)
return res;
Py_DECREF(res);
}
if ((f = Py_TYPE(v)->tp_richcompare) != NULL) {
res = (*f)(v, w, op);
if (res != Py_NotImplemented)
return res;
Py_DECREF(res);
}
if (!checked_reverse_op && (f = Py_TYPE(w)->tp_richcompare) != NULL) {
res = (*f)(w, v, _Py_SwappedOp[op]);
if (res != Py_NotImplemented)
return res;
Py_DECREF(res);
}
/* ... */
}
We have three if
statements in there.
The last two seem to correspond to my observations:
- It tries the direct comparison operation from the first
element
v
(if ((f = Py_TYPE(v)->tp_richcompare) != NULL) { res = (*f)(v, w, op); /* ...*/ }
) - Otherwise, it tries the reverse operation from the second
element
w
(if ((f = Py_TYPE(w)->tp_richcompare) != NULL) { res = (*f)(w, v, _Py_SwappedOp[op]); /* ...*/ }
).
Interestingly, there is another if
statement that is tested before those two.
This checks if the second element w
is a subtype of the first one v
(if (!Py_IS_TYPE(v, Py_TYPE(w)) && PyType_IsSubtype(Py_TYPE(w), Py_TYPE(v)) && (f = Py_TYPE(w)->tp_richcompare) != NULL)
)
and performs the reverse operation on the second element if that is the case (res = (*f)(w, v, _Py_SwappedOp[op]);
).
This implementation detail allows to handle polymorphism correctly. Indeed, if a subclass has reimplemented a comparison method that was already implemented in the parent class, we want to use the most specific method (that of the subclass). So this extra check allows to handle subtype polymorphism as we expect it to behave!