How do Python objects do stuff?

Reading Time: 6 minutes

In October 2020 I took Dave Beazley’s Advanced Programming with Python course. Since I wrote a series about his SICP course and the Raft course, I figured I’d write about this one too :).

This is not an introductory tutorial for writing a Python class (w3schools has one of those). This is an assessment of how Python treats a request for an object to do something. We look at the inner workings of Python class and instance methods through the lens of the special methods that Python implements on its own built-in types.

Objects do two things: organize functions and manage state.

The function expresses, at the granular level, the stuff that we use programs to do.

Some functions return outputs that depend only on the inputs. These are called pure functions or stateless functions. You can use objects as namespaces for them, but you can also do that with file names. Pure functions don’t need the power of objects.

Some functions, by contrast, live on a class or its instances. These are called class methods and instance methods. They draw context from the class itself. This is where objects shine.

In Python, everything is an object. That includes built-in types like integers, lists, and arrays. They each implement a collection of special methods, magic methods, or dunder methods (named for the double-underscore in their signatures). These methods usually draw context from the instance or class itself, and we can implement them on our own objects. Take repr, for example. This method allows you to print a legible object representation, like so:

class Fraction()
    def __init__(self, numerator, denominator):
        self.numerator = numerator
        self.denominator = denominator

    def __repr__(self):
        return f"Fraction(numerator={self.numerator}, denominator={self.denominator}"

$ python 
>> from fraction import Fraction
>> repr(Fraction(2, 3))
>> Fraction(numerator=2, denominator=3)

You might be familiar with a similar special method to repr, called str. Implementing __str__ affects the output when an object is explicitly stringified. We might do this for our fraction object:

def __str__(self):
    return f"({self.numerator}, {self.denominator})"

$ python
>> from fraction import Fraction
>> frac = Fraction(1, 3)
>> print(frac)
>>(1, 3)

By contrast, __repr__ is supposed to be the object’s “formal” representation. By convention, a __repr__ string should recreate the object it represents if passed to the Python eval method, which parses a string into a line of Python code, then runs it. (Please be extremely certain you know what lines you’re passing in, and what they do, if you use the eval method).

Here’s another common special method: init, which you write like def __init__(self) but you call like Object(). More pieces of Python syntax you’ve seen and their corresponding special methods:

x[0]               calls x.__getitem__(index=0)
x[0] = 1           calls x.__setitem__(key=0, value=1)
x < y              calls x.__lt__(other=y)   #lt stands for "less than"
x > y              calls x.__gt__(other=y)   #gt stands for "greater than"
x == y             calls x.__eq__(other=y)
with Object as x:  calls x.__enter__() at the beginning and x.__exit__() at the end

When we define a class method in Python, it has a first argument of cls. When we define an instance method, it has a first argument of self. We don’t usually include this parameter when we call the method on an object, though.

Instead, Python does it under the hood. When Python sees x.__gt__(other=y), it treats the object on which the method was called as its first argument. That is, it looks for a token in scope called __gt__, and then it invokes that token on its list of arguments with the calling object prepended to the front: __gt__(x, other=y). This is the case for all class and instance methods: not just special methods.

Let’s write an infix operator for ourselves. Suppose we wanted to add our fractions:

def __add__(self, other):
    return Fraction(
        numerator=(self.numerator * other.denominator) + 
                  (other.numerator * self.denominator),
        denominator=(self.denominator * other.denominator))

$ python
>> from fraction import Fraction
>> one_third = Fraction(numerator=1, denominator=3)
>> four_thirds = Fraction(numerator=4, denominator=3)
>> one_third + four_thirds
>> (15, 9)

(It's worth noting, we could also add a method to always factor fractions to their least common denominator.)

The __add__ method allows us to use the infix operator, ‘+’, on our two fractions.

Here’s a surprisingly hard problem. What if we want to add mixed numbers, like Fraction(4, 3) + 1? If we try to do that with our existing implementation, we get an Exception:

unsupported operand type(s) for +: 'Fraction and 'int' 

Our method doesn’t support adding with integers! To get this to work, Fractions have to know about integers:

    def to_frac(self, other):
        if type(other) == type(1):
            return Fraction(numerator=other, denominator=1)
        else:
            return other

    def __add__(self, other):
        other = self.to_frac(other)

        return Fraction(
            numerator=(self.numerator * other.denominator) +
                      (other.numerator * self.denominator),
            denominator=self.denominator * other.denominator
        )

Now it works. But you know what doesn’t work? Going the other direction: 1 + Fraction(4, 3).

unsupported operand type(s) for +: 'int' and 'Fraction'

Fractions know about integers, but integers don’t know about fractions! And that method call doesn’t call our fraction addition method: it calls __add__ from the int object!

Well….actually…there is another method it tries.

The Python interpreter, when faced with infix operators spanning mixed objects like this, will look for an __add__ method on each object and an __radd__ (for reverse add) on each object as well. If __add__ doesn’t work, it will try the operation in the other direction. So, underneath our definition of __add__, we can add the line:

__radd__ = __add__

And now it works in both directions!

Should we implement special methods on our objects?

If we’re writing objects meant to represent data, it might make sense to take advantage of the special methods. The pandas library for managing dataframes does this a lot.

It’s worth noting: we should genuinely implement the methods ourselves, rather than subclass from a built-in Python type like list or dictionary, unless we want our clients to be able to use all of the list or dictionary methods. We might want a data representation that allows clients to get items, but not modify them. We’d have to subclass the list object and override __setitem__ to not work. Overriding methods with no-op versions is a pretty strong sign that subclassing was not the right choice for an object. It also puts you in danger of accidentally changing your own object’s API in the future just by upgrading Python.

But now, we’ve traversed the territory of how objects do stuff alone and crossed into questions about how objects work together. Let’s talk about that in the next post of this series.

If you liked this piece, you might also like:

How to Jump-Start a New Programming Language, or maybe, even, gain a more concrete mental model of the one you already use!

Lessons from Space: Edge-Free Programming, which also explores an Android app’s design and the engineering choices behind it (plus, cool pictures of rockets!)

How does git detect renames?—This piece is about how git detects renames, but it’s also about how to approach questions and novel problems in programming in general

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.