Python 029 - Efficiency

I’ll point you to a fairly famous anti-clean-code video screed, comment on that, and explore the efficiency, or lack of it, in our little billiards program. I notice something surprising. And something important.

The video is from “Molly Rocket”, probably Casey Muratori. It seems to want to start in the middle, so if it does, drag it back to the beginning. I found it mostly interesting.

I think it’s fair to say that the point of the video is that “clean code” is very inefficient. Molly/Casey makes a good case, hammering nail after nail into the coffin of clean code. And much of what is referred to as clean code is the kind of code that I like and write about in these articles.

I want to begin by pointing out that I do not usually use the term “clean code”, even in lower case, because the upper-case version has become associated with an authoritarian style of thinking that I do not support. My close colleagues and I program as we do because it is the fastest way we know to get to the results we want, not because someone told us that we are bad people unless we program this way. For myself, I was born to resist authority and I try not to apply authority to anyone else. My writings are mostly “I did this: that happened. Think about it and choose what you want to do.”

Then I want to state clearly that Molly/Casey is absolutely right. Now, some of the examples picked are really cherry-picking, but the fact is that it takes longer to call a method and do a thing than it takes to do the thing, so that:

    def bounce(self, ball):
        if self.colliding(ball):
            self.reflect_off_rail(ball)

    def colliding(self, ball):
        return self.close_enough(ball) and self.ball_is_approaching(ball)

The code above is less efficient than this code:

    def bounce(self, ball):
        velocity = ball.velocity.rotate(self.angle)
        if self.distance_to_rail(ball) <= ball.radius and velocity.y > 0:
            velocity.y = -velocity.y
            ball.velocity = velocity.rotate(-self.angle)

In the second version, we have saved all those method calls, and they are not free. Depending on the language we use and the quality of one’s compiler, those method calls might involve a dictionary-style access to a method dictionary and then a dispatch to the function found there. There could be quite a few instructions in each of those calls.

Now there’s another interesting fact to consider in the code above: almost all the time is probably spent in rotate, because as we’ve seen, it computes two sine functions, two cosine functions and a bit of arithmetic. So saving those calls, in the case above, was probably not the most important thing we could do. We’ll come back to that, time and flow permitting.

Some of the Molly Rocket examples are picked so that the function called is nearly trivial, and that’s certainly where the inefficiency is most important when it comes down to saving CPU time.

But there’s another key point about method dispatch: it actually slows down the processor. Today’s processors are much faster than the machine’s main memory, so the processors jump through some serious hoops to fill a cache with nearby cells, on the assumption that the processing will continue more or less linearly. If the program jumps outside that cache, mass quantities of recovery takes place and another slow memory fetch is required to get to the new place. And then that function returns and the poor CPU has to flush it all again and re-cache from where it was before it made the call.

That really happens and from the articles I’ve read, the effect is profound. Things can be an order of magnitude slower, or worse. So Molly/Casey is absolutely right: the kind of code we write here is inefficient from the viewpoint of the computer.

Yeah, well, intercourse the computer.

I’m running an M1 MacBook Air here chez Ron, and it can generate my entire 5000 page web site in under a minute, and I’ve never seen all its cores running at the same time. I can just see that little chip glaring at me and making that “come get it” gesture that they always make before the big fight scene. Efficient code isn’t important at my house. Efficient coding is.

But if I were a game programmer like Molly/Casey is, with a need to squeeze every frame out through a few hundred bucks worth of hardware, things would be different.

Horses for courses and all that. We program to meet the needs of the situation. In game programming, things are very different from the general run of corporate IT or software product development. Game programming is an outlier on almost any diagram you’d draw of what we need in software development.

So. Molly Rocket is right, at least in the space where they live. I’m good with that, and I wish I knew the things that they know.

And for me, and I think for most programmers, the most important thing is probably delivering what our organization needs, as quickly and reliably as we can. We might hit an efficiency issue once in a while, but by and large, getting done with working maintainable code is a higher priority.

As Hill likes to put it: “I’m in the business of changing code”.

And when you’re in that business you want code that’s easy to understand and easy to change without breaking things. And for that, we’ve found that tests and small objects and small methods are more changeable than super-tight “efficient” code.

So. What should you do? You can’t trick me into answering that. You should do what you see as best. Use judgment in the light of experience, Nero used to tell Archie.

Back to the Billiards Code

You may recall that I started this little exercise twice, because the first time I got tangled up in my own code and decided that starting over was better than trying to debug it into working. The second time we started with the simple concepts “too_far” and “approaching”, coded those, then assembled them into the code you saw above. In so doing, we discovered that “too_far” was backward from what we needed, so a day or so ago we refactored to “close_enough”.

But suppose we cared about efficiency. After all there are six rails and as many as 16 balls on a pool table, that’s a lot of collision checking going on. Let’s pretend that a performance measurement has shown us that those rotate calls are killing us. What might we do?

As I write this sentence, I do not know what we might do. Let me see what might work:

Write a faster rotate, taking advantage of our angles being just 0, 90, -90,and 180.
Avoid the rotate with some way of quickly fetching x, or y, or -x, or -y, as needed by the particular rail.
Subclass Rail and provide four ways of doing it.
Write some kind of BallViewer that fetches x or y or -x or -y, and wrap the ball with it.
… I’m out of ideas. Maybe as I go I’ll get more.

With these ideas, and knowing that we had a performance problem, it would probably make sense to try some different approaches and measure them. If the code were hard to change, we’d do more analysis and pick a way and do the work. If the code were easy to change, we could actually try a few different ways.

And, curiously, I think that maybe, just maybe, we’d benefit from putting in more abstraction before optimizing, even though presumably it is abstraction that’s hurting us.

Here’s what I’m thinking. If we think about what all the rotating boils down to, when we do our check for distance, we fetch, from the ball position, its x or y or -x or -y, depending on our rotation. It goes through the rotate, but we know the math will always come out even. When we check for velocity, we fetch the corresponding item from velocity, vx or vy or -vx or -vy. And when we reflect the ball, we negate the corresponding velocity item.

So … what if if we were to find a different way to say what we’re asking for here:

    def distance_to_rail(self, ball):
        pos = ball.position.rotate(self.angle)
        y_distance = self.position.y - pos.y
        return y_distance

    def ball_is_approaching(self, ball):
        velocity = ball.velocity.rotate(self.angle)
        return velocity.y > 0

    def reflect_off_rail(self, ball):
        vel = ball.velocity.rotate(self.angle)
        vel.y = -vel.y
        ball.velocity = vel.rotate(-self.angle)

In all those cases, we’re really just trying to access the “right” value. The “virtual y”, given the rotation.

And while we are thinking efficiency, we really ought to avoid the first rotate in reflect_off_rail, because we’ve already computed that once. We should probably cache it in the object, although if we were to do that, we’ll not be able to call reflect without calling distance. That might be bad. This is particularly interesting because in the example where I inlined everything, you’ll note that I removed the extra call to rotate. I did that manually after the automated inlines, because I saw the duplication.

So here we have an example of a whole ‘nother call to rotate due to our particular choice of clear code. Darn, that’s interesting. I’m glad we had this little chat.

To better understand exactly what we need, I’d like to write some tests of rotate:

def test_rotate_equivalents():
    x = 10
    y = 20
    v = Vector2(x,y)
    v00y = v.y
    v90y = v.rotate(90).y
    v180y = v.rotate(180).y
    v270y = v.rotate(-90).y
    assert v00y == y
    assert v90y == x
    assert v180y == -y
    assert v270y == -x

We really only care about the logical y coordinate, by virtue of our rotation logic. So we could imagine some efficient way of just returning x or y or -x or -y.

We may need a getter and a setter or some equivalent. Let’s assume we’ll find something good and plug in our new idea. I’ll start with this one:

    def distance_to_rail(self, ball):
        pos = ball.position.rotate(self.angle)
        y_distance = self.position.y - pos.y
        return y_distance

Let’s fetch the “y” in one go. First extract the value:

    def distance_to_rail(self, ball):
        pos = ball.position.rotate(self.angle)
        ball_y = pos.y
        y_distance = self.position.y - ball_y
        return y_distance

Then I think we can inline that:

    def distance_to_rail(self, ball):
        ball_y = ball.position.rotate(self.angle).y
        y_distance = self.position.y - ball_y
        return y_distance

For some reason, PyCharm wouldn’t do that for me. So I did it. Now let’s give that thing a method name. I’m not sure we’ll love this but I’m going to follow the thread a ways.

    def distance_to_rail(self, ball):
        ball_y = self.ball_position_y(ball)
        y_distance = self.position.y - ball_y
        return y_distance

    def ball_position_y(self, ball):
        return ball.position.rotate(self.angle).y

OK, now can we perhaps optimize this? Let’s do it in the most straightforward way we can imagine.

    def ball_position_y(self, ball):
        p = ball.position
        match self.angle:
            case 0:
                return p.y
            case 90:
                return p.x
            case 180:
                return -p.y
            case -90:
                return -p.x
            case _:
                return p.rotate(self.angle).y

That’s passing all my tests. Let’s extract a method that takes the vector as an input.

    def ball_position_y(self, ball):
        return self.vector_logical_y(ball.position)

    def vector_logical_y(self, p):
        match self.angle:
            case 0:
                return p.y
            case 90:
                return p.x
            case 180:
                return -p.y
            case -90:
                return -p.x
            case _:
                return p.rotate(self.angle).y

Now I can use that in a few other places. Here’s an interesting opportunity:

    def distance_to_rail(self, ball):
        ball_y = self.ball_position_y(ball)
        y_distance = self.position.y - ball_y
        return y_distance

Why don’t we just inline our new function here? I think this is the only place we refer to the position.

    def distance_to_rail(self, ball):
        ball_y = self.vector_logical_y(ball.position)
        y_distance = self.position.y - ball_y
        return y_distance

I kind of liked it better the first way. We’re saving four trig functions, that should allow me to have the explaining function name. Put that back.

Now I had hoped that the vector_logical_y thing would save us more rotations, but it really can’t quite help us with the velocity. We’ll do that differently. Let’s test this one.

def test_negate_logical_y():
    vx = 10
    vy = 20
    v = Vector2(vx, vy)
    r0 = Rail.head(600)
    neg0 = r0.negate_logical_y(v)
    assert neg0.x == v.x
    assert neg0.y == -v.y

I made a mistake and built the whole function:

    def negate_logical_y(self, v):
        match self.angle:
            case 0:
                return Vector2(v.x, -v.y)
            case 90:
                return Vector2(-v.x, v.y)
            case 180:
                return Vector2(v.x, -v.y)
            case -90:
                return Vector2(-v.x, v.y)
            case _:
                new_v = v.rotate(self.angle)
                new_v.y = - new_v.y
                return new_v.rotate(-self.angle)

You’ll note that with both of these, I built in a default that does the rotate. I think that’s a good principle although in this program we know we’ll never get a different angle.

Anyway more tests and I’ll try not to look at the answer sheet. No, wait. I’ll plug this function in and try it.

    def reflect_off_rail(self, ball):
        ball.velocity = self.negate_logical_y(ball.velocity)

Works fine. But the tests:

def test_negate_logical_y_head():
    vx = 10
    vy = 20
    v = Vector2(vx, vy)
    r0 = Rail.head(600)
    neg0 = r0.negate_logical_y(v)
    assert neg0.x == v.x
    assert neg0.y == -v.y


def test_negate_logical_y_foot():
    vx = 10
    vy = 20
    v = Vector2(vx, vy)
    r0 = Rail.foot(600)
    neg0 = r0.negate_logical_y(v)
    assert neg0.x == v.x
    assert neg0.y == -v.y


def test_negate_logical_y_left():
    vx = 10
    vy = 20
    v = Vector2(vx, vy)
    r0 = Rail.left(600)
    neg0 = r0.negate_logical_y(v)
    assert neg0.x == -v.x
    assert neg0.y == v.y


def test_negate_logical_y_right():
    vx = 10
    vy = 20
    v = Vector2(vx, vy)
    r0 = Rail.right(600)
    neg0 = r0.negate_logical_y(v)
    assert neg0.x == -v.x
    assert neg0.y == v.y

I renamed the first and made four tests, because I wanted to watch the test count go down as I got the right values in.

I notice that in the case of the reflection we really only ever negate x or negate y, never anything else. It’s y on 0 and 180, x on 90 and -90. I don’t see a truly clever way to consolidate those, being new today to the match case Python thingie. We could use if and elif throughout of course, but presumably the compiler knows how to do the match case thing nicely.

I’ve used my new functions everywhere, so the only remaining calls to rotate are the ones that can never occur. We could replace them with exceptions if we were that kind of person and the program would still run just fine.

We’re green. Commit: remove rotate calls using logical_y methods.

Let’s review the code. I suspect we’ve lost some clarity.

    def bounce(self, ball):
        if self.colliding(ball):
            self.reflect_off_rail(ball)

    def colliding(self, ball):
        return self.close_enough(ball) and self.ball_is_approaching(ball)

    def close_enough(self, ball):
        return self.distance_to_rail(ball) <= ball.radius

    def distance_to_rail(self, ball):
        ball_y = self.ball_position_y(ball)
        y_distance = self.position.y - ball_y
        return y_distance

    def ball_position_y(self, ball):
        return self.vector_logical_y(ball.position)

    def ball_is_approaching(self, ball):
        vel_y = self.vector_logical_y(ball.velocity)
        return vel_y > 0

    def reflect_off_rail(self, ball):
        ball.velocity = self.negate_logical_y(ball.velocity)

    def vector_logical_y(self, p):
        match self.angle:
            case 0:
                return p.y
            case 90:
                return p.x
            case 180:
                return -p.y
            case -90:
                return -p.x
            case _:
                return p.rotate(self.angle).y

    def negate_logical_y(self, v):
        match self.angle:
            case 0:
                return Vector2(v.x, -v.y)
            case 90:
                return Vector2(-v.x, v.y)
            case 180:
                return Vector2(v.x, -v.y)
            case -90:
                return Vector2(-v.x, v.y)
            case _:
                new_v = v.rotate(self.angle)
                new_v.y = - new_v.y
                return new_v.rotate(-self.angle)

It’s not as bad as I feared. I think we might want to rename the logical names to rotated. And possibly we should pass the angle in to those functions rather than fetch it inside. These methods are really wanting to be methods on Vector2, but unless we wanted to monkey-patch it, we can’t extend Vector2.

I’ll do the renames.

    def ball_position_y(self, ball):
        return self.rotated_y(ball.position)

    def ball_is_approaching(self, ball):
        vel_y = self.rotated_y(ball.velocity)
        return vel_y > 0

    def reflect_off_rail(self, ball):
        ball.velocity = self.negate_rotated_y(ball.velocity)

    def rotated_y(self, vector):
        match self.angle:
            case 0:
                return vector.y
            case 90:
                return vector.x
            case 180:
                return -vector.y
            case -90:
                return -vector.x
            case _:
                return vector.rotate(self.angle).y

    def negate_rotated_y(self, vector):
        match self.angle:
            case 0:
                return Vector2(vector.x, -vector.y)
            case 90:
                return Vector2(-vector.x, vector.y)
            case 180:
                return Vector2(vector.x, -vector.y)
            case -90:
                return Vector2(-vector.x, vector.y)
            case _:
                new_v = vector.rotate(self.angle)
                new_v.y = - new_v.y
                return new_v.rotate(-self.angle)

I’m tempted to mark a bunch of these methods as “private” but they’re good enough for now, I think.

Commit: renaming from logical to rotated.

Let’s sum up.

Summary

I think the main summary statement is this:

We retained our code clarity at a high level and removed a round dozen sine and cosine calls. We saved a bunch more time than a few method calls. Take that, Molly Rocket!

Now, of course, if we were to inline all that stuff, we’d save even more time, albeit just a fraction of what we saved by removing all the trig.

And my point is, and I do have one, this change was made much easier by the fact that the code was nicely broken out into separate testable understandable small methods.

But I think it’s still all horses for courses. This code might or might not be fast enough for the latest XBox game. It’s fast enough for us here, it’s fast enough for most any application you might make of it, and it’s able to be changed.

If fast progress and easy change is important, maybe there are ideas here for you. And if you need, or even want, to go another way, go for it and good cess to you. It’s all about judgment in the light of experience.

Late Edit

I see that I named the variables in the test all neg0, because I didn’t edit the name when I changed the test. I’ll fix that up and spare you the text.

Later Edit

I chose a solution and went all the way with it. It may not be the simplest thing I could have done. Possibly just writing a faster rotate would have been easier, especially since I already had this test of the idea:

def test_fast_rotate():
    v = Vector2(1,2)
    v90 = v.rotate(90)
    v270 = v.rotate(-90)
    v180 = v.rotate(180)
    assert fast_rotate(v, 90) == v90
    assert fast_rotate(v, -90) == v270
    assert fast_rotate(v, 180) == v180

def fast_rotate(v, angle):
    if angle == 90:
        return Vector2(-v.y, v.x)
    elif angle == -90 or angle == 270:
        return Vector2(v.y, -v.x)
    elif angle == 180:
        return Vector2(-v.x, -v.y)
    else:
        return v.rotate(angle)

Plugging that in would have eliminated all the rotate calls and involved less code. I think it wouldn’t be quite as efficient as the new scheme, but it would be close. And I could be wrong, since I’ve not tried it: it could be better.

It’s tempting to do this little improvement again and again. Finger exercises. Etudes. Katas. And all that jazz. Maybe I will, maybe I won’t. We’ll see.

But the strongest argument in favor of this code style is that it’s easy to try different ways of solving this efficiency issue, exactly because the code is simple, clear, and highly modular. For many of us on most days, that’s the clear win.

See you next time!