Thoughts Heap

A Blog by Roman Gonzalez.-

On passing parameters by value in Golang

TL;DR: Choosing to use pointer arguments for functions for the sake of better performance is premature optimization, and you should consider the drawbacks of pointers in regards to accidental complexity.

Introduction

When working in Golang, how many times have you considered the choice of receiving a pointer instead of a value as the argument of a function? We usually don’t spend much time thinking about this given it is known that pointer arguments perform faster than values.

it is known

However, in other languages (cough, Haskell, cough), doing this, is one of the most pervasive anti-patterns. Why? The repercussions of mutating a variable are many, especially if you are using an awesome language with green threads like Golang. In this blog post, we’ll explore some of the drawbacks of using pointers.

The scope of change

Let’s start with explicitly defining what the pointer contract means in your business logic: whenever we pass a pointer as an argument to a function, we are giving this other function a rubber stamp to make changes on the behavior of their caller.

main()
|
|- fnA(*foo)
|   |- fnB(*foo)
|   `- fnC(*foo)
|     |- fnD(*foo)
|     |- fnE(*foo)
|       `- fnF(*foo)
|
`- fnG(*foo)


# Suddenly the fnG function stopped working as expected because foo is in a state that is not valid/expected, 
# where should we look for the bug?

This approach, by extension, adds a mental overhead to the algorithms you run. As soon as you pass a pointer argument to a function, you cannot guarantee that this function is not going to modify the behavior of an algorithm in unexpected ways (unless the documentation is evident on what the function does, and is also accurate about it).

For sure, this situation is unlikely to happen when you use a general purpose library function, but is not impossible – and I argue, it happens more often than not – to have an unexpected behavior after calling a function from an API that changes for the sake of solving business needs. The priority is not in the UX of the code or the repercussions of its many callers, but the business value that a particular change offers when using a specific set of inputs.

However, now you argue, we have unit tests to solve this kind of issues right? Well yeah, up to a point, the combination of values and mid-states makes it hard to test every single combination that could exacerbate an issue from mutation.

I prefer not to give a function the power to change state like that, and instead, make this issue impossible to happen in the first place by passing parameters by value.

Pointers and concurrency

With the advent of multi-core CPUs, concurrency is becoming more prevalent as a way to squeeze as much performance as we can from our computers.

Golang is one of the few languages that make concurrency easy, via the usage of goroutines; however, it makes it also very easy to shoot yourself in the foot. Values are mutable by default, and if by any chance you use the same reference to multiple goroutines unexpected things tend to happen.

If you add that to the fact that you are passing values as pointers, things become accidentally complex very quickly.

How expensive is it to receive parameters by value?

The points above may sound reasonable, but you still think to yourself, won’t somebody, please think of the performance?, that’s fair, let’s figure out how expensive is it to pass parameters by value to a function.

think of the performance

Let us build a benchmark that allows us to assess what are the performance implications of passing small and big sized values as parameters:

package vals_test

import (
    "testing"
)

type F2 struct {
    f1 int64
    f2 int64
}

type F4 struct {
    f1 int64
    f2 int64
    f3 int64
    f4 int64
}

type F8 struct {
    f1 int64
    f2 int64
    f3 int64
    f4 int64
    f5 int64
    f6 int64
    f7 int64
    f8 int64
}

type F16 struct {
    f1 int64
    f2 int64
    f3 int64
    f4 int64
    f5 int64
    f6 int64
    f7 int64
    f8 int64
    f9 int64
    f10 int64
    f11 int64
    f12 int64
    f13 int64
    f14 int64
    f15 int64
    f16 int64
}

type D2 struct {
    d1 F2
    d2 F2
    d3 F2
    d4 F2
}

type D4 struct {
    d1 F4
    d2 F4
    d3 F4
    d4 F4
}

type D8 struct {
    d1 F8
    d2 F8
    d3 F8
    d4 F8
}

type D16 struct {
    d1 F16
    d2 F16
    d3 F16
    d4 F16
}


func updateValD2(val D2) D2 {
    val.d1.f2 += int64(10)
    return val
}

func updateRefD2(val *D2) {
    val.d1.f2 += int64(10)
}


func updateValD4(val D4) D4 {
    val.d1.f3 += int64(10)
    return val
}

func updateRefD4(val *D4) {
    val.d1.f3 += int64(10)
}

func updateValD8(val D8) D8 {
    val.d1.f3 += int64(10)
    return val
}

func updateRefD8(val *D8) {
    val.d1.f3 += int64(10)
}


func updateValD16(val D16) D16 {
    val.d1.f3 += int64(10)
    return val
}

func updateRefD16(val *D16) {
    val.d1.f3 += int64(10)
}


func BenchmarkPassRefD2(b *testing.B) {
    var d D2
    for i := 0; i < b.N; i++ {
        updateRefD2(&d)
    }
}

func BenchmarkPassValD2(b *testing.B) {
    var d D2
    for i := 0; i < b.N; i++ {
        d = updateValD2(d)
    }
}

func BenchmarkPassRefD4(b *testing.B) {
    var d D4
    for i := 0; i < b.N; i++ {
        updateRefD4(&d)
    }
}

func BenchmarkPassValD4(b *testing.B) {
    var d D4
    for i := 0; i < b.N; i++ {
        d = updateValD4(d)
    }
}

func BenchmarkPassRefD8(b *testing.B) {
    var d D8
    for i := 0; i < b.N; i++ {
        updateRefD8(&d)
    }
}

func BenchmarkPassValD8(b *testing.B) {
    var d D8
    for i := 0; i < b.N; i++ {
        d = updateValD8(d)
    }
}

func BenchmarkPassRefD16(b *testing.B) {
    var d D16
    for i := 0; i < b.N; i++ {
        updateRefD16(&d)
    }
}

func BenchmarkPassValD16(b *testing.B) {
    var d D16
    for i := 0; i < b.N; i++ {s
        d = updateValD16(d)
    }
}

We have two versions of the updateVal function, one receiving a reference, and another receiving a value. We benchmark different sizes to assess how different is the slowness when using small and big values. Executing the above code gives us the following result:

$ go test -bench=.
goos: linux
goarch: amd64
BenchmarkPassRefD2-8            2000000000               1.37 ns/op
BenchmarkPassValD2-8            200000000                6.71 ns/op
BenchmarkPassRefD4-8            2000000000               1.37 ns/op
BenchmarkPassValD4-8            200000000                9.77 ns/op
BenchmarkPassRefD8-8            2000000000               1.35 ns/op
BenchmarkPassValD8-8            100000000               14.7 ns/op
BenchmarkPassRefD16-8           2000000000               1.37 ns/op
BenchmarkPassValD16-8           50000000                24.9 ns/op
PASS
ok      _/home/roman/tmp/playground/vals        19.197s

What do you know? Passing parameters is slow!

The first thing we can notice is that even with small values, passing parameters by value is around 6x slower than passing them by reference. The bigger the value, the slower it becomes, having it 24x slower when having a struct of 16 attributes (not your average use case). Now, is this slowness relevant in the bigger scheme of things? I argue that it may not be.

There is this table about latency numbers every programmer should now that may give us some clarity; for any access to RAM we do in our code, we get a 100ns tax, meaning, if we do have any IO related code around our functions, the latency of those calls is going to eclipse any performance degradation you may get from receiving values as parameters.

Conclusion

Does this mean you should not worry about parameters by value ever? Well, no. If you are executing a pure function that receives big values as parameters in a tight loop, it may make sense to pass parameters by reference to speed up the performance of your algorithm.

Our key takeaway is, before passing pointers as arguments for the sake of performance, make sure that this piece of code is the real bottleneck of your program, this way you reduce accidental complexity from your application.