Go routines and thread safe data structures

A simple definition of Go routines is found the “Tour of Go” section about concurrency:

a Go routine is a lightweight thread managed by the Go runtime

Now if you have not come across threads before, threads is a way for a program to run parts of itself in some sort of concurrent way. In this post we are going to examine the differences between running a program sequentially, using Go routines and Wait Groups and we are going to talk about thread safe access to a data structure. So… Let’s get started 💪!

We are going to use a Supermarket as an example, that can have 1 or more counters and a queue of people that are waiting to pay for the items that they have in their baskets. The code is available on Github under the go_routines package.

Boilerplate code

Out application is using three domain objects (structs) to describe the entities that we are working with.

Person:
A Person has 2 fields, an integer Id and an integer Items that represents the number of items in their basket.

Counter:
A counter has an Id and 2 internal fields. The registry represents the sum of all the money taken by this counter and a list of people which is all the people that were processed by this counter. The Process function calculates the price of a Person’s basket and also adds the Person in the internal list.

Queue:
Finally a Queue is just a list of People that have shopped at our supermarket and now wish to pay for the items that they have bought. The queue supports simple operations like getting the total number of people still in the queue, adding a person at the end of the queue and removing a person from the beginning of the queue.

Sequential run

Now, using these data structures we can create a Queue and a Counter and we can then simulate our Supermarket by looping through all the people in the queue and processing them one by one by our counter. Visually this looks like this:

In the simplest form our program can look like the following function, which takes a Queue and a Counter and, for as long as we have people in the queue, we extract the next person, process it and return.

Go routines

In Go we can have our program running some parts concurrently and the way we do that is by using Go routines, and doing the following:

  • define which part of the code to run concurrently. This is very easy as we only need to define a function that contains the code we wish to run
  • tell Go to run this function in a Go routine by simply adding the keyword go before the invocation of the function

So, how did our code change from the sequential run? We took the logic of our processing and placed it in a new function which we called processQueue. Then we call processQueue using the go keyword on line 22 and that’s it 🎉 . We just created our first Go routine.

There is just one small problem … Go has delegated some responsibility for executing a piece of code to another Go routine which means that the main Go routine (where our application is running from) has no way of knowing what’s happening in that other routine. Has the processing started ? Has it finished ? Are we done yet ? We simply don’t know 🤷‍♀️ and if we run the code now the application is going to return immediately without printing any output at all …

Wait Groups

The way to fix this is by using Go’s Wait Groups which allow Go routines to signal to the main Go routine that they have indeed finished with their processing thus allowing our program to successfully finish.

We have now added a few more lines of code in our example:

  • line 22: Create a new WaitGroup (provided by the go sync package)
  • line 24: Before we create our Go routine we inform the WaitGroup that there is one Go routine being started
  • line 25: We tell our main program that we need to wait until the WaitGroup has informed us that all Go routines it knows about have finished
  • line 13: our processing function is now responsible of notifying the WaitGroup that it has finished with its processing which we do by calling the wg.Done() function as the last thing our function is going to do

There 🎉🎉🎉! Now if we execute our code we will see the output being generated and the code was executed in a separate Go routine !

Multiple Go routines

So far so good as they say, but although we have managed to execute some code concurrently we have not managed to serve our customers any faster as we are still processing one person at a time. What we would like to ideally have is a number of counters all pulling people from the queue and processing them like shown below.

What we could do is actually create more than 1 counters and also more than 1 Go routines. Each Go routine could have a counter reading people from the queue and processing them concurrently with all the other counters.

So looking at the changes we introduced now instead of passing a single *domain.Counter object in our RunSupermarketNGoRoutines function, we pass a list of counters. We then loop through them at line 29 and we spawn a new Go routine for each counter, whilst also notifying our WaitGroup that we will need to wait for 1 more Go routine to finish processing before we can terminate the main application.

This means that we can have any number of Counters running concurrently and pulling people from the queue, processing and moving on. This is actually quite amazing and very simple to do! Looking at the output of our program now (this will be in the command line if you were to execute the code on Github 😉) it will look like this:

For each counter we print its ID and the list of IDs of the people that were served at this counter. For this sample run I have used 10 counters and 50 people. Some counters process people faster (this is due to the varying processing time I have introduced based on the number of items each person has).

But …. if we run this a couple or maybe even 100 times … we might start getting some unexpected results.

And we then might see:

This basically tells us that we have one person with ID=29 that has actually been processed by more than one counters (Counter 1 and Counter 4) which is definitely not something that we would like to do in our Supermarket application as this would mean that we would be double counting ⛔!!!

Using locks

The problem described above happens because we have multiple Go routines executing concurrently all reading from the same data structure which is our Queue. This manifests if:

2 or more Go routines have read the queue and extracted the same Person before they had the time to update the queue with the remaining slice

This is not a bug that we can easily detect, it’s not always going to happen and it does not manifest in every run (I had to run the code 100 times so that we can see the race condition actually occurring). Also it may happen for only person or multiple people and the frequency can vary and depend as well on what else if currently being executed on your computer or server. That’s why it’s important to know about it in advance.

The good news is that Go provides us with a way to actually protect our code, using locks to provide exclusive access to a data structure for a period of time (you may have already noticed the Lock *sync.Mutex that we have added in our Queue structure).

Locks allow the code in a particular thread to have exclusive access for the period that it holds the lock. We can see the lock in action in lines 18-22 where on line 18 we lock the Queue (perform any operations we see fit) and then on line 22 we release the lock thus allowing other routines / code to access it.

This way we can be sure that every person will only ever be accessed and processed once no matter how many Go routines we start.

One thought on “Go routines and thread safe data structures

Leave a Reply