Database connect loop in Go

May 13, 2019

Today I wanted to talk about a useful pattern I started to use in my Go programs. Suppose you have some service that needs to connect to the database. This is how it probably looks like:

	db, err := sqlx.Connect("postgres", DSN)
	if err != nil {
		return nil, errors.Wrap(err, "failed to connect to db")
	}

Nice and familiar but why fail immediately? We can certainly do better!

We can just wait a little bit for a database in a loop because databases may come up later than our service. Connections are usually done during initialization so we almost certainly can wait for them.

Here is how I do it:

package db

import (
	"fmt"
	"log"
	"time"

	"github.com/jmoiron/sqlx"
	"github.com/pkg/errors"
)

// ConnectLoop tries to connect to the DB under given DSN using a give driver
// in a loop until connection succeeds. timeout specifies the timeout for the
// loop.
func ConnectLoop(driver, DSN string, timeout time.Duration) (*sqlx.DB, error) {
	ticker := time.NewTicker(1 * time.Second)
	defer ticker.Stop()

	timeoutExceeded := time.After(timeout)
	for {
		select {
		case <-timeoutExceeded:
			return nil, fmt.Errorf("db connection failed after %s timeout", timeout)

		case <-ticker.C:
			db, err := sqlx.Connect("postgres", DSN)
			if err == nil {
				return db, nil
			}
			log.Println(errors.Wrapf(err, "failed to connect to db %s", DSN))
		}
	}
}

Our previous code is now wrapped with a ticker loop. Ticker is basically a channel that delivers a tick on a given interval. It’s a better pattern than using for and sleep.

On each tick, we try to connect to the database. Note, that I’m using sqlx here because it provides convenient Connect method that opens a connection and pings a database.

There is a timeout to avoid infinite connect loop. Timeout is delivered via channel and that’s why there is a select here – to read from 2 channels.

Quick gotcha – initially I was doing the first case like this mimicking the example in time.After docs:

    // XXX: THIS DOESN'T WORK
	for {
		select {
		case <-time.After(timeout)
			return nil, fmt.Errorf("db connection failed after %s timeout", timeout)

		case <-ticker.C:
			...
		}
	}

but my timeout was never exceeded. That’s because we have a loop and so time.After creates a channel on each iteration so it was effectively resetting timeout.

So this simple trick will make your code more robust without sacrificing readability – this is what my diff for the new function looks like:

 // New creates new Article service backed by Postgres
 func NewService(DSN string) (*Service, error) {
-     db, err := sqlx.Connect("postgres", DSN)
+     db, err := db.ConnectLoop("postgres", DSN, 5*time.Minute)
      if err != nil {
              return nil, errors.Wrap(err, "failed to connect to articles db")
      }

There is no magic here, just a simple code. Hope you find this useful. Till the next time!