Testing Data-Intensive Code With Go, Part 3

Overview

This is part three out of five in a tutorial series on testing data-intensive code with Go. In part two, I covered testing against a real in-memory data layer based on the popular SQLite. In this tutorial, I'll go over testing against a local complex data layer that includes a relational DB and a Redis cache.

Testing Against a Local Data Layer

Testing against an in-memory data layer is awesome. The tests are lightning fast, and you have full control. But sometimes you need to be closer to the actual configuration of your production data layer. Here are some possible reasons:

You use specific details of your relational DB that you want to test.
Your data layer consists of several interacting data stores.
The code under test consists of several processes accessing the same data layer.
You want to prepare or observe your test data using standard tools.
You don't want to implement a dedicated in-memory data layer if your data layer is in flux.
You just want to know that you're testing against your actual data layer.
You need to test with a lot of data that doesn't fit in memory.

I'm sure there are other reasons, but you can see why just using an in-memory data layer for testing may not be enough in many cases.

OK. So we want to test an actual data layer. But we still want to be as lightweight and agile as possible. That means a local data layer. Here are the benefits:

No need to provision and configure anything in the data center or the cloud.
No need to worry about our tests corrupting the production data by accident.
No need to coordinate with fellow developers in a shared test environment.
No slowness over the network calls.
Full control over the content of the data layer, with the ability to start from scratch any time.

In this tutorial we'll up the ante. We'll implement (very partially) a hybrid data layer that consists of a MariaDB relational DB and a Redis server. Then we will use Docker to stand up a local data layer we can use in our tests.

Using Docker to Avoid Installation Headaches

First, you need Docker, of course. Check out the documentation if you're not familiar with Docker. The next step is to get images for our data stores: MariaDB and Redis. Without getting into too much detail, MariaDB is a great relational DB compatible with MySQL, and Redis is a great in-memory key-value store (and much more).

> docker pull mariadb
...

> docker pull redis
...

> docker images
REPOSITORY      TAG      IMAGE ID      CREATED      SIZE
mariadb         latest   51d6a5e69fa7  2 weeks ago  402MB
redis           latest   b6dddb991dfa  2 weeks ago  107MB

Now that we have Docker installed and we have the images for MariaDB and Redis, we can write a docker-compose.yml file, which we'll use to launch our data stores. Let's call our DB "songify".

mariadb-songify:
  image: mariadb:latest
  command: >
      --general-log 
      --general-log-file=/var/log/mysql/query.log
  expose:
    - "3306"
  ports:
    - "3306:3306"
  environment:
    MYSQL_DATABASE: "songify"
    MYSQL_ALLOW_EMPTY_PASSWORD: "true"
  volumes_from:
    - mariadb-data
mariadb-data:
  image: mariadb:latest
  volumes:
    - /var/lib/mysql
  entrypoint: /bin/bash

redis:
  image: redis
  expose:
    - "6379"
  ports:
    - "6379:6379"

You can launch your data stores with the docker-compose up command (similar to vagrant up). The output should look like this:

> docker-compose up
Starting hybridtest_redis_1 ...
Starting hybridtest_mariadb-data_1 ...
Starting hybridtest_redis_1
Starting hybridtest_mariadb-data_1 ... done
Starting hybridtest_mariadb-songify_1 ...
Starting hybridtest_mariadb-songify_1 ... done
Attaching to hybridtest_mariadb-data_1, 
             hybridtest_redis_1, 
             hybridtest_mariadb-songify_1
.
.
.
redis_1  | * DB loaded from disk: 0.002 seconds
redis_1  | * Ready to accept connections
.
.
.
mariadb-songify_1  | [Note] mysqld: ready for connections.
.
.
.

At this point, you have a full-fledged MariaDB server listening on port 3306 and a Redis server listening on port 6379 (both are the standard ports).

The Hybrid Data Layer

Let's take advantage of these powerful data stores and upgrade our data layer to a hybrid data layer that caches songs per user in Redis. When GetSongsByUser() is called, the data layer will first check if Redis already stores the songs for the user. If it does then just return the songs from Redis, but if it doesn't (cache miss) then it will fetch the songs from MariaDB and populate the Redis cache, so it's ready for the next time.

Here is the struct and constructor definition. The struct keeps a DB handle like before and also a redis client. The constructor connects to the relational DB as well as to Redis. It creates the schema and flushes redis only if the corresponding parameters are true, which is needed only for testing. In production, you create the schema once (ignoring schema migrations).

type HybridDataLayer struct {
    db *sql.DB
	redis *redis.Client
}

func NewHybridDataLayer(dbHost string, 
                        dbPort int, 
                        redisHost string, 
                        createSchema bool, 
                        clearRedis bool) (*HybridDataLayer, 
                                          error) {
	dsn := fmt.Sprintf("root@tcp(%s:%d)/", dbHost, dbPort)
	if createSchema {
		err := createMariaDBSchema(dsn)
		if err != nil {
			return nil, err
		}
	}

	db, err := sql.Open("mysql", 
                         dsn+"desongcious?parseTime=true")
	if err != nil {
		return nil, err
	}

	redisClient := redis.NewClient(&redis.Options{
		Addr:     redisHost + ":6379",
		Password: "",
		DB:       0,
	})

	_, err = redisClient.Ping().Result()
	if err != nil {
		return nil, err
	}

	if clearRedis {
		redisClient.FlushDB()
	}

	return &HybridDataLayer{db, redisClient}, nil
}

Using MariaDB

MariaDB and SQLite are a little different as far as DDL goes. The differences are small, but important. Go doesn't have a mature cross-DB toolkit like Python's fantastic SQLAlchemy, so you have to manage it yourself (no, Gorm doesn't count). The main differences are:

The SQL driver is "github.com/go-sql-driver/mysql".
The database doesn't live in memory, so it is recreated every time (drop and create).
The schema must be a slice of independent DDL statements instead of one string of all statements.
The auto incrementing primary keys are marked by AUTO_INCREMENT.
VARCHAR instead of TEXT.

Here is the code:

func createMariaDBSchema(dsn string) error {
    db, err := sql.Open("mysql", dsn)
	if err != nil {
		return err
	}

	// Recreate DB
	commands := []string{
		"DROP DATABASE songify;",
		"CREATE DATABASE songify;",
	}
	for _, s := range (commands) {
		_, err = db.Exec(s)
		if err != nil {
			return err
		}
	}

	// Create schema
	db, err = sql.Open("mysql", dsn+"songify?parseTime=true")
	if err != nil {
		return err
	}

	schema := []string{
		`CREATE TABLE IF NOT EXISTS song (
		  id          INTEGER PRIMARY KEY AUTO_INCREMENT,
		  url         VARCHAR(2088) UNIQUE,
		  title       VARCHAR(100),
		  description VARCHAR(500)
		);`,
		`CREATE TABLE IF NOT EXISTS user (
		  id            INTEGER PRIMARY KEY AUTO_INCREMENT,
		  name          VARCHAR(100),
		  email         VARCHAR(100) UNIQUE,
		  registered_at TIMESTAMP,
		  last_login    TIMESTAMP
		);`,
		"CREATE INDEX user_email_idx  ON user (email);",
		`CREATE TABLE IF NOT EXISTS label (
		  id   INTEGER PRIMARY KEY AUTO_INCREMENT,
		  name VARCHAR(100) UNIQUE
		);`,
		"CREATE INDEX label_name_idx ON label (name);",
		`CREATE TABLE IF NOT EXISTS label_song (
		  label_id  INTEGER NOT NULL REFERENCES label (id),
		  song_id INTEGER NOT NULL REFERENCES song (id),
		  PRIMARY KEY (label_id, song_id)
		);`,
		`CREATE TABLE IF NOT EXISTS user_song (
		  user_id INTEGER NOT NULL REFERENCES user (id),
		  song_id INTEGER NOT NULL REFERENCES song (id),
		  PRIMARY KEY (user_id, song_id)
		);`,
	}

	for _, s := range (schema) {
		_, err = db.Exec(s)
		if err != nil {
			return err
		}
	}
	return nil
}

Using Redis

Redis is very easy to use from Go. The "github.com/go-redis/redis" client library is very intuitive and faithfully follows the Redis commands. For example, to test if a key exists, you just use the Exits() method of the redis client, which accepts one or more keys and returns how many of them exist.

In this case, I check for one key only:

1	count, err := m.redis.Exists(email).Result()
2	if err != nil {
3	return err
4	}

Testing Access to Multiple Data Stores

The tests are actually identical. The interface didn't change, and the behavior didn't change. The only change is that the implementation now keeps a cache in Redis. The GetSongsByEmail() method now just calls refreshUser_Redis().

func (m *HybridDataLayer) GetSongsByUser(u User) (songs []Song, 
                                                  err error) {
    err = m.refreshUser_Redis(u.Email, &songs)
	return
}

The refreshUser_Redis() method returns the user songs from Redis if they exist and otherwise fetches them from MariaDB.

type Songs *[]Song

func (m *HybridDataLayer) refreshUser_Redis(email string, 
                                            out Songs) error {
    count, err := m.redis.Exists(email).Result()
	if err != nil {
		return err
	}

	if count == 0 {
		err = m.getSongsByUser_DB(email, out)
		if err != nil {
			return err
		}

		for _, song := range *out {
			s, err := serializeSong(song)
			if err != nil {
				return err
			}

			_, err = m.redis.SAdd(email, s).Result()
			if err != nil {
				return err
			}
		}
		return
	}

	members, err := m.redis.SMembers(email).Result()
	for _, member := range members {
		song, err := deserializeSong([]byte(member))
		if err != nil {
			return err
		}
		*out = append(*out, song)
	}

	return out, nil
}

There is a slight problem here from a testing methodology point of view. When we test through the abstract data layer interface, we have no visibility into the data layer implementation.

For example, it's possible that there is a big flaw where the data layer completely skips the cache and always fetches the data from the DB. The tests will pass, but we don't get to benefit from the cache. I'll talk in part five about testing your cache, which is very important.

Conclusion

In this tutorial, we covered testing against a local complex data layer that consists of multiple data stores (a relational DB and a Redis cache). We also utilized Docker to easily deploy multiple data stores for testing.

In part four, we will focus on testing against remote data stores, using snapshots of production data for our tests, and also generating our own test data. Stay tuned!