Terraform Remote State with Consul Backend

Written by: Amit Saha

Terraform is a command-line tool for creating and managing your cloud infrastructure. Infrastructure is expressed in a JSON-like configuration language -- Hashicorp Configuration Language, and it supports multiple cloud infrastructure providers.

One of the key aspects of the functionality of terraform is that of the state of your infrastructure as terraform sees it. This state is stored in a backend -- multiple backends are supported. The default backend is local and is implemented as a file, usually terraform.tfstate.

The backend selection is a key decision that has to be made right at the start of adopting terraform to manage your infrastructure.

Why Non-local Terraform Backends?

Although using the local backend is simple, especially when getting started, at least two problems will show up sooner than later:

Storage of secrets

The local state file is usually managed in the same repository as the Terraform code. This can mean that there may be unencrypted sensitive data stored in your repository which exists on disk.

No locking

Since the state file now lives as a file in the repository, there is nothing that prevents multiple instances of Terraform, invoked either manually or in a continuous integration pipeline, from attempting to modify your infrastructure. It's worth noting that this can to some extent be solved by enforcing rules such as only one instance of your pipeline is running at any given point of time.

To solve both problems, Terraform supports non local backends. When using remote backends, terraform doesn't store any of its state on disk. This however means that if you want to store the state encrypted at rest and in transit, we will have to implement a backend-specific solution. For example, the docs specify a possible solution when using AWS S3 backend.

To solve the second problem, two possible options are AWS S3 and HashiCorp's consul. Documentation on using S3 (which achieves its locking via DynamoDB) is part of the official Terraform documentation.

Using a Consul Remote Backend

Using consul to solve the above problems is the focus of this article. For most of this article, we will be looking at a getting-started setup where we are running the consul server and applying infrastructural changes from our local system. Towards the end, I will briefly touch on how we may adopt this approach to a more realistic scenario and how we may go about encryption of Terraform state when using consul.

The accompanying git repository contains configuration and code used in this article and it may be a good idea to clone the repository as you work through this article. Linux and OS X are the only operating systems for which the accompanying scripts have been tested. Besides the need to download consul and terraform (as described next), we will use pipenv to run some Python scripts.

Set up consul

To use consul as a remote backend for Terraform with locking, we will make use of consul's key value store, ACL system, and sessions.

consul is distributed as a platform-specific binary zip file. The latest version at the time of this writing is 1.1.0, which can be downloaded from here. Download the platform-specific zip file and unzip it to extract the consul binary.

It may be a good idea to add the unzip location to the system PATH variable or equivalent.

Next, we will start a consul development server:

$ <repository root>
$ consul agent dev --config-file=./consul/server-config.json
...</repository>

The server-config.json has the following configuration:

{
  "acl_datacenter": "dc1",
  "acl_master_token": "Arandom$tring",
  "acl_default_policy": "deny",
  "acl_down_policy": "extend-cache"
}

The above starts our consul dev server so that it has ACL enabled with a controller token, and a default policy of deny. The consul ACL guide explains this in detail.

We will leave the consul server running in a dedicated terminal session.

Set up Terraform with consul backend

Similar to consul, terraform is distributed as a platform-specific binary zip file. At the time of this writing, the latest version is 0.11.7. Download the relevant zip file from here and unzip it somewhere on your filesystem.

I assume that the path where it is unzipped is added to the system PATH variable or equivalent so that it can be invoked from anywhere on the command line without specifying the absolute path.

Initializing the backend

Now that we have terraform on our system, let's initialize the backend. The terraform configuration is as follows:

# backend.tf
terraform {
  backend "consul" {
    path    = "terraform/state"
    lock = true
  }
}

The path above specifies the consul key we will store the state in, and we specify that we want to use locking. Let's now run Terraform and initialize the backend. Since we will need to first obtain a ACL token from consul, we will be using a bash script to tie the two steps together:

$ <repository root>
$ cd terraform/configuration/
$ ./init.bash
~/work/github.com/amitsaha/terraform-consul-lock-demo/terraform/bootstrap ~/work/github.com/amitsaha/terraform-consul-lock-demo/terraform/configuration
~/work/github.com/amitsaha/terraform-consul-lock-demo/terraform/configuration
Initializing the backend...
Backend configuration changed!
...</repository>

The above output tells us that we have succesfully initialized the consul backend with Terraform. The contents of init.bash is as follows:

#!/bin/bash
set -e
pushd ../bootstrap-utils
terraform_token=$(pipenv run python get_session_token.py)
popd
terraform init --backend-config=access_token=$terraform_token

First, we run a Python script to get an ACL token from consul which has the permission to:

  • Read and write to consul KV store at the path terraform/state.

  • Create sessions on all nodes.

We then run terraform init, providing the token via partial configuration so that we don't have to hardcode the access token.

When you run the above script on your consul process console, you will see logs as follows:

2018/06/05 22:35:59 [DEBUG] http: Request PUT /v1/acl/create?token=Arandom%24tring (838.515µs) from=127.0.0.1:50425
2018/06/05 22:35:59 [DEBUG] http: Request GET /v1/kv/terraform/state (639.584µs) from=127.0.0.1:50428

We can see that an API call was made to create a token (by our Python script) and then a GET query was made (by Terraform) to the consul KV store to query the current state.

The init command initializes the backend for us and creates a .terraform sub-directory. In that, we will have a terraform.tfstate file which looks as follows:

$ cat .terraform/terraform.tfstate
{
    "version": 3,
    "serial": 1,
    "lineage": "7d772b24-b269-0638-3832-339be8926025",
    "backend": {
        "type": "consul",
        "config": {
            "access_token": "4925cfa5-1195-4802-74f4-64561e6fa788",
            "lock": true,
            "path": "terraform/state"
        },
        "hash": 14982975079171644367
    },
    "modules": [
        {
            "path": [
                "root"
            ],
            "outputs": {},
            "resources": {},
            "depends_on": []
        }
    ]
}

When we perform any subsequent Terraform operation, Terraform will consul the configuration above to interact with the backend. This file will not need to be committed to version control. It is safe to run the init operation more than once.

Managing a consul key value resource

At this stage, Terraform is initialized and we can now start managing our infrastructure. To keep things simple, we will use Terraform's consul provider to create a consul key on the local consul server we have running.

The configuration looks as follows:

# infrastructure.tf
variable "app1_version_token" {}
resource "consul_keys" "app1_version" {
  datacenter = "dc1"
  token = "${var.app1_version_token}"
   key {
    path  = "app1/version"
    value = "0.1"
  }
}   

We will supply the token when running terraform via the script apply_consul_key.bash:

#!/bin/bash
set -e
pushd ../configuration-utils
pipenv install
terraform_token=$(pipenv run python get_kv_token.py)
popd
terraform apply -target=consul_keys.app1_version -var "app1_version_token=$terraform_token"

Next, we will run the script as:

$ < repository root >
$ cd terraform/configuration
$ ./apply_consul_key.bash
...

Once the above script runs, it will create the key with the specified value. On the consul server, you will see logs that show API calls to the server for acquiring a lock, reading current state, creating a session, creating the key, and writing the final state.

Demonstration of Locking

Let's modify our configuration, infrastructure.tf, to update the value of the key to something else:

$ < repository root>
$ cd terraform/configuration
$ git diff infrastructure.tf
diff --git a/terraform/configuration/infrastructure.tf b/terraform/configuration/infrastructure.tf
index 67bf4ea..ea9490a 100644
--- a/terraform/configuration/infrastructure.tf
+++ b/terraform/configuration/infrastructure.tf
@@ -5,6 +5,6 @@ resource "consul_keys" "app1_version" {
   token = "${var.app1_version_token}"
    key {
     path  = "app1/version"
-    value = "0.1"
+    value = "0.2"
   }
 }

Run the apply_consul_key.bash script in a terminal window. While it waits for us to confirm, run the script in another console. You will see the script exits with an error output as follows:

Acquiring state lock. This may take a few moments...
Error: Error locking state: Error acquiring the state lock: Lock Info:
  ID:        1494f8fc-fe71-7dbe-7ea8-b21b0d402d2e
  Path:      terraform/state
  Operation: OperationTypeApply
  Who:       vagrant@default-centos-7-latest
  Version:   0.11.7
  Created:   2018-06-08 01:46:24.40623813 +0000 UTC
  Info:      consul session: 19e976ad-0c43-b511-3cd1-bc554f7416e1

We can use the -lock-timeout argument to specify a duration for terraform to try and acquire a lock.

Beyond Local Setup

It perhaps makes the most sense to use the consul backend for Terraform if you are already using consul in your organization. In such a scenario, your consul server is running at a central location while terraform is run as part of a continuous integration and deployment pipeline. Compared to our demo setup, we would do the following things differently:

Managing consul tokens for Terraform

We saw that we need to get a consul token for Terraform to be able to create sessions, read, and write the key at which the state will be stored. In our demo, we used the controller token to create this token. Ideally, we would use a setup where we have a token that can create a token for terraform with the desired policy.

Encrypt state at rest

Terraform would also store the state file encrypted by default. Even if we can use consul ACLs to make sure, we don't allow access to the state by any undesired entity; the state is still unencrypted at rest.

For both the above scenarios, Vault is worth looking into.

Summary

In this article, we looked at setting up terraform with consul backend. If you are already using consul in your infrastructure, it is definitely worth looking into.

Although there may be solutions to still use the local backend and using a CI solution to enforce having a single instance of Terraform running at any point of time, using a remote backend with locking is so easy that there is no reason to not do it.

The repository used for this article is available here.

Resources

The following resources should be helpful for you to learn more:

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.