Writing a Custom Resource for Concourse— Detecting Pull Request Close/Merge Events
Recently I’ve been playing around with the excellent Concourse CI project. It’s a really cool project that’s been gaining a lot of traction as an open source CI/CD solution. Take a look at this page for why Concourse is so great and how it compares to other CI/CD solutions.
Inside of Concourse all interactions are done through resources and jobs. Its model is functional in the sense that pipelines are composed of stateless jobs with well defined inputs and outputs which are modeled by resources. Everything runs inside of its own container which prevents any pollution of the automation environment between tasks. In order for any steps within a job to share anything, the inputs and outputs must be explicitly defined. This can be a little tedious at times, but it keeps everything happening inside of the pipelines very explicit. All of these characteristics make it ideal for using it as a CI/CD solution because it prevents a lot of the more mysterious and frustrating issues that cause builds to break.
Main Idea and Motivation
Implementing CI/CD using any kind of automation system requires the system to interact with the events happening inside of source control solutions like Github. In particular, one of the more common events that trigger a CI/CD system are pull requests. The community as well as the maintainers of Concourse at Pivotal have been really good with providing resources for most cases. There is already an excellent github-pullrequest-resource
available for dealing with Github pull requests, but one case it doesn’t really handle is the ability to detect a pull request that has been merged or closed.
This can be useful when modeling CI/CD in such a way so that an open PR creates an environment for the application source code to deploy onto while a closed or merged PR initiates a cleanup of that environment to reclaim resources. This type of flow is especially useful within Cloud platforms where the cost is billed on a per-use basis, so that the platform is only using resources while there are open pull requests. This was a use case I really needed in my current projects using Concourse, so I decided to take this opportunity to dive a little deeper and write my own custom resource and hopefully document the process so that it can be useful to others. For the tldr; please check out the project repo.
Understanding Concourse Resources
Before we first start, we should first try to understand how to implement a custom resource for Concourse. I don’t want to go too deep into the spec here, but basically a Concourse resource is just a container that implements three scripts:
/opt/resource/check
: checking for new versions of the resource/opt/resource/in
: pulling a version of the resource down/opt/resource/out
: idempotently pushing a version up
All resources should implement these 3 scripts, but they don’t all have to do something. For operations that don’t fit the semantics of the resource, the script can be a noop. In the case of our resource, we really only need to implement the check
and the in
scripts because we aren’t updating anything from closed or merged pull requests, merely fetching information about them and triggering downstream jobs.
Understanding the check script
Now that we understand which scripts we need to implement for this resource, let’s dive a little deeper into the spec to understand what check
should be doing for this resource. Breaking down the spec:
- A resource type’s
check
script is invoked to detect new versions of the resource. - It is given the
source
configuration and currentversion
on stdin. source
is an arbitrary JSON object which specifies the location of the resource, including any credentials. This is passed verbatim from the pipeline configuration.version
is a JSON object withstring
fields, used to uniquely identify an instance of the resource.- This will be omitted from the first request, in which case the resource should return the current version (not every version since the resource’s inception).
- It must print the array of new versions in chronological order to stdout, including the requested version if it’s still valid.
The above spec is basically saying the Concourse runtime will be running the script using something like the following command:
echo {...source config json...} | /opt/resource/check
For the first check
invocation, the input to the script will only include the source
config, but in subsequent requests, the check
script will also be passed the current version
which tells the resource to return the next valid version
objects.
Given this spec, we can start to design what our source
configuration and version
objects should look like.
Defining the version and source config
Since we want to be returning information about closed and merged pull requests from Github, let’s try to understand what kind of information is available. Because Github provides an excellent GraphQL API we can do some exploration using the explorer to see what kind of data we can fetch about pull requests. After some experimentation, I came up with the following GraphQL query:
Since our query will be what is required to run the check
script, we can pretty much use the input parameters (with some additional fields for credentials and API endpoint) as the fields for our source
config:
source:
graphql_api: https://api.github.com/graphql
access_token: ((github-access-token))
base_branch: master
owner: ((github-owner))
repo: ((repo-name))
first: ((num-to-fetch))
states:
- closed
- merged
This query will return a payload that looks something like this:
Based on this information, we can try to return an array of version
objects that look something like this:
{
"id": "MDExOlB1bGxSZXF1ZXN0MTcxMjQ5NjI1",
"cursor": "Y3Vyc29yOnYyOpK5MjAxOC0wMi0yNVQxMjozNDo0NC0wODowMM4KNQ/Z",
"number": "1",
"url": "https://github.com/shinmyung0/fixture-repo/pull/1",
"baseBranch": "master",
"headBranch": "test-merged-branch",
"state": "MERGED",
"timestamp": "2018-02-25T20:34:44Z"
}
Some important points to highlight are fields like the cursor
which can be passed in to our GraphQL query’s after
field, to only return pull requests after a particular cursor. Since the "current”version
object is passed into the check
script to fetch “new” version
objects, this is something that would be useful to have included.
Implementing the check script
Now that we have a clear idea of what our check
script should do, we can start to actually implement it. Since we know that Concourse resources can be written in any language as long as they satisfy the spec, we can pick the best language for what we are doing. Because we are using GraphQL, I decided to implement this using JS. It’s easy to deal with asynchronous network calls, easy to deal with JSON, and there are plenty of client libraries out there for GraphQL. A library I really enjoy using for GraphQL in JS is Apollo. Given all these choices the pseudocode for the check
script would look something like this:
#! /usr/bin/env node// the shebang allows this file to be directly executable
async function check() { // read stdin
// parse and validate input json
// use configuration to run GraphQL query to fetch PRs
// convert response payload to version objects
// output to stdout}
check()
For the actual implementation check the source code here.
Understanding the in script
Let’s take a deep dive in the spec for the in
script.
- The
in
script is passed a destination directory as$1
. The script must fetch the resource and place it in the given directory. - The script is given on
stdin
the configuredsource
and a preciseversion
of the resource to fetch. - The script must emit the fetched version, and may emit metadata as a list of key-value pairs.
Based on this spec, the Concourse runtime will basically be invoking the in
script in something like the following manner:
echo {... some json ...} | /opt/resource/in outputdir
Because our pull request resource is simply fetching info about a pull request, we don’t need to be fetching anything additionally other than outputting the version
object as a file. So we can say that the result of the above execution of the in
script will result in some outputdir/pull_request
file that contains the version
object that was passed in to stdin
. It will also emit the current version
to stdout
.
Implementing the in script
Based on our understanding of what the in
script should do, the pseudocode for the in
script would look something like the following:
#! /usr/bin/env nodeasync function doIn() { // read stdin, parse, and validate
// extract given .version key
// output version object to $1/pull_request file
// emit version to stdout}doIn()
Downstream jobs can read the $1/pull_request
file and extract information about the recently closed or merged pull request.
Writing Tests
Since we are using Node JS, we can use Jest to easily write some unit and integration tests. Unit tests are pretty straight forward to write, but in order to be able to run integration tests, we need to setup a fixture repo with some closed or merged pull requests to test actual API calls against.
Because making calls against this repo requires a Github Access token, we can pass this in as an environment variable to the integration test suite. A good thing to do would be to make sure to validate that an access token has been set within the test. Check out the test code to see in detail how this is setup.
CI/CD and Publishing to Docker Hub
We can use Travis CI to easily setup some CI/CD for this project. Basically all the CI/CD job needs to do is run all the tests and then if successful and the commit is tagged as a release, build a Docker image and then publish it to the public Docker Hub. Pretty straightforward, so I won’t go into too much detail here. But if you’re curious, please check out the Travis CI config as well as the build script.
Conclusion
This has been a fun project to publish for this month. It gave me a really good opportunity to do a deep dive into Concourse which is something that I’ve been using for work quite a bit lately. I’m feeling more confident in my ability to open source things on a regular basis which is also something I’m committed to doing this year. Overall Concourse is a really awesome project that I highly recommend for any teams looking for a really nice CI/CD solution.