This post shows a simple way of improving Docker images built process. Using this technique reduces both required storage and build time.

Problem

Docker images are built using layers, each command in Dockerfile creates a layer. Read more about images, layers and storage drivers in Docker docs.

Usually all Rails Dockerfiles have 3 common parts as layers:

  • installing gems in bundle install (RUN bundle install)
  • copying all files
  • precompiling assets

Exemplary Dockerfile (Github):

FROM ruby:2.5.1-alpine

WORKDIR /app
COPY Gemfile Gemfile.lock ./
RUN bundle --deployment --without development test
COPY . .
ENV RAILS_ENV production
RUN bundle exec rake assets:precompile

In this setup when there’s a change then COPY and all higher layers, including rake assets:precompile have to be rebuild from scratch. It is even worse when a Gemfile or Gemfile.lock is changed, as bundle install starts from scratch with no cache.

This is unacceptable that adding a blank line to the code produces big artifact and restarts time consuming tasks from scratch.

What we want to achieve is to have a way to add incremental changes as small diffs - patches, not replacements.

Master image

The simplest solution is to create a master/bootstrap image, that incremental images are be built on.

Firstly create and tag the master build. In my repository I use Dockerfile Dockerfile.production.simplified and tag it as rails-example-app-master:

$ docker build -f Dockerfile.production.simplified . -t rails-example-app-master

Then create a copy of your Dockerfile and change the first line with FROM command to:

FROM rails-example-app-master

I saved it as Dockerfile.production.simplified.cached:

FROM rails-example-app-master

WORKDIR /app
COPY Gemfile Gemfile.lock ./
RUN bundle --deployment --without development test
COPY . .
ENV RAILS_ENV production
RUN bundle exec rake assets:precompile

Let’s try it by buildinga a new image with newly added assets. I use 1.4MB picture from NASA:

$ wget https://apod.nasa.gov/apod/image/1808/heic1404b1920.jpg -P app/assets/images

In my case I build the image using:

$ docker build -f Dockerfile.production.simplified.cached . -t rails-example-app-cached-image

What happens?

All commands from Dockerfile are executed again. However this time we already have a lot of data loaded in the image. Let’s take a look in details at some of them:

  • RUN bundle --deployment --without development test - all gems are already installed in the parent image. Because there’s nothing to do it bundle quickly exits with no changes.
  • COPY . . - internally Docker is able to store a difference of changes when calling COPY. All files, except for the new asset, are already in the image. The only the diffrence (new image file) is stored in this layer.
  • RUN bundle exec rake assets:precompile - old assets are already precompiled so there’s no need to do that again. The image that was added is precompiled and the result is stored.

We can inspect layers using docker history command. In my case it looks like this:

$ docker history rails-example-app-cached-image
IMAGE      CREATED             CREATED   SIZE
6450cf8b0  18 seconds ago      /bin/sh -c #(nop)  ENV RAILS_LOG_TO_STDO   0B
d0f791c49  20 seconds ago      /bin/sh -c #(nop)  ENV RAILS_SERVE_STATI   0B
a4a7779e4  22 seconds ago      /bin/sh -c bundle exec rake assets:preco   1.74MB
2b77d3b1a  28 seconds ago      /bin/sh -c #(nop)  ENV RAILS_ENV=product   0B
6d9f380d3  30 seconds ago      /bin/sh -c #(nop) COPY dir:370f291eb553b   1.51MB
6892c696a  32 seconds ago      /bin/sh -c bundle --deployment --without   90B
24d2f3c52  35 seconds ago      /bin/sh -c #(nop) COPY multi:5f12d01bf90   0B
e5a2749c0  37 seconds ago      /bin/sh -c #(nop) WORKDIR /app             0B
3bf1c6640  39 seconds ago      /bin/sh -c apk add --no-cache --update b   853kB
869f40cb4  4 minutes ago       /bin/sh -c #(nop)  ENV RAILS_LOG_TO_STDO   0B
947876927  4 minutes ago       /bin/sh -c #(nop)  ENV RAILS_SERVE_STATI   0B
3c9a21fc8  4 minutes ago       /bin/sh -c bundle exec rake assets:preco   26.3MB
e4248596b  5 minutes ago       /bin/sh -c #(nop)  ENV RAILS_ENV=product   0B
32639ec16  5 minutes ago       /bin/sh -c #(nop) COPY dir:bcef82e4178ba   52.8MB
f4a48e1f7  About an hour ago   /bin/sh -c bundle --deployment --without   121MB
993849839  About an hour ago   /bin/sh -c #(nop) COPY multi:5f12d01bf90   6.93kB
1b0cd9df5  About an hour ago   /bin/sh -c #(nop) WORKDIR /app             0B
7e368ce8c  About an hour ago   /bin/sh -c apk add --no-cache --update b   202MB
d82225343  3 days ago          /bin/sh -c #(nop)  CMD ["irb"]             0B
<missing>  3 days ago          /bin/sh -c mkdir -p "$GEM_HOME" && chmod   0B
<missing>  3 days ago          /bin/sh -c #(nop)  ENV PATH=/usr/local/b   0B
<missing>  3 days ago          /bin/sh -c #(nop)  ENV BUNDLE_PATH=/usr/   0B
<missing>  3 days ago          /bin/sh -c #(nop)  ENV GEM_HOME=/usr/loc   0B
<missing>  3 days ago          /bin/sh -c set -ex   && apk add --no-cac   57.8MB
<missing>  3 days ago          /bin/sh -c #(nop)  ENV BUNDLER_VERSION=1   0B
<missing>  7 weeks ago         /bin/sh -c #(nop)  ENV RUBYGEMS_VERSION=   0B
<missing>  7 weeks ago         /bin/sh -c #(nop)  ENV RUBY_DOWNLOAD_SHA   0B
<missing>  7 weeks ago         /bin/sh -c #(nop)  ENV RUBY_VERSION=2.5.   0B
<missing>  7 weeks ago         /bin/sh -c #(nop)  ENV RUBY_MAJOR=2.5      0B
<missing>  7 weeks ago         /bin/sh -c mkdir -p /usr/local/etc  && {   45B
<missing>  7 weeks ago         /bin/sh -c #(nop)  CMD ["/bin/sh"]         0B
<missing>  7 weeks ago         /bin/sh -c #(nop) ADD file:6ee19b92d5cb1   4.2MB

Layer 869f40cb4 and all the lower layers (including <missing> layers from ruby:2.5.1-alpine image) come from the master build.

Layers 3bf1c664 and higher were created during child image build. To verify that only necessarry diff was stored in the layers take a look at the last column of the listing, which represents layer size:

6450cf8b 18 seconds ago /bin/sh -c #(nop)  ENV RAILS_LOG_TO_STDOUT=t…   0B
d0f791c4 20 seconds ago /bin/sh -c #(nop)  ENV RAILS_SERVE_STATIC_FI…   0B
a4a7779e 22 seconds ago /bin/sh -c bundle exec rake assets:precompile   1.74MB
2b77d3b1 28 seconds ago /bin/sh -c #(nop)  ENV RAILS_ENV=production     0B
6d9f380d 30 seconds ago /bin/sh -c #(nop) COPY dir:370f291eb553b0623…   1.51MB
6892c696 32 seconds ago /bin/sh -c bundle --deployment --without dev…   90B
24d2f3c5 35 seconds ago /bin/sh -c #(nop) COPY multi:5f12d01bf9056d0…   0B
e5a2749c 37 seconds ago /bin/sh -c #(nop) WORKDIR /app                  0B
3bf1c664 39 seconds ago /bin/sh -c apk add --no-cache --update build…   853kB

Some noop operations like bundle or apk add are not 0B size and add some small trace, but it’s totally negligible.

The most important part is that the size of COPY . . (6d9f380d, 1.51MB) and rake assets:precompile (a4a7779e, 1.74MB) layers are small, reflecting newly added files size.

Issue with incremental COPY

For this moment there’s an open bug #21950 in Docker that describes a problem with incremental COPY on some drivers.

It affects new storage drivers such as aufs and overlay2. Old overlay works fine.

I suggest using overlay storage driver on the build server until this gets fixed.

Summary

Using master image improves both speed and storage usage.

The only challenge left is to keep the master build up to date to keep the diff cached images small. It’s a good idea to automate this process.

EDIT 1/11/2018

In the next blog post I wrote about improved solution using new, experimental RUN --mount feature.