by Dan Manges
Layer caching is one of the best features of building Docker images. When a command is run while building an image, Docker saves the changes to the filesystem as a new layer. Anytime a command has already been run, Docker will skip executing the command and instead apply the layer from the previous execution. For more on how this works, see the explanation in the Docker docs on optimizing builds with cache management.
Can the same technique be used when executing commands in VMs for CI builds?
CI build pipelines typically use ephemeral VMs. Using a fresh machine helps ensure that the execution is pristine, and nothing left over from persistent infrastructure is affecting the build in an unexpected way.
However, provisioning a fresh VM for every build results in needing to re-execute commands to set up the system.
Because repeatedly running commands like this would be inefficient, most CI providers offer a cache store. However, the cache store is usually nothing more than a generic key/value store for tar files. Each CI step has to be implemented with logic to determine the cache key, package up files, and restore them.
This approach to caching is tedious and error-prone compared to Docker’s layer caching. In practice though, the process is very similar. A Dockerfile may look like this:
FROM ubuntu:latest RUN apt-get update && apt-get install -y build-essentials
And a GitHub Actions build script may look like this:
steps: - run: apt-get update && apt-get install -y build-essentials
In this example, GitHub Actions wouldn’t cache the install at all. To get caching, you’d need to use an action which runs apt-get and uses the manual cache store.
Wouldn’t it be great if a CI build pipeline could cache the same way that Docker does?
OverlayFS is the recommended storage driver for Docker. It’s a union filesystem that enables having a stack of layers which are exposed to a process as a single filesystem.
As part of a build or CI process, OverlayFS can be used in a VM the same way it’s used with containers.
There’s some additional complexity to consider with caching, such as environment variables and side effects outside of the file system. However, the technique that Docker uses can be applied to VMs.
If a build can be executed in a container, then Docker caching will work out of the box.
Executing in a VM is usually more familiar and less restricted though, and we’ve found that many engineers prefer the simplicity of using a VM.
We’re implementing Docker-like layer caching when running VMs for Mint, the new build/CI tool that we’re developing.
We’ll be sharing the first public preview of Mint at the end of May. Join us for one of the following sessions
Or if you can’t make it but want to follow along, subscribe to our newsletter:
Get the latest releases and news about RWX and our ecosystem with our newsletter.