Docker-like Layer Caching on VMs

Docker-like Layer Caching on VMs

Published on 

by Dan Manges

Layer caching is one of the best features of building Docker images. When a command is run while building an image, Docker saves the changes to the filesystem as a new layer. Anytime a command has already been run, Docker will skip executing the command and instead apply the layer from the previous execution. For more on how this works, see the explanation in the Docker docs on optimizing builds with cache management.

Can the same technique be used when executing commands in VMs for CI builds?

How Caching Works in Most CI Providers

CI build pipelines typically use ephemeral VMs. Using a fresh machine helps ensure that the execution is pristine, and nothing left over from persistent infrastructure is affecting the build in an unexpected way.

However, provisioning a fresh VM for every build results in needing to re-execute commands to set up the system.

Because repeatedly running commands like this would be inefficient, most CI providers offer a cache store. However, the cache store is usually nothing more than a generic key/value store for tar files. Each CI step has to be implemented with logic to determine the cache key, package up files, and restore them.

This approach to caching is tedious and error-prone compared to Docker’s layer caching. In practice though, the process is very similar. A Dockerfile may look like this:

FROM ubuntu:latest
RUN apt-get update && apt-get install -y build-essentials

And a GitHub Actions build script may look like this:

steps:
- run: apt-get update && apt-get install -y build-essentials

In this example, GitHub Actions wouldn’t cache the install at all. To get caching, you’d need to use an action which runs apt-get and uses the manual cache store.

Wouldn’t it be great if a CI build pipeline could cache the same way that Docker does?

Using OverlayFS for Layer Caching on VMs

OverlayFS is the recommended storage driver for Docker. It’s a union filesystem that enables having a stack of layers which are exposed to a process as a single filesystem.

As part of a build or CI process, OverlayFS can be used in a VM the same way it’s used with containers.

  • When running a command, create a new layer
  • When commands finish running, archive the changes in that layer
  • If a command has already been run on a layer, rather than executing the command again, restore the previously cached layer

Containers versus VMs for Builds

There’s some additional complexity to consider with caching, such as environment variables and side effects outside of the file system. However, the technique that Docker uses can be applied to VMs.

If a build can be executed in a container, then Docker caching will work out of the box.

Executing in a VM is usually more familiar and less restricted though, and we’ve found that many engineers prefer the simplicity of using a VM.

Mint

We’re implementing Docker-like layer caching when running VMs for Mint, the new build/CI tool that we’re developing.

Follow along by subscribing to our newsletter:

👉 https://www.rwx.com/newsletter

Enjoyed this post? Share it!

Never miss an update

Get the latest releases and news about RWX and our ecosystem with our newsletter.