Docker has been a tremendously impactful technology, changing how software is packaged and run.
It was released in 2013, and BuiltKit, the current build system, was released in 2018.
While BuildKit made several meaningful improvements over the classic Docker builder, it still leaves a lot to be desired.
This is our proposal for a new approach that provides substantially faster builds with simplified configuration.
Build Context
Currently, most docker builds upload the entire git repository for the project into the build context. Some engineering teams have git repositories that contains 100s of megabytes of files.
This is a very slow step to begin the building process. It also detracts from the benefits provided by remote image builders, such as Depot. You're typically cloning an entire repository onto a build machine, only to then upload all of it into the remote build context.
Proposal: we should stop uploading entire git repositories into the build context.
Cloning the repository contents into the image inside of the builder is a better approach.
While it is possible to do that within BuildKit today, it is not the norm, and BuildKit does not guide users towards that solution. We should eliminate COPY . .
from Dockerfiles.
COPY Order Tedium
Currently, most docker builds require very careful ordering of copying files and running commands to produce cache hits. Once there is a cache miss, everything downstream is a cache miss.
Therefore, individual files have to be plucked out of the build context and copied into the image as commands are run. This approach is verbose and tedious to configure.
Finally, at the end of the image, a COPY . .
statement will push everything in.
Proposal: images should start with having the entire repository contents available. When running commands, you should be able to specify which files are relevant for the execution. Only the contents of those files should affect the cache key. And the execution should be sandboxed to ensure that other files are not present on disk.
Instead of:
- copy a few files
- run a command
- copy a few more files
- run another command
- copy all of the files
We are proposing:
- put all of the files into the image
- run a command, using a subset of the files
- run another command, using a different subset of the files
This approach also provides for miss-then-hit caching behavior, whereas today, once a layer is a cache miss, everything downstream is also a cache miss.
Multi-stage Builds
The careful consideration of the order of COPY
and RUN
in a Dockerfile is also
problematic for RUN
commands that should be independent, such as installing dependencies from two different package managers. You can solve that with a multi-stage build, but that approach presents its own challenges.
Using a multi-stage build requires manually copying files from stages back into the main image, which can be tedious, especially when you don't know which files were generated or are relevant. In practice, this is enough to make multi-stage builds far less utilized than they should be.
The two uses cases for multi-stage builds are fairly distinct:
- parallelization and independent cache hits
- installing and using build-time dependencies but only putting generated artifacts in the final image
In the case of only putting generated artifacts in the final image, copying between stages is inherently required. It's the use case of parallelization and increased cache hits where the multi-stage build process adds too much complexity.
Proposal: allow multi-stage builds to be combined together. In practice, it's exceedingly rare to having independent stages writing the same files. Although conflicts between stages are rare, there are good options to handle them, such as producing an error, or defaulting to the last layer winning.
Cache From the Full Repository
When trying to get as many cache hits as possible when building containers, there are two primary approaches:
- Use a persistent build machine, likely a remote one such as Depot
- Use
--cache-from
Using --cache-from
helps a little bit, but it's still quite limited.
You have to specify the images that you want to use for the cache.
Typically, people will specify the latest image.
This approach results in frequent cache misses which could be cache hits instead.
Proposal: use the entire registry as the cache. Anything that has previously been executed should be available to produce a cache hit when building images on different machines. The entire repository should be a source, not only a specific image.
Compression
Layers are compressed by default, and gzip
is slow.
Networks are faster than compression algorithms.
Even gzip -1
configured for the fastest and least amount of compression is slower than the network.
Proposal: stop compressing layers by default.
There is some merit in compressing for transfers where the compression algorithm is faster than the network, such as from a backend registry to a consumer connection. But entirely within cloud infrastructure, it's not worth slowing down the transfers to save a little bit of money on storage costs.
Syntax
This doesn't affect performance, but to throw in one more nit –
Dockerfiles contain single-line RUN
statements, necessitating chaining commands together with &&
and placing a \
before each newline. It'd be much nicer to be able to write multiline scripts, configuring the shell wrapper if necessary.
Prototype
We know these proposals result in substantially faster builds with a much more elegant developer experience, because we built a prototype on the RWX runtime.
If you're interested in early access, reach out to Dan Manges via email, DM on 𝕏 or Bluesky.
Related posts
Read more on updates and advice from the RWX engineering team

The GitHub Actions Base Image is a Whopping 47 GiB
The decision to bundle a plethora of tools into the GitHub Actions base image is baffling. The base image is a whopping 47 GiB.