The Paradigm Shift of CI/CD as a DAG of Tasks
The Paradigm Shift of CI/CD as a DAG of Tasks
by Dan Manges

The Paradigm Shift of CI/CD as a DAG of Tasks

Most CI/CD platforms run jobs as scripts on ephemeral VMs. This is a fairly simple model which can support any use case. “Here’s a VM, run whatever you want on it.” It’s also a primitive model which limits advanced capabilities. When the lowest level execution primitive in a CI workflow is a VM, the ability to optimize performance, reliability, and the developer experience around retries and task definitions is highly constrained.

Our engineering team at RWX spent the past year designing and building Mint, a CI/CD platform. We’ve described it as paradigm-shifting because the execution primitive for Mint is fundamentally different than with other CI/CD platforms. Mint defines workflows as a directed acyclic graph (DAG) of tasks, rather than as scripts on VMs. This difference is the key to Mint’s unmatched performance and developer experience.

Traditional CI with scripts on VMs

Consider a simple CI setup for a project that uses both Ruby and Javascript and has five parallel CI jobs. In traditional CI systems, the definition and execution might look like this.

Each parallel job must redefine and execute the steps from previous jobs.

CI with a DAG of Tasks

With a DAG, the definition and execution looks like this.

Each task is only defined and executed once, and can be easily composed for subsequent tasks to execute.

Performance

In the traditional CI approach, the setup steps are duplicated for each job. Sometimes the setup jobs are fast enough that it doesn’t matter. That’s often the case, until some point in the future when it does matter – when workflows evolve, setup gets more complicated, and performance gets worse. Overall, it’s really inefficient to repeatedly execute the same setup steps for each parallel job that runs. The diagram above only illustrates a handful of jobs, but CI workflows for bigger projects can easily have dozens of jobs executing the exact same setup.

With a DAG, each setup step is only executed once, and the result of that execution step can be reused and shared among all downstream tasks. You only need to execute the installation or configuration of a tool once. In combination with using semi-persistent infrastructure, eliminating the duplicated setup makes Mint much faster than traditional CI/CD.

Resource Provisioning

The lowest level of definition also affects resource provisioning. With running a series of steps on a VM, the same machine must execute all steps. This means that it’s not possible to use more CPU for a single step, such as leveraging parallelization for compilation, without using the same amount of CPU for all other steps.

With a DAG, you can increase the CPU for an individual step without having to over-provision the resources used for other tasks in the DAG.

Parallelization

With scripts on a VM, steps generally execute in serial. Sometimes there are ways to run them in parallel, such as a background process in bash. Unfortunately, this usually has implications on log output and can make debugging more difficult.

With a DAG, tasks can run with optimal parallelization. Multiple setup steps can run in parallel with their outputs getting merged together for subsequent steps.

Definition

In addition to the performance benefits, defining tasks as a DAG results in more elegant composition and definition.

In traditional CI systems, the example above looks like this in pseudo-code:

1
2
3
4
5
6
7
8
9
10
11
ruby-tests:
  - clone code
  - install ruby
  - bundle install
  - run tests

ruby-linter:
  - clone code
  - install ruby
  - bundle install
  - run linter

Sometimes those setup steps can be extracted, but there’s often tension between the CI definition interface and mechanisms for reuse unless you drop down to using code generation for building CI workflows.

Whereas with a DAG, reuse of setup across parallel jobs is simple and straightforward:

1
2
3
4
5
6
7
8
tasks:
  - key: ruby-tests
    use: [system-packages, ruby, gems]
    run: test command here

  - key: ruby-linter
    use: [system-packages, ruby, gems]
    run: linter command here

Retries

The retry experience is also much better with a DAG than with VMs.

When the lowest primitive of execution is the entirety of the setup and scripts that run on a VM, any failure requires retrying the entire execution.

With finer granularity in task definition, retries are also as granular as possible. Mint supports retrying individual tasks, and it will also correctly and automatically retry any downstream tasks in the DAG. Mint can even do this while a run is still in progress with other tasks in the DAG still executing.

Caching

Much like the differences with retries, the lowest level of granularity in task definition affects the lowest level of granularity for other benefits such as caching.

Content-based caching (demo video) is the most impactful feature of Mint. It makes the power of Bazel available with the simplicity of GitHub Actions.

Although most CI systems have some sort of caching primitive, utilizing it is a manual effort, requiring caution around cache key definition.

With Mint, content-based caching happens automatically. Executing the same commands on the same source files as has previously been executed will result in a cache hit.

Content-based caching is only viable with more granularity around task definition and atomic task execution. In traditional CI, where the lowest level primitive is an entire set of scripts running on a VM, it’s not possible to implement features like this.

Mint

Mint powers the fastest builds and has the best developer experience in CI/CD. Learn more about what makes Mint different, check out a 3 minute demo video on Mint's DAG, or book a demo if you want to see it firsthand and chat about giving your engineering team the best experience in CI/CD.

Enjoyed this post? Share it!