Optimize Docker build image in GitHub Actions
Building docker images on every deployment can be time consuming but there are several ways to speed up the process.
The Setup
At my current job we have a CI/CD pipeline that builds a Docker image that is later deployed to replace the current version in production. It is a multi-step process (workflows, in GH parlance) that used to consist of:
- Run CI workflow (~50 minutes)
- Make sure application is installable (using cache to optimize dependencies downloads)
- Unit tests
- Integration tests (selenium)
- Run CD workflow (~30 minutes)
- Build image: some dependencies need to be compiled (~25 minutes)
- Deploy to production
The second workflow is run automatically by GH Actions when the first workflow
completes, i.e., on: workflow_run
in the Workflow spec, so we would need to
wait for about 1 hour and a half before we actually see our changes in
production.
The Refactor
Since we use Docker for the build process, there’s already a tried and tested cache layer that I know we were not taking advantage of. After ducking around a bit, I arrived at Docker’s GitHub Actions cache documentation which outlines the (currently experimental) integration with Actions’ cache. To optimize image caching I decided to split the image into two:
- base: install system dependencies, e.g., runtimes and dependencies that need to be compiled and don’t change as often, e.g., HTTP server.
- application: install dependencies that can simply be downloaded or change more often.
After testing locally this reduced the build process from 20 to ~5 minutes, but when merged to the main branch and tested in the GH Actions servers, the cache didn’t work as expected.
This is because GH Actions cache was being created on the feature branch but, after merging, it would not be picked up by the main branch build process. There were multiple ways to solve this but I decided to restructure the pipeline to run some steps in parallel:
- Run CI workflow (~50 minutes): all steps in parallel
- Build image only for main branch
- Unit tests
- Integration tests
- Run CD workflow (~5 minutes)
- Deploy to production
With the new layout the CI process still takes a long time (due to selenium tests) but the whole pipeline runtime was reduced by about 30 minutes.
The above mentioned pipeline doesn’t illustrate that the “Build image” step is actually building both the base image and the application image which can be improved further by NOT building the base (i.e., go straight to cache) image unless specific files changed. In our case, we want to build the base image only if the base Dockerfile changes. To achieve this, I used the dorny/paths-filter action to detect changes on specific files, further reducing the processing needed to build the application image.
The Rails
This setup was conceived specifically for a Ruby on Rails application with
several gems that require a compilation with native extensions which is a
redundant step for dependencies like puma or sassc that rarely change.
So at the end of Dockerfile.base
, after all “OS” dependencies have been
installed, we install these gems individually which will become CACHED
layers for the image build process, e.g.:
As a rule of thumb (which I break when my setup grows), I DO NOT include
ADD or COPY statements in my Dockerfile.base
file to avoid unintentional
layer cache expiration; I add those to the Dockerfile.app
image which usually
copies the application code to the image.
The CI Workflow
After all was said and done, the CI workflow looked something like this: