The magical world of containers — Dockerfile best practices

Gabriele de Capoa
4 min readAug 18, 2022

--

Photo by Guillaume Bolduc on Unsplash

Anyone in my LinkedIn network knows last year I became a Certified Kubernetes Application Developer. After about 4 years working actively on Kubernetes day by day, this certification was a great recognition of my studies.
This is why I decided to start a new post series, describing what I’ve learned about containers and their use cases. Topics will be:

* Basic concepts
* Use cases
* Unix background
* Namespaces
* Control Groups
* chroot
* Implementations
* Docker
* containerd
* Docker commands
* Orchestration and Docker Swarm
* Kubernetes
* Architecture
* Objects
* CLI
* Cloud Foundry
* Architecture
* Application development
* CLI
* OpenShift
* Architecture
* Objects
* CLI

Before starting, I would add a caveat: this is what I understood after studying on books and on the job, but could include lots of misunderstanding, so please use those posts just a starting point to deepen your knowledge, starting your own learning roadmap (and eventually point me to the misunderstandings)!

We saw in last story how Docker building process works and we described Dockerfile commands. In this story we will describe some best practices identified to write a Dockerfile production-ready.

Order matters for caching

Dockerfile s uses a declarative language to describe actions Docker engine execute in order to build a new image. So, instruction order into a Dockerfile matters, as intermediate layers could include different contents.

More specific COPY to limit cache busts

This is something a developer must avoid anytime. Since the cache identifies each layer with a hash ID strictly related to its content, if you add a COPY clause referring the parent directory of the whole project instead of the directory with only the source code, then each time you will build the container you will get a cache bust.

Identify cacheable units such as apt-get update and apt-get install

Sometimes you need to update your base image or to install other software useful to your application. These operations must be executed immediately after FROM clause, so could be added in cache avoiding any bust.

Remove unnecessary dependencies

To reduce complexity, you must avoid to install (or you need to remove) dependencies you don’t use in your production application.

Remove package manager cache

As said before, container images should be the smallest possible. Removing OS package manager cache will help to reduce size.

Use official images when possible

Since anyone could define its own image and share with the community, you have some chance to chose a malicious base image. For your enterprise production applications it’s desirable using an official base image, so images released by operating systems producers.

Use more specific tags

Container images, as we know, could be versioned using tags. Using specific tags for each version helps you to identify the right version. For example, one could use the semantic versioning pattern to define their tags, and use the same tag on their git server.

Look for minimal flavors

Less is better, also in software world. Using a base image with a minimal operating system will reduce the size of your image, but also will reduce the attack surface.

Speaking of best practices, we could have a look also on security stuffs.

In general, the image build process can be vulnerable to supply-chain attacks.
In such attacks, a malicious user injects code or binaries into some dependency from a trusted source that then gets built into your application.
Because of the risk of such attacks, it is critical that when you build your images you only base them on well known and trusted image providers.
Alternately, you can build all your images from scratch.
Building from scratch is easy for some languages (e.g. Go) which can build static binaries, but it is more complicated for interpretted languages like Python, Javascript or Ruby.

The other best practices for images regards naming.
Though the version of a container image in an image registry is theoretically mutable, you should treat the version as immutable.
In particular, some combination of the semantic version and the SHA hash of the commit where the image was built is a good practice for naming images (e.g. v1.0.1-bfeda01f).
If you don’t specify an image version, then latest is used by default.
While this can be convenient in development, it is a bad idea for production usage since latest is clearly being mutated every time a new images is built.

--

--

Gabriele de Capoa

Cloud software engineer, wanna-be data scientist, former Scrum Master. Agile, DevOps, Kubernetes and SQL are my top topics.