Preface
I’ve been using Docker Containers for many years, but I’ve always treated them as magical black boxes. I know that Docker uses container runtimes like runc (by default) to create isolated environments to run code. However, I don’t know what “isolated” really means. To unveil the black box, I decided to implement a toy container runtime from scratch in Rust.
Luckily, there are a ton of tutorials and resources online that I can learn from. My implementation is largely based on these two blogs in particular: Linux Containers in 500 Lines of Code & Writing a Container in Rust. As someone who knew very little about Linux, the experience of building a container is extremely eye-opening and rewarding.
Here is a summary of what we will build:
- root filesystem isolation with mount namespace
- resource restriction with cgroups
- limit syscalls with seccomp
- isolate user IDs and group IDs with user namespace and uid mapping
- privilege control with capabilities
In this blog series, I will cover the theory behind and the implementation of a container from the perspective of someone new to Linux. I will also provide as many demos as possible to demonstrate how the Linux primitives that make up a container work.
The full source code is available here.