Running containers with systemd-nspawn
I recently discovered that apart from running
services,
scheduling
timers,
configuring your network
interfaces,
resolving
names
and a lot more you can also run containers with systemd
using
systemd-nspawn.
This was completely new to me so I decided to take a deeper look into the
necessary steps to get this up and running. First I looked into how to build
images suitable for systemd-nspawn
and then at the different ways to run and
manage containers with the help of builtin tools. After reading this post you
will hopefully also have a rough understanding about how this works and are able
to run simple workloads in containers yourself using only systemd
.
Prerequisites
To use systemd-nspawn
you can install it on Debian based distributions via:
apt install systemd-container
On Arch based distributions it comes already pre-packaged with systemd
. You also need
to enable and start systemd-networkd.service
and systemd-resolved.service
so
networking and name resolution work inside the containers.
Building an image
There are several ways in which you can build an image for use with
systemd-nspawn
depending on the distribution you want to use. In this post I
am going to use Debian but you can of course use any
distribution you like. You just have to make sure it contains a valid
/etc/os-release
file. It is also helpful if the distribution it uses systemd
as the init
system as well, but not necessary. systemd-nspwan
will also run any other init
system it finds inside the filesystem tree.
The default way to build images for Debian is to use debootstrap
. Creating a
minimal image based on the latest stable version Buster can be done by
executing:
debootstrap --include=systemd-container stable /var/lib/machines/Buster
This creates a filesystem tree inside the /var/lib/machines/Buster
directory which you
can use with systemd-nspawn
. To make everything work completely, you have to
perform some post installation steps.
# systemd-nspawn -D /var/lib/machines/Buster
Spawning container Buster on /var/lib/machines/Buster.
Press ^] three times within 1s to kill container.
root@Buster:~# systemctl enable systemd-networkd
root@Buster:~# systemctl enable systemd-resolved
root@Buster:~# ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf
root@Buster:~# ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf
root@Buster:~# mkdir /etc/systemd/resolved.conf.d
root@Buster:~# echo "[Resolve]" > /etc/systemd/resolved.conf.d/dns.conf
root@Buster:~# echo "DNS=1.1.1.1 8.8.8.8" >> /etc/systemd/resolved.conf.d/dns.conf
root@Buster:~# echo "pts/0" >> /etc/securetty
root@Buster:~# exit
logout
Container Buster exited successfully.
Let’s go through this step by step. I first spawned a shell inside the newly
created image. Then I needed to enable systemd-networkd
and systemd-resolved
in order to get networking and name resolution inside the container working
properly. For this I also linked the stub-resolv.conf
generated by
systemd-resolved
to /etc/resolv.conf
and configured the DNS servers which
systmed-resolved
will use. Otherwise the running container cannot resolve anything. The DNS
servers are configured inside /etc/systemd/resolved.conf.d/dns.conf
. As a last
step, I added pts/0
to /etc/securetty
to enable root logins.
To make this process more automated, you can also use mkosi. It is a wrapper
around debootstrap
, pacstrap
and zypper
to create minimal, legacy free OS images.
To install mkosi
, clone the git repo somewhere,
open a shell in it and simply run the install script.
git clone https://github.com/systemd/mkosi.git
cd mkosi
sudo setup.py install
For detailed information about how to use it, see the man page. Now create an empty directory somewhere and create the following files in it:
# tree
.
├── mkosi.default
└── mkosi.postinst
# cat mkosi.default
[Distribution]
Distribution=debian
Release=buster
[Output]
Format=directory
Bootable=no
Hostname=buster
Output=/var/lib/machines/buster
[Validation]
Password=root
[Packages]
Packages=
iputils-ping
systemd-container
iproute2
# cat mkosi.postinst
#!/bin/sh
# make sure systemd-networkd and systemd-resolved are running
systemctl enable systemd-networkd
systemctl enable systemd-resolved
# make sure we symlink /run/systemd/resolve/stub-resolv.conf to /etc/resolv.conf
# otherwise curl will fail
ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf
# Configure global DNS servers
mkdir /etc/systemd/resolved.conf.d
echo "[Resolve]" > /etc/systemd/resolved.conf.d/dns.conf
echo "DNS=1.1.1.1 8.8.8.8" >> /etc/systemd/resolved.conf.d/dns.conf
# set pts/0 in /etc/securetty to enable root login
echo "pts/0" >> /etc/securetty
mkosi
will read the mkosi.default
file for the settings of the image.
According to the file, it will create a directory at /var/lib/machines/buster
containing a Debian/Buster filesystem tree, make it not bootable inside a virtual machine
and set the host name and root password . It will also install some additional
packages. There are actually a lot more options you could use but I will leave
it rather simple for this post. After the image was created, mkosi
will run
the mkosi.postinst
script inside the image which performs all of the steps
just done by hand when using debootstrap
. Make sure to set the executable flag
on the file after creating it.
The nice thing about mkosi
is that you can easily and in an automated fashion
create OS images for a number of different distributions. Now that we have
created an image, it is time to run it.
Running the image
systemd-nspawn
can either be invoked via the command line or run as a system
service in the background. In the service mode, each container runs as it’s own
service instance using a provided systemd-nspawn@
unit template. I will first
look at how to invoke it via the command line to get a better understanding
about how it works and then I will use the provided unit template for a more
automated approach.
There are actually three different ways you can run an image with
systemd-nspawn
which all work slightly different. The default way is to boot
the image using it’s init system just like you would boot a VM. It is important
to note here that systemd-nspawn
does not boot a kernel and doesn’t start a
VM. Using the boot mode will provide you with an OS container that is running
multiple processes as well as it’s own init system. You can compare this mode of
operation to LXC containers or BSD jails. To use it, the --boot
or -b
flag
need to be passed when invoking it. This is the default mode of operation when
using the systemd-nspawn@
unit template.
systemd-nspawn --boot -D /var/lib/machines/buster
The command above will boot the image and present you with a login shell. If you
followed the steps above to build the image, you can now login with the root
user and password root
to look around a bit. You will notice that the
container shows the same interface names and IP addresses as your host because
network separation was not enabled. Any network service started in this
container or port that will be exposed, will directly be available on the IPs
of the host.
Instead of full-fleged OS containers, you can also start something more similar
to an application container which you might now from
Docker or RKT. You can
either start an application directly as PID 1 by passing no extra flag at all
or run a stub init process which will then start the application by passing
--as-pid2
. Note that not all applications are suited to run as PID 1 since
they have to meet a few special requirements that the PID 1 process has. For
example they need to reap all processes spawned by it and also implement
sysvinit
compatible signal handling. Shells are generally able to satisfy
these requirements but for all other applications is recommended to use
the --as-pid2
switch.
To start a shell inside the created image running as PID 2, run the following command:
systemd-nspawn -a -D /var/lib/machines/buster /bin/bash
A big caveat in this mode of operation is, that name resolution does not seem to work properly (at least I could not get it working). If this is an issue for the application you want to run I would recommended you to use the boot mode. The man page has a nice comparison of the three modes
Switch | Explanation |
---|---|
Neither –as-pid2 nor –boot specified | The passed parameters are interpreted as the command line, which is executed as PID 1 in the container. |
–as-pid2 specified | The passed parameters are interpreted as the command line, which is executed as PID 2 in the container. A stub init process is run as PID 1. |
–boot specified | An init program is automatically searched for and run as PID 1 in the container. The passed parameters are used as invocation parameters for this process. |
UPDATE:
The issues with name resolution in some containers can be explained by the way
systemd-nspawn
handles the /etc/resolv.conf
file. It is configured by the
--resolv-conf
command line flag:
If set to “auto” the file is left as it is if private networking is turned on (see –private-network). Otherwise, if systemd-resolved.service is connectible its static resolv.conf file is used, and if not the host’s /etc/resolv.conf file is used. In the latter cases the file is copied if the image is writable, and bind mounted otherwise. […] Defaults to “auto”.
To use the same DNS servers in the container as on the host, set it to either
copy-host
or bind-host
.
Networking
In general it can be a good idea to contain the container in a private network
so you don’t have to worry about which ports it exposes unless you explicitly
forward them. To do this, systemd-nspawn
offers a variety of options which
differ in complexity. To simply put a container inside it’s own private /28
subnet you have to pass the --network-veth
or -n
option. This will create a
virtual ethernet link between the container and the host. Inside the container,
it will be available as host0
and on the host side it will be named after the
container, prefixed with ve-
. systemd-networkd
comes with a default
configuration to set up the virtual interface on the host and inside the
container as well if it is enabled and running on both. It also takes care of
setting up DHCP on the link as well as the necessary routing options. A
container with private networking can be started like this:
systemd-nspawn -bD /var/lib/machines/Buster -n
Note: If you are also using docker
on your system, you have to do some
tweaking of iptables
rules so the container can communicate with the outside
world. docker
changes the default behavior of iptables
so you have to allow
in- and outgoing traffic on the created virtual interface. Example iptables
rules can be found in the paragraph below.
Managing containers
If you want to run containers via systemd-nspawn
in a more automated and
management friendly fashion which is similar to how you would run docker
containers, you can make use of machinectl
which also ships with systemd. It
uses the systemd-nspawn@
unit template mentioned above to start containers
with sensible default settings. Those are:
ExecStart=/usr/bin/systemd-nspawn --quiet --keep-unit --boot \
--link-journal=try-guest --network-veth -U \
--settings=override --machine=%i
I did not cover all of them so make sure to look them up in the man page :-).
To start a container, you can first have a look at all images that are
available. machinectl
searches images stored in /var/lib/machines/
,
/usr/local/lib/machines/
, /usr/lib/machines/
and /var/lib/container/
.
# machinectl list-images
NAME TYPE RO USAGE CREATED MODIFIED
buster directory no n/a n/a n/a
1 images listed.
Then you can simply run machinectl start buster
and it will invoke the unit
template with the used image.
# machinectl
MACHINE CLASS SERVICE OS VERSION ADDRESSES
buster container systemd-nspawn debian 10 192.168.68.28…
1 machines listed.
You can then use machinectl login
or machinectl shell
to login to the running
container and do things or machinectl status
to check the processes running
inside your container. What is also pretty neat is that you can use journalctl
-u systemd-nspawn@buster
on your host to see all log output of the container.
As mentioned above, if you are also running docker
on your system, you have to
create a few iptables
rules so your container can talk to the outside when you
run it with private networking enabled. The easiest way to do this, is to create
an override file for the systemd unit template via systemctl edit
systemd-nspawn@
and adding the following content:
[Service]
ExecStartPre=-/usr/bin/iptables -A FORWARD -o ve-%i -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT ; \
-/usr/bin/iptables -A FORWARD -i ve-%i ! -o ve-%i -j ACCEPT ; \
-/usr/bin/iptables -A FORWARD -i ve-%i -o ve-%i -j ACCEPT
ExecStopPost=-/usr/bin/iptables -D FORWARD -o ve-%i -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT ; \
-/usr/bin/iptables -D FORWARD -i ve-%i ! -o ve-%i -j ACCEPT ; \
-/usr/bin/iptables -D FORWARD -i ve-%i -o ve-%i -j ACCEPT
It will invoke iptables
before starting and after stopping the container to
add and delete the necessary rules for the container.
Configuration per container
If you want to customize the options a container is started with using
machinectl
you can create a .nspawn
file next to your image with the same
name. On startup it will be parsed by systemd-nspawn
and possibly override the
default settings of the unit template. Have a look at the
systemd.nspawn
man page for the options. To forward port 80
of the buster
container to port 8080
on the host, you could create the following buster.nspawn
file in /etc/systemd/nspawn
.
It cannot be put next to the image since some options are privileged and therefore need
to be set inside /etc/systemd/nspawn
to be applied. Information about which options are
privileged can also be found inside the man page.
[Network]
Port=8080:80
VirtualEthernet=yes
After creating the config file and starting the container again, port 80
on the container will
be forwarded to port 8080
on your host. It is important to note that systemd-nspwan
will not
forward the port to your loopback interface. So it won’t be available via 127.0.0.1:8080
or localhost:8080
.
This caused quite some confusion for me :-)
That’s it for now. I hope I could give you a small and understandable introduction on how to run containers with the
help of systemd-nspawn
. I am currently trying to figure out how you could use systemd-nspawn
with existing
workload orchestrators like HashiCorp Nomad so stay tuned :-)
Jan
© 2024 JanMa's Blog ― Powered by Jekyll and hosted on GitLab