Platforms like Kubernetes, Nomad, or any cloud-hosted platform-as-a-service (Paas) offer a variety of powerful capabilities. From scaling workloads to secrets management to deployment strategies, these workload orchestrators are optimized to help scale infrastructure in different ways.
But do operators always need to pay the cost for maximal scalability? Sometimes, the cost of complexity and abstraction overcome their benefits. Many builders instead come to rely on radically simple deployment architectures for ease of management. Two virtual private servers behind a load balancer is a drastically simpler stack to manage in comparison to a sprawling cluster of microservers across a fleet of container hosts. This can start to pay dividends when there are fewer moving parts to debug when problems arise or upgrade when the time comes to maintain them.
The foundation for many modern Linux distributions is systemd, and it comes with a strong set of features that are often comparable to container orchestrators or PaaS systems. In this article, we’ll explore how you can leverage the latest systemd features to gain many of the abilities of those other large systems without the management headache and supercharge any ordinary Linux server to be a very capable application platform.
On a single host, writing a systemd .service
file is an ideal way to run a managed process. Most of the time, you don’t even need to change the application at all: systemd supports a variety of different kinds of services and can adapt accordingly.
For example, consider this simple .service
that defines how to run a simple web service:
[Unit]
Description=a simple web service
[Service]
ExecStart=/usr/bin/python3 -m http.server 8080
Remember the defaults for systemd services: ExecStart=
must be an absolute path, processes should not fork into the background, and you may need to set requisite environment variables with the Environment=
option.
When placed into a file like /etc/systemd/system/webapp.service
, this creates a service that you can control with systemctl
:
systemctl start webapp
will start the process.systemctl status webapp
will display whether the service is running, its uptime, and output from stderr
and stdout
, as well as the process’s ID and other information.systemctl stop webapp
will end the service.
In addition, all output printed to stderr
and stdout
will be aggregated by journald and accessible via the system journal (with journalctl
) or targeted specifically using the --unit
flag:
journalctl --unit webapp
Because journald rotates and manages its storage by default, collecting logs via the journal is a good strategy for managing log storage.
The rest of this article will explore options to enhance a service like this one.
Container orchestrators like Kubernetes support the ability to securely inject Secrets: values drawn from secure datastores and exposed to running workloads. Sensitive data like API keys or passwords require different treatment than environment variables or configuration files to avoid unintentional exposure.
The LoadCredential=
systemd option supports loading sensitive values from files on-disk and exposing them to running services in a secure way. Like hosted platforms that manage secrets remotely, systemd will treat Credentials differently than values like environment variables to ensure that they’re kept safe.
To inject a secret into a systemd service, begin by placing a file containing the secret value into a path on the filesystem. For example, to expose an API key to a .service
unit, create a file at /etc/credstore/api-key
to persist the file across reboots or at /run/credstore/api-key
to avoid persisting the file permanently (the path can be arbitrary, but systemd will treat these credstore
paths as defaults). In either case, the file should have restricted permissions using a command like chmod 400 /etc/credstore/api-key
.
Under the [Service]
section of the .service
file, define the LoadCredential=
option and pass it two values separated by a colon (:
): the name of the credential and its path. For example, to call our /etc/credstore/api-key
file “token,” define the following systemd service option:
LoadCredential=token:/etc/credstore/api-key
When systemd starts your service, the secret is exposed to the running service under a path of the form ${CREDENTIALS_DIRECTORY}/token
where ${CREDENTIALS_DIRECTORY}
is an environment variable populated by systemd. Your application code should read in each secret defined this way for use in libraries or code that require secure values like API tokens or passwords. For example, in Python, you can read this secret with code like the following:
from os import environ
from pathlib import Path
credentials_dir = Path(environ["CREDENTIALS_DIRECTORY"])
with Path(credentials_dir / "token").open() as f:
secret = f.read().strip()
You can then use the secret
variable with the contents of your secret for any libraries that may require an API token or password.
Another capability of orchestrators like Nomad is the ability to automatically restart a workload that has crashed. Whether due to an unhandled application error or some other cause, restarting failed applications is a very useful capability that is often the first line of defense when designing an application to be resilient.
The Restart=
systemd option controls whether systemd will automatically restart a running process. There are several potential values for this option, but for basic services, the on-failure
setting is well-suited to satisfy most use cases.
Another setting to consider when configuring auto-restart is the RestartSec=
option, which dictates how long systemd will wait before starting up the service again. Typically, this value should be customized to avoid restarting failed services in a tight loop and potentially consuming too much CPU time spent restarting processes. A short value that doesn’t wait too long like 5s
is usually sufficient.
Options like RestartSec=
that accept duration periods or time-based values can parse a variety of formats like 5min 10s
or 1hour
depending on your needs. Reference the manual for systemd.time for additional information.
Finally, two other options dictate how aggressively systemd will attempt to restart failed units before eventually giving up. StartLimitIntervalSec=
and StartLimitBurst=
will control how often a unit is permitted to start within a given period of time. For example, the following settings:
StartLimitBurst=5
StartLimitIntervalSec=10
It will only permit a unit to try to start up for a maximum of 5 times over a period of 10 seconds. If the configured service attempts to start up a sixth time within a period of 10 seconds, systemd will stop attempting to restart the unit and mark it as failed
instead.
Combing all of these settings, you might include the following options for your .service
unit to configure automatic restarts:
[Unit]
StartLimitBurst=5
StartLimitIntervalSec=10
[Service]
Restart=on-failure
RestartSec=1
This configuration will restart a service if it fails - that is, it exits unexpectedly, such as with a nonzero exit code - after waiting for one second and will stop attempting to restart the service if it tries to start more than five times over the course of 10 seconds.
One of the chief benefits of running within a container is security sandboxing. By segmenting an application process from the underlying operating system, any vulnerabilities that may be present in the service are much more difficult to escalate into full-blown compromise. Runtimes like Docker achieve this through a combination of cgroups and other security primitives.
You may enable several systemd options to enforce similar restrictions that can help protect an underlying host against unpredictable workload behavior:
ProtectSystem=
can restrict write access to sensitive system paths like /boot
and /usr
. The documentation for this option enumerates all the available options, but generally speaking, setting this option to full
is a reasonable default to protect these filesystem paths.ProtectHome=
can set the /home
, /root
, and /run/user
directories to read-only with the read-only
setting or, when set to true
, mount them into the service’s filesystem as empty directories. Unless your application has a specific need to access these directories, setting this to true
can safely harden the system against illegitimate access to those directories.PrivateTmp=
maintains a separate /tmp
and /var/tmp
for the configured service so that temporary files for this service and other processes remain private. Unless there is a compelling reason for processes to share information via temporary files, this is a useful option to enable.NoNewPrivileges=
is another safe and straightforward way to harden a service by ensuring that the executed process cannot elevate its privileges. If you’re unsure about the ability to use other hardening options, this is generally one of the least problematic to enable.
The manual page for systemd.exec is a helpful resource for exploring the different options that apply to executable workloads like services.
The manual pages for the systemd project are extensive and useful for learning about all the options available for running your own applications. Whether you’re running a persistent service like a webserver or a periodic .timer
unit to replace a cron job, the systemd documentation can offer helpful guidance.