systemd tips and tricks

Oct 27, 2020 linux systemd

Systemd is the neat little init system that most popular linux distributions rely on at this point. I know it’s controversial, but I don’t really want to get into systemd vs sysvinit, or who got banned from the linux kernel for what – systemd is here to stay, so we should learn how to use it.

systmctl

First things first lets start with the basics:

systemctl start dhcpcd.service

This will start the dhcpcd service. If you’re in a hurry you can leave off .service but in general, it’s best to use it.

Most commonly I run this when I’ve just installed a package – if it’s behaving how I like and I want to avoid having to type this every time I boot, I will enable it. You can combine these (ie. enable the service, and start it immediately) with the --now flag.

systemctl start dhcpcd.service
systemctl enable dhcpcd.service

# or combine them
systemctl enable --now dhcpcd.service

If you’re ever curious which service are provided by a package you’ve just installed, run the following:

$ pacman -Qql wpa_supplicant | grep -Fe .service -e .socket
/usr/lib/systemd/system/wpa_supplicant-nl80211@.service
/usr/lib/systemd/system/wpa_supplicant-wired@.service
/usr/lib/systemd/system/wpa_supplicant.service
/usr/lib/systemd/system/wpa_supplicant@.service

You might correctly infer that using disable will not stop the service. You can send the --now flag just like with enable to stop and disable the service. The same logic applies to mask which prevents starting a unit by symlinking the unit file of the service to /dev/null.

systemctl mask httpd.service

is equivalent to

ln -s /dev/null /etc/systemd/system/httpd.service

Note that in addition to disabling the specified service, this command will also disable any units specified in the Also= setting in the [Install] section of the unit files.

This isn’t a tutorial, so I’m going to avoid going over every single command you can run with systemctl, but here’s the cool part: every single one of these can done on a remote machine with the -H flag! So if you have a recurring pattern of sshing into machines to restart a unit or get its status, for instance, you can let systemctl use the ssh protocol for you.

This is an example bash command with output:

$ systemctl -H webuser@git.crmullins.com status gitea.service
● gitea.service - Gitea (Git with a cup of tea)
     Loaded: loaded (/usr/lib/systemd/system/gitea.service; enabled; vendor (out)preset: disabled)
     Active: active (running) since Wed 2020-12-16 01:52:15 UTC; 1 weeks 6 days ago
   Main PID: 59040
      Tasks: 8 (limit: 1156)
     Memory: 178.1M
     CGroup: /system.slice/gitea.service
             └─59040 /usr/bin/gitea web -c /etc/gitea/app.ini

A couple other helpful systemctl commands:

$ systemctl list-unit-files
UNIT FILE                                                                 STATE           VENDOR PRESET
proc-sys-fs-binfmt_misc.automount                                         static          -            
-.mount                                                                   generated       -            
boot.mount                                                                generated       -            
dev-hugepages.mount                                                       static          -            
dev-mqueue.mount                                                          static          -            
proc-sys-fs-binfmt_misc.mount                                             disabled        disabled     
sys-fs-fuse-connections.mount                                             static          -            
sys-kernel-config.mount                                                   static          -            
sys-kernel-debug.mount                                                    static          -            
sys-kernel-tracing.mount                                                  static          -            
tmp.mount                                                                 static          -            
var-lib-machines.mount                                                    static          -            
var-lib-snapd-snap-core20-875.mount                                       enabled         disabled     
var-lib-snapd-snap-core20-904.mount                                       enabled         disabled     
var-lib-snapd-snap-snapd-10492.mount                                      enabled         disabled     
var-lib-snapd-snap-ubports\x2dinstaller-330.mount
$ systemctl show sshd.service
Type=simple
Restart=always
NotifyAccess=none
RestartUSec=100ms
TimeoutStartUSec=1min 30s
TimeoutStopUSec=1min 30s
TimeoutAbortUSec=1min 30s
TimeoutStartFailureMode=terminate
TimeoutStopFailureMode=terminate
RuntimeMaxUSec=infinity
WatchdogUSec=infinity
WatchdogTimestampMonotonic=0
RootDirectoryStartOnly=no

journalctl

The journalctl utility is the second most useful tool provided by systemd. Maybe the most useful depending on what you do. The first thing I wanna be able to do is “follow” logs, so if you’re used to syslog you can replace tail -f /var/log/messages with

journalctl -f

I won’t go into technical details of how it organized facility codes and priority levels (because I’d just have to look ’em up anyway) but one super handy trick is to check out the logs from your current boot.

journalctl -b -0

The -0 is not necessary but I like to add it because it reminds me of the real utility here. I don’t usually care about logging from the current boot, but if an unrecoverable error happened last time,

journalctl -b -1

will spit the logs of the one before it into a pager.

If I wanted, say, the last ten lines from the previous boot with the last one on top in json format, I could do:

journalctl -b -1 -r -o json -n 10

This is nice ifyou’re going to pipe that output into a filtering tool like jq, but if you’re going to be looking at it with your eyeballs I’d suggest replacying json with json-pretty.

The semantics around error levels and time filtering are nice to familiar with too. If you wanted to see all the error, critical, and alert messages from the last 20 minutes, all you’d have to do it:

journalctl -p err -r --since "20 minutes ago"

There are all sorts of ways these logs are indexed by default, so rather than enumerate them here (I’d just be copying them from the man pages) I’ll show a quick way to explore them:

$ journalctl -o verbose
Thu 2020-12-31 00:19:49.736425 UTC [s=c64cdfd97a5d48a6b71268e13a31065b;i=bca1;b=f8c3659c079b49d5b47775db59372940;m=7d14615fb1;t=5b7b794f80fe9;x=4c6eb3b4c7e00534]
    _TRANSPORT=stdout
    PRIORITY=6
    SYSLOG_FACILITY=3
    _BOOT_ID=f8c3659c079b49d5b47775db59372940
    _MACHINE_ID=2e48cee174d246d5bb35eb2ed2f277dd
    _HOSTNAME=lapbox
    SYSLOG_IDENTIFIER=openvpn
    _UID=973
    _GID=90
    _COMM=openvpn
    _EXE=/usr/bin/openvpn
    _CMDLINE=/usr/bin/openvpn --suppress-timestamps --nobind --config US_Seattle.conf
    _CAP_EFFECTIVE=470c2
    _SYSTEMD_CGROUP=/system.slice/system-openvpn\x2dclient.slice/openvpn-client@US_Seattle.service
    _SYSTEMD_UNIT=openvpn-client@US_Seattle.service
    _SYSTEMD_SLICE=system-openvpn\x2dclient.slice
    MESSAGE=Initialization Sequence Completed
    _STREAM_ID=ef7fc04fb17748888920af652759532c
    _PID=106228
    _SYSTEMD_INVOCATION_ID=1f244289ecd04d63afe25f050c93d6c6

You might look at this and decide you’d like to narrow the search to only logs from processes under user ID 973 since yesterday so you could be like:

journalctl --since yesterday _UID=973

Wondering what the possible values are for a given field? Couldn’t be easier, just use the -F flag with the field:

journalctl -F _SYSTEMD_UNIT

By the way, there are bash hooks for tab-completion on all of these! So typing journalctl _SY<tab> will should you all the completions, and once you choose one, will show you all the available values.

$ journalctl _SYSTEMD_UNIT=<tab><tab>
dbus.service                                           session-3.scope                                      
dhcpcd@wlan0.service                                   session-4.scope                                      
getty@tty1.service                                     snapd.service                                        
...

As you might intuit, using two of these fields will result in a logical OR operation on the logs:

journalctl -UID=70 _UID=71 # show the logs with either _UID value

in other words, using the same fields is an implicit OR. But different fields will be combined with an implicit AND:

journalctl _SYSTEMD_UNIT=dhcpcd@wlan0.service PRIORITY=6

so this will only show logs with both the matching fields.

As the unix gurus woul dhave it, we can specify an explicit OR with the +:

journalctl _SYSTEMD_UNIT=dhcpcd@wlan0.service PRIORITY=6 + \  
    _SYSTEMD_UNIT=openvpn-client@US_Seattle.service PRIORITY=3

Other tools

There are a bunch of other ones. Some I find useful are systemd-cgtop which will show you which cgroups are consuming your resources, updated and displayed in the familiar top fashion.

Once you’ve found which cgroup is causing trouble, you might be interested in which processes belong to it: you can use systemd-cgls:

$ systemd-cgls /user.slice
Control group /user.slice:
└─user-1000.slice 
  ├─user@1000.service 
  │ ├─app.slice 
  │ │ ├─at-spi-dbus-bus.service 
  │ │ │ ├─941 /usr/lib/at-spi-bus-launcher