systemd monitoring
We want to do an action (send an alert) when a systemd unit fails.
It's easy: we want to follow journalctl logs, but only messages with the syslog identifier systemd:
journalctl -t systemd
Oct 31 00:00:00 hostname systemd[1]: logrotate.service: Deactivated successfully.
Oct 31 00:00:00 hostname systemd[1]: Finished Logrotate Service.
Oct 31 00:00:00 hostname systemd[1]: systemd-tmpfiles-create.service: Deactivated successfully.
Oct 31 00:00:00 hostname systemd[1]: systemd-tmpfiles-create.service: Consumed 70ms CPU time, 2.9M memory peak, 1.5M read from disk.
Oct 31 00:00:01 hostname systemd[1]: backup.service: Failed with result 'exit-code'.
We're interested by lines like the last one, which indicate that a unit failed.
We don't need all the metadata, so we add the -o cat option.
journalctl -t systemd -o cat
logrotate.service: Deactivated successfully.
Finished Logrotate Service.
systemd-tmpfiles-create.service: Deactivated successfully.
systemd-tmpfiles-create.service: Consumed 70ms CPU time, 2.9M memory peak, 1.5M read from disk.
backup.service: Failed with result 'exit-code'.
We need a Pattern that will match a systemd unit:
{
patterns: {
unit: {
regex: @'[a-zA-Z0-9\-_@]+\.(:?automount|mount|scope|service|slice|socket|path|target|timer)\b',
// Optionnally ignore units that you don't want to monitor:
// ignore: ["buggy-job.service"],
}
}
}
We can now create the corresponding Stream and Filter:
{
streams: {
systemd: {
cmd: ['journalctl', '-fn0', '-o', 'cat', '-t', 'systemd'],
filters: {
failedunit: {
regex: [@'^<unit>: Failed with result'],
actions: {
// Add an action
}
}
}
}
}
}
See Example Actions for inspiration on how to send an alert.