Oh hey, I’m back. Been a while. Today, I want to share with you how I’m using systemd to start my Clojure applications on matthiasnehlsen.com, and keep them alive, in case anything should go wrong. These are the applications managed this way:
- BirdWatch, an application for tweet stream analysis, see on GitHub
- redux-counter example, a sample application for my Clojure book
- trailing mouse pointer example, another sample application for the book
- inspect, a demo for my inspect library. This is will soon be replaced by a new version making sense of messages passed around in systems-toolbox applications.
Also, I’m using systemd to start up sse-chat, a Scala demo application which you can also find on GitHub. However, this application is only started by systemd, but not restarted when anything goes wrong.
The background for this post is that I recently ordered a new Skylake Intel® Xeon® E3-1275 v5 based server at Hetzner, and I felt it was finally time to retire the manual process startup approach I had used before. Servers should be updated as often as possible, but who does that often enough when it takes 10-15 minutes to wait for a reboot and then manually restart the processes? Certainly not me. So instead, all process startup should be automatic. Initially, I considered using Docker, but regarding monitoring that the application is alive, and restarting it if not, systemd has the better story to offer. Also, I wasted way too much time on a Docker environment in my last client project, so I’m a little cured of the snake oil.1
So what I wanted was restarting the machine and have all services come up automatically. Also, I wanted to use the watchdog functionality, which expects the monitored applications to call systemd with a heartbeat message and restarts the application if that heartbeat wasn’t encountered for say 20 seconds or whatever else you define there. You can read all about this mechanism in this blog post by one of the original authors of systemd.
While my applications were running rock solid for months in a row until I finally managed to update the server and restart it, it is certainly appealing from an operations perspective to have a mechanism in place that listens for a heartbeat and restarts a process when the heartbeat does not come as expected. So I thought this might be a good opportunity to write a small library that takes care of emitting said heartbeat when an application is monitored by systemd. You can find this library on GitHub here.
This is the entire library:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
It fires up a switchboard, which manages and wires systems, the
:wd/notify-cmp, which calls
(SDNotify/sendWatchdog) from the SDNotify library, and a scheduler component, which emits
:wd/send messages every
timeout milliseconds. You can build much more complex applications with the systems-toolbox, e.g. BirdWatch. The 14 lines above (plus comments and imports) however are about the minimum case when some scheduling is desired.
You can have a look at the mentioned examples if you’re interested in building systems with the systems-toolbox. In subsequent articles, I will introduce them in detail. For now, you can just use the library in your projects if you want to have your application monitored by systemd. It’s just a one-liner, as you can see for example in the trailing mouse pointer example:
This simple command calls systemd every 5 seconds, but only if the
NOTIFY_SOCKET environment variable is set, which would only be the case if systemd had started the application.
Here’s the service configuration:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
You can find all the service configurations for my server in my **conf project, together with some install scripts which allow me to set up a new server with little effort. I hope this helps you in your deployments. It certainly helps me with mine.
Would you like to know when there’s a new article? Subscribe to the newsletter and I’ll let you know.
There, the problem was that silly Docker service that frequently hung, which, for whatever reason, required a REBOOT of the whole machine. As you can imagine, this was very annoying, as that, of course, meant ALL services would become unavailable until the machine was back up.↩