Systemd: Enable indefinite service restarts

126 points by secure 1 year ago | 78 comments
  • deathanatos 1 year ago
    > Why does systemd give up by default?

    > I’m not sure. If I had to speculate, I would guess the developers wanted to prevent laptops running out of battery too quickly because one CPU core is permanently busy just restarting some service that’s crashing in a tight loop.

    sigh … bounded randomized exponential backoff retry.

    (exponential: double the maximum time you might wait each iteration. Randomized: the time you want is a random amount, between [0, current maximum] (yes, zero.). Bounded: you stop doubling at a certain point, like 5 minutes, so that we'll never wait longer than 5 minutes; otherwise, at some point you're waiting for ∞s, which I guess is like giving up.)

    (The concern about logs filling up is a worse one. It won't directly solve this, but a high enough max wait usually slows the rate of log generation enough that it becomes small enough to not matter. Also do your log rotations on size.)

    • kaba0 1 year ago
      Arguably, this logic should live in another place that monitors the service.

      Especially that service startup failure is usually not something that gets fixed on its own, like a network connection (where exponential backoff is (in)famous). A bad config file, or a failed disk won’t recover in 10 minutes on its own, so systemd’s default makes sense here, I believe.

      • otterley 1 year ago
        systemd is a service monitor. It wouldn't be nearly as useful if it wasn't!
        • gizmo686 1 year ago
          From the servers perspective, external problems typically do get fixed on their own. It is nice when resolving the primary issue is sufficient to fix the entire system; instead of needing to resolve the primary issue; then fix all the secondary and tertiary issues.

          At my work, we have a simple philosophy for this. The tester is allowed to (on the test system): toggle servers' power; move around network cables; input bad configuration; etc; in any permutation he wants. So long as at the end of the exersise everything is setup correctly the system should function nominally (potentially after a reasonable delay).

          There should, of course, be a system level dashboard that notifies someone there is a problem; but that is unrelated to the server internal retry logic.

        • jxf 1 year ago
          Q: Why is the optimal lower bound zero and not "at least as long as you waited last time"?
          • onetoo 1 year ago
            edit: I did some more investigation, and I missed something crucial: The distribution of requests over time becomes very wonky if the lower bound isn't 0. It stabilizes given enough time, but that time seems very long. Whereas lower-bound 0 quickly becomes uniform.

            See the following figure, where I plot the # of requests over type: https://i.imgur.com/PNUFhjc.png

            And that is why you should use 0.

            ---

            I am not an expert, but I got nerd-sniped.

            I have made some simulations[1] and done some napkin math[2], and I would summarize as follows:

            I don't think there is any globally optimal, I think it depends on your exact circumstances, and which of {excess waiting time, excess load} you want to optimize. Defaulting to 0 is a lot easier to implement, and is at most a factor 2 worse than the optimal lower bound w.r.t. waiting time vs. load. You may also consider [0.5 * maximum, maximum], which trades some excess waiting time for less load. Your suggestion is a similar heuristic one might use depending on their exact circumstances.

            [1] https://gist.github.com/Dryvnt/1984d9389ae7386127f5e8998bf52...

            [2] Consider a family of random bounded exponential back-off strategies with ultimate upper bound U as follows:

              Strategy X(z): Random of [z * U, U], where z is constant, 0 <= z <= 1
            
            There are other families of back-off algorithms with other characteristics, and I am not considering or comparing those, just this family. Note that strategy OP suggests is X(0). Consider that X(1) is non-random, which is undesirable.

            Simple probability tells us

              avg(X(z)) = (1 + z) * U / 2
            
            Server load ~= frequency, frequency is inverse duration, so

              load(X(z)) ~= 1 / avg(X(z)) = 2 / (1 + z) * U
            
            The difference in load between any two X strategies is

              load(X(z1)) / load(X(z2)) = (1 + z1) / (1 + z2)
            
            Since z1 and z2 are constant, this relative load is constant. Due to the bounds on z, the largest possible relative load is

              load(X(1)) / load(X(0)) = 2 / 1 = 2
            
            Now consider your suggested strategy

              Strategy Y: Random of [L, U], where L is the last choice
            
            Note that

              Y = X(y), where y = L / U.
            
            In fact, Y approaches X(1) exponentially fast, since the difference between L and T is halved each step, on average. So your suggestion still falls within this at-most-factor-2 difference. Exactly where just depends on the outage length.
            • jxf 1 year ago
              Yeah, all fair. What piqued my curiosity is that you could wait _less_ time than you did before (potentially not at all!) which feels like the opposite of what you want to do in such situations.
          • 1 year ago
            • isatty 1 year ago
              Regardless, all this opinionated settings should be by OS maintainers or similar. I don’t see why a low level init system tries to make decisions for others. Yes, it may be with good intentions, but don’t.
              • izacus 1 year ago
                Seems like OS maintainers can set those settings, what exactly is the problem?
                • gizmo686 1 year ago
                  Systemd has to pick something as a default. Distributions are more than capable of changing the default on their builds of Systemd if they want to. Feel free to file a bug with Redhat, or Debian, or whoever maintains your distribution and see if they want to change the default on their system.
                  • bogota 1 year ago
                    The amount of times i had to fight and debug systemd compared to any other init system is at least 10x.

                    Yes it does a lot of stuff for you and in others I had to write custom scripts but it was much more understandable and maintainable long term. Sadly systemd won and now i build my own OS without it.

                    • __float 1 year ago
                      Even for basic service running, complex dependencies are so much more manageable in systemd.

                      I'm glad systemd "won"; it's much more maintainable IMO than shell scripts written once and forgotten about (until they break).

                      • kaba0 1 year ago
                        With all due respect, I just don’t believe that. Perhaps it’s just rosy glasses on you, or the modern complexity of services/their dependencies.
                  • ElectricSpoon 1 year ago
                    > I would guess the developers wanted to prevent laptops running out of battery too quickly

                    And I would guess sysadmins also don't like their logging facilities filling the disks just because a service is stuck in a start loop. There are many reasons to think a service failing to start multiple times in a row won't start. Misconfiguration is probably the most frequent reason for that.

                    • twic 1 year ago
                      Exactly. If a service crashes within a second ten times in a row, it's not going to come up cleanly an eleventh time. The right thing to do is stay down, and let monitoring get the attention of a human operator who can figure out what the problem is. Continually rebooting is just going to fill up logs, spam other services, and generally make trouble.

                      I'm sure there are exceptions to this. For those, set Restart=always. But it's an absolutely terrible default.

                      • BenjiWiebe 1 year ago
                        It might actually, if a network connection is temporarily down.
                        • rendaw 1 year ago
                          Or a disk not attached yet. Or another service it depends on being slow to finish starting up.
                        • growse 1 year ago
                          Interestingly, the kubernetes approach is the opposite one. Dependencies between pods / software components are encouraged to be a little softer, so that the scheduler is simpler.

                          Starting up, noticing that the environment doesn't have what you need yet and dying quickly appears to be The Kubernetes Way. A scheduler will eventually restart you and you'll have another go. Repeat until everything is up.

                          The kubelet operates the same way afair. On a node that hasn't joined a cluster yet, it sits in a fail/restart loop until it's provisioned.

                        • deathanatos 1 year ago
                          Heh. We used syslog at one place, with it configured to push logs into ELK. The ingestion into ELK broke … which caused syslog to start logging that it couldn't forward logs. Now that might seem like screaming into a void, but that log went to local disk, and syslog retried it as fast as disk would otherwise allow, so instantly every machine in the fleet started filling up its disks with logs.

                          (You can guess how we noticed the problem…)

                          Also logrotate. (And bounded on size.)

                          • freedomben 1 year ago
                            it's wild how easy it is to misconfigure (or not configure) logrotate properly and have a log file fill up the disk. Out of memory and/or out of disk are the two error cases that have led to the most pain in my career. I think most people who started with docker in the early days (long before there was a docker system prune) had this happen where old docker containers/images filled up the disk and wreaked havoc at an unsuspecting point.
                            • doubled112 1 year ago
                              I used to joke that if VMware engineers couldn't figure out the logrotate configuration for their own product for a few releases, what chance do I have?
                          • melolife 1 year ago
                            I've seen bad service design having e.g.

                            Before=systemd-user-sessions.service

                            This means that as long as systemd is trying to (re)start the service, nobody can log in. Which is a problem with infinite restarts.

                            It's still pretty easy to accidentally set up an infinite restart loop with the default settings if your service takes more than 2s to crash.

                          • tadfisher 1 year ago
                            This must be a different philosophy. When I see something like this happening, I investigate to find out why the service is failing to start, which usually uncovers some dependency that can be encoded in the service unit, or some bug in the service.
                            • zhengyi13 1 year ago
                              I think the author's specified use case is to address transient conditions that drive failures.

                              When the given (transient) condition goes away (either passively, or because somebody fixed something), then the service comes back without anyone needing to remember to restart the (now dead) service.

                              By way of example, I've run apps that would refuse to come up fully if they couldn't hit the DB at startup. Alternatively, they might also die if their DB connection went away. App lives on one server; DB lives on another.

                              It'd be awfully nice in that case to be able to fix the DB, and have the app service come back automatically.

                              • ot 1 year ago
                                Imagine you use systemd to manage daemons in a large distributed system. Crashes could be caused by a failure in a dependency. Once you fix the dependency, you want all your systems to recover as quickly as possible, you don't want to go through each one of them to manually restart things.

                                This doesn't mean that you don't investigate, it just means that you have an additional guarantee that the system can automatically eventually recover.

                                If you set a limit on number or time or restart, what's a reasonable limit? That will be context dependent, and as soon as it's more than a few minutes, it may as well be infinite.

                                • chpatrick 1 year ago
                                  If your server has a bug that makes it crash every two hours you still want it up the rest of the time until you fix it.
                                  • mise_en_place 1 year ago
                                    That's exactly why systemd should blindly attempt to restart the service infinitely. Seperation of concerns. An init system should simply start and monitor services. That is what an init system is meant to do. The fact that systemd is overengineered and tries to do multiple things causes headaches for a lot of us. Busybox-init is one of the best alternatives, I would use that everywhere if I could.
                                    • vidarh 1 year ago
                                      It's trivial to make systemd do that if that is what you want, but there are also plenty of cases when that is not what you want and you then end up trying to write crash-proof startup scripts to provide backoff instead of just changing a flag in a unit file.

                                      (And if you want a dumb unit system, there are plenty of options which will run just fine under systemd as a single unit so you never have to actually use systemd for your own services even if you're forced to use systemd for the overall system for whatever reason)

                                    • tekla 1 year ago
                                      Of course you understand you can do both, like I do.
                                      • BarbaryCoast 1 year ago
                                        ...and now you know why I don't run systemd. I believe their thought process is: what would Windows do? This is an example. For instance, the desktop shell still crashes often. In the old days, this would lock up the keyboard and mouse, and you'd have to power cycle. But MS "fixed" it by simply adding infinite restarts to the system. Now we have systemd. When something crashes, there's no need to fix the bug, just restart it.

                                        My favorite new misfeature is PulseAudio. These geniuses actually built code for a multi-user, multi-tasking OS...which will only run for ONE user, and then only if that user is logged in. So forget running cron jobs, and sounding an alert if something needs attention.

                                        This is all code produced by FreeDesktop[.]org. Thanks to them, your industrial strength, mission-critical server OS is now only suitable for single-user desktop systems.

                                      • PhilipRoman 1 year ago
                                        I can understand avoiding infinite restarts when there is something clearly wrong with configuration, but I can't figure out why they made the "systemctl restart" command also limited by this. For services which don't support dynamic reloading, restarting them is a substitute for that. This makes "systemctl restart" extremely brittle when used from scripts.

                                        Nobody accidentally runs "systemctl restart" too fast, when such a command is issued it is clearly intentional and should be always respected by systemd.

                                        • cozzyd 1 year ago
                                          systemctl just uses dbus, as far as I understand, and someone can easily send dbus commands too fast
                                          • felbane 1 year ago
                                            [flagged]
                                          • twinpeak 1 year ago
                                            Recently discovered while making a monitoring script that systemd exposes a few properties that can be used to alert on a service that is continuously failing to start if it's set to restart indefinitely.

                                                # Get the number of restarts for a service to see if it exceeds an arbitrary threshold.
                                                systemctl show -p NRestarts "${SYSTEMD_UNIT}" | cut -d= -f2
                                            
                                                # Get when the service started, to work out how long it's been running, as the restart counter isn't reset once the service does start successfully.
                                                systemctl show -p ActiveEnterTimestamp "${SYSTEMD_UNIT}" | cut -d= -f2
                                            
                                                # Clear the restart counter if the service has been running for long enough based on the timestamp above
                                                systemctl reset-failed "${SYSTEMD_UNIT}"
                                            • o11c 1 year ago
                                              It would be nice if `RestartSec` weren't constant.

                                              Then you could have the default be 100ms for one-time blips, but (after a burst of failures) fall back gradually to 10s to avoid spinning during longer outages.

                                              That said, beware of failure chains causing the interval to add up. AFAIK there's no way to have the kernel notify you of when a different process starts listening on a port.

                                              • dijit 1 year ago
                                                > AFAIK there's no way to have the kernel notify you of when a different process starts listening on a port.

                                                You can use mandatory access control for this.

                                                AppArmour or SELinux are examples.

                                                Unfortunately they are hard, not sexy and sysadmins (people who tend to do not sexy hard things) are a dead/dying breed

                                                • saint_yossarian 1 year ago
                                                  There's `RestartSteps` and `RestartMaxDelaySec` for that, see the manpage `systemd.service`.
                                                  • o11c 1 year ago
                                                    Ah, not in the man page on my system.

                                                    Available since systemd 254, released July 2023 (only 1 release since then). Huh, has release rate severely slowed down?

                                                  • BobbyTables2 1 year ago
                                                    > AFAIK there's no way to have the kernel notify you of when a different process starts listening on a port.

                                                    For startup, I’d argue the proper way is for the process to bind the socket before forking as a daemon.

                                                    With such a design, one can launch a list of dependent processes without worrying how long they each take to start up. No polling loops needed!

                                                    Of course, that requires some careful design — “fork and forget” is too appealing. The process would be responsible for creating its PID file after forking but the socket before…

                                                    Alternatively, an IPC notification could be used but would require some sort of standardization to be generally useful.

                                                    • nomel 1 year ago
                                                      > AFAIK there's no way to have the kernel notify you of when a different process starts listening on a port.

                                                      Would the ExecCondition be appropriate here, minimally, with a script that runs `lsof -nP -iTCP:${yourport} -sTCP:LISTEN`?

                                                      • o11c 1 year ago
                                                        I'm talking: once your process has started, how do you wait for a process you depend on?

                                                        Obviously if systemd opens the port for you it's easy enough (in this case, even across machines), but otherwise you have to do a sleep loop. And I'm not sure how dependency restarts work in this case.

                                                        ExecCondition must moves the spin to systemd, and has more overhead than doing in your own process. There's no point in gratuitously restarting after all.

                                                    • akira2501 1 year ago
                                                      I've always preferred daemontools and runit's ideology here. If a service dies, wait one second, then try starting it. Do this forever.

                                                      The last thing I need is emergent behavior out of my service manager.

                                                      • freedomben 1 year ago
                                                        Systemd can do that exactly that. it just doesn't do that by default. But if that's what you want, it's trivial
                                                        • rconti 1 year ago
                                                          ... and how many of us knew this before the article?
                                                          • otterley 1 year ago
                                                            Anyone who read its documentation, which is comprehensive and clear.
                                                          • akira2501 1 year ago
                                                            Is it possible to do this system wide? Or do I have to do it for each individual service? It may be a trivial amount of work but if the configuration is fragile, I've gained nothing.
                                                            • izacus 1 year ago
                                                              It's literally described in the article.
                                                        • franknord23 1 year ago
                                                          I believe this allows you to have cascading restart strategies, similar to what can be done in Erlang/OTP: Only after the StartLimit= has been reached, systemd considers the service as failed. Then services that have Required= set on the failed service will be restarted/marked failed as well.

                                                          I think you can even have systemd reboot or move the system into a recovery mode (target) if an essential unit does not come up. That way, you can get pretty robust systems that are highly tolerant to failures.

                                                          (Now after reading `man systemd.unit`, i am not fully sure how exactly restarts are cascaded to requiring units.)

                                                          • vidarh 1 year ago
                                                            You can trigger units explicitly on failure with OnFailure=someservice as well (and since you can parameterize service names, you can have e.g. a single failure@.service that'll do whatever you prefer once a service fails.

                                                            OnFailure makes it easy to implement more complex restart or notification logic.

                                                          • mise_en_place 1 year ago
                                                            I’ve been bitten by the restart limit many times. Our application server (backend) was crash looping, newest build fixed the crash, but systemd refused to restart the service due to the limit. A subtle but very annoying default behavior.
                                                            • dijit 1 year ago
                                                              are you saying systemd was refusing to restart after manual intervention?
                                                              • mise_en_place 1 year ago
                                                                Correct, because the startup limit had been reached: `service start request repeated too quickly, refusing to start`.
                                                                • dijit 1 year ago
                                                                  Thats terrifying, systemd shouldn't pretend to be smarter than manual intervention.

                                                                  That violates everything I ever enjoyed linux for, I left Windows because it thought it knew better than me.

                                                              • freedomben 1 year ago
                                                                Did your deployment process/script not include restarting the service?
                                                                • mise_en_place 1 year ago
                                                                  It does, but systemd refused to start the service because of the startup limit.
                                                              • halyconWays 1 year ago
                                                                Seems reasonable if the service is failing due to a transient network issue, which takes many minutes to resolve.
                                                                • bravetraveler 1 year ago
                                                                  > And then you need to remember to restart the dependent services later, which is easy to forget.

                                                                  You missed the other direction of the relationship.

                                                                  I posted elsewhere in the thread on this, don't rely on entropy. Define your dependencies (well)

                                                                  After=/Requires= are obvious. People forget PartOf=.

                                                                  • 3abiton 1 year ago
                                                                    [flagged]