-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about openvpn systemd unit file #485
Comments
Hi,
On Wed, Jan 17, 2024 at 03:16:28PM -0800, krm3 wrote:
We use the openvpn packages for Debian bookworm from
https://build.openvpn.net/debian/openvpn/. As we ran in #449 with
2.6.7 (thanks @patcable for reporting) we noticed that in the systemd
unit file for openvpn KillMode ist set to 'process' and not
'control-group'. Therefore after every segfault there were zombies
left and when MaxTasks was reached (which is set to 10) the openvpn
service could not start again. That's why the segfault behaviour
led to a complete openvpn service outage for us.
This sounds not like what should happen. If OpenVPN crashes, and has
current child processes (like for auth plugin, or anything else), these
should be re-parented to systemd, and no zombies should ever happen.
Zombie processes happen if the parent process *is still there* and is
not properly calling wait() on its child processes - but if the parent
process dies (SIGSEGV), this scenario can not happen.
My question is: what is the reason that KillMode is set to 'process' here? systemd manual page is saying: "Note that it is not recommended to set KillMode= to process or even none, as this allows processes to escape the service manager's lifecycle and resource management, and to remain running even while their service is considered stopped and is assumed to not consume any resources."
No process OpenVPN starts is expected to live for a long time or even
beyond OpenVPN ending, so it's somewhat moot whether the primary process
or everything is signalled.
gert
…--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress
Gert Doering - Munich, Germany ***@***.***
|
I think "zombie" was the wrong term. I think the processes were still parented to systemd. I will try to reproduce this with 2.6.7 and investigate further and then come back. Klara |
We have four openvpn services on one node (udp/ipv6, udp/ipv4 and the same with pushing no default route but split routes). On 2024-01-26 01:31 I installed 2.6.7 again and started the services. Soon after, segfaults must have been happened (but I see nothing in the logs) . When I looked at the services about 7 hours later timestamps were showing that services had restarted and one service had already 7 Tasks (normal state is 2). I deactivated the node in the loadbalancer so no new sessions could be established. Recent output cutout from systemctl status:
Whole output for openvpn@tun4u.service:
I think this is not what should happen. When the limit of Tasks is reached the service cannot start again. This is what happened to us after we upgraded to 2.6.7. |
Are you using The fact that you have "Tasks: 2" in steady state is unusual, but is normal when using So I guess there is a plugin bug involved, not noticing if OpenVPN dies - and thus not exiting. So, not a Zombie in the unix sense ("a process that has already exited, and no parent calling So we should see if this plugin bug can be fixed (and of course see that OpenVPN won't SIGSEGV again...) - but this said, it does make sense for systemd to kill all child processes as well, in this case. Depending on the source of the debian unit file, it won't be on us (upstream) to fix it... I'll ping the debian maintainer for his opinion. |
Yes, we are using |
Debian Maintainer here. You are using openvpn@.service which is a unit shipped only by Debian, but the upstream provided openvpn-server@.service have the same issue. I agree that we should probably just change the KillMode. However, I'm not sure why the processes are stuck here at all. I have only seen that with DCO when the kernel module hung, and in that case changing the KillMode will probably not help you (the processes are unkillable). Can you kill the processes manually by PID? Does it help to locally override |
I'm fairly sure that this is a bug / misfeature in I do wonder if there is a possible drawback on changing the |
Yes, it works:
Just implemented this on another node. We will see if the number of tasks remains 2. |
Seems to help. The number of tasks is still 2 for all services although the services have obviously been restarted i.e. segfaults have occurred. |
We use the openvpn packages for Debian bookworm from https://build.openvpn.net/debian/openvpn/. As we ran in #449 with 2.6.7 (thanks @patcable for reporting) we noticed that in the systemd unit file for openvpn KillMode ist set to 'process' and not 'control-group'. Therefore after every segfault there were zombies left and when MaxTasks was reached (which is set to 10) the openvpn service could not start again. That's why the segfault behaviour led to a complete openvpn service outage for us.
My question is: what is the reason that KillMode is set to 'process' here? systemd manual page is saying: "Note that it is not recommended to set KillMode= to process or even none, as this allows processes to escape the service manager's lifecycle and resource management, and to remain running even while their service is considered stopped and is assumed to not consume any resources."
Thanks in advance for your explanation.
The text was updated successfully, but these errors were encountered: