How does KillSignal interact with TimeoutStopSec in systemd? - linux

Can someone let me know the following about systemd service shutdown sequence
If I have specified KillSignal=SIGTERM then how does this interact
this TimeoutStopSec ? Does this mean that during shutdown of
service, first SIGTERM will be sent and if the service is still
running after TimeoutStopSec SIGKILL will be sent (if SendSIGKILL is
set to yes)? I am asking about the case where nothing is specified in
ExecStop.
Does TimeoutStopSec take into account ExecStop and all ExecPostStop?

This has been answered in systemd email thread. Posting the answer below
Can someone let me know the following about systemd service shutdown
sequence
1.
If I have specified KillSignal=SIGTERM then how does this interact this
TimeoutStopSec ? Does this mean that during shutdown of service, first
SIGTERM will be sent and if the service is still running after
TimeoutStopSec SIGKILL will be sent (if SendSIGKILL is set to yes? I am
asking about the case where nothing is specified in ExecStop.
Yes, that's correct
2.
Does TimeoutStopSec take into account ExecStop and all ExecPostStop?
TimeoutStopSec is for every command. If ExecStopPost command fails (or
times out) subsequent commands are not executed, but if each command
requires almost TimeoutStopSec time, total execution time will be
close to ExecStopPost commands multiplied by TimeoutStopSec.

From that systemd man
This option serves two purposes. First, it configures the time to wait
for each ExecStop= command. If any of them times out, subsequent
ExecStop= commands are skipped and the service will be terminated by
SIGTERM. If no ExecStop= commands are specified, the service gets the
SIGTERM immediately. This default behavior can be changed by the
TimeoutStopFailureMode= option. Second, it configures the time to wait
for the service itself to stop. If it doesn't terminate in the
specified time, it will be forcibly terminated by SIGKILL (see
KillMode= in systemd.kill(5)). Takes a unit-less value in seconds, or
a time span value such as "5min 20s". Pass "infinity" to disable the
timeout logic. Defaults to DefaultTimeoutStopSec= from the manager
configuration file (see systemd-system.conf(5)).
If a service of Type=notify sends "EXTEND_TIMEOUT_USEC=…", this may
cause the stop time to be extended beyond TimeoutStopSec=. The first
receipt of this message must occur before TimeoutStopSec= is exceeded,
and once the stop time has extended beyond TimeoutStopSec=, the
service manager will allow the service to continue to stop, provided
the service repeats "EXTEND_TIMEOUT_USEC=…" within the interval
specified, or terminates itself (see sd_notify(3)).

Related

What is the signal sent to the process running in the container when k8s liveness probe fails? KILL or TERM

I have a use case to gracefully terminate the container where i have a script to kill the process gracefully from within the container by using the command "kill PID".( Which will send the TERM signal )
But I have liveness probe configured as well.
Currently liveness probe is configured to probe at 60 second interval. So if the liveness probe take place shortly after the graceful termination signal is sent, the overall health of the container might become CRITICAL when the termination is still in progress.
In this case the liveness probe will fail and container will be terminated immediately.
So i wanted to know whether kubelet kills the container with TERM or KILL.
Appreciate your support
Thanks in advance
In Kubernetes, Liveness Probe checks for the health state of a container.
To answer your question on whether it uses SIGKILL or SIGTERM, the answer is both are used but in order. So here is what happens under the hood.
Liveness probe check fails
Kubernetes stops routing of traffic to the container
Kubernetes restarts the container
Kubernetes starts routing traffic to the container again
For container restart, SIGTERM is first sent with waits for a parameterized grace period, and then Kubernetes sends SIGKILL.
A hack around your issue is to use the attribute:
timeoutSeconds
This specifies how long a request can take to respond before it’s considered a failure. You can add and adjust this parameter if the time taken for your application to come online is predictable.
Also, you can play with readinessProbe before livenessProbe with an adequate delay for the container to come into service after restarting the process. Check https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/ for more details on which parameters to use.

Systemd http health check

I have a service on a Redhat 7.1 which I use systemctl start, stop, restart and status to control. One time the systemctl status returned active, but the application "behind" the service responded http code different from 200.
I know that I can use Monit or Nagios to check this and do the systemctl restart - but I would like to know if there exist something per default when using systemd, so that I do not need to have other tools installed.
My preferred solution would be to have my service restarted if http return code is different from 200 totally automatically without other tools than systemd itself - (and maybe with a possibility to notify a Hipchat room or send a email...)
I've tried googling the topic - without luck. Please help :-)
The Short Answer
systemd has a native (socket-based) healthcheck method, but it's not HTTP-based. You can write a shim that polls status over HTTP and forwards it to the native mechanism, however.
The Long Answer
The Right Thing in the systemd world is to use the sd_notify socket mechanism to inform the init system when your application is fully available. Use Type=notify for your service to enable this functionality.
You can write to this socket directly using the sd_notify() call, or you can inspect the NOTIFY_SOCKET environment variable to get the name and have your own code write READY=1 to that socket when the application is returning 200s.
If you want to put this off to a separate process that polls your process over HTTP and then writes to the socket, you can do that -- ensure that NotifyAccess is set appropriately (by default, only the main process of the service is allowed to write to the socket).
Inasmuch as you're interested in detecting cases where the application fails after it was fully initialized, and triggering a restart, the sd_notify socket is appropriate in this scenario as well:
Send WATCHDOG_USEC=... to set the amount of time which is permissible between successful tests, then WATCHDOG=1 whenever you have a successful self-test; whenever no successful test is seen for the configured period, your service will be restarted.

systemd start service after another one stopped issue

I have 2 services that i need to start.
First service has download jobs required for second service.
First service
[Unit]
Description=First
After=network.target
Second service
[Unit]
Description=Second
After=First
Problem is they both start at the same time, i need second service to wait until first one is dead.
I don't wait to use sleep because download jobs can be big.
Thank you.
In your first service add
ExecStopPost = /bin/systemctl start Second
what this does is when the service terminates the above option is activated and thus second service is called.
This particular option(ExecStopPost) allows you to execute commands that are executed after the service is stopped. This includes cases where the commands configured in ExecStop= were used, where the service does not have any ExecStop= defined, or where the service exited unexpectedly.

How Does Azure Interrupt a Worker Role For Deployment?

I'm moving some background processing from an Azure web role to a worker role. My worker needs to do a task every minute or so, possibly spawning off tasks:
while(true){
//start some tasks
Thread.Sleep(60000);
}
Once I deploy, it will start running forever. So later, when I redeploy, how does Azure stop my process for redeployment?
Does it just kill it instantly? Is there a way to get a warning that it's shutting down? Do I just have to make sure everything is transactional?
When a role (either worker or web) is asked to gracefully shut down (because it is being scaled down or because you've asked for a redeployment) the OnStop method of the RoleEntryPoint class is called. This is the same class which has the Run method which likely either contains your loop or calls the code that contains that loop.
A couple of things to note here: The OnStop has 5 minutes to actually stop, after that the process is simply killed. If you have to call something else to shut down asynchronously, you'll need the thread in OnStop to be kept busy waiting until that other process is shut down. Once execution has left OnStop the platform assumes the machine can be shut down.
If you need to gracefully stop processing but it not require a shutdown of the machine then you can put a setting in the service config file that you can update to indicate work should be done or note. So for example a bool that says "ProcessQueues". Then in your onStart in RoleEntryPoint you hook the RoleEnvironmentChanging event. Your event handler then looks for a RoleEnvironmentConfigurationSettingChange to occur and then checks the ProcessQueues bool. If it is true it either starts up or continues processing, if it is false it stop the processing gracefully. You can then do a config change to control when things are running or not. This is one option of handling this and there are many more depending on how quickly you need to stop processing, etc.

Gearman-manager: Speed decreases when putty closed

SOLUTION:
The solution that I found: using low level nohup program that ignores signal sent by putty when closing the connection.
So, instead of ./gearman-manager start I did nohup ./gearman-manager start
NOTE: Still, I would like to know why was it slowing down when closing putty OR why does it continues in the first place if it has received the hangup signal???
I have a problem with execution of a gearman worker after I close a putty session.
This is what I have:
gearman client that is started with a cron job checking something in DB (infinite loop).
gearman manager started with gearman-manager start command receiving client's tasks and managing the calls to a worker
gearman worker reading/writing from DB and echoing the status of a current job
When I start gearman-manager I can see the echos from my worker when it receives task and when it executes them. Tasks (updates in DB) are executed cca. 1/second...
A) When I close putty session the speed of changes in DB decreases enormously (cca. 1/10sec)?! Could you tell me why is this?
B) When I log back with putty I don't get the outputs of gearman-manager back to the screen? I expected I'll log back into and see that it continues to echo the status like it did before closing putty? Maybe this could be because gearman-manager is started with owner root while the echoes are coming from .php ran as user gearman? or maybe when I log back into it the process is in the background?!
You don't see the output when you create a new tty because the process was bound to the previous tty. Unless you use something like screen to keep the tty alive, you aren't going to see that output with a new terminal.

Resources