systemd start service after another one stopped issue - linux

I have 2 services that i need to start.
First service has download jobs required for second service.
First service
[Unit]
Description=First
After=network.target
Second service
[Unit]
Description=Second
After=First
Problem is they both start at the same time, i need second service to wait until first one is dead.
I don't wait to use sleep because download jobs can be big.
Thank you.

In your first service add
ExecStopPost = /bin/systemctl start Second
what this does is when the service terminates the above option is activated and thus second service is called.
This particular option(ExecStopPost) allows you to execute commands that are executed after the service is stopped. This includes cases where the commands configured in ExecStop= were used, where the service does not have any ExecStop= defined, or where the service exited unexpectedly.

Related

How does Application Gateway prevent requests being sent to recently terminated pods?

I'm currently researching and experimenting with Kubernetes in Azure. I'm playing with AKS and the Application Gateway ingress. As I understand it, when a pod is added to a service, the endpoints are updated and the ingress controller continuously polls this information. As new endpoints are added AG is updated. As they're removed AG is also updated.
As pods are added there will be a small delay whilst that pod is added to the AG before it receives requests. However, when pods are removed, does that delay in update result in requests being forwarded to a pod that no longer exists?
If not, how does AG/K8S guarantee this? What behaviour could the end client potentially experience in this scenario?
Azure Application gateway ingress is an ingress controller for your kubernetes deployment which allows you to use native Azure Application gateway to expose your application to the internet. Its purpose is to route the traffic to pods directly. At the same moment all questions about pods availability, scheduling and generally speaking management is on kubernetes itself.
When a pod receives a command to be terminated it doesn't happen instantly. Right after kube-proxies will update iptables to stop directing traffic to the pod. Also there may be ingress controllers or load balancers forwarding connections directly to the pod (which is the case with an application gateway). It's impossible to solve this issue completely, while adding 5-10 seconds delay can significantly improve users experience.
If you need to terminate or scale down your application, you should consider following steps:
Wait for a few seconds and then stop accepting connections
Close all keep-alive connections not in the middle of request
Wait for all active requests to finish
Shut down the application completely
Here are exact kubernetes mechanics which will help you to resolve your questions:
preStop hook - this hook is called immediately before a container is terminated. This is very helpful for graceful shutdowns of an application. For example simple sh command with "sleep 5" command in a preStop hook can prevent users to see "Connection refused errors". After the pod receives an API request to be terminated, it takes some time to update iptables and let an application gateway know that this pod is out of service. Since preStop hook is executed prior SIGTERM signal, it will help to resolve this issue.
(example can be found in attach lifecycle event)
readiness probe - this type of probe always runs on the container and defines whether pod is ready to accept and serve requests or not. When container's readiness probe returns success, it means the container can handle requests and it will be added to the endpoints. If a readiness probe fails, a pod is not capable to handle requests and it will be removed from endpoints object. It works very well with newly created pods when an application takes some time to load as well as for already running pods if an application takes some time for processing.
Before removing from the endpoints readiness probe should fail several times. It's possible to lower this amount to only one fail using failureTreshold field, however it still needs to detect one failed check.
(additional information on how to set it up can be found in configure liveness readiness startup probes)
startup probe - for some applications which require additional time on their first initialisation it can be tricky to set up a readiness probe parameters correctly and not compromise a fast response from the application.
Using failureThreshold * periodSecondsfields will provide this flexibility.
terminationGracePeriod - is also may be considered if an application requires more than default 30 seconds delay to gracefully shut down (e.g. this is important for stateful applications)

How does KillSignal interact with TimeoutStopSec in systemd?

Can someone let me know the following about systemd service shutdown sequence
If I have specified KillSignal=SIGTERM then how does this interact
this TimeoutStopSec ? Does this mean that during shutdown of
service, first SIGTERM will be sent and if the service is still
running after TimeoutStopSec SIGKILL will be sent (if SendSIGKILL is
set to yes)? I am asking about the case where nothing is specified in
ExecStop.
Does TimeoutStopSec take into account ExecStop and all ExecPostStop?
This has been answered in systemd email thread. Posting the answer below
Can someone let me know the following about systemd service shutdown
sequence
1.
If I have specified KillSignal=SIGTERM then how does this interact this
TimeoutStopSec ? Does this mean that during shutdown of service, first
SIGTERM will be sent and if the service is still running after
TimeoutStopSec SIGKILL will be sent (if SendSIGKILL is set to yes? I am
asking about the case where nothing is specified in ExecStop.
Yes, that's correct
2.
Does TimeoutStopSec take into account ExecStop and all ExecPostStop?
TimeoutStopSec is for every command. If ExecStopPost command fails (or
times out) subsequent commands are not executed, but if each command
requires almost TimeoutStopSec time, total execution time will be
close to ExecStopPost commands multiplied by TimeoutStopSec.
From that systemd man
This option serves two purposes. First, it configures the time to wait
for each ExecStop= command. If any of them times out, subsequent
ExecStop= commands are skipped and the service will be terminated by
SIGTERM. If no ExecStop= commands are specified, the service gets the
SIGTERM immediately. This default behavior can be changed by the
TimeoutStopFailureMode= option. Second, it configures the time to wait
for the service itself to stop. If it doesn't terminate in the
specified time, it will be forcibly terminated by SIGKILL (see
KillMode= in systemd.kill(5)). Takes a unit-less value in seconds, or
a time span value such as "5min 20s". Pass "infinity" to disable the
timeout logic. Defaults to DefaultTimeoutStopSec= from the manager
configuration file (see systemd-system.conf(5)).
If a service of Type=notify sends "EXTEND_TIMEOUT_USEC=…", this may
cause the stop time to be extended beyond TimeoutStopSec=. The first
receipt of this message must occur before TimeoutStopSec= is exceeded,
and once the stop time has extended beyond TimeoutStopSec=, the
service manager will allow the service to continue to stop, provided
the service repeats "EXTEND_TIMEOUT_USEC=…" within the interval
specified, or terminates itself (see sd_notify(3)).

Docker Container in Azure Logic App fails does not exit properly

I have a curious issue getting a docker container set up to run and exit properly in an Azure logic app.
I have a python script that prints hello world, then sleeps for 30 minutes. The reason for sleeping is to make the script run longer so that I can test if the container in the logic app exits at the right moment, when the script is done running and not when the loop times out.
First, I confirm that the container is running properly and exiting properly in powershell:
PS C:\Users\cgiltner> docker run helloworld
Running 'Hello World' at 2019-11-26 17:53:48
Hello, World!
Sleeping for 30 minutes...
Awake after 30 minutes
PS C:\Users\cgiltner>
I have the container set up in a logic app as follows, there is an “Until” loop that is configured to run until “State” = “succeeded”
But when I run it, the “until” loop continues for 1 hour, which is the default timeout period for an until loop (PT1H)
Looking at the properties of the container, I can see that the state of the container never changed from “Running”
Just to clarify, the container IS running and executing the script/docker container successfully. The problem is that it is not exiting when the script is actually done, rather it is waiting until the timeout period is done. There is not an error message or any failure indicating that it times out, it just simply moves to the next step. This has big implications in a complex logic app where multiple steps need to happen after containers run, it causes things to take hours in the app.
For your issue, what you need to know first is that your first action of the Logic App is creating the Azure Container Instance, but when the Logic App action has done, the creation of the Azure Container Instance still be not finished. It only returns a pending state and will not update. In your second action, you expect the succeeded state in the Until action. So the result is that the action will delay until timeout.
The solution is that you need to add a pure delay action behind the creation of the Azure Container Instance. Then add the action to get the properties and logs of the containers in the container group.

Systemd http health check

I have a service on a Redhat 7.1 which I use systemctl start, stop, restart and status to control. One time the systemctl status returned active, but the application "behind" the service responded http code different from 200.
I know that I can use Monit or Nagios to check this and do the systemctl restart - but I would like to know if there exist something per default when using systemd, so that I do not need to have other tools installed.
My preferred solution would be to have my service restarted if http return code is different from 200 totally automatically without other tools than systemd itself - (and maybe with a possibility to notify a Hipchat room or send a email...)
I've tried googling the topic - without luck. Please help :-)
The Short Answer
systemd has a native (socket-based) healthcheck method, but it's not HTTP-based. You can write a shim that polls status over HTTP and forwards it to the native mechanism, however.
The Long Answer
The Right Thing in the systemd world is to use the sd_notify socket mechanism to inform the init system when your application is fully available. Use Type=notify for your service to enable this functionality.
You can write to this socket directly using the sd_notify() call, or you can inspect the NOTIFY_SOCKET environment variable to get the name and have your own code write READY=1 to that socket when the application is returning 200s.
If you want to put this off to a separate process that polls your process over HTTP and then writes to the socket, you can do that -- ensure that NotifyAccess is set appropriately (by default, only the main process of the service is allowed to write to the socket).
Inasmuch as you're interested in detecting cases where the application fails after it was fully initialized, and triggering a restart, the sd_notify socket is appropriate in this scenario as well:
Send WATCHDOG_USEC=... to set the amount of time which is permissible between successful tests, then WATCHDOG=1 whenever you have a successful self-test; whenever no successful test is seen for the configured period, your service will be restarted.

To stop the EC2 instance after the execution of a script

I configured a ubuntu server(AWS ec2 instance) system as a cronserver, 9 cronjobs run between 4:15-7:15 & 21:00-23:00. I wrote a cron job on other system(ec2 intance) to stop this cronserver after 7:15 and start again # 21:00. I want the cronserver to stop by itself after the execution of the last script. Is it possible to write such script.
When you start the temporary instance, specify
--instance-initiated-shutdown-behavior terminate
Then, when the instance has completed all its tasks, simply run the equivalent of
sudo halt
or
sudo shutdown -h now
With the above flag, this will tell the instance that shutting down from inside the instance should terminate the instance (instead of just stopping it).
Yes, you can add an ec2stop command to the end of the last script.
You'll need to:
install the ec2 api tools
put your AWS credentials on the intance, or create IAM credentials that have authority to stop instances
get the instance id, perhaps from the inIstance-data
Another option is to run the cron jobs as commands from the controlling instance. The main cron job might look like this:
run processing instance
-wait for sshd to accept connections
ssh to processing instance, running each processing script
stop processing instance
This approach gets all the processing jobs done back to back, leaving your instance up for the least amount of time., and you don't have to put the credentials on thee instance.
If your use case allows for the instance to be terminated instead of stopped, then you might be able to replace the start/stop cron jobs with EC2 autoscaling. It now sports schedules for running instances.
http://docs.amazonwebservices.com/AutoScaling/latest/DeveloperGuide/index.html?scaling_plan.html

Resources