A common Linux/UNIX idiom when it comes to running daemons is to spawn the daemon, and create a PID file which just contains the process ID of the daemon. This way, to stop/restart the daemon you can simply have scripts which kill $(cat mydaemon.pid)
Now, there's a lot of opportunity here for inconsistent state. Suppose the machine running the daemon is forcefully shut off, then restarted. Now you have a PID file which refers to a non-existent process.
Okay, so no problem... your daemon will just try to kill the non-existent process, find that it's not a real process, and continue as usual.
But... what if it is a real process - just not your daemon? What if it's someone else's process, or some other important process? You have no way of knowing - so killing it is potentially dangerous. One possibility would be to check the name of the process. Of course, this isn't foolproof either because there's no reason another process might not have the same name. Especially, if for example, your daemon runs under an interpreter, like Python, in which case the process name will never be something unique - it will simply be "python", in which case you might inadvertently kill someone else's process.
So how can we handle a situation like this where we need to restart a daemon? How can we know the PID in the pid file necessarily is the daemon?
You just keep adding on layers of paranoia:
pid file
process name matching
some communication channel/canary
The most important thing that you could to in order to ensure that the pid isn't stale following a reboot is to store it in /var/run, which is a location that is guaranteed to be cleared every reboot.
For process name matching, you can actually redefine the name of the process at the fork/exec point, which will allow you to use some unique name.
The communication channel/canary is a little more complex and is prone to some gotchas. If a daemon creates a named socket, then the presence of the socket + the ability to connect and communicate with the daemon would be considered evidence that the process is running.
If you really want to provide a script for your users, you could the let the daemon process manage its pidfile on its own and add an atexit and a SIGABRT handler to unlink the pidfile even on unclean shutdown.
More ways include also storing the process startup time in the pidfile. Together with volatile storage (e.g. /var/run) this is a pretty reliable way to identify a process. This makes the kill command a bit more complicated though.
However, I personally think that a daemon developer should not care (too much) about this and let this being handled by the target platforms way to manage daemons (systemd, upstart, good ol’ SysV init scripts). These have usually more knowledge: systemd for example will happily accept a daemon which does not fork at all, allowing it to monitor its status directly and without the requirement for a PID file. You could then provide configuration files for the most common solutions (currently probably systemd, given that Debian migrates to it too and it will thus also hit ubuntu soon), which are usually easier to write than a full fledged daemon process management.
Related
What append in Linux when the user type "halt" in? Is there any script that force process to end as soon as possible? If yes, is there any way to schedule process shutdown?
The purpose of this scheduler will be force GPIO management to be the last one to exit.
Thanks in advance
Depending init system it can force killing services. General way to shedule stop order is sorting "K*" links in /etc/rc.d/
To understand deeper first look in /etc/inittab at lines:
::shutdown:/etc/rc.d/rcS K shutdown
::shutdown:/bin/umount -a -r
In this example shutdown sheduled by /etc/rc.d/rcS script.
After typing halt init executes inittab rules, then killing himself with still live childrens. Then kernel stops CPU.
As i understand question, one of solution is:
remove stop link for gpio-management service
add script to ensure that gpio-sensible services stopped/killed to inittab
add script to stop gpio-management service at end
In the Supervisord conf files you can specify to autorestart a certain program with:
autorestart=true
But is there an equivalent for [Supervisord] itself?
What is the recommended method of making sure Supervisord continues running unconditionally, especially if the Supervisord process gets killed.
Thanks!
Actually your question is a particular application of the famous "Quis custodiet ipsos custodes?" that is "Who will guard the guards?".
In a modern Linux system the central guarding point is init process (the process number 1). If init dies, the Linux kernel immediately panics, and thus you have to go to your data center (I mean go afoot) and press reset button. There're a lot of alternative init implementations, here is one of those "comparison tables" :)
The precise answer how to configure a particular init implementation depends on what init version you use in that system. For example systemd has its own machinery for configure service restart upon their deaths (directives Restart=, RestartSec=, WatchdogSec= etc in a corresponding unit-file. Other init implementations like Ubuntu Upstart also has its analogues (respawn directive in a service configuration file). Even old good SysV init has respawn option for a service line in /etc/inittab, but usually user-level services aren't started directly inittab, only virtual console managers (getty, mgetty etc)
I want to make a process alive during reboot. I am thinking that, I can backup all process related information and i will store it on some file. After reboot i will take that data back and by using that i will create a process again. Is my thinking is correct? Please clarify.
Your question is too vague. Can you provide with more information? I can give a basic outline of what is required to run a process on system startup.
Deamonize
A process must be a deamon (it shouldn't run in the context of a terminal) Read deamon process for more information.
chkconfig
Use this to add your deamon to your system startup run levels. sudo chkconfig deamon_name on
There is a Linux-oriented Python program that is launching Puppet subprocess. Puppet is a configuration management software and while executing it launches many subprocesses (yum, curl, custom scripts etc.) Python code has a watchdog that kills puppet subprocess if it is running too long. In a current version it uses os.kill to do that.
The problem is that when puppet process is killed on timeout, it's orphaned children are attached to "init" and keep running. Typically these children are an initial cause of a timeout.
The first attempt was killing the entire process group (os.killpg). But kill call fails with OSError(3, 'No such process'). After studying process management docs I understood that it does not work because puppet itself launches ruby process in a separate process group. Also, process group is not inherited by children, so os.killpg would not help anyway. Yes, POSIX allows to set PGID of children with some limitations, but it requires iterative monitoring of new subprocesses and looks like a hack in my case.
Next attempt was running Puppet in a separate shell ("sh -c")/"su" environment/setsid (in various combinations). The desired result is to start this process (and children) in a new session. I expect that it will allow to emulate something like remote ssh connection disconnect: sending SIGHUP to session leader, eg puppet process, will send SIGHUP to an entire tree of children. So I will be able to kill the entire tree. Experiments with running puppet through remote SSH connection show that this approach seems to work: all processes die after terminal disconnect. I have not achieved this behaviour from Python yet. Is this approach correct, am I missing something or it's an ugly hack?
One more approach I see is to send SIGSTOP to every process in tree (iterating while there is at least one running process in a tree to avoid race conditions), and then kill every process individually. This approach will work, but looks not too elegant.
Problem is not related to Python code, it also reproduces when running "puppet apply" from a console and sending signals using the "kill" command.
Yes, I know that Puppet has a "timeout" keyword for the described purpose, but I am looking for a more general solution, applicable not only to Puppet, but to any fruitful subprocess.
My application is built with three distincts servers: each one of them serves a different purpose and they must stay separated (at least, for using more than one core). As an example (this is not the real thing) you could think about this set up as one server managing user authentication, another one serving as the game engine, another one as a pubsub server. Logically the "application" is only one and clients connect to one or another server depending on their specific need.
Now I'm trying to figure out the best way to run a setup like this in a production environment.
The simplest way could be to have a bash script that would run each server in background one after the other. One problem with this approach would be that in the case I need to restart the "application", I should have saved each server's pid and kill each one.
Another way would be to use a node process that would run each servers as its own child (using child_process.spawn). Node spawning nodes. Is that stupid for some reason? This way I'd have a single process to kill when I need to stop/restart the whole application.
What do you think?
If you're on Linux or another *nix OS, you might try writing an init script that start/stop/restart your application. here's an example.
Use specific tools for process monitoring. Monit for example can monitor your processes by their pid and restart them whenever they die, and you can manually restart each process with the monit-cmd or with their web-gui.
So in your example you would create 3 independent processes and tell monit to monitor each of them.
I ended up creating a wrapper/supervisor script in Node that uses child_process.spawn to execute all three processes.
it pipes each process stdout/stderr to the its stdout/stderr
it intercepts errors of each process, logs them, then exit (as it were its fault)
It then forks and daemonize itself
I can stop the whole thing using the start/stop paradigm.
Now that I have a robust daemon, I can create a unix script to start/stop it on boot/shutdown as usual (as #Levi says)
See also my other (related) Q: NodeJS: will this code run multi-core or not?