I develop linux daemon working with some complex hardware, and i need to know ways how application may exit (normal or abnormal) to create proper cleanup functions. As i read from docs application may die via:
1. Receive signal - sigwait,sigaction, etc.
2. exit
3. kill
4. tkill
Is there is some other ways how application may exit or die?
In your comments you wrote that you're concerned about "abnormal ways" the application may die.
There's only one solution for that1 -- code outside the application. In particular, all handles held by the application at termination (normal or abnormal) are cleanly closed by the kernel.
If you have a driver for your special hardware, do cleanup when the driver receives notification that the device fd has been closed. If you don't already have a custom driver, you can use a second user-mode process as a watchdog. Just connect the watchdog to the main process via a pipe... it will receive a signal when the main application closes.
In addition to things the programmer has some degree of control over, such as wild pointer bugs causing segmentation fault, there's always the oom-killer, which can take out even a bug-free process. For this reason the application should also detect unexpected loss of its watchdog and spawn a new one.
Your app should finish by itself when the system or the user doesnt need it.
Using an external commando like a kill -9 PROCESS could give you some bugs on your application because you don't know what is your application doing in that moment.
Try to imeplement over your app a subsystem to control your application status... like a real daemon to allow something like this:
yourapp service status or /etc/init.d/yourapp status
yourapp service start or /etc/init.d/yourapp start
yourapp service stop or /etc/init.d/yourapp stop
In that way your app could finish normally everytime and the users could control it easily.
Regards
Related
I have a routine that crashes linux and force a reboot using a system function.
Now I have the problem that I need to crash linux when a certain process dies. Using a script starting the process and if the script ends restart the server is not appropriate since it takes some ms.
Another idea is spawning the shooting processes alongside and use polling of a counter and if the counter is not incremented reboot the server would be another idea.
This would result in an almost instant reaction.
Now the question is what would be a good timeframe. I have no idea how the scheduler of linux would guarantee a certain update of any such counter and what a good timeout would be.
Also I would like to hear some alternatives to this second process spawning. Is there a possibility to advice linux to run a certain routine in case of a crash of the given process or a listener meachanism for the even of problems with a given process?
The timeout idea is already implemented in the kernel. You can register any application as a software watchdog, but you'll have to lower the default timeout. Have a look at http://linux.die.net/man/8/watchdog for some ideas. That application can also handle user-defined tests. Realistically unless you're trying to run kernel like linux-rt, having timeouts lower than 100ms can be dangerous on a system with heavy load - especially if the check needs to poll another application.
In cases of application crashes, you can handle them if your init supports notifications. For example both upstart and systemd can do that by monitoring files (make sure coredumps are created in the right place).
But in any case, I'd suggest rethinking the idea of millisecond-resolution restarts. Do you really need to kill the system in that time, or do you just need to isolate it? Just syncing the disks will take a few extra milliseconds and you probably don't want to miss that step. Instead of just killing the host, you could make sure the affected app isn't working (SIGABRT?) and kill all networking (flush iptables, change default to DROP).
I am a newbie and trying to figure out how process monitoring works with JXcore. I saw the documentation but need few steps in order to make my server application starting multithreaded and monitored properly.
Thanks in advance!
I'll try to explain it to you. There is no shame to be a newbie! :)
JXcore offers you two types of application monitoring.
1) One of them is Process Monitor and this is a process, which runs as separate instance. Your applications may subscribe to it for being monitored. Monitor verifies them in regular intervals, and if it finds that your application is gone it tries to relaunch it. For example, if your application servers http and should be online all the time - Process Monitor will ensure, that it is really running.
The fastest way to start to monitor your application is to:
launch the monitor: > jx monitor start
launch your application with automatic subscription to the monitor: > jx monitor run app.js
After that, when your application crashes, Process Monitor will restart it. You can test it by just killing your application's process.
Process monitor also gives you information about currently monitored processes. You can browse to see the list of them:
http://127.0.0.1:17777/json
2) Second type of a monitoring feature is process and thread recovery. With Process Recovery you can achieve the same as with the Process Monitoring, so there is no reason to use them both at the same time.
Another scenario could be:
Let's say you have a multithreaded application and only to recovering it's threads is enough.
Your application is launched with a command:
jx mt-keep:3 app.js
which means, that you run it with 3 threads.
To enable Thread Recovery is enough to subscribe to process.on('restart') event like this:
process.on('restart', function (cb) {
process.release();
cb();
});
Remember, to call cb() callback. As you probably saw it in the docs, the thread will not restart until you invoke this callback. Before restart, you may back-up things etc.
Basically that's it. Feel free to play with it!
My system includes a task which opens a network socket, receives pushed data from the network, processes it, and writes it out to disk or pings other machines depending on the messages. This task is intended to run forever, and the service is designed to have this task always running. But sometimes it crashes.
What's the best practice for keeping a task like this alive? Assume it's okay for the task to be dead for up to 30 seconds before we restart it.
Some obvious ideas include having a watchdog process that checks to make sure the process is still running. Watchdog could be triggered by cron. But how does it know if the process is alive or not? Write a pidfile? touch a heartbeat file? An ideal solution wouldn't continuously spin up more processes if the machine gets bogged down to the point where the watchdog is running faster than the heartbeat.
Are there standard linux tools for this? I can imagine a solution that uses a message queue, but I'm not sure if that's a good idea or not.
Depending on the nature of the task that you wish to monitor, one method is to write a simple wrapper to start up your task in a fork().
The wrapper task can then do a waitpid() on the child and restart it if it is terminated.
This does depend on modifying the source for the task that you wish to run.
sysvinit will restart processes that die, if added to inittab.
If you're worried about the process freezing without crashing and ending the process, you can use a heartbeat and hard kill the active instance, letting init restart it.
You could use monit along with daemonize. There are lots of tools for this in the *nix world.
Supervisor was designed precisely for this task. From the project website:
Supervisor is a client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems.
It runs as a daemon (supervisord) controlled by a command line tool, supervisorctl. The configuration file contains a list of programs it is supposed to monitor, among other settings.
The number of options is quite extensive, -- have a look at the docs for a complete list. In your case, the relevant configuration section might be something like this:
[program:my-network-task]
command=/bin/my-network-task # where your binary lives
autostart=true # start when supervisor starts?
autorestart=true # restart automatically when stopped?
startsecs=10 # consider start successful after how many secs?
startretries=3 # try starting how many times?
I have used Supervisor myself and it worked really well once everything was set up. It requires Python, which should not be a big deal in most environments but might be.
I have a daemon process which does the configuration management. all the other processes should interact with this daemon for their functioning. But when I execute a large action, after few hours the daemon process is unresponsive for 2 to 3 hours. And After 2- 3 hours it is working normally.
Debugging utilities for Linux process hang issues?
How to get at what point the linux process hangs?
strace can show the last system calls and their result
lsof can show open files
the system log can be very effective when log messages are written to track progress. Allows to box the problem in smaller areas. Also correlate log messages to other messages from other systems, this often turns up interesting results
wireshark if the apps use sockets to make the wire chatter visible.
ps ax + top can show if your app is in a busy loop, i.e. running all the time, sleeping or blocked in IO, consuming CPU, using memory.
Each of these may give a little bit of information which together build up a picture of the issue.
When using gdb, it might be useful to trigger a core dump when the app is blocked. Then you have a static snapshot which you can analyze using post mortem debugging at your leisure. You can have these triggered by a script. The you quickly build up a set of snapshots which can be used to test your theories.
One option is to use gdb and use the attach command in order to attach to a running process. You will need to load a file containing the symbols of the executable in question (using the file command)
There are a number of different ways to do:
Listening on a UNIX domain socket, to handle status requests. An external application can then inquire as to whether the application is still ok. If it gets no response within some timeout period, then it can be assumed that the application being queried has deadlocked or is dead.
Periodically touching a file with a preselected path. An external application can look a the timestamp for the file, and if it is stale, then it can assume that the appliation is dead or deadlocked.
You can use the alarm syscall repeatedly, having the signal terminate the process (use sigaction accordingly). As long as you keep calling alarm (i.e. as long as your program is running) it will keep running. Once you don't, the signal will fire.
You can seamlessly restart your process as it dies with fork and waitpid as described in this answer. It does not cost any significant resources, since the OS will share the memory pages.
Suppose there are two executables. One is mine and the other is some other application. Now if the other app is running, I want my app to run until the other one exits or is stopped.
Writing a separate service seems quite an overkill.
You can first obtain a Process object - say by Process.GetProcessesByName, or better - use the ProcessID of the process you wish to monitor, if you have it. You can then try obtaining a WaitHandle from it, as discussed e.g. here, then call WaitOne on it (or WaitAll, if you're monitoring several instances).
Write a windows service that will continuously monitor the other application executable. If the service finds it running it will start your executable if not running and make sure it keeps running throughout the life cycle of the other application. As soon as the other app terminates, your windows service will also terminate your exe.