I am very much new to linux and to this forum. I am working on one issue for a customer where they have 10+ Red Hat Linux 5.5 64 bits servers. They want to stop the tomcat process using the stop script (the script uses 'kill -15')
On some servers, the script works fine and stops the tomcat process within seconds.
On some servers, sometimes it stops quickly, sometimes it keeps running for minutes and finally customer has to use 'kill -9' command to stop tomcat. Logs are not indicating anything.
Do you have any idea why there is an intermittent behaviour of this script? How can we catch it in logs etc?
Processes may ignore the SIGTERM(15) signal.
From Wikipedia: http://en.wikipedia.org/wiki/Unix_signal
The SIGTERM signal is sent to a process to request its termination. Unlike the SIGKILL signal, it can be caught and interpreted or ignored by the process. This allows the process to perform nice termination releasing resources and saving state if appropriate.
If it's ignored the only option left is to kill it with SIGKILL(9) which can't be ignored.
Related
I'm trying to debug an issue in a Qt5 application where, for some reason, even after the application reaches its exit point (log message placed right before return 0 of int main shows), the process persists and when running "ps -e" and grepping for the process, it will show a process in the background.
Is there a way I can diagnose where this thread is in the background? All my log messages indicate that all Qt windows have been closed, and the "setQuitOnLastWindowClosed" flag is set to true. So the only thing I can think of is that a thread spawned by the application is still running in the background.
I should note that this does not ALWAYS happen. When the user exits the application normally, this does not happen. But when the machine detects a power cycle, it forces a close, but it seems something is missing in the code it runs in this case, so figuring out what's still running will help me find that.
The application was built in Qt5 and it's running on Scientific Linux 6.4 if that matters.
Recently I was working on an update and I had to kill a few java process before that.
killall -9 java
So I used the above command which killed all the java process. But now I'm stuck without knowing how to restart those java services.
Is there a command to start all java services killed using killall?
using kill
First of all: kill -9 should be the last method to use to stop a process.
A process stopped with SIGKILL has no chance to shutdown properly. Some services or daemons have complex and important shutdown procedures like databases who takes care to close open database files in a consistent state and write cached data to it.
Before stopping processes with kill or something like that, you should try the stop procedure which comes from the init system of your unix/linux operating system.
When you have to use kill, try to send a TERM signal to a process first (just use kill without -9) and wait a moment to see if the process shuts down. Use -9 if there is no other option!
Starting and stopping services
Starting and stopping services should be handled by the init service which comes with your unix/linux operating system.
SysV init or systemd is common. Check the Manual of your operating system to see which system is used. If set up properply you can check which services are missing (stopped, which should be running) and start them again.
here are some manual examples
FreeBSD:
https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/configtuning-rcd.html
Debian:
https://www.debian.org/doc/manuals/debian-handbook/unix-services.de.html#sect.system-boot
Fedora: https://docs.fedoraproject.org/f28/system-administrators-guide/infrastructure-services/Services_and_Daemons.html
As far as I know, no. There is no record (by default) of what you have killed, as you can see in strace killall java.
More about process management, including why SIGKILL is a bad idea almost all of the time.
I've got an icinga2 process running on Linux which, once or twice a day (at unpredictable times) simply disappears and is no longer running. I'm running the process with its -x debug option, but this does not provide any useful information. The log file shows normal operation, and then suddenly there are no more entries in the log and the process is no longer running.
How can I gather more information about why this process is disappearing/crashing?
What are the advantages of "daemonizing" a server application over running the program in console mode?
Having it run as a daemon means you can
log out without loosing the service (which saves some resources)
do not risk loosing the service from an accidental ctrl-c
does not offer a minor security risk from someone accessing the terminal, hitting ctrl-c and taking your session
Essentially all 'real' services that are running 'in production' (as opposed to debug mode) run that way.
I think it is preventing from accidentally closing an app and you have one more terminal free.
But I personally don't see big difference between "screen" program and "daemonizing"
The main point would be to detach the process from the terminal so that the process does not terminate when the user logs out from the terminal. If you run a program in console mode, it will terminate when you log out, because this is the default behavior for a process when it receives a SIGHUP signal.
Note that there is more to writing a daemon than just calling daemon(3). See How to write a unix daemon for more information.
I have a daemon process which does the configuration management. all the other processes should interact with this daemon for their functioning. But when I execute a large action, after few hours the daemon process is unresponsive for 2 to 3 hours. And After 2- 3 hours it is working normally.
Debugging utilities for Linux process hang issues?
How to get at what point the linux process hangs?
strace can show the last system calls and their result
lsof can show open files
the system log can be very effective when log messages are written to track progress. Allows to box the problem in smaller areas. Also correlate log messages to other messages from other systems, this often turns up interesting results
wireshark if the apps use sockets to make the wire chatter visible.
ps ax + top can show if your app is in a busy loop, i.e. running all the time, sleeping or blocked in IO, consuming CPU, using memory.
Each of these may give a little bit of information which together build up a picture of the issue.
When using gdb, it might be useful to trigger a core dump when the app is blocked. Then you have a static snapshot which you can analyze using post mortem debugging at your leisure. You can have these triggered by a script. The you quickly build up a set of snapshots which can be used to test your theories.
One option is to use gdb and use the attach command in order to attach to a running process. You will need to load a file containing the symbols of the executable in question (using the file command)
There are a number of different ways to do:
Listening on a UNIX domain socket, to handle status requests. An external application can then inquire as to whether the application is still ok. If it gets no response within some timeout period, then it can be assumed that the application being queried has deadlocked or is dead.
Periodically touching a file with a preselected path. An external application can look a the timestamp for the file, and if it is stale, then it can assume that the appliation is dead or deadlocked.
You can use the alarm syscall repeatedly, having the signal terminate the process (use sigaction accordingly). As long as you keep calling alarm (i.e. as long as your program is running) it will keep running. Once you don't, the signal will fire.
You can seamlessly restart your process as it dies with fork and waitpid as described in this answer. It does not cost any significant resources, since the OS will share the memory pages.