Jobs being killed for an unidentifiable reason [closed] - linux

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
On Linux, I try to run a fortran executable (or even recompile and then run) and the job is killed immediately. The process just says "Killed". Now, if I copy the whole directory, the program will run just fine in the "new" directory -- but never in the original. This is happening repeatedly, but not universally, and seems random to me. Even though I have a work-a-round, I am still wondering why this happens at all.

Run your program with strace to find out what it is doing before it gets killed. Just speculating: But could it be allocating a huge amount of memory? If system memory is exhausted the out-of-memory killer usually kills the process that uses memory most aggressively. Check /var/log/syslog to see if the OOM killer was kicking in.
Also see What killed my process and why? and Will Linux start killing my processes without asking me if memory gets short?.

Related

Memory leaks in code when using Embarcadero 10.3.1 [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
My C++ code is written in Embarcadero 10.3.1. I am facing lot of memory leaks and resource leaks. I am unable to identify the leaks.
When I use CodeGaurd, the application freezes, so I'm unable to get any conclusions.
My application is a background job which continuously processes files and generates labels. It works fine for a couple of hours and generates around 3000 labels, and then goes to a hung/non-responsive state.
Can anyone suggest any solution?
Memory leaks can be difficult to track down. In your case I suspect that you are using a label printer with it's own library or driver and leaks could be anywhere.
Firstly you should try and understand what memory management models exist in the application. With C++ Builder code generally you will be responsible for allocating and freeing memory. So every object you create with new should have a corresponding delete - make sure you understand what part of the code is responsible for freeing the object. (In 10.3.1 C++ Builder does support C++ auto_ptr but you may not be using this, and you can't guarantee that any library code you have linked in will honour the auto_ptr semantics).
If you are passing information into code that's using another memory management model (so using a COM Object is a good example) then make sure you understand the implications for memory management. If you pass it a pointer is it expecting to free it or is it expecting you to free it - and if it's you how do you know when it has finished with it.
Try running a smaller run and seeing if with a smaller run you can use CodeGuard and pick up anything it suggests.
If your system is in production you will want to keep it running. One option would be to run it as a Windows Scheduled Task. It will process a set number of files and exit. The O/S will free up resources it had in use (but not any that are being leaked at the system level, perhaps by a buggy driver). That may allow you to keep it running all day while you continue to find any leaks.
Good Luck!

Recovering from fork bomb by having a kernel patch allowing to run only recovery process [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
WAS: Reading another question on SO that was migrated to SU : https://superuser.com/questions/435690/does-linux-have-any-measures-to-protect-against-fork-bombs, i was thinking of a solution at kernel level.
I read one proposal at LWN ( http://lwn.net/Articles/435917/ ) but this proposal focuse on fork bomb detection to be able to prevent it.
I would focus on recovery, since detection basically means that system is not usable; what will soon be detected by any user of the system.
I broaden the context to non fork bomb only : what if your system is unresponsive and you can't get a decent console to it but still don't want to reboot it even cleanly.
So THE QUESTION :
Is it possible to tell kernel by some SysReq command to enter in a recover shell that will run only one process ( and refuse to fork it ) with intent to kill faulty processes; has this feature been ever implemented ? If no, then why ?
Remark i am not speaking of SysReq+i that send SIGKILL to all process, but something that behaves like a SIGSTOP to all processes, it can be another kernel kexec alongside the first allowing to inspect and resume it.
You can always limit, for a non-root user, the maximal number of processes with setrlimit(2) syscall with RLIMIT_NPROC.
You could use the bash ulimit builtin (or limit if using zsh as your shell). You can also /etc/security/limits.conf and/or /etc/pam.d/ to limit it "system-wide" (but tuning the limit user by user if so wanted, etc.). PAM is very powerful for that.
I don't think you need some risky kernel patch. You just want to administer your machine with care.
And you don't care about root fork bombs: if a malicious (or stupid) user gets root access, your Linux system is doomed anyway (even without root fork bombs). Nobody care about them because by definition root is trusted and needs to behave carefully & cleverly. (likewise, root can /bin/rm -rf / but that is usually stupid, as stupid as a root fork bomb, hence no protections exist against both mistakes...)
And a kernel patch would be difficult : you want the root to be able to run several processes (at least, the recovery shell and the child command, possibly piped), not only one. !Kernel patches can be brittle and then crash the entire system....
Of course you are free to patch your kernel, since it is free software. However, making an interesting patch and getting the kernel community attracted by it is also a social issue (and a much harder thing to achieve). Good luck. LKLM is a better place to discuss that.
PS.Sending SIGSTOP to every non init process won't help much w.r.t. a root fork bomb: you won't be able to type any shell command, because your shell would also and always be stopped!
PPS. The LWN article quoted in the question had comments mentionning cgroup-s which could be relevant.

My linux server "Number of processes created" and "Context switches" are growing incredibly fast [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
EDIT: More detailed answers here: https://serverfault.com/questions/454192/my-linux-server-number-of-processes-created-and-context-switches-are-growing
I have a strange behaviour in my server :-/. Is a VPS. When I do cat /proc/stat, I can see how each second about 50-100 processes are created and happens about 800k-1200k context switches! All that info is with the server completely idle, no traffic nor programs running.
Top shows 0 load average and 100% idle CPU.
I've closed all non-needed services (httpd, mysqld, sendmail, nagios, named...) and the problem still happens. I do ps -ALf each second too and I don't see any changes, only a new ps process is created each time and the PID is just the same as before + 1, so new processes are not created, so I thought that process growing in cat /proc/stat must be threads (Yes, seems that processes in /proc/stat counts threads creation too as this states: http://webcache.googleusercontent.com/search?q=cache:8NLgzKEzHQQJ:www.linuxhowtos.org/System/procstat.htm&hl=es&tbo=d&gl=es&strip=1).
I've changed to /proc dir and done cat [PID]\status with all PIDs listed with ls (Including kernel ones) and in any process voluntary_ctxt_switches nor nonvoluntary_ctxt_switches are growing at the same speed as cat /proc/stat does (just a few tens/second).
I've done strace -p PID to all process too so I can see if any process is crating threads or something but the only process that has a bit of movement is ssh and that movement is read/write operations because of the data is sending to my terminal.
After that, I've done vmstat -s and saw that forks is growing at the same speed processes in /proc/stat does. As http://linux.die.net/man/2/fork says, each fork() creates a new PID but my server PID is not growing!
The last thing I can think of is that all process data that proc/stat and vmstat -s show is shared with all the other VPS stored in the same machine, but I don't know if that is correct... If someone can throw some light on this I would be really grateful.

Why would gdb hang?

I have an application that I am debugging and I'm trying to understand how gdb works and why I am not able to step through the application sometimes. The problem that I am experiencing is that gdb will hang and the process it is attached to will enter a defunct state when I am stepping through the program. After gdb hangs and I have to kill it to free the terminal (ctrl-C does not work, I have to do this from a different terminal window by getting the process id for that gdb session and using kill -9).
I'm guessing that gdb is hanging because it's waiting for the application to stop at the next instruction and somehow the application finished execution without gdb identifying this. But that's just speculation on my part from the behavior I've observed thus far. So my question is if anyone has seen this type of behavior before and/or could suggest what the cause might be. I think that might help me improve my debugging strategy.
In case it matters I'm using g++ 4.4.3, gdb 7.1, running on Ubuntu 10.04 x86_64.
I had a similar problem and solved it by sending a CONT signal to the process being debugged.
I'd say the debugged process wouldn't sit idle if it was the cause of the hang. Every time GDB has completed a step, it has to update any expressions you required to print. It may include following pointers and so, and in some case, it may fail there (although I don't remind of a real "hang"). It also typically try to update your stack trace. If the stack trace has been corrupted and is no longer coherent, it could be trapped into an endless loop. Attaching gdb to strace to see what kind of activity is going on during the hang could be a good way to go one step further into figuring out the problem.
(e.g. accessing sources through a no-longer-working NFS/SSHFS mount is one of the most frequent reason for gdb to hang, here :P)

How to check if a process is in hang state (Linux) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
Is there any command in Linux through which i can know if the process is in hang state.
Is there any command in Linux through which i can know if the process is in hang state.
There is no command, but once I had to do a very dumb hack to accomplish something similar. I wrote a Perl script which periodically (every 30 seconds in my case):
run ps to find list of PIDs of the watched processes (along with exec time, etc)
loop over the PIDs
start gdb attaching to the process using its PID, dumping stack trace from it using thread apply all where, detaching from the process
a process was declared hung if:
its stack trace didn't change and time didn't change after 3 checks
its stack trace didn't change and time was indicating 100% CPU load after 3 checks
hung process was killed to give a chance for a monitoring application to restart the hung instance.
But that was very very very very crude hack, done to reach an about-to-be-missed deadline and it was removed a few days later, after a fix for the buggy application was finally installed.
Otherwise, as all other responders absolutely correctly commented, there is no way to find whether the process hung or not: simply because the hang might occur for way to many reasons, often bound to the application logic.
The only way is for application itself being capable of indicating whether it is alive or not. Simplest way might be for example a periodic log message "I'm alive".
you could check the files
/proc/[pid]/task/[thread ids]/status
What do you mean by ‘hang state’? Typically, a process that is unresponsive and using 100% of a CPU is stuck in an endless loop. But there's no way to determine whether that has happened or whether the process might not eventually reach a loop exit state and carry on.
Desktop hang detectors just work by sending a message to the application's event loop and seeing if there's any response. If there's not for a certain amount of time they decide the app has ‘hung’... but it's entirely possible it was just doing something complicated and will come back to life in a moment once it's done. Anyhow, that's not something you can use for any arbitrary process.
Unfortunately there is no hung state for a process. Now hung can be deadlock. This is block state. The threads in the process are blocked. The other things could be live lock where the process is running but doing the same thing again and again. This process is in running state. So as you can see there is no definite hung state.
As suggested you can use the top command to see if the process is using 100% CPU or lot of memory.

Resources