Linux, timing out on subprocess - linux

Ok, I need to write a code that calls a script, and if the operation in script hangs, terminates the process.
The preferred language is Python, but I'm also looking through C and bash script documentation too.
Seems like an easy problem, but I can't decide on the best solution.
From research so far:
Python: Has some weird threading model where the virtual machine uses
one thread at a time, won't work?
C: The preferred solution so far seems to use SIGALARM + fork +
execl. But SIGALARM is not heap safe, so it can trash everything?
Bash: timeout program? Not standard on all distros?
Since I'm a newbie to Linux, I'm probably unaware of 500 different gotchas with those functions, so can anyone tell me what's the safest and cleanest way?

Avoid SIGALRM because there is not much safe stuff to do inside the signal handler.
Considering the system calls that you should use, in C, after doing the fork-exec to start the subprocess, you can periodically call waitpid(2) with the WNOHANG option to inspect whether the subprocess is still running. If waitpid returns 0 (process is still running) and the desired timeout has passed, you can kill(2) the subprocess.

In bash you can do something similar to this:
start the script/program in background with &
get the process id of the background process
sleep for some time
and then kill the process (if it is finished you cannot kill it) or you can check if the process is still live and then to kill it.
Example:
sh long_time_script.sh &
pid=$!
sleep 30s
kill $pid
you can even try to use trap 'script_stopped $pid' SIGCHLD - see the bash man for more info.
UPDATE: I found other command timeout. It does exactly what you need - runs a command with a time limit. Example:
timeout 10s sleep 15s
will kill the sleep after 10 seconds.

There is a collection of Python code that has features to do exactly this, and without too much difficulty if you know the APIs.
The Pycopia collection has the scheduler module for timing out functions, and the proctools module for spawning subprocesses and sending signals to it. The kill method can be used in this case.

Related

Linux job scheduler launching a script 2 hours after it terminates

I have a script that runs unknown period of time that depends on its input. It can run one hour when little data available, or it can run for 8 hours if much data is to be processed.
I need to run it periodically, particularly 2 hours after previous run was completed.
Is there an utility to do that?
Use 'at' instead of 'cron' and at the end of your script add:
at now +2 hours $*
This means that each occurrence is chained - so if it terminates abnormally the next instance won't be scheduled - but I don't think there's a more robust solution without adding a lot of code/complexity.
I don't like the at solution proposed, so here another solution:
Use cron to launch your every two hours
Upon startup, your application(*) checks if there's a pidfile.
2.1 if it is present, then there may be another instance running: read contents of the file (pid) and see if that pid is the pid of an existing process, a zombie process or something else. If it is the pid of a running, existing process, then exit. If it is the pid of a zombie process then the previous job ended unexpectedly and then you have to delete the pidfile and go to step 3. Otherwise.
After deleting pidfile, you create a new one and put your pid into it. Then proceed to do your job.
*: In order not to add complexity, this application i cited could also be a simple wrapper that spawns your code using exec.
This solution can also be scripted quite easily.
Hope it helps,
SnoopyBBT
If it looks complicated, here is another, dirtier solution:
while true ; do
./your_application
sleep 7200
done
Hope this helps,
SnoopyBBT

Monitor multiple instances of same process

I'm trying to monitor multiple instances of the same process. I can't for the life of me do this without running into a problem.
All the examples I have seen so far on the internet involve me writing out the PID or monitoring the process itself. The issue is that if one instance fails, it doesn't mean all the rest have failed as well.
In order for me to write out the PID for each process it would mean I'd probably have to run each process with a short delay to record the correct, seeing as the way I need to record the PID is done through the process name being probed.
If I'm wrong on this, please correct me. But so far I haven't found a way to monitor each individual process, which all have the same name.
To add to the above, the processes are run in a batch script and each one is run in its own screen (ffmpeg would otherwise not be able to run in the background).
If anyone can point me vaguely in the right direction on how to do this in Linux I would really appreciate it. I read somewhere that it would be possible to set up symlinks which would then give me fake process names and that way I can monitor the 'fake' process name.
man wait. For example, in shell script:
wget "$url1" &
pid1=$!
wget "$url2" &
pid2=$!
wait $pid1 $pid2
will launch both wget processes, and wait until both processes are finished (or failed)

How to set process ID in Linux for a specific program

I was wondering if there is some way to force to use some specific process ID to Linux to some application before running it. I need to know in advance the process ID.
Actually, there is a way to do this. Since kernel 3.3 with CONFIG_CHECKPOINT_RESTORE set(which is set in most distros), there is /proc/sys/kernel/ns_last_pid which contains last pid generated by kernel. So, if you want to set PID for forked program, you need to perform these actions:
Open /proc/sys/kernel/ns_last_pid and get fd
flock it with LOCK_EX
write PID-1
fork
VoilĂ ! Child will have PID that you wanted.
Also, don't forget to unlock (flock with LOCK_UN) and close ns_last_pid.
You can checkout C code at my blog here.
As many already suggested you cannot set directly a PID but usually shells have facilities to know which is the last forked process ID.
For example in bash you can lunch an executable in background (appending &) and find its PID in the variable $!.
Example:
$ lsof >/dev/null &
[1] 15458
$ echo $!
15458
On CentOS7.2 you can simply do the following:
Let's say you want to execute the sleep command with a PID of 1894.
sudo echo 1893 > /proc/sys/kernel/ns_last_pid; sleep 1000
(However, keep in mind that if by chance another process executes in the extremely brief amount of time between the echo and sleep command you could end up with a PID of 1895+. I've tested it hundreds of times and it has never happened to me. If you want to guarantee the PID you will need to lock the file after you write to it, execute sleep, then unlock the file as suggested in Ruslan's answer above.)
There's no way to force to use specific PID for process. As Wikipedia says:
Process IDs are usually allocated on a sequential basis, beginning at
0 and rising to a maximum value which varies from system to system.
Once this limit is reached, allocation restarts at 300 and again
increases. In Mac OS X and HP-UX, allocation restarts at 100. However,
for this and subsequent passes any PIDs still assigned to processes
are skipped
You could just repeatedly call fork() to create new child processes until you get a child with the desired PID. Remember to call wait() often, or you will hit the per-user process limit quickly.
This method assumes that the OS assigns new PIDs sequentially, which appears to be the case eg. on Linux 3.3.
The advantage over the ns_last_pid method is that it doesn't require root permissions.
Every process on a linux system is generated by fork() so there should be no way to force a specific PID.
From Linux 5.5 you can pass an array of PIDs to the clone3 system call to be assigned to the new process, up to one for each nested PID namespace, from the inside out. This requires CAP_SYS_ADMIN or (since Linux 5.9) CAP_CHECKPOINT_RESTORE over the PID namespace.
If you a not concerned with PID namespaces use an array of size one.

Shell script process is getting killed automatically

I am facing problem with shell script i have ascript which will be running in infinite loop so say its havin PID X.The process is running for 4-5 hours but automatically the process getting killed.This is happening only for some long time running system and i am observing some times after 2 hours also its getting killed.
I am not able to find the reason the why its going down why its getting killed.No one is using the system other than me.And i am running the process as a root user.
Can any one explain or suspect the reason who is killing the process.
Below is the sample script
#!/bin/bash
until ./test.tcl; do
echo "Server 'test.tcl' crashed with exit code $?. Respawing.." >&2
done
In test.tcl script i am running it for infinite loop and the script is used to trap signal and do some special operation.But we find that test.tcl is also going down.
So is there any way from where i capture who and how it gets killed.
Enable core dump in your system, it is the most commonly used method for app crash analysis. I know it is a bit painful to gdb core file, but more or less you can find something out of it.
Here is a reference link for you.(http://www.cyberciti.biz/tips/linux-core-dumps.html).
Another way to do this is tracing you script by "strace -p PID-X", note that it will slow down your system, espeically several hours in your case, but it can be the last resort.
Hope above helpful to you.
Better to check all the signals generated and caught by OS at that time by specific script might be one of signal is causing to kill your process.

How to find out when process exits in Linux?

I can't find a good way to find out when a process exits in Linux. Does anyone have a solution for that?
One that I can think of is check process list periodically, but that is not instant and pretty expensive (have to loop over all processes each time).
Is there an interface for doing that on Linux? Something like waitpid, except something that can be used from unrelated processes?
Thanks,
Boda Cydo
You cannot wait for an unrelated process, just children.
A simpler polling method than checking the process list, if you have permission, you can use the kill(2) system call and "send" signal 0.
From the kill(2) man page:
If sig is 0, then no signal is sent,
but error checking is still performed;
this can be used to check for the
existence of a process ID or process
group ID
Perhaps you can start the program together with another program, the second one doing whatever it is you want to do when the first program stops, like sending a notification etc.
Consider this very simple example:
sleep 10; echo "finished"
sleep 10 is the first process, echo "finished" the second one (Though echo is usually a shell plugin, but I hope you get the point)
Another option is to have the process open an IPC object such as a unix domain socket; your watchdog process can detect when the process quits as it will immediately be closed.
If you know the PID of the process in question, you can check if /proc/$PID exists. That's a relatively cheap stat() call.

Resources