Bash: Process monitoring and manipulation - linux

I have a C program that processes some input file. I'm using a Bash script to feed the input files one-by-one to this program, along with some other parameters. Each input file is processed by the program 4 times , each time by varying some parameters. You can think of it as an experiment to test the C program with different parameters.
This C program can consume memory very quickly (and can even take up more than 95% of the OS memory , resulting in slowing down the system). So, in my script, I'm monitoring 2 things for every test run of the program - The total running time, and the memory percentage consumed (obtained from top command) . When either of them first crosses a threshold, I kill the C program using killall -q 0 processname , and begin the next test run.
This is how my script is structured:
# run in background
./program file_input1 file_input2 param1 param2 &
# now monitor the process
# monitor time
sleep 1
((seconds++))
if [ $seconds -ge $timeout ]; then
timedout=1
break
fi
# monitor memory percentage used
memused=`top -bn1 | grep \`pidof genpbddh2\` | awk '{print $10}' | cut -d'.' -f1`
if [ $memused -ge $memorylimit ]; then
overmemory=1
break
fi
This entire thing is run inside a loop which keeps generating new values for the paramaters to the C program.
When a program breaks out of the loop due to timeout or over memory limit usage, this command is executed:
killall -q 0 program
The problem:
My intention was , once the program is started in the background (1st line above), I can monitor it. Then go to the next run of the program. A sequential execution of test cases.
But, it seems all the future runs of the program have been schedule by the OS (Linux) for some reason. That is, if Test Run 1 is running, Test Runs 2,3,4..and so on are also scheduled somehow (without Run 1 having finished). At least, it seems that way from the below observation:
When I pressed Ctrl-C to end the script, it exited cleanly. But, new instances of the "program" are keeping on being created continuously. The script has ended, but the instances of the program are still being continuously started. I checked and made sure that the script has ended. Now , I wrote a script to infinitely check for instances of this program being created and kill it. And eventually, all the pre-scheduled instances of the program were killed and no more new ones were created. But it was all a lot of pain.
Is this the correct way to externally monitor a program?
Any clues on why this problem is occuring, and how to fix it?

I would say that a more correct way to monitor a program like this would be:
ulimit -v $memorylimit
With such a limit set any process will get killed if it uses too much virtual memory. It is also possible to set other limits like maximum cpu time used or maximum number of open files.
To see your current limits you can use
ulimit -a
Ulimit is for bash users, if you use tcsh the command to use is instead limit.

Related

bash programming, background processes, PID's and waiting for jobs to exit

I have written a bash script to continually run jobs to generate large quantities of simulation data.
Essentially once the script is run, it should continually launch background processes to generate data subject to the constraint that no more than 32 simultaneous background jobs can be run. This is required to prevent processes gobbling up all available ram and stalling the server.
My idea was to launch bash functions in the background and store the PID of those jobs. Then after 32 jobs have been launched, use the wait command to wait until all PIDs of jobs have finished executing.
I think wait is the correct tool to use here as so long as the pid of a process exists when the wait command is run (which it will because the simulations take 6 hours to run) then the wait command will detect the process exiting.
This seems like a better option than just polling processes and checking for the existence of a particular PID as PIDs are recycled, and another process could have been started after ours finished with the same PID. (Just by random chance, if we are unlucky.)
However, using the wait method has the drawback that if processes do not exit in the order they were run, then wait will be called for a PID which no longer exists unless a new process re-used the same PID as the one we recorded earlier, and in addition, if one job takes significantly longer than the others (again by chance) then we will be waiting for one job to end while there is room on the system for another 31 jobs, which cannot be run because we are waiting for that final PID to exit...
This is probably becoming a bit hard to visualize so let me add some code...
I am using a while loop as the foundation of this "algorithm"
c=0 # count total number of jobs launched (will not really use this here)
PIDS=() # keep any array of PIDs
# maximum number of simultaneous jobs and counter
BATCH_SIZE=32
BATCH_COUNT=0
# just start looping
while true
# edit: forgot to add this initially
# just check to see if a job has been run using file existance
if [ ! -e "$FILE_NAME_1" ]
then
# obvious
if [ "$BATCH_COUNT" -lt "$BATCH_SIZE" ]
then
(( BATCH_COUNT += 1 ))
# this is used elsewhere to keep track of whether a job has been executed (the file existence is a flag)
touch "$FILE_NAME_1"
# call background job, parallel_job_run is a bash function
parallel_job_run $has_some_arguments_but_not_relevent
# get PID
PID=$!
echo "[ JOB ] : Launched job as PID=$PID"
PIDS+=($PID)
# count total number of jobs
((c=c+1))
fi
else
# increment file name to use as that file already exists
# the "files" are for input/output
# the details are not particularly important
fi
true # prevent exit
# the following is a problem
do
if (( BATCH_COUNT < BATCH_SIZE ))
then
continue
else
# collect launched jobs
# this does not collect jobs in the order that they finish
# it will first wait for the first PID in the array to exit
# however this job may be the last to finish, in which case
# wait will be called with other array values with PID's which
# have already exited, and hence it is undefined behaviour
# as to whether we wait for a PID which doesn't exist (no problem)
# or a new process may have started which re-uses our PID
# and therefore we are waiting for someone else's process
# to finish which is nothing to do with our own jobs!
# we could be waiting for the PID of someone else's tty login
# for example!
for pid in "${PIDS[#]}"
do
wait $pid || echo "failed job PID=$pid"
(( BATCH_COUNT -= 1 ))
done
fi
done
Hopefully the combination of comments and above code and comments in the code should make it clear what I am attempting to do.
My other idea was to replace the for loop at the end with another loop which continually checks whether each of the PID's exist. (Polling.) This could be combined with sleep 1 to prevent CPU hogging. However the problem with this is as before, our process may exit releasing it's PID and another process may happen to be run which acquires that PID. The advantage of this method is that we will never wait more than about 1 second before a new process is launched when a previous one exits.
Can anyone give me any advice on how to proceed with the problems I am having here?
I will continually update this question today - for example by adding new information if I find any and by formatting it / rewording sections to make it clearer.
If you use -n option with wait, it will wait for the next process to finish, regardless of its PID. So, that could be one solution.
Also, Linux does not recycle the PID immediately as you seem to imply. It assigns the next available PID to the new process in order and starts from the beginning only after it has exhausted the maximum available PID.

how to know if system is completely idle

I am trying to figure out, how can we know if the system is idle? I want to suspend the system if it is idle for some x minutes. I tried to find for this and tried the below script code as well
#!/bin/bash
idletime=$((1000*60)) # 1 minute in milliseconds
while true; do
idle=`xprintidle`
echo $idle
if (( $idle > $idletime )); then
echo -n "mem" >> /sys/power/state
fi
sleep 1
done
But xprintidle only monitors the mouse and keyboard activity to increment it's counter.
Now if I run a program in infinite loop then also it will suspend the system.
The other option was extracting the idle time from /proc/stat over an interval of time, but on different systems I see different range of values for cpu idle, if I leave the system without any activity.
Can some one help me how can I implement suspension of system.
Stuff can, and will, happen at any time. Something gets kicked off by cron. Someone's sleep() call finishes, and it wakes up for a few milliseconds.
I'd say, come up with some meaningful heuristic. For example, periodically sample /proc/loadavg, and if the load average stays below some threshold, for a given period of time, assume that the system is now idle.

Stopping time and starting several programs with bash

I'm currently trying to measure the time a program needs to finish when I start it 8 times at the same time.
Now I would really like to write a bash or something that starts the program several times with different parameters and measures the time until all of them are finished.
I think I would manage to start my program 8 times by simply using & at the end but then I don't know how to know when they stop.
You can use wait to wait for background jobs to finish.
#!/bin/sh
program &
program &
wait
will wait until both instances of program exit.
use jobs to see whats still running, if you need 8 you can do
if jobs | wc -l < 8 then command &
Not working code but you get the idea
You can use the time command to measure the time consumption of a program, so perhaps something like
#!/bin/bash
time yourprogram withoneargument
time yourprogram with three arguments
...etc
Cheers,
Mr. Bystrup supplies the time command, which will time the execution of your programs. Mr. Politowski and user2814958 supply the & operator, which will run programs in the background, allowing you to start them at the same time. If you combine these, you're most of the way there, except the output from time for the different commands will be jumbled, and it will be hard to tell which output pertains to which command.
One way to overcome this issue is to shunt the output into different files:
/usr/bin/time program1 2>/tmp/program1.time &
/usr/bin/time program2 2>/tmp/program2.time &
Note that I'm redirecting the standard error (file descriptor 2) into the files; time writes its output on the standard error instead of the standard output. Also note my use of the full path to time. Some shells have a built-in time command that behaves differently, such as writing output directly to the terminal instead of standard error.

How to join multiple processes in shell?

So I've made a small c++ binary to connect to do a command on a server to stress test it, so i started working on the following shell script:
#!/bin/bash
for (( i = 0 ; i <= 15; i++ ))
do
./mycppbinary test 1 &
done
Now, I also happen to want to time how long all the processes take to execute. I suppose I'll have to do a time command on each of these processes?
Is it possible to join those processes, as if they're a thread?
You don't join them, you wait on them. At lest in bash, and probably other shells with job control.
You can use the bash fg command to bring the last background process back into the foreground. Do it in another loop to catch them all, though some may complete before this causing you to get an error about no such process. You're not joining processes, they aren't threads, they each have their own pid and unique memory space.
1st, make the script last the same as all its children
The script you propose will die before the processes finish, due to the fact that you are launching them on the background. If you don't want this to happen, you can do as many waits as needed (as Keith suggested).
2nd, time the script
Then, you can time your script and that will give you the total execution time, as you requested.
You can time your shell script, that will give you the total execution time.

Scheduling in Linux: run a task when computer is idle (= no user input)

I'd like to run Folding#home client on my Ubuntu 8.10 box only when it's idle because of the program's heavy RAM consumption.
By "idle" I mean the state when there's no user activity (keyboard, mouse, etc). It's OK for other (probably heavy) processes to run at that time since F#H has the lowest CPU priority. The point is just to improve user experience and to do heavy work when nobody is using the machine.
How to accomplish this?
When the machine in question is a desktop, you could hook a start/stop script into the screensaver so that the process is stopped when the screensaver is inactive and vice versa.
It's fiddly to arrange for the process to only be present when the system is otherwise idle.
Actually starting the program in those conditions isn't the hard bit. You have to arrange for the program to be cleanly shut down, and figure out how and when to do that.
You have to be able to distinguish between that process's own CPU usage, and that of the other programs that might be running, so that you can tell whether the system is properly "idle".
It's a lot easier for the process to only be scheduled when the system is otherwise idle. Just use the 'nice' command to launch the Folding#Home client.
However that won't solve the problem of insufficient RAM. If you've got swap space enabled, the system should be able to swap out any low priority processes such that they're not consuming and real resources, but beware of a big hit on disk I/O each time your Folding#Home client swaps in and out of RAM.
p.s. RAM is very cheap at the moment...
p.p.s. see this article
may be You need to set on idle task lowest priority via nice.
Your going to want to look at a few things to determine 'idle' and also explore the sysinfo() call (the link points out the difference in the structure that it populates between various kernel versions).
Linux does not manage memory in a typical way. Don't just look at loads, look at memory. In particular, /proc/meminfo has a wonderful line started with Committed_AS, which shows you how much memory the kernel has actually promised to other processes. Compare that with what you learned from sysinfo and you might realize that a one minute load average of 0.00 doesn't mean that its time to run some program that wants to allocate 256MB of memory, since the kernel may be really over-selling. Note, all values filled by sysinfo() are available via /proc, sysinfo() is just an easier way to get them.
You would also want to look at how much time each core has spent in IOWAIT since boot, which is an even stronger indicator of if you should run an I/O resource hog. Grab that info in /proc/stat, the first line contains the aggregate count of all CPU's. IOWAIT is in the 6'th field. Of course if you intend to set affinity to a single CPU, only that CPU would be of interest (its still the sixth field, in units of USER_HZ or typically in 100'ths of a second). Average that against btime, also found in /proc/stat.
In short, don't just look at load averages.
EDIT
You should not assume a lack of user input means idle.. cron jobs tend to run .. public services get taxed from time to time, etc. Idle remains your best guess based on reading the values (or perhaps more) that I listed above.
EDIT 2
Looking at the knob values in /proc/sys/vm also gives you a good indication of what the user thinks is idle, in particular swappiness. I realize your doing this only on your own box but this is an authoritative wiki and the question title is generic :)
The file /proc/loadavg has the systems current load. You can just write a bash script to check it, and if its low then run the command. Then you can add it to /etc/cron.d to run it periodically.
This file contains information about
the system load. The first three
numbers represent the number of active
tasks on the system - processes that
are actually running - averaged over
the last 1, 5, and 15 minutes. The
next entry shows the instantaneous
current number of runnable tasks -
processes that are currently scheduled
to run rather than being blocked in a
system call - and the total number of
processes on the system. The final
entry is the process ID of the process
that most recently ran.
Example output:
0.55 0.47 0.43 1/210 12437
If you're using GNOME then take look at this:
https://wiki.gnome.org/Attic/GnomeScreensaver/FrequentlyAskedQuestions
See this thread for a perl script that checks when the system is idle (through gnome screensaver).
You can run commands when idling starts and stops.
I'm using this with some scripts to change BOINC preferences when idling
(to give BOINC more memory and cpu usage).
perl script on ubuntu forums
You can use xprintidle command to find out if user is idle or not. The command will print value in milliseconds from the last interaction with X server.
Here is the sample script which can start/stop tasks when user will go away:
#!/bin/bash
# Wait until user will be idle for provided number of milliseconds
# If user wasn't idle for that amount of time, exit with error
WAIT_FOR_USER_IDLE=60000
# Minimal number of milliseconds after which user will be considered as "idle"
USER_MIN_IDLE_TIME=3000
END=$(($(date +%s%N)/1000000+WAIT_FOR_USER_IDLE))
while [ $(($(date +%s%N)/1000000)) -lt $END ]
do
if [ $(xprintidle) -gt $USER_MIN_IDLE_TIME ]; then
eval "$* &"
PID=$!
#echo "background process started with pid = $PID"
while kill -0 $PID >/dev/null 2>&1
do
if [ $(xprintidle) -lt $USER_MIN_IDLE_TIME ]; then
kill $PID
echo "interrupt"
exit 1
fi
sleep 1
done
echo "success"
exit 0
fi
sleep 1
done
It will take all arguments and execute them as another command when user will be idle. If user will interact with X server then running task will be killed by kill command.
One restriction - the task that you will run should not interact with X server otherwise it will be killed immediately after start.
I want something like xprintidle but it didn't work in my case(Ubuntu 21.10, Wayland)
I used the following solution to get idle current value(time with no mouse/keyboard input):
dbus-send --print-reply --dest=org.gnome.Mutter.IdleMonitor /org/gnome/Mutter/IdleMonitor/Core org.gnome.Mutter.IdleMonitor.GetIdletime
It should return uint64 time in milliseconds. Example:
$ sleep 3; dbus-send --print-reply --dest=org.gnome.Mutter.IdleMonitor /org/gnome/Mutter/IdleMonitor/Core org.gnome.Mutter.IdleMonitor.GetIdletime
method return time=1644776247.028363 sender=:1.34 -> destination=:1.890 serial=9792 reply_serial=2
uint64 2942 # i.e. 2.942 seconds without input

Resources