How to dispatch tasks in Linux when the system is not busy - linux

I'm using a 12 core 24 thread Linux machine to performing the following task. Each task is independent.
while read parameter
do
./program_a $parameter $parameter.log 2>&1 &
done < parameter_file
However this code will dispatch all the tasks at one time, which could read to serious context switch lack of idle cpu power and/or lack of memory.
I want to exploit system information tools such as free, top, and ps to determine if the task should be dispatched.
Using free.
while read parameter
do
#for instance using free
free_mem=`free -g | grep Mem | awk '{print $4}'`
if [ $free_mem -gt 10]; then
./program_a $parameter $parameter.log 2>&1 &
fi
done < parameter_file
But this won't work because this won't wait until the condition meet the criteria. How should I do this?
Besides how should I use top and ps to determine if the system is busy or not. I want to dispatch new task when the system is too busy.
Maybe I can use
ps aux | grep "program_a " | grep -v "grep" | wc -l
to limit the number of dispatched tasks. But it is a implicit way to determine if the system is busy or not. Any other thought?

while read parameter
do
#for instance using free
while 1; do
free_mem=`free -g | awk '/Mem/{print $4}'`
if (( $free_mem > 10 )); then
break
fi
sleep 1 # wait so that some tasks might finish
done
./program_a $parameter $parameter.log 2>&1 &
done < parameter_file

Related

Shutdown computer when all instances of a given program have finished

I use the following script to check whether wget has finished downloading. To check for this, I'm looking for its PID, and when it is not found the computer shutdowns. This works fine for a single instance of wget, however, I'd like the script to look for all already running wget programs.
#!/bin/bash
while kill -0 $(pidof wget) 2> /dev/null; do
for i in '-' '/' '|' '\'
do
echo -ne "\b$i"
sleep 0.1
done
done
poweroff
EDIT: I'd would be great if the script would check if at least one instance of wget is running and only then check whether wget has finished and shutdown the computer.
In addition to the other answers, you can satisfy your check for at least one wget pid by initially reading the result of pidof wget into an array, for example:
pids=($(pidof wget))
if ((${#pids[#]} > 0)); then
# do your loop
fi
This also brings up a way to routinely monitor the remaining pids as each wget operation completes, for example,
edit
npids=${#pids[#]} ## save original number of pids
while (( ${#pids[#]} -gt 0 )); do ## while pids remain
for ((i = 0; i < npids; i++)); do ## loop, checking remaining pids
kill -0 ${pids[i]} || pids[$i]= ## if not unset in array
done
## do your sleep and spin
done
poweroff
There are probably many more ways to do it. This is just one that came to mind.
I don't think kill is a right Idea,
may be some thing on the lines like this
while [ 1 ]
do
live_wgets=0
for pid in `ps -ef | grep wget| awk '{print $2}'` ; # Adjust the grep
do
live_wgets=$((live_wgets+1))
done
if test $live_wgets -eq 0; then # shutdown
sudo poweroff; # or whatever that suits
fi
sleep 5; # wait for sometime
done
You can adapt your script in the following way:
#!/bin/bash
spin[0]="-"
spin[1]="\\"
spin[2]="|"
spin[3]="/"
DOWNLOAD=`ps -ef | grep wget | grep -v grep`
while [ -n "$DOWNLOAD" ]; do
for i in "${spin[#]}"
do
DOWNLOAD=`ps -ef | grep wget | grep -v grep`
echo -ne "\b$i"
sleep 0.1
done
done
sudo poweroff
However I would recommend using cron instead of an active waiting approach or even use wait
How to wait in bash for several subprocesses to finish and return exit code !=0 when any subprocess ends with code !=0?

bash script for memory and cpu usage [duplicate]

This question already has answers here:
How do I get the total CPU usage of an application from /proc/pid/stat?
(6 answers)
Closed 7 years ago.
I need to write script which returns file with information about cpu and memory usage of specified process through given time period.
When I use ps -p pid I only get usage of one cpu core, and when I use top I get binary file as output. I tried with next :
while :;
top -n 1 -p pid | awk '{ print $9" "$10 }'
sleep 10;
done
Information the kernel offers for your process is in the /proc filesystem. Primarily you would need to parse these two files to get the pertinent data for your script
/proc/(pid)/status
/proc/(pid)/stat
This thread describes getting this CPU data in detail so I wont here.
The problem I think you'll find is CPU usage for a process is not broken down into the various cores available on your system, but rather summarized into a number that approaches 100% * (number of cores). The closest to this is the "last processor used" column of top (option f, J), though this hardly addresses your problem. A profiling tool like the one in this thread is likely the final answer.
I don't know your environment or requrements; however a solution could be running only the process isolated on a machine, then collecting each cores CPU usage at the system level loosely representing the process demand.
Try this:
$PID=24748; while true; do CPU=$(ps H -q $PID -eo "pid %cpu %mem" | grep $PID | cut -d " " -f 3 | sed -e 's/$/\+/' | tr -d "\n" | sed -e 's/+$/\n/' | bc) && MEM=$(ps H -q $PID -eo "pid %cpu %mem" | grep $PID | cut -d " " -f 4 | head -n 1) && echo $PID $CPU $MEM && sleep 3; done
It basically just adds up the CPU % usage of each thread with bc, takes the memory usage (as a whole), and prints them, of a specified PID.

Bash script optimization for waiting for a particular string in log files

I am using a bash script that calls multiple processes which have to start up in a particular order, and certain actions have to be completed (they then print out certain messages to the logs) before the next one can be started. The bash script has the following code which works really well for most cases:
tail -Fn +1 "$log_file" | while read line; do
if echo "$line" | grep -qEi "$search_text"; then
echo "[INFO] $process_name process started up successfully"
pkill -9 -P $$ tail
return 0
elif echo "$line" | grep -qEi '^error\b'; then
echo "[INFO] ERROR or Exception is thrown listed below. $process_name process startup aborted"
echo " ($line) "
echo "[INFO] Please check $process_name process log file=$log_file for problems"
pkill -9 -P $$ tail
return 1
fi
done
However, when we set the processes to print logging in DEBUG mode, they print so much logging that this script cannot keep up, and it takes about 15 minutes after the process is complete for the bash script to catch up. Is there a way of optimizing this, like changing 'while read line' to 'while read 100 lines', or something like that?
How about not forking up to two grep processes per log line?
tail -Fn +1 "$log_file" | grep -Ei "$search_text|^error\b" | while read line; do
So one long running grep process shall do preprocessing if you will.
Edit: As noted in the comments, it is safer to add --line-buffered to the grep invocation.
Some tips relevant for this script:
Checking that the service is doing its job is a much better check for daemon startup than looking at the log output
You can use grep ... <<<"$line" to execute fewer echos.
You can use tail -f | grep -q ... to avoid the while loop by stopping as soon as there's a matching line.
If you can avoid -i on grep it might be significantly faster to process the input.
Thou shalt not kill -9.

How to (trivially) parallelize with the Linux shell by starting one task per Linux core?

Today's CPUs typically comprise several physical cores. These might even be multi-threaded so that the Linux kernel sees quite a large number of cores and accordingly starts several times the Linux scheduler (one for each core). When running multiple tasks on a Linux system the scheduler achieves normally a good distribution of the total workload to all Linux cores (might be the same physical core).
Now, say, I have a large number of files to process with the same executable. I usually do this with the "find" command:
find <path> <option> <exec>
However, this starts just one task at any time and waits until its completion before starting the next task. Thus, just one core at any time is in use for this. This leaves the majority of the cores idle (if this find-command is the only task running on the system). It would be much better to launch N tasks at the same time. Where N is the number of cores seen by the Linux kernel.
Is there a command that would do that ?
Use find with the -print0 option. Pipe it to xargs with the -0 option. xargs also accepts the -P option to specify a number of processes. -P should be used in combination with -n or -L.
Read man xargs for more information.
An example command:
find . -print0 | xargs -0 -P4 -n4 grep searchstring
If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:
find | parallel do stuff {} --option_a\; do more stuff {}
You can install GNU Parallel simply by:
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
Watch the intro videos for GNU Parallel to learn more:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Gnu parallel or xargs -P is probably a better way to handle this, but you can also write a sort-of multi-tasking framework in bash. It's a little messy and unreliable, however, due to the lack of certain facilities.
#!/bin/sh
MAXJOBS=3
CJ=0
SJ=""
gj() {
echo ${1//[][-]/}
}
endj() {
trap "" sigchld
ej=$(gj $(jobs | grep Done))
jobs %$ej
wait %$ej
CJ=$(( $CJ - 1 ))
if [ -n "$SJ" ]; then
kill $SJ
SJ=""
fi
}
startj() {
j=$*
while [ $CJ -ge $MAXJOBS ]; do
sleep 1000 &
SJ=$!
echo too many jobs running: $CJ
echo waiting for sleeper job [$SJ]
trap endj sigchld
wait $SJ 2>/dev/null
done
CJ=$(( $CJ + 1 ))
echo $CJ jobs running. starting: $j
eval "$j &"
}
set -m
# test
startj sleep 2
startj sleep 10
startj sleep 1
startj sleep 1
startj sleep 1
startj sleep 1
startj sleep 1
startj sleep 1
startj sleep 2
startj sleep 10
wait

setting a variable on completion of an bg task in ubuntu linux 11.10 64bit shell

so I have this code:
uthreads=4
x=1
cd ./wxWidgets/wxWidgets/build-win
for upxs in $(ls ./lib/*.dll)
do
while [ true ] ; do
if [ $x -lt $uthreads ] ; then
x=$((x+1))
(upx -qqq --best --color $upxs; x=$((x-1))) &
break
fi
sleep 1
done
done
x=$((x=1))
the problem lies in the variable being modified in parenthesis. Naturally this does not work as intended as the variable never gets sent back to the parent shell. So my question is how do you do this? The variable must be incremented after the upx command finishes regardless of it's exit status. so a simple && or || won't work here. and you can't use a single & here after upx; because then the process runs in the background and instantly changes the variable...
So I'm effectively stuck and could use some help...
first of all, this: while [ true ] is wrong, true is a command and so is [ you don't need both here just do while true; do (and use [[ in bash, the other is just for POSIX-sh).
this: for upxs in $(ls ./lib/*.dll) is also wrong, and it's more serious; I'll give you some links with information about all that but for now just do: for upxs in ./lib/*.dll
another important mistake is that you're not quoting your variables, EVERY SINGLE VARIABLE MUST BE DOUBLE QUOTED, if there's anything you learn from this post please let it be that.
now regarding your particular problem the whole approach is wrong, you can't do what you're trying to do because you don't know which process will end first and you wait for a particular one or wait for all of them.
also notice you're running four processes with the same file, I'm sure your intention was to process 4 different files in parallel and you don't wait for them to finish either, chances are after your script ends your processes get killed.
the suboptimal but easy solution would be to have 4 processes running in parallel and wait for all of them to finish before starting 4 more, that is accomplished by this (which is similar to your current approach):
#!/bin/bash
uthreads=4
cd ./wxWidgets/wxWidgets/build-win
upxs=(lib/*.dll)
i=0
while ((i<${#upxs[#]})); do
for ((x=0; x<uthreads; x++)); do
[[ ${upxs[i+x]} ]] || break
upx -qqq --best --color "${upxs[i+x]}" &
done
i=$((i+x))
wait
done
now, this might not be enough to satisfy you :) so the way to implement 4 running processes at all times would be this (using PIDs):
#!/bin/bash
uthreads=4
cd ./wxWidgets/wxWidgets/build-win
upxs=(lib/*.dll)
i=0 pids=()
while [[ ${upxs[i]} ]]; do
while ((${#pids[#]}<uthreads)); do
upx -qqq --best --color "${upxs[i]}" & pids+=($!)
[[ ${upxs[++i]} ]] || break
done
sleep .1
pids=($(for pid in "${pids[#]}"; do kill -0 $pid && echo $pid; done))
done
wait
now here's the links I promised:
http://mywiki.wooledge.org/BashPitfalls#if_.5Bgrep_foo_myfile.5D
http://mywiki.wooledge.org/BashFAQ/001
http://mywiki.wooledge.org/Quotes
http://mywiki.wooledge.org/ProcessManagement
Assuming that you're trying to 'gate' the activity of a program by controlling how many copies of it are running in the background, check out the xargs command. My system doesn't have upx installed, so the best I can recommend as a code sample is
uthreads=4
cd ./wxWidgets/wxWidgets/build-win
ls ./lib/*.dll | xargs -I{} -P $uthreads upx -qqq --best --color {}
^ ^ ^ ^ ^ ^
+>your list | | | | |
| | | | +> item from your lst
| | | +> your cmd's opts
| | +> your cmd
| + xargs opt for # of processes
+ xargs option to indicate what string to replace with
use xargs --help to see all the options, man xargs if you're lucky for in-depth detail on usage, and search on web and here for more examples of usage. The wikipedia article is a good start.
I hope this helps

Resources