Limit number of parallel jobs in bash [duplicate] - linux

This question already has answers here:
Bash: limit the number of concurrent jobs? [duplicate]
(14 answers)
Closed 1 year ago.
I want to read links from file, which is passed by argument, and download content from each.
How can I do it in parallel with 20 processes?
I understand how to do it with an unlimited number of processes:
#!/bin/bash
filename="$1"
mkdir -p saved
while read -r line; do
url="$line"
name_download_file_sha="$(echo $url | sha256sum | awk '{print $1}').jpeg"
curl -L $url > saved/$name_download_file_sha &
done < "$filename"
wait

You can add this test :
until [ "$( jobs -lr 2>&1 | wc -l)" -lt 20 ]; do
sleep 1
done
This will maintain maximum 21 instance of curl in parallel .
And wait until you reach 19 or a lower value to start another one .
If you are using GNU sleep , you can do sleep 0.5 , to optimize the wait time
So you code will be
#!/bin/bash
filename="$1"
mkdir -p saved
while read -r line; do
until [ "$( jobs -lr 2>&1 | wc -l)" -lt 20 ]; do
sleep 1
done
url="$line"
name_download_file_sha="$(echo $url | sha256sum | awk '{print $1}').jpeg"
curl -L $url > saved/$name_download_file_sha &
done < "$filename"
wait

xargs -P is the simple solution. It gets somewhat more complicated when you want to save to separate files, but you can use sh -c to add this bit.
: ${processes:=20}
< $filename xargs -P $processes -I% sh -c '
line="$1"
url_file="$line"
name_download_file_sha="$(echo $url_file | sha256sum | awk "{print \$1}").jpeg"
curl -L $url > saved/$name_download_file_sha
' -- %
Based on triplee's suggestions, I've lower-cased the environment variable and changed its name to 'processes' to be more correct.
I've also made the suggested corrections to the awk script to avoid quoting issues.
You may still find it easier to replace the awk script with cut -f1, but you'll need to specify the cut delimeter if it's spaces (not tabs).

Related

2 Linux scripts nearly identical. Variables getting confused between to different scripts

I have two scripts. The only difference between the two scripts is the log file name and the device ip address that it fetches the data from. The problem is that the log file that concats continuously mixes up and starts writing the contents of one device onto the log of the other. So, 1 particular log file randomly switches from showing the data from one device to the other device..
Here is a sample of what it gets from the curl call.
{"method":"uploadsn","mac":"04786364933C","version":"1.35","server":"HT","SN":"267074DE","Data":[7.2]}
I'm 99% the issue is with the log variable, as one script runs every 30 minutes and one script runs every 15 minutes, so i can tell by the date stamps that the issue is not from fetching from the wrong device, but the concatenating of the files. It appears to concat the wrong file to the new file....
Here is the code of both.
#!/bin/bash
log="/scripts/cellar.log"
if [ ! -f "$log" ]
then
touch "$log"
fi
now=`date +%a,%m/%d/%Y#%I:%M%p`
json=$(curl -m 3 --user *****:***** "http://192.168.1.146/monitorjson" --silent --stderr -)
celsius=$(echo $json | cut -d "[" -f2 | cut -d "]" -f1)
temp=$(echo "scale=4; $celsius*1.8 + 32" | bc)
line=$(echo $now : $temp)
echo $line
echo $line | cat - $log > temp && mv temp $log | sed -n '1,192p' $log
and here is the second
#!/bin/bash
log="/scripts/gh.log"
if [ ! -f "$log" ]
then
touch "$log"
fi
now=`date +%a,%m/%d/%Y#%I:%M%p`
json=$(curl -m 3 --user *****:***** "http://192.168.1.145/monitorjson" --silent --stderr -)
celsius=$(echo $json | cut -d "[" -f2 | cut -d "]" -f1)
temp=$(echo "scale=4; $celsius*1.8 + 32" | bc)
line=$(echo $now : $temp)
#echo $line
echo $line | cat - $log > temp && mv temp $log | sed -n '1,192p' $log
Example of bad log file (shows contents of both devices when should only contain 1):
Mon,11/28/2022#03:30AM : 44.96
Mon,11/28/2022#03:00AM : 44.96
Mon,11/28/2022#02:30AM : 44.96
Tue,11/29/2022#02:15AM : 60.62
Tue,11/29/2022#02:00AM : 60.98
Tue,11/29/2022#01:45AM : 60.98
The problem is that you use "temp" as the filename for a temporary file in both scripts.
I'm not good in understanding sed, but as I read it, you print only the first 192 lines of the logfile with your command. You don't need a temporary file for that.
First: logfiles are usually written from oldest to newest entry (top to bottom), so probably you want to view the 192 newest lines? Then you can make use of the >> output redirection to append your output to the file. Then use tail to get only the bottom of the file. And if necessary, you could reverse that final output.
That last line of your script would then be replaced by:
sed -i '1i '"$line"'
192,$d' $log
Further possible improvements:
Use a single script that gets URL and log filename as parameters
Use the usual log file order (newest entries appended at the end)
Don't truncate log files inside the script, but use logrotate to not exceed a certain filesize

Using ssh inside a script to run another script that itself calls ssh

I'm trying to write a script that builds a list of nodes then ssh into the first node of that list
and runs a checknodes.sh script which it's self is just a for i loop that calls checknode.sh
The first 2 lines seems to work ok, the list builds successfully, but then I get either get just the echo line of checknodes.sh to print out or an error saying cat: gpcnodes.txt: No such file or directory
MYSCRIPT.sh:
#gets the master node for the job
MASTERNODE=`qstat -t -u \* | grep $1 | awk '{print$8}' | cut -d'#' -f 2 | cut -d'.' -f 1 | sed -e 's/$/.com/' | head -n 1`
#builds list of nodes in job
ssh -qt $MASTERNODE "qstat -t -u \* | grep $1 | awk '{print$8}' | cut -d'#' -f 2 | cut -d'.' -f 1 | sed -e 's/$/.com/' > /users/issues/slow_job_starts/gpcnodes.txt"
ssh -qt $MASTERNODE cd /users/issues/slow_job_starts/
ssh -qt $MASTERNODE /users/issues/slow_job_starts/checknodes.sh
checknodes.sh
for i in `cat gpcnodes.txt `
do
echo "### $i ###"
ssh -qt $i /users/issues/slow_job_starts/checknode.sh
done
checknode.sh
str=`hostname`
cd /tmp
time perf record qhost >/dev/null 2>&1 | sed -e 's/^/${str}/'
perf report --pretty=raw | grep % | head -20 | grep -c kernel.kallsyms | sed -e "s/^/`hostname`:/"
When ssh -qt $MASTERNODE cd /users/issues/slow_job_starts/ is finished, the changed directory is lost.
With the backquotes replaced by $(..) (not an error here, but get used to it), the script would be something like
for i in $(cat /users/issues/slow_job_starts/gpcnodes.txt)
do
echo "### $i ###"
ssh -nqt $i /users/issues/slow_job_starts/checknode.sh
done
or better
while read -r i; do
echo "### $i ###"
ssh -nqt $i /users/issues/slow_job_starts/checknode.sh
done < /users/issues/slow_job_starts/gpcnodes.txt
Perhaps you would also like to change your last script (start with cd /users/issues/slow_job_starts)
You will find more problems, like sed -e 's/^/${str}/' (the ${str} inside single quotes won't be replaced by a host), but this should get you started.
EDIT:
I added option -n to the ssh call.
Redirects stdin from /dev/null (actually, prevents reading from stdin).
Without this option only one node is checked.

BASH script : Integrated document creation hangs

I find that a piece of my bash script causes the hang up. I extract it here :
#!/bin/bash
cat << EndOfFspreadFile >> ./myscript.sh
echo Enter Source Path :
read SRCPATH
FILECNT=`find $SRCPATH/* 2>/dev/null | wc -l`
FILECNTERR=`find $SRCPATH/* 2>&1 | grep "find:" | wc -l`
echo count : $FILECNT
echo problems : $FILECNTERR
EndOfFspreadFile
echo done
This script is expected to just append the script part in the integrated block into myscript.sh file. But it just HANGS !
Thanks !
- Mohamed -
Your $ variables and back quotes will get expanded. You need to escape them in script.
Right now you end up searching the entire filesystem.
Basically, find $SRCPATH/* 2>/dev/null | wc -l gets executed as find /* 2>/dev/null | wc -l
Here is how you can rewrite it (just one line example):
FILECNT=\$(find \$SRCPATH/* 2>/dev/null | wc -l)
By the way, it's easy to find out if you run bash -x <your script>.

bash - errors trying to pipe commands to run to separate function

I'm trying to get this function for making it easy to parallelize my bash scripts working. The idea is simple; instead of running each command sequentially, I pipe the command I want to run to this function and it does while read line; run the jobs in the bg for me and take care of logistics.... it doesn't work though. I added set -x by where stuff's executed and it looks like I'm getting weird quotes around the stuff I want executed... what should I do?
runParallel () {
while read line
do
while [ "`jobs | wc -l`" -eq 8 ]
do
sleep 2
done
{
set -x
${line}
set +x
} &
done
while [ "`jobs | wc -l`" -gt 0 ]
do
sleep 1
jobs >/dev/null 2>/dev/null
echo sleeping
done
}
for H in `ypcat hosts | grep fmez | grep -v mgmt | cut -d\ -f2 | sort -u`
do
echo 'ping -q -c3 $H 2>/dev/null 1>/dev/null && echo $H - UP || echo $H - DOWN'
done | runParallel
When I run it, I get output like the following:
> ./myscript.sh
+ ping -q -c3 '$H' '2>/dev/null' '1>/dev/null' '&&' echo '$H' - UP '||' echo '$H' - DOWN
Usage: ping [-LRUbdfnqrvVaA] [-c count] [-i interval] [-w deadline]
[-p pattern] [-s packetsize] [-t ttl] [-I interface or address]
[-M mtu discovery hint] [-S sndbuf]
[ -T timestamp option ] [ -Q tos ] [hop1 ...] destination
+ set +x
sleeping
>
The quotes in the set -x output are not the problem, at most they are another result of the problem. The main problem is that ${line} is not the same as eval ${line}.
When a variable is expanded, the resulting words are not treated as shell reserved constructs. And this is expected, it means that eg.
A="some text containing > ; && and other weird stuff"
echo $A
does not shout about invalid syntax but prints the variable value.
But in your function it means that all the words in ${line}, including 2>/dev/null and the like, are passed as arguments to ping, which set -x output nicely shows, and so ping complains.
If you want to execute from variables complicated commandlines with redirections and conditionals, you will have to use eval.
If I'm understanding this correctly, you probably don't want single quotes in your echo command. Single quotes are literal strings, and don't interpret your bash variable $H.
Like many users of GNU Parallel you seem to have written your own parallelizer.
If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:
cat hosts | parallel -j8 'ping -q -c3 {} 2>/dev/null 1>/dev/null && echo {} - UP || echo {} - DOWN'
You can install GNU Parallel simply by:
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
Watch the intro videos for GNU Parallel to learn more:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Put your command in an array.

newbie in bash scripting assistance please

I run bash scripts from time to time on my servers, I am trying to write a script that monitors log folders and compress log files if folder exceeds defined capacity. I know there are better ways of doing what I am currently trying to do, your suggestions are more than welcome. The script below is throwing an error "unexpected end of file" .Below is my script.
dir_base=$1
size_ok=5000000
cd $dir_base
curr_size=du -s -D | awk '{print $1}' | sed 's/%//g' zipname=archivedate +%Y%m%d
if (( $curr_size > $size_ok ))
then
echo "Compressing and archiving files, Logs folder has grown above 5G"
echo "oldest to newest selected."
targfiles=( `ls -1rt` )
echo "rocess files."
for tfile in ${targfiles[#]}
do
let `du -s -D | awk '{print $1}' | sed 's/%//g' | tail -1`
if [ $curr_size -lt $size_ok ];
then
echo "$size_ok has been reached. Stopping processes"
break
else if [ $curr_size -gt $size_ok ];
then
zip -r $zipname $tfile
rm -f $tfile
echo "Added ' $tfile ' to archive'date +%Y%m%d`'.zip and removed"
else [ $curr_size -le $size_ok ];
echo "files in $dir_base are less than 5G, not archiving"
fi
Look into logrotate. Here is an example of putting it to use.
With what you give us, you lack a "done" to end the for loop and a "fi" to end the main if. Please reformat your code and You will get more precise answers ...
EDIT :
Looking at your reformatted script, it is as said : The "unexpected end of file" comes from the fact you have not closed your "for" loop neither your "if"
As it seems that you mimick the logrotate behaviour, check it as suggested by #Hank...
my2c
My du -s -D does not show % sign. So you can just do.
curr_size=$(du -s -D)
set -- $curr_size
curr_size=$1
saves you a few overheads instead of du -s -D | awk '{print $1}' | sed 's/%//g.
If it does show % sign, you can get rid of it like this
du -s -D | awk '{print $1+0}'. No need to use sed.
Use $() syntax instead of backticks whenever possible
For targfiles=(ls -1rt) , you can omit the -1. So it can be
targfiles=( $(ls -rt) )
Use quotes around your variables whenever possible. eg "$zipname" , "$tfile"

Resources