How to invoke while statement multiple times in parallel using xargs max-procs utility - linux

I have a file called test.txt in Linux. The contents are below.
test.txt
['table1']
['table2']
['table3']
['table4']
['table5']
and so on
Now I have a while statement that loops over the test.txt file does some jobs.
While statement is below:
while read -r line; do
table=${line:2:-2}
validateTable=$(hive --database history -e "SHOW TABLES LIKE '$table'")
if [[ -z $validateTable ]]; then
/home/"$USER"/import.py "${table}"
else
/home/"$USER"/append.py "${table}"
fi
done < < test.txt
In this while statement validateTable is to check whether table is present in hive or not.
If not present then it will invoke import.py script
if present it will invoke append.py script.
Now the while statement works fine. I am getting the expected result.
Requirement:
What I require is this I want to invoke the while statement in parallel. I mean I want the while to run 10 times at the same time.
What is the best possible solution.
I found that we can do it with xargs --max-procs option but cannot figure out how to use it.

xargs requires a single command to run, which means you would need to put the body of the loop into a shell script. Something like (call it myscript)
#!/bin/bash
table=${1:2:-2}
validateTable=$(hive --database history -e "SHOW TABLES LIKE '$table'")
if [[ -z $validateTable ]]; then
/home/"$USER"/import.py "${table}"
else
/home/"$USER"/append.py "${table}"
fi
Then run something like
xargs --max-procs 10 myscript < test.txt

Related

Automatically print command output from a separate file

Is it possible to have the output of commands separate_file.sh be visible to the caller of caller.sh automatically with the following setup?
I'm aware of adding >&2 to each of the commands in separate_file.sh, but I'm looking for a more automatic solution.
I'm also aware that the output is visible if I call separate_file.sh like so: separate_file.sh instead of $(separate_file.sh), but I'd like to preserve the -e option in caller.sh (it's used for other calls).
caller.sh:
#!/bin/bash -e
if [[ ! $(separate_file.sh) ]]; then
echo 'separate_file.sh failed'
fi
separate_file.sh:
#!/bin/bash
echo '1'
ls -la

Shell - iterate over content of file but do something only the first x lines

So guys,
I need your help trying to identify the fastest and the most "fault" tolerant solution to my problem.
I have a shell script which executes some functions, based on a txt file, in which I have a list of files.
The list can contain from 1 file to X files.
What I would like to do is iterate over the content of the file and execute my scripts for only 4 items out of the file.
Once the functions have been executed for these 4 files, go over to the next 4 .... and keep on doing so until all the files from the list have been "processed".
My code so far is as follows.
#!/bin/bash
number_of_files_in_folder=$(cat list.txt | wc -l)
max_number_of_files_to_process=4
Translated_files=/home/german_translated_files/
while IFS= read -r files
do
while [[ $number_of_files_in_folder -gt 0 ]]; do
i=1
while [[ $i -le $max_number_of_files_to_process ]]; do
my_first_function "$files" & # I execute my translation function for each file, as it can only perform 1 file per execution
find /home/german_translator/ -name '*.logs' -exec mv {} $Translated_files \; # As there will be several files generated, I have them copied to another folder
sed -i "/$files/d" list.txt # We remove the processed file from within our list.txt file.
my_second_function # Without parameters as it will process all the files copied at step 2.
done
# here, I want to have all the files processed and don't stop after the first iteration
done
done < list.txt
Unfortunately, as I am not quite good at shell scripting, I do not know how to structure it so that it won't waste any resources and mostly, to make sure that it "processes" everything from that file.
Do you have any advice on how to achieve what I am trying to achieve?
only 4 items out of the file. Once the functions have been executed for these 4 files, go over to the next 4
Seems to be quite easy with xargs.
your_function() {
echo "Do something with $1 $2 $3 $4"
}
export -f your_function
xargs -d '\n' -n 4 bash -c 'your_function "$#"' _ < list.txt
xargs -d '\n' for each line
-n 4 take for arguments
bash .... - run this command with 4 arguments
_ - the syntax is bash -c <script> $0 $1 $2 etc..., see man bash.
"$#" - forward arguments
export -f your_function - export your function to environment so child bash can pick it up.
I execute my translation function for each file
So you execute your translation function for each file, not for each 4 files. If the "translation function" is really for each file with no inter-file state, consider rather executing 4 processes in parallel with same code and just xargs -P 4.
If you have GNU Parallel it looks something like this:
doit() {
my_first_function "$1"
my_first_function "$2"
my_first_function "$3"
my_first_function "$4"
my_second_function "$1" "$2" "$3" "$4"
}
export -f doit
cat list.txt | parallel -n4 doit

Why is a part of the code inside a (False) if statement executed?

I wrote a small script which:
prints the content of a file (generated by another application) on paper with a matrix printer
prints the same line into a backup file
removes the original file.
The script runs every minute by a cronjob and works fine as long as there are files to print. If there are no files to print, it prints an empty line on the matrix printer and in the backup file. I don't understand why this happens as i implemented an if statement which checks if there is a file to print before the print command is executed. This behaviour only happens if the script is executed by the cron and not if i execute it manually with ./script.sh. What's the reason of this? and how can i solve it?
Something i noticed on the side is that if I place an echo "hi" command in the script, its printed to the matrix printer and the backup file. I expected that its printed to the console console when it has no >> something behind. How does this work?
The script:
#!/bin/bash
# Make sure the backup directory exists
if [ ! -d /home/user/backup_logprint ]
then
mkdir /home/user/backup_logprint
fi
# Print the records if there are any
date=`date +%Y-%m-%d`
filename='_logprint_backup'
printer_path="/dev/usb/lp0"
if [ `ls /tmp/ | grep logprint | wc -l` -gt 0 ]
then
for f in `ls /tmp | grep logprint`
do
echo `cat /tmp/$f` >> "/home/user/backup_logprint/$date$filename"
echo `cat /tmp/$f` >> $printer_path
rm "/tmp/$f"
done
fi
There's no need for ls or an if statement. Just use a proper glob in the for loop, and if no file match, the loop won't be entered.
#!/bin/bash
# Don't check first; just let mkdir decide if
# anything actually needs to be created.
d=/home/user/backup_logprint
mkdir -p "$d"
filename=$(date +"$d/%Y-%m-%d_logprint_backup")
printer_path="/dev/usb/lp0"
# Cause non-matching globs to expand to an empty
# sequence instead of being treated literally.
shopt -s nullglob
for f in /tmp/*logprint*; do
cat "$f" > "$printer_path" && mv "$f" "$d"
done

bash script loop breaks [duplicate]

I have the following shell script. The purpose is to loop thru each line of the target file (whose path is the input parameter to the script) and do work against each line. Now, it seems only work with the very first line in the target file and stops after that line got processed. Is there anything wrong with my script?
#!/bin/bash
# SCRIPT: do.sh
# PURPOSE: loop thru the targets
FILENAME=$1
count=0
echo "proceed with $FILENAME"
while read LINE; do
let count++
echo "$count $LINE"
sh ./do_work.sh $LINE
done < $FILENAME
echo "\ntotal $count targets"
In do_work.sh, I run a couple of ssh commands.
The problem is that do_work.sh runs ssh commands and by default ssh reads from stdin which is your input file. As a result, you only see the first line processed, because the command consumes the rest of the file and your while loop terminates.
This happens not just for ssh, but for any command that reads stdin, including mplayer, ffmpeg, HandBrakeCLI, httpie, brew install, and more.
To prevent this, pass the -n option to your ssh command to make it read from /dev/null instead of stdin. Other commands have similar flags, or you can universally use < /dev/null.
A very simple and robust workaround is to change the file descriptor from which the read command receives input.
This is accomplished by two modifications: the -u argument to read, and the redirection operator for < $FILENAME.
In BASH, the default file descriptor values (i.e. values for -u in read) are:
0 = stdin
1 = stdout
2 = stderr
So just choose some other unused file descriptor, like 9 just for fun.
Thus, the following would be the workaround:
while read -u 9 LINE; do
let count++
echo "$count $LINE"
sh ./do_work.sh $LINE
done 9< $FILENAME
Notice the two modifications:
read becomes read -u 9
< $FILENAME becomes 9< $FILENAME
As a best practice, I do this for all while loops I write in BASH.
If you have nested loops using read, use a different file descriptor for each one (9,8,7,...).
More generally, a workaround which isn't specific to ssh is to redirect standard input for any command which might otherwise consume the while loop's input.
while read -r line; do
((count++))
echo "$count $line"
sh ./do_work.sh "$line" </dev/null
done < "$filename"
The addition of </dev/null is the crucial point here, though the corrected quoting is also somewhat important for robustness; see also When to wrap quotes around a shell variable?. You will want to use read -r unless you specifically require the slightly odd legacy behavior you get for backslashes in the input without -r. Finally, avoid upper case for your private variables.
Another workaround of sorts which is somewhat specific to ssh is to make sure any ssh command has its standard input tied up, e.g. by changing
ssh otherhost some commands here
to instead read the commands from a here document, which conveniently (for this particular scenario) ties up the standard input of ssh for the commands:
ssh otherhost <<'____HERE'
some commands here
____HERE
ssh -n option prevents checking the exit status of ssh when using HEREdoc while piping output to another program.
So use of /dev/null as stdin is preferred.
#!/bin/bash
while read ONELINE ; do
ssh ubuntu#host_xyz </dev/null <<EOF 2>&1 | filter_pgm
echo "Hi, $ONELINE. You come here often?"
process_response_pgm
EOF
if [ ${PIPESTATUS[0]} -ne 0 ] ; then
echo "aborting loop"
exit ${PIPESTATUS[0]}
fi
done << input_list.txt
This was happening to me because I had set -e and a grep in a loop was returning with no output (which gives a non-zero error code).

List files greater than 100K in bash

I want to list the files recursively in the HOME directory. I'm trying to write my own script , so I should not use the command find or ls. My script is:
#!/bin/bash
minSize=102400;
printFiles() {
for x in "$1/"*; do
if [ -d "$x" ]; then
printFiles "$x";
else
size=$(wc -c "$x");
if [[ "$size" -gt "$minSize" ]]; then
echo "$size";
fi
fi
done
}
printFiles "/~";
So, the problem here is that when I run this script, the terminal throws Line 11: division by 0 and /home/gandalf/Videos/*: No such file or directory. I have not divided by any number, why I'm getting this error?. And the second one?
Alternatively, I can't use find or ls because I have to display the files one by one asking to the user if he want to see the next file or not. This is possible using the command find or ls or only can be done writing my own function?
Thanks.
size=$(wc -c "$x");
That's the line that is failing. When you run that wc command manually you should be able to see why:
$ wc -c /tmp/out
5 /tmp/out
The output contains not only the file size but also the file name. So you can't use $size with the -gt comparator on the next line. One way to fix that is to change the wc line to use cut (or awk, or sed, etc) to keep just the file size.
size=$(wc -c "$x" | cut -f1 -d " ")
A simpler alternative suggested by #mklement0:
size=$(wc -c < "$x")

Resources