Wait for all files with a certain extension to stop existing - linux

I have a shell script that unzips a bunch of files, then processes the files and then zips them back up again. I want to wait with the processing until all the files are done unzipping.
I know how to do it for one file:
while [ -s /homes/ndeklein/mzml/JG-C2-1.mzML.gz ]
do
echo "test"
sleep 10
done
However, when I do
while [ -s /homes/ndeklein/mzml/*.gz ]
I get the following error:
./test.sh: line 2: [: too many arguments
I assume because there are more than 1 results. So how can I do this for multiple files?

You can execute a subcommand in the shell and check that there is output:
while [ -n "$(ls /homes/ndeklein/mzml/*.gz 2> /dev/null)" ]; do
# your code goes here
sleep 1; # generally a good idea to sleep at end of while loops in bash
done
If the directory could potentially have thousands of files, you may want to consider using find instead of ls with the wildcard, ie; find -maxdepth 1 -name "*\.gz"

xargs is your friend if while is not coerced.
ls /homes/ndeklein/mzml/*.gz | xargs -I {} gunzip {}

Related

Shell - iterate over content of file but do something only the first x lines

So guys,
I need your help trying to identify the fastest and the most "fault" tolerant solution to my problem.
I have a shell script which executes some functions, based on a txt file, in which I have a list of files.
The list can contain from 1 file to X files.
What I would like to do is iterate over the content of the file and execute my scripts for only 4 items out of the file.
Once the functions have been executed for these 4 files, go over to the next 4 .... and keep on doing so until all the files from the list have been "processed".
My code so far is as follows.
#!/bin/bash
number_of_files_in_folder=$(cat list.txt | wc -l)
max_number_of_files_to_process=4
Translated_files=/home/german_translated_files/
while IFS= read -r files
do
while [[ $number_of_files_in_folder -gt 0 ]]; do
i=1
while [[ $i -le $max_number_of_files_to_process ]]; do
my_first_function "$files" & # I execute my translation function for each file, as it can only perform 1 file per execution
find /home/german_translator/ -name '*.logs' -exec mv {} $Translated_files \; # As there will be several files generated, I have them copied to another folder
sed -i "/$files/d" list.txt # We remove the processed file from within our list.txt file.
my_second_function # Without parameters as it will process all the files copied at step 2.
done
# here, I want to have all the files processed and don't stop after the first iteration
done
done < list.txt
Unfortunately, as I am not quite good at shell scripting, I do not know how to structure it so that it won't waste any resources and mostly, to make sure that it "processes" everything from that file.
Do you have any advice on how to achieve what I am trying to achieve?
only 4 items out of the file. Once the functions have been executed for these 4 files, go over to the next 4
Seems to be quite easy with xargs.
your_function() {
echo "Do something with $1 $2 $3 $4"
}
export -f your_function
xargs -d '\n' -n 4 bash -c 'your_function "$#"' _ < list.txt
xargs -d '\n' for each line
-n 4 take for arguments
bash .... - run this command with 4 arguments
_ - the syntax is bash -c <script> $0 $1 $2 etc..., see man bash.
"$#" - forward arguments
export -f your_function - export your function to environment so child bash can pick it up.
I execute my translation function for each file
So you execute your translation function for each file, not for each 4 files. If the "translation function" is really for each file with no inter-file state, consider rather executing 4 processes in parallel with same code and just xargs -P 4.
If you have GNU Parallel it looks something like this:
doit() {
my_first_function "$1"
my_first_function "$2"
my_first_function "$3"
my_first_function "$4"
my_second_function "$1" "$2" "$3" "$4"
}
export -f doit
cat list.txt | parallel -n4 doit

Silent while loop in bash

I am looking to create a bash script that keeps checking a file in directory and perform certain operation on it. I am using while loop, if file does not exists I want that while loop stays quite and keeps on checking condition. Here is what i created but it keeps throwing error that file not found, if file is not there.
while [ ! -f /home/master/applications/tmp/mydata.txt ]
do
cat mydata.txt;
rm mydata.txt;
sleep 1; done
There are two issue in your implementation:
You should use the same (absolute or relative) path in your while loop test statement [ ! -f $file ] and in your cat and rm commands.
The cat command is looking for the file in the current working directory (pwd) and your while statement might be looking somewhere else and hence, your implementation is buggy and won't work as expected if your pwd isn't /home/master/applications/tmp.
You need to move your cat command and rm command after the while block. It doesn't make sense to cat a file if a file doesn't exist. I think your misplaced those commands.
Try this:
file="/home/master/applications/tmp/mydata.txt"
while [ ! -f "$file" ]
do
sleep 1
done
cat $file
rm $file
EDIT
As per suggestion from #Ivan, you could use until instead of while as it suits more to your requirements.
file="/home/master/applications/tmp/mydata.txt"
until [ -f "$file" ]; do sleep 1; done
cat $file
rm $file
Making a different assumption than abhiarora, I'll guess maybe you meant for the file to reappear, and you want it shown each time.
file=/home/master/applications/tmp/mydata.txt
while :
do if [[ -f "$file" ]]
then echo "$(<"$file")"
rm "$file"
fi
sleep 1
done
This creates an infinite loop. If that's NOT what you wanted, use abhiarora's solution.

List files greater than 100K in bash

I want to list the files recursively in the HOME directory. I'm trying to write my own script , so I should not use the command find or ls. My script is:
#!/bin/bash
minSize=102400;
printFiles() {
for x in "$1/"*; do
if [ -d "$x" ]; then
printFiles "$x";
else
size=$(wc -c "$x");
if [[ "$size" -gt "$minSize" ]]; then
echo "$size";
fi
fi
done
}
printFiles "/~";
So, the problem here is that when I run this script, the terminal throws Line 11: division by 0 and /home/gandalf/Videos/*: No such file or directory. I have not divided by any number, why I'm getting this error?. And the second one?
Alternatively, I can't use find or ls because I have to display the files one by one asking to the user if he want to see the next file or not. This is possible using the command find or ls or only can be done writing my own function?
Thanks.
size=$(wc -c "$x");
That's the line that is failing. When you run that wc command manually you should be able to see why:
$ wc -c /tmp/out
5 /tmp/out
The output contains not only the file size but also the file name. So you can't use $size with the -gt comparator on the next line. One way to fix that is to change the wc line to use cut (or awk, or sed, etc) to keep just the file size.
size=$(wc -c "$x" | cut -f1 -d " ")
A simpler alternative suggested by #mklement0:
size=$(wc -c < "$x")

How to pipe files one by one from list into script?

I have a list of files that I need to pipe into a shell script. I can list the files within a directory by using the following:
ls ~/data/2121/*SOMEFILE*
resulting in:
2121.SOMEFILEaa
2121.SOMEFILEab
2121.SOMEFILEac
and so on...
I have another script that performs some processing on a single file (2121.SOMEFILEaa) which I run by using the following command:
bash runscript ../data/2121/2121.SOMEFILEaa
However, I need to make this more efficient by piping individual files from the list of files generated via ls into the script. How can I pipe the results from the ls ~/data/2121/*SOMEFILES* command--file by file--into the runscript script?
Another option
ls ~/data/2121/*SOMEFILE* | xargs -L1 bash runscript
I think you are looking for this:
for file in ~/data/2121/*SOMEFILE*; do
bash runscript "$file"
done
In this way, you're calling bash runscript for each file.
$ cat pipe.sh
#!/bin/bash
## Store data from pipe to variable $PIPE ------#
_read_pipe(){ #
while read -t 10 pipe; do
if [ -n "$pipe" ] ;then
PIPE="$PIPE $pipe" ;fi ;done ;}
## your code -----------------------------------#
_read_pipe #
for kung_foo in $PIPE ;do
echo $kung_foo ;done
$ ls 2121.SOMEFILE* | ./pipe.sh
2121.SOMEFILEaa
2121.SOMEFILEab
2121.SOMEFILEac
and so on...
[ -t ] is for timeout
I hope this helps,
cheers Karim

Getting an empty file for grep output

I am running this command in a script
while [ 1 ]
do
if [ -e $LOG ]
then
grep -A 5 -B 5 -f $PATTERNS $LOG >> $FOREMAIL
break
fi
done
$LOG file is scp'ed from another machine. So as soon as it appears in the current directory, while loop detects it and does the grep. The problem is, the $FOREMAIL file turns up to be empty. But if I run this grep outside of the script as a standalone command with same files and params, I can see that it generates an output.
I am baffled as to why this command is generating no o/p in the script?
The -e is triggering as soon as scp creates the file, while it still has no data in it, and grep is operating on an empty file. You need to wait until the file has finished transferring.
You could accomplish this by transferring to a temporary filename, than running mv over ssh from the machine which is pushing the file up.
Edit: the code for the machine copying to log file up...
scp $log 192.168.0.1:/logfiles/${log}.tmp
ssh 192.168.0.1 mv /logfiles/${log}.tmp /logfiles/${log}
Before you can grep, you need to wait for two things: 1) the download started (file comes into existence) and 2) download finished (nobody is opening the file anymore). I have a script call waitfor.sh, which does this:
#!/bin/bash
# waitfor.sh - wait for a file fully downloaded (via Firefox, scp, ...)
# Syntax:
# waitfor.sh filename
FILENAME=$1 # Name of file to wait for
INTERVAL=10 # Wait interval of N seconds
# Wait for download started
while [ ! -f $FILENAME ]
do
sleep $INTERVAL
done
# Wait for download finished
while lsof $FILENAME
do
sleep $INTERVAL
done
To use it:
waitfor.sh $LOG
grep ...
Could it be that the while [1] is very fast, so when the file starts copying, it shows up as an empty file first before copying is complete? Depending on the size of the file, try a sleep delay inside the then loop. Figuring out when a file finishes copying when done by an external process is probably a separate question - e.g. googling for something like "how to tell when scp has finshed copying a file" turns up a bunch of links like: https://superuser.com/questions/45224/is-there-a-way-to-tell-if-a-file-is-done-copying
Better to use:
if [ -f $LOG ]
instead of:
if [ -e $LOG ]
-f checks for a regular type
-e checks for any file
Here's what I ended up doing:
scp $LOGFILE
then
scp $SCPDONE # empty file
And modified the if clause like this:
while [ 1 ]
do
if [ -e $SCPDONE ]
then
grep -A 5 -B 5 -f $PATTERNS $LOG >> $FOREMAIL
break
fi
done

Resources