Iterate through filelist from two directions - linux

I have the following bash script
#!/bin/bash
for i in `ls /file-directory/ | grep -v static-backup | grep -v fileGroup1 | grep -v fileGroup2`
do
echo $i
rsync --delete -avz --size-only --exclude "$i/stuff1" --exclude "$i/stuff2" --exclude "$i/stuff3" --exclude "$i/stuff4" --exclude "$i/stuff5" --exclude "$i/stuff6" /file-directory/$i otherServer:/file-directory/ && echo " exit code: " $?" $i" || echo " exit code: " $?" $i"
done
The script iterates through a file directory and rsyncs its subdirectories, excluding certain fileGroups and portions of those filegroup's directories. I would like this script to spawn two rsync jobs, one that starts at the top of the directory and another that starts at the bottom. They would iterate in opposite directions and meet in the middle.
This is relatively simple to do with normal counting for loops, and wouldn't bee to hard in something like python (you could just save the number of directories as a variable, then iterate using that var). How can I do something similar in bash?

You can make use of & in bash to fork a process. or even loop.
I am sure there are more than one way to achieve this, but here's one way.
Pseudo code.
Get listing of files in two separate arrays such that they each contain half of listing (for even).
For odd files , second array will contain one more file.
Loop through first array and fork a process using &
Loop through second array and fork second process using &
Code
#!/bin/bash
dir_len=$(ls /file-directory/ |wc -l|awk '{print $NF}')
midpoint=$(($dir_len / 2))
array1=()
array2=()
count=0
#Loop through files and divide contents into two arrays
for i in $(ls /file-directory/)
do
count=$(($count + 1))
if [ $count -le $midpoint ] ; then
array1+=($i)
elif [[ $count -gt $midpoint && $count -le $dir_len ]] ; then
array2+=($i)
fi
done
#Loop through first array and fork
for i in "${array1[#]}"
do
echo $i
#your rsync command here
done&
#Loop through second array and fork
for j in "${array2[#]}"
do
echo $j
#your rsync command here
done &
Note:
& is optional in second for loop as u can run it in foreground since there are only two processes.
Also, you may want harden the script for undesired files or if folder is empty.
Edit.
Based upon type of files u need to rsync, refer to this link for inspiration instead of using ‘ls /folder_name/‘.
https://unix.stackexchange.com/questions/9496/looping-through-files-with-spaces-in-the-names

Related

Shell - iterate over content of file but do something only the first x lines

So guys,
I need your help trying to identify the fastest and the most "fault" tolerant solution to my problem.
I have a shell script which executes some functions, based on a txt file, in which I have a list of files.
The list can contain from 1 file to X files.
What I would like to do is iterate over the content of the file and execute my scripts for only 4 items out of the file.
Once the functions have been executed for these 4 files, go over to the next 4 .... and keep on doing so until all the files from the list have been "processed".
My code so far is as follows.
#!/bin/bash
number_of_files_in_folder=$(cat list.txt | wc -l)
max_number_of_files_to_process=4
Translated_files=/home/german_translated_files/
while IFS= read -r files
do
while [[ $number_of_files_in_folder -gt 0 ]]; do
i=1
while [[ $i -le $max_number_of_files_to_process ]]; do
my_first_function "$files" & # I execute my translation function for each file, as it can only perform 1 file per execution
find /home/german_translator/ -name '*.logs' -exec mv {} $Translated_files \; # As there will be several files generated, I have them copied to another folder
sed -i "/$files/d" list.txt # We remove the processed file from within our list.txt file.
my_second_function # Without parameters as it will process all the files copied at step 2.
done
# here, I want to have all the files processed and don't stop after the first iteration
done
done < list.txt
Unfortunately, as I am not quite good at shell scripting, I do not know how to structure it so that it won't waste any resources and mostly, to make sure that it "processes" everything from that file.
Do you have any advice on how to achieve what I am trying to achieve?
only 4 items out of the file. Once the functions have been executed for these 4 files, go over to the next 4
Seems to be quite easy with xargs.
your_function() {
echo "Do something with $1 $2 $3 $4"
}
export -f your_function
xargs -d '\n' -n 4 bash -c 'your_function "$#"' _ < list.txt
xargs -d '\n' for each line
-n 4 take for arguments
bash .... - run this command with 4 arguments
_ - the syntax is bash -c <script> $0 $1 $2 etc..., see man bash.
"$#" - forward arguments
export -f your_function - export your function to environment so child bash can pick it up.
I execute my translation function for each file
So you execute your translation function for each file, not for each 4 files. If the "translation function" is really for each file with no inter-file state, consider rather executing 4 processes in parallel with same code and just xargs -P 4.
If you have GNU Parallel it looks something like this:
doit() {
my_first_function "$1"
my_first_function "$2"
my_first_function "$3"
my_first_function "$4"
my_second_function "$1" "$2" "$3" "$4"
}
export -f doit
cat list.txt | parallel -n4 doit

copy and append files in ubuntu linux

I have two folders each containing 351 text files and i want to copy the corresponding text from one folder to corresponding file in another folder?
when i am using cat command i am getting an empty file as a result? what could be the problem
my code is :
#!/bin/bash
DIR1=$(ls 2/)
DIR2=$(ls 3/)
for each $i in $DIR1; do
for each $j in $DIR2; do
if [[ $i == $j ]];then
sudo cat $i $j >> $j
fi
done
done
2/ and 3/ are the folders containing the data...
DIR1 and DIR2 contain the file names in directories 2 and 3 respectively.
Apart from possible problems with spaces or special characters in file names, you would have to use 2/$i and 3/$j. $i and $j alone would reference files with the same names in the current directory (parent of 2 and 3).
It's better not to parse the output of ls.
You don't need two nested loops.
#!/bin/bash
DIR1=2
DIR2=3
for source in $DIR1/*
do
dest="$DIR2/$(basename $source)"
if [ -f "$dest" ]
then
sudo cat "$source" >> "$dest"
fi
done
see also https://mywiki.wooledge.org/BashPitfalls#for_f_in_.24.28ls_.2A.mp3.29
Depending on your needs it may be better to run the whole script with sudo instead of running sudo for every file. The version above will only execute cat "$source" as root. When running the whole script as root this includes also >> "$dest".

Bash scripting wanting to find a size of a directory and if size is greater than x then do a task

I have put the following together with a couple of other articles but it does not seem to be working. What I am trying to do eventually do is for it to check the directory size and then if the directory has new content above a certain total size it will then let me know.
#!/bin/bash
file=private/videos/tv
minimumsize=2
actualsize=$(du -m "$file" | cut -f 1)
if [ $actualsize -ge $minimumsize ]; then
echo "nothing here to see"
else
echo "time to sync"
fi
this is the output:
./sync.sh: line 5: [: too many arguments
time to sync
I am new to bash scripting so thank you in advance.
The error:
[: too many arguments
seems to indicate that either $actualsize or $minimumsize is expanding to more than one argument.
Change your script as follows:
#!/bin/bash
set -x # Add this line.
file=private/videos/tv
minimumsize=2
actualsize=$(du -m "$file" | cut -f 1)
echo "[$actualsize] [$minimumsize]" # Add this line.
if [ $actualsize -ge $minimumsize ]; then
echo "nothing here to see"
else
echo "time to sync"
fi
The set -x will echo commands before attempting to execute them, something which assists greatly with debugging.
The echo "[$actualsize] [$minimumsize]" will assist in trying to establish whether these variables are badly formatted or not, before the attempted comparison.
If you do that, you'll no doubt find that some arguments will result in a lot of output from the du -m command since it descends into subdirectories and gives you multiple lines of output.
If you want a single line of output for all the subdirectories aggregated, you have to use the -s flag as well:
actualsize=$(du -ms "$file" | cut -f 1)
If instead you don't want any of the subdirectories taken into account, you can take a slightly different approach, limiting the depth to one and tallying up all the sizes:
actualsize=$(find . -maxdepth 1 -type f -print0 | xargs -0 ls -al | awk '{s += $6} END {print int(s/1024/1024)}')

What is the error in this shell script

I never used shell script, but now I have to , here is what I'm trying to do :
#!/bin/bash
echo running the program
./first
var = ($(ls FODLDER |wc -l)) #check how many files the folder contains
echo $var
if( ["$var" -gt "2"] #check if there are more the 2file
then ./second
fi
the scriopt crashes at the if statement. how may I solve this
Many:
var = ($(ls FODLDER |wc -l))
This is wrong, you cannot have those spaces around =.
if( ["$var" -gt "2"]
Your ( is not doing anything there, so it has to be deleted. Also, you need spaces around [ and ].
All together, this would make more sense:
#!/bin/bash
echo "running the program"
./first
var=$(find FOLDER -maxdepth 1 -type f|wc -l) # better find than ls
echo "$var"
if [ "$var" -gt "2" ]; then
./second
fi
Note:
quote whenever you echo, specially when handling variables.
see another way to look for files in a given path. Parsing ls is kind of evil.
indent your code for better readibility.
Edit your script.bash file as follow:
#!/bin/env bash
dir="$1"
echo "running the program"
./first
dir_list=( $dir/* ) # list files in directory
echo ${#dir_list[#]} # count files in array
if (( ${#dir_list[#]} > 2 )); then # test how many files
./second
fi
Usage
script.bash /tmp/
Explaination
You need to learn bash to avoid dangerous actions!
pass the directory to work with as first argument in the command line (/tmp/ → `$1)
use glob to create an array (dir_list) containing all file in given directory
count items in array (${#dir_list[#]})
test the number of item using arithmetic context.

Bash Variable Maths Not Working

I have a simple bash script, which forms part of an in house web app that I've developed.
It's purpose is to automate deletion of thumbnails of images when the original image has been deleted by the user.
The script logs some basic status info to a file /var/log/images.log
#!/bin/bash
cd $thumbpath
filecount=0
# Purge extraneous thumbs
find . -type f | while read file
do
if [ ! -f "$imagepath/$file" ]
then
filecount=$[$filecount+1]
rm -f "$file"
fi
done
echo `date`: $filecount extraneous thumbs removed>>/var/log/images.log
Whilst the script correctly deletes thumbs, it doesn't correctly output the number of thumbs that are being purged, it always shows 0.
For example, having just manually created some orphaned thumbnails, and then running my script, the manually generated orphaned thumbs are deleted, but the log shows:
Thu Jun 9 23:30:12 BST 2011: 0 extraneous thumbs removed
What am I doing wrong that is stopping $filecounter from showing a number other than zero, when files are being deleted.
I've created the following bash script to test this, and this works perfectly, outputting 0 then 1:
#!/bin/bash
count=0
echo $count
count=$[$count+1]
echo $count
Edit:
Thanks for the answers, but why does the following work
$ x=3
$ x=$[$x+1]
$ echo $x
4
...and also the second example works, yet it doesn't work in the first script?
Second Edit:
This works
count=0
echo Initial Value $count
for i in `seq 1 5`
do
count=$[$count+1]
echo $count
done
echo Final Value $count
Initial Value 0
1
2
3
4
5
Final Value 5
as does replacing count=$[$count+1] with count=$((count+1)), but not in my initial script.
You're using the wrong operator. Try using $(( ... )) instead, e.g.:
$ x=4
$ y=$((x + 1))
$ echo $y
5
$
EDIT
The other problem you're bumping into is down to the pipe. Bumped into this one before (with ksh, but wouldn't suprise me to find that other shells have the same problem). The pipe is forking another bash process, so when you do the increment, filcount is getting incremented in the subshell that's been forked after the pipe. This value isn't passed back to the calling shell as the subshell has it's own independent environment (environment variables are inherited in called processes, but called process cannot modify the environment of the calling process).
As an example, this demonstrates that filecount gets incremented okay:
#!/bin/bash
filecount=0
ls /bin | while read x
do
filecount=$((filecount + 1))
echo $filecount
done
echo $filecount
...so you should see filecount increase in the loop, but the final filecount will be zero because this echo belongs to the main shell, but the forked subshell (which consists purely of the while loop).
One way you can get the value back is like this...
#!/bin/bash
filecount=0
filecount=`ls /bin | while read x
do
filecount=$((filecount + 1))
echo $filecount
done | tail -1`
echo $filecount
This will only work if you don't care about any other stdout output in the loop as this throws it all away apart from the last line we output (the final value of filecount). This works because we're using stdout and stdin to feed the data back to the parent shell.
Depending on your viewpoint this is either a nasty hack or a nifty bit of shell jiggery-pokery. I'll leave you to decide what you think it is :-)
If you remove the pipeline into the while construct, you remove bash's need to create a subshell.
Change this:
filecount=0
find . -type f | while read file; do
if [ ! -f "$imagepath/$file" ]; then
filecount=$[$filecount+1]
rm -f "$file"
fi
done
echo $filecount
to this:
filecount=0
while read file; do
if [ ! -f "$imagepath/$file" ]; then
rm -f "$file" && (( filecount++ ))
fi
done < <(find . -type f)
echo $filecount
That is harder to read because the find command is hidden at the end. Another possibility is:
files=$( find . -type f )
while ...; do
:
done <<< "$files"
Chris J is quite right that you are using the wrong operator and POSIX subshell variable scoping means you can't get a final count that way.
As a side note, when doing math operations you could also consider using the let shell bultin like this:
$ filecount=4
$ let filecount=$filecount+1
$ echo $filecount
5
Also if you want scoping to just work like you expected it to in spite of that pipeline, you could use zsh instead of bash. In this case it should be a drop in replacement and work as expected.

Resources