BASH : merge two directories and delete duplicated data - linux

i want to compare the content of two folders and delete duplicated data, actually i wrote a script (BASH) but i think it's not the right way to do it (i use loops to iterate over directories content and a lot of diff commands , that make it too much time consuming).
I'll explain the context :
I have two directories :
1-
dir1/
Student1/
homework1
homework2
Student2/
homework1
homework2
2-
dir2/
Student1/
homework1
homework2
Student3/
homework1
homework2
suppose that student1/homework1 folder contains the same data in dir1 and dir2, unlike homework2 which contains different data
the output directory should contains :
Student1
homework1 //same name , same content ==> keep one homework
homework2
homework2_dir2 //same name different content ==> _dir2
Student2
homework1
homework2
Student3
homework1
homework2
What do you think the optimal way in term of time and reliability (filenames problem, etc..) to do such kind of operation ?
Thank you ;)
PS: dir* and Student* and homework* are directories
PS2: PLEASE i am not looking to this model of answer :
loop over student
loop over student homeworks
test on homework existance
diff on homework content
if diff copy
end
end
if i have alot of student and alot of homeworks with only one difference (only one homework that differ), the script take alot of time with the above solution

Assuming that dir1 and dir2 are relative paths with no directories (i.e. no slashes in dir1 or dir2):
dir1=dir1
dir2=dir2
cd $dir1
BASEDIR=$(pwd)
for studentdir in *
cd $BASEDIR/$studentdir
do
for homeworkdir in *
cd $BASEDIR/$studentdir/$homeworkdir
do
for workfile in *
do
if cmp $workfile ${CMPDIR}/${studentdir}/${homeworkdir}/${workfile} 2>&1 >/dev/null
then
altdir=../${studentdir}_${dir2}
mkdir ../${altdir}
ln ${CMPDIR}/${studentdir}/${homeworkdir}/${workfile} ${altdir}
fi
done
done
done
I haven't tried this - there may be some typos.
In dir1, recurse into each student folder, and in each student folder into each homework directory.
In each homework directory, use cmp on each file to check whether it is byte identical with the matching file in the dir2 subtree.
If different, create an alternate homework directory in the student directory, and link (ln) the different file in to the alternate directory.
cmp is faster than diff; ln is faster than cp.
That's all, folks.

I'm not sure it's faster than your solution, as you didn't post it.
#!/bin/bash
mkdir output
cp -r dir1/* output
cd dir2
for student in Student* ; do
(
cd $student
out_path=../../output/$student
[[ -d $out_path ]] || mkdir $out_path
for file in * ; do
if [[ -f $out_path/$file ]] ; then
diff -q $file $out_path/$file \
|| cp $file $out_path/$file'_dir2'
else
cp $file $out_path/$student
fi
done
)
done

As far as I understand, you need to merge all files in two different directories into a new directory and you don't want duplicate files or folders.
Let's say you want to merge them into 'merged' directory.
You can do this:
rsync -hrv /dir1 /merged/
rsync -hrv /dir2 /merged/
All files in the /dir1 folder will be copied into /merged folder, then the same process will work for /dir2 folder.

Related

for each pair of files with the same prefix, execute code

I have a large list of directories, each of which contains a varied number of "paired" files. By paired, I mean the prefix is the same for two files, and the pairs are denoted as "a" and "b". The prefix does not follow a defined pattern either. My broader intentions are to write a bash script that will list all subdirectories in a given directory, cd into each directory, find the pairs of files, and execute a function on the pairs. Here is an example directory:
Dir1
123_a.txt
234_a.txt
123_b.txt
234_b.txt
Dir2
345_a.txt
345_b.txt
Dir3
456_a.txt
567_a.txt
678_a.txt
456_b.txt
567_b.txt
678_b.txt
I can use this code to loop thought each directory:
for d in ./*/ ; do (cd "$d" && script.sh); done
In script.sh, I have been working on writing a script that will find all pairs of files (which is the problem I am struggling to figure out), and then call the function I want to apply to those files. This is the gist of what I have been trying:
for file in ./*_a.txt; do (find the paired file with *_b.txt && run_function.sh); done
Ive broken the problem into needing to get the value of "*" for the _a.txt files, and then searching the directory using this value for the matching _b.txt suffix,and making a subdirectory that I can put them into so I can then apply run_function.sh. So Dir1, would contain subdirectories 123 and 234.
Let me know if this doesn't make sense. The part of the problem I'm struggling with is matching files without a defined prefix.
Thanks for your help.
Use parameter expansion:
#!/bin/bash
file=123_a.txt
prefix=${file%_a.txt} # remove _a.txt from the right
second=${prefix}_b.txt
if [[ -f $second ]] ; then
run_function "$file" "$second"
fi

bash script to create folders and move files

I have many files created from a simulation.
Like this: res_00001.root through res_09999.root.
I would like to create a series of folders that move in batches of 1000 files in sequence to a newly created folder based on the filename we are moving. e.g. folder1 would contain res_00001.root through res_00999.root, folder2 res_01000.root through res_01999.root, ...
I attempted to create a script but it's not working:
#!/bin/bash
N_files=$1
for (( file=0; file<$N_files; ++file )) do #state what file I am looking at
s=file%1000 INPUT=printf data/output_%04lu.root $file` OUTPUT=printf data/folder%02lu/res_%04lu.root $s # move the files
mv INPUT OUTPUT
done`
I've been banging my head against this for sometime, I appreciate any help you can provide.
Updated Answer
You can run this little script if you can't find the rename program - make backup first!
#!/bin/bash
shopt -s nullglob nocaseglob
for f in *.root; do
n=$(tr -dc '[0-9]' <<< $f)
((d=(10#$n/1000)+1))
[ ! -d folder$d ] && mkdir folder$d
echo mv "$f" folder$d/$f
done
Original Answer
Make a backup and see if this helps you on a copy of a small subset of your files:
rename --dry-run 's/[^0-9]//g; my $d=int($_/1000)+1; $_="folder$d/res_$_.root"' *root
Sample Output
'res_00001.root' would be renamed to 'folder1/res_00001.root'
'res_00002.root' would be renamed to 'folder1/res_00002.root'
'res_00003.root' would be renamed to 'folder1/res_00003.root'
'res_00004.root' would be renamed to 'folder1/res_00004.root'
'res_00005.root' would be renamed to 'folder1/res_00005.root'
...
...
'res_00997.root' would be renamed to 'folder1/res_00997.root'
'res_00998.root' would be renamed to 'folder1/res_00998.root'
'res_00999.root' would be renamed to 'folder1/res_00999.root'
'res_01000.root' would be renamed to 'folder2/res_01000.root'
'res_01001.root' would be renamed to 'folder2/res_01001.root'
'res_01002.root' would be renamed to 'folder2/res_01002.root'
'res_01003.root' would be renamed to 'folder2/res_01003.root'
...
...
If it looks good, remove the --dry-run so it actually does stuff rather than just saying what stuff it would do!
s/[^0-9]//g gets rid of anything non-numeric in the filename
my $d=int($_/1000)+1 calculates the directory name
$_="folder$d/res_$_.root" builds the output filename

Copy numbered files to corresponding numbered directory using Linux bash commands or script

This should be a relatively straightforward problem but I haven't found any answers within stackoverflow. In a given directory, I have ~1000 files that are numbered (e.g. chem-0320.inp). I would like to cp the numbered file to a correspondingly numbered directory; all copied files will be renamed with the same name. I would like to do this for a specified numbered of files (#'s 300-500 for example).
For example, I would like to copy chem-0320.inp to a directory named 320 and rename it mech.dat.
Another example: copy chem-0430.inp to a directory named 430 and rename it mech.dat.
Thanks in advance for your help!
The following script would do the work for you
for file in *.inp
do
dir=$(echo $file | sed -r 's/[^0-9]+0([0-9]+).*/\1/g')
mv $file $dir/mech.dat
done
"cd" first to right dir. Subdirs will be created there.
#!/bin/bash
lo_limit=300
hi_limit=500
for file in ./*.inp
do
dir="${file//[^0-9]/}"
dir_cut="${dir:1:3}" # leading zero cut off
if [ $dir_cut -ge $lo_limit ] && [ $dir_cut -le $hi_limit ]; then
echo "$file $dir_cut"
mkdir -p "$dir_cut"
cp "$file" "$dir_cut"/mech.dat
fi
done

Bash Script to Copy Folders by first character

I'm a newbie to ubuntu/Linux, and just got my first bash script to execute.
I'm trying to copy and organize my music collection from driveA to driveB.
driveA has all my artists folders (e.g Adele, Brian, Bob Marley, Cassie) the path to this /media/myMusic
in driveB i have created folders A, B, C and the path to those is /media/orderedMusic
All artist folders whose first character is A or B or C in driveA will be copied to respective folders in driveB i.e. Adele would be copied to /media/orderedMusic/A,
Brian and Bob Marley would be copied to /media/orderedMusic/B and so on.
here is what i have so far, help would be highly appreciated. Thanks
#!/bin/bash
folder1=/media/myMusic
folder2=/media/orderedMusic
for dir in $folder1
do
if []
then
cp
fi
done
This should do the trick:
#!/usr/bin/env bash
folder1=/media/myMusic
folder2=/media/orderedMusic
cd "$folder1" && {
for artist in *; do
dest=$folder2/${artist:0:1}
mkdir -p "$dest"
cp -rp "$artist" "$dest"
done
}
Note that if you are on a case-sensitive filesystem and have artist names that aren't capitalized, you will get separate folders in the destination tree for the two cases.. e.g. an "A" folder and an "a" folder.
You could use substring extraction: ${string:start_index:length}:
#!/bin/bash
folder1=/media/myMusic
folder2=/media/orderedMusic
for dir in "$folder1/*"
do
initial=${dir:0:1}
src="$folder1/$dir"
dest="$folder2/$initial"
# test if the destination directory exists
if [ ! -d "$dest" ]
then
mkdir $dest
fi
cp -r $src $dest
done
Also you could use string index as you need only the first one character in a string.
For more details, see http://tldp.org/LDP/abs/html/string-manipulation.html

rsync selected sub folders

I want to transfer selective sub folders from a range of parent folders:
/home/user/sample_rsync/
FolderA/sub1
FolderA/sub2
FolderA/sub3
FolderB/sub1
FolderB/sub2
FolderB/sub3
FolderC/sub1
FolderC/sub2
FolderC/sub3
Say from the above example I want to copy just sub1 from each directory. i.e. in my destination I want the following folders to be created (along with the files they contain)
/destination/
sample_rsync/FolderA/sub1
sample_rsync/FolderB/sub1
sample_rsync/FolderC/sub1
How do I go about doing this?
I tried out
rsync -avh -f"- *" -f"+ *sub1/*" /home/user/sample_rsync /destination/
In an attempt to exclude everything and then just include sub1's - didnt work.
Any way I can get this working?
Assuming your source folders are in a file called "sources" as typed in your first code sement (without trailing / characters)
for s in $(cat sources)
do
rsync -av ${s} /destination/sample_rsync/$(echo ${s}| awk -F "/" '{print $1}')
done
of course this is only valid if you have a certain level deep directories in your sources file. If the depth level of the directories to be copied changes, this script will need to be heavily modified. But at least it is a starting point I hope.
upon your question below, you might want to use something like this: (ignore the code segment above. I just left it there for history purposes)
cd /home/user/sample_rsync
for dir in $(find ./ -type d -name sub1)
do
dest=$(echo ${dir} | sed -e "1,1s+/sub1++")
mkdir /destination/sample_rsync/${dest}
rsync -av ${dir} /destination/sample_rsync/${dest}
done
please do not take it as the word of gospel. I have not tested the code whatsoever. So. it might yield some unexpected results. Please test it on a system that you wouldn't mind having problems if it gets haywire.

Resources