Compare directory structures

Compare directory structures - linux

Is it possible to compare directory structures of two different server? I need to compare the directory structure of a test with that of a production server and list the directories that exists on prod but no in test (the test server has lot less info).
I am using following rsync command
rsync -rvnc --delete userid#servername:/directory /directory
Besides above rsync, i have also tried running find commands on both server, sdiff the two output of find
find directory1 -type d -printf "%P\n" | sort > file1
find directory2 -type d -printf "%P\n" | sort > file2
sdiff file 1 file2 > file3
Please help which approach would be better.

you can use rsync -ai dir1/ dir2/ --dry-run to create a machine-readable list of changes between dir1 and dir2.
source: https://stackoverflow.com/a/42160545/2536029

Related

How to find which files / folders are on both computers?

I have a folder called documentaries on my Linux computer.
I have SSH access to seedbox (also Linux).
How do I find out which documentaries I have in both computers?
On seedbox it's a flat file structure. Some documentaries are files, some are folders which contain many files, but all in same folder
For example:
data/lions_botswana.mp4
data/lions serengeti/S01E01.mkv
data/lions serengeti/S01E02.mkv
data/strosek_on_capitalism.mp4
data/something_random.mp4
Locally structure is more organized
documentaries/animals/lions_botswana.mp4
documentaries/animals/lions serengeti/S01E01.mkv
documentaries/animals/lions serengeti/S01E02.mkv
documentaries/economy/strosek_on_capitalism.mp4
documentaries/something_random.mp4
I am not looking for command like diff, I am looking for command like same (opposite of diff) if such command exists.

Based on the answer from Zumo de Vidrio, and my comment:
on one computer
cd directory1/; find | sort > filelist1
on the other
cd directory2/; find | sort > filelist2
copy them in one place an run:
comm -12 filelist1 filelist2
or as a one liner:
ssh user#host 'cd remotedir/; find|sort' | comm -12 - <(cd localdir/; find|sort)
Edit: With multiple folders this would look as follows
on one computer
cd remotedir/; find | sort > remotelist
on the other
cd localdir/subdir1/; find > locallist1
cd -;
cd localdir/subdir2/; find > locallist2
cd -;
#... and so on
sort locallist1 locallist2 > locallistall
copy them in one place an run:
comm -12 remotelist locallistall
or as a (now very long) one liner:
ssh user#host 'cd remotedir/; find|sort' | comm -12 - <({cd localdir/subdir1/; find; cd -; cd localdir/subdir2/; find; cd -; cd localdir/subdir3/; find}|sort)

Export list of remote files to local file by:
ssh user#seedbox 'find /path/to/data -type f -execdir echo {} ";"' > remote.txt
Note: On Linux you've to use absolute path to avoid leading ./ or use with "$PWD"/data.
Then grep the result of find command:
find documentaries/ -type f | grep -wFf remote.txt
This will display only these local files which also exist on remote.
If you would like to generate similar list on local and compare two files, try:
find "$PWD"/documentaries/ -type f -execdir echo {} ';' > local.txt
grep -wFf remote.txt local.txt
However above methods aren't reliable, since one file could have a different size. If files would have the same structure, you could use rsync to keep your files up-to-date.
For more reliable solution, you can use fdupes which can find all files which exist in both directories by comparing file sizes and MD5 signatures.
Sample syntax:
fdupes -r documentaries/ data/
However both directories needs to be accessible locally, so you can always use sshfs tool to mount the remote directory locally. Then you can use fdupes to find all duplicate files. It has also option to remove the other duplicates (-d).

Copy the ls output of each Computer to a same folder and then apply diff over them:
In your computer:
ls -R documentaries/ > documentaries_computer.txt
In seedbox:
ls -R documentaries/ > documentaries_seedbox.txt
Copy both files to a same location and execute:
diff documentaries_computer.txt documentaries_seedbox.txt

You can mount remote folder using sshfs, then you can use diff -r to find the differences between them.
E.g.
sshfs user#seedbox-host:/path/to/documentaries documentaries/
diff -rs /local/path/documentaries/animals documentaries/ | grep identical
diff -rs /local/path/documentaries/economy documentaries/ | grep identical

How to recursively remove different files in two directories

I have 2 different recursive directories, in one directory have 200 .txt files in another have 210 .txt files, need a script to find the different file names and remove them from the directory.

There are probably better ways, but I think about:
find directory1 directory2 -name \*.txt -printf '%f\n' |
sort | uniq -u |
xargs -I{} find directory1 directory2 -name {} -delete
find directory1 directory2 -name \*.txt -printf '%f\n':
print basename of each file matching the glob *.txt
sort | uniq -u:
only print unique lines (if you wanted to delete duplicate, it would have been uniq -d)
xargs -I{} find directory1 directory2 -name {} -delete:
remove them (re-specify the path to narrow the search and avoid deleting files outside the initial search path)
Notes
Thank's to #KlausPrinoth for all the suggestions.
Obviously I'm assuming a GNU userland, I suppose people running with the tools providing bare minimum POSIX compatibility will be able to adapt it.

Yet another way is to use diff which is more than capable in finding file differences in files in directories. For instance if you have d1 and d2 that contain your 200 and 210 files respectively (with the first 200 files being the same), you could use diff and process substitution to provide the names to remove to a while loop:
( while read -r line; do printf "rm %s\n" ${line##*: }; done < <(diff -q d1 d2) )
Output (of d1 with 10 files, d2 with 12 files)
rm file11.txt
rm file12.txt
diff will not fit all circumstances, but is does a great job finding directory differences and is quite flexible.

Create a bash script to delete folders which do not contain a certain filetype

I have recently run into a problem.
I used a utility to move all my music files into directories based on tags. This left a LOT of almost empty folders. The folders, in general, contain a thumbs.db file or some sort of image for album art. The mp3s have the correct album art in their new directories, so the old ones are okay to delete.
Basically, I need to find any directories within D:/Music/ that:
-Do not have any subdirectories
-Do not contain any mp3 files
And then delete them.
I figured this would be easier to do in a shell script or bash script or whatever else linux/unix world than in Windows 8.1 (HAHA).
Any suggestions? I'm not very experienced writing scripts like this.

This should get you started
find /music -mindepth 1 -type d |
while read dt
do
find "$dt" -mindepth 1 -type d | read && continue
find "$dt" -iname '*.mp3' -type f | read && continue
echo DELETE $dt
done

Here's the short story...
find . -name '*.mp3' -o -type d -printf '%h\n' | sort | uniq > non-empty-dirs.tmp
find . -type d -print | sort | uniq > all-dirs.tmp
comm -23 all-dirs.tmp non-empty-dirs.tmp > dirs-to-be-deleted.tmp
less dirs-to-be-deleted.tmp
cat dirs-to-be-deleted.tmp | xargs rm -rf
Note that you might have to run all the commands a few times (depending on your repository's directory depth) before you're done deleting all recursive empty directories...
And the long story goes...
You can approach this problem from two basic perspective: either you find all directories, then iterate over each of them, check if it contain any mp3 file or any subdirectory, if not, mark that directory for deletion. It will works, but on large very large repositories, you might expect a significant run time.
Another approach, which is in my sense much more interesting, is to build a list of directories NOT to be deleted, and subtract that list from the list of all directories. Let's work the second strategy, one step at a time...
First of all, to find the path of all directories that contains mp3 files, you can simply do:
find . -name '*.mp3' -printf '%h\n' | sort | uniq
This means "find any file ending with .mp3, then print the path to it's parent directory".
Now, I could certainly name at least ten different approaches to find directories that contains at least one subdirectory, but keeping the same strategy as above, we can easily get...
find . -type d -printf '%h\n' | sort | uniq
What this means is: "Find any directory, then print the path to it's parent."
Both of these queries can be combined in a single invocation, producing a single list containing the paths of all directories NOT to be deleted.. Let's redirect that list to a temporary file.
find . -name '*.mp3' -o -type d -printf '%h\n' | sort | uniq > non-empty-dirs.tmp
Let's similarly produce a file containing the paths of all directories, no matter if they are empty or not.
find . -type d -print | sort | uniq > all-dirs.tmp
So there, we have, on one side, the complete list of all directories, and on the other, the list of directories not to be deleted. What now? There are tons of strategies, but here's a very simple one:
comm -23 all-dirs.tmp non-empty-dirs.tmp > dirs-to-be-deleted.tmp
Once you have that, well, review it, and if you are satisfied, then pipe it through xargs to rm to actually delete the directories.
cat dirs-to-be-deleted.tmp | xargs rm -rf

How to compare contents of two directoriers in bash?

Lets say there are two dirs
/path1 and /path2
for example
/path1/bin
/path1/lib
/path1/...
/path2/bin
/path2/lib
/path2/...
And one needs to know if they are identical by contents (names of files and content of files) and if not have differences listed.
How to do this in Linux?
Is there some Bash/Zsh command for it?

The diff command can show all the differences between two directories:
diff -qr /path1 /path2

Someone suggested this already but deleted their answer, not sure why. Try using rsync:
rsync -avni /path1/ /path2
This program will normally sync two folders, but with -n it will do a dry-run instead.

I'm using this script for such a task:
diff <(cd "$dir1"; find . -type f -printf "%p %s\n" | sort) \
<(cd "$dir2"; find . -type f -printf "%p %s\n" | sort)
Feel free to adjust the script in the <(...) part to your specific needs. This version uses find to print the directory contents by printing the paths and the sizes of the files it found therein. Other things are possible of course.

How to copy a file to multiple directories using the gnu cp command

Is it possible to copy a single file to multiple directories using the cp command ?
I tried the following , which did not work:
cp file1 /foo/ /bar/
cp file1 {/foo/,/bar}
I know it's possible using a for loop, or find. But is it possible using the gnu cp command?

You can't do this with cp alone but you can combine cp with xargs:
echo dir1 dir2 dir3 | xargs -n 1 cp file1
Will copy file1 to dir1, dir2, and dir3. xargs will call cp 3 times to do this, see the man page for xargs for details.

No, cp can copy multiple sources but will only copy to a single destination. You need to arrange to invoke cp multiple times - once per destination - for what you want to do; using, as you say, a loop or some other tool.

Wildcards also work with Roberts code
echo ./fs*/* | xargs -n 1 cp test

I would use cat and tee based on the answers I saw at https://superuser.com/questions/32630/parallel-file-copy-from-single-source-to-multiple-targets instead of cp.
For example:
cat inputfile | tee outfile1 outfile2 > /dev/null

As far as I can see it you can use the following:
ls | xargs -n 1 cp -i file.dat
The -i option of cp command means that you will be asked whether to overwrite a file in the current directory with the file.dat. Though it is not a completely automatic solution it worked out for me.

These answers all seem more complicated than the obvious:
for i in /foo /bar; do cp "$file1" "$i"; done

ls -db di*/subdir | xargs -n 1 cp File
-b in case there is a space in directory name otherwise it will be broken as a different item by xargs, had this problem with the echo version

Not using cp per se, but...
This came up for me in the context of copying lots of Gopro footage off of a (slow) SD card to three (slow) USB drives. I wanted to read the data only once, because it took forever. And I wanted it recursive.
$ tar cf - src | tee >( cd dest1 ; tar xf - ) >( cd dest2 ; tar xf - ) | ( cd dest3 ; tar xf - )
(And you can add more of those >() sections if you want more outputs.)
I haven't benchmarked that, but it's definitely a lot faster than cp-in-a-loop (or a bunch of parallel cp invocations).

If you want to do it without a forked command:
tee <inputfile file2 file3 file4 ... >/dev/null

To use copying with xargs to directories using wildcards on Mac OS, the only solution that worked for me with spaces in the directory name is:
find ./fs*/* -type d -print0 | xargs -0 -n 1 cp test
Where test is the file to copy
And ./fs*/* the directories to copy to
The problem is that xargs sees spaces as a new argument, the solutions to change the delimiter character using -d or -E is unfortunately not properly working on Mac OS.

Essentially equivalent to the xargs answer, but in case you want parallel execution:
parallel -q cp file1 ::: /foo/ /bar/
So, for example, to copy file1 into all subdirectories of current folder (including recursion):
parallel -q cp file1 ::: `find -mindepth 1 -type d`
N.B.: This probably only conveys any noticeable speed gains for very specific use cases, e.g. if each target directory is a distinct disk.
It is also functionally similar to the '-P' argument for xargs.

No - you cannot.
I've found on multiple occasions that I could use this functionality so I've made my own tool to do this for me.
http://github.com/ddavison/branch
pretty simple -
branch myfile dir1 dir2 dir3

ls -d */ | xargs -iA cp file.txt A

Suppose you want to copy fileName.txt to all sub-directories within present working directory.
Get all sub-directories names through ls and save them to some temporary file say, allFolders.txt
ls > allFolders.txt
Print the list and pass it to command xargs.
cat allFolders.txt | xargs -n 1 cp fileName.txt

Another way is to use cat and tee as follows:
cat <source file> | tee <destination file 1> | tee <destination file 2> [...] > <last destination file>
I think this would be pretty inefficient though, since the job would be split among several processes (one per destination) and the hard drive would be writing several files at once over different parts of the platter. However if you wanted to write a file out to several different drives, this method would probably be pretty efficient (as all copies could happen concurrently).

Using a bash script
DESTINATIONPATH[0]="xxx/yyy"
DESTINATIONPATH[1]="aaa/bbb"
..
DESTINATIONPATH[5]="MainLine/USER"
NumberOfDestinations=6
for (( i=0; i<NumberOfDestinations; i++))
do
cp SourcePath/fileName.ext ${DESTINATIONPATH[$i]}
done
exit

if you want to copy multiple folders to multiple folders one can do something like this:
echo dir1 dir2 dir3 | xargs -n 1 cp -r /path/toyourdir/{subdir1,subdir2,subdir3}

If all your target directories match a path expression — like they're all subdirectories of path/to — then just use find in combination with cp like this:
find ./path/to/* -type d -exec cp [file name] {} \;
That's it.

If you need to be specific on into which folders to copy the file you can combine find with one or more greps. For example to replace any occurences of favicon.ico in any subfolder you can use:
find . | grep favicon\.ico | xargs -n 1 cp -f /root/favicon.ico

This will copy to the immediate sub-directories, if you want to go deeper, adjust the -maxdepth parameter.
find . -mindepth 1 -maxdepth 1 -type d| xargs -n 1 cp -i index.html
If you don't want to copy to all directories, hopefully you can filter the directories you are not interested in. Example copying to all folders starting with a
find . -mindepth 1 -maxdepth 1 -type d| grep \/a |xargs -n 1 cp -i index.html
If copying to a arbitrary/disjoint set of directories you'll need Robert Gamble's suggestion.

I like to copy a file into multiple directories as such:
cp file1 /foo/; cp file1 /bar/; cp file1 /foo2/; cp file1 /bar2/
And copying a directory into other directories:
cp -r dir1/ /foo/; cp -r dir1/ /bar/; cp -r dir1/ /foo2/; cp -r dir1/ /bar2/
I know it's like issuing several commands, but it works well for me when I want to type 1 line and walk away for a while.

For example if you are in the parent directory of you destination folders you can do:
for i in $(ls); do cp sourcefile $i; done

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string