How to find which files / folders are on both computers? - linux

I have a folder called documentaries on my Linux computer.
I have SSH access to seedbox (also Linux).
How do I find out which documentaries I have in both computers?
On seedbox it's a flat file structure. Some documentaries are files, some are folders which contain many files, but all in same folder
For example:
data/lions_botswana.mp4
data/lions serengeti/S01E01.mkv
data/lions serengeti/S01E02.mkv
data/strosek_on_capitalism.mp4
data/something_random.mp4
Locally structure is more organized
documentaries/animals/lions_botswana.mp4
documentaries/animals/lions serengeti/S01E01.mkv
documentaries/animals/lions serengeti/S01E02.mkv
documentaries/economy/strosek_on_capitalism.mp4
documentaries/something_random.mp4
I am not looking for command like diff, I am looking for command like same (opposite of diff) if such command exists.

Based on the answer from Zumo de Vidrio, and my comment:
on one computer
cd directory1/; find | sort > filelist1
on the other
cd directory2/; find | sort > filelist2
copy them in one place an run:
comm -12 filelist1 filelist2
or as a one liner:
ssh user#host 'cd remotedir/; find|sort' | comm -12 - <(cd localdir/; find|sort)
Edit: With multiple folders this would look as follows
on one computer
cd remotedir/; find | sort > remotelist
on the other
cd localdir/subdir1/; find > locallist1
cd -;
cd localdir/subdir2/; find > locallist2
cd -;
#... and so on
sort locallist1 locallist2 > locallistall
copy them in one place an run:
comm -12 remotelist locallistall
or as a (now very long) one liner:
ssh user#host 'cd remotedir/; find|sort' | comm -12 - <({cd localdir/subdir1/; find; cd -; cd localdir/subdir2/; find; cd -; cd localdir/subdir3/; find}|sort)

Export list of remote files to local file by:
ssh user#seedbox 'find /path/to/data -type f -execdir echo {} ";"' > remote.txt
Note: On Linux you've to use absolute path to avoid leading ./ or use with "$PWD"/data.
Then grep the result of find command:
find documentaries/ -type f | grep -wFf remote.txt
This will display only these local files which also exist on remote.
If you would like to generate similar list on local and compare two files, try:
find "$PWD"/documentaries/ -type f -execdir echo {} ';' > local.txt
grep -wFf remote.txt local.txt
However above methods aren't reliable, since one file could have a different size. If files would have the same structure, you could use rsync to keep your files up-to-date.
For more reliable solution, you can use fdupes which can find all files which exist in both directories by comparing file sizes and MD5 signatures.
Sample syntax:
fdupes -r documentaries/ data/
However both directories needs to be accessible locally, so you can always use sshfs tool to mount the remote directory locally. Then you can use fdupes to find all duplicate files. It has also option to remove the other duplicates (-d).

Copy the ls output of each Computer to a same folder and then apply diff over them:
In your computer:
ls -R documentaries/ > documentaries_computer.txt
In seedbox:
ls -R documentaries/ > documentaries_seedbox.txt
Copy both files to a same location and execute:
diff documentaries_computer.txt documentaries_seedbox.txt

You can mount remote folder using sshfs, then you can use diff -r to find the differences between them.
E.g.
sshfs user#seedbox-host:/path/to/documentaries documentaries/
diff -rs /local/path/documentaries/animals documentaries/ | grep identical
diff -rs /local/path/documentaries/economy documentaries/ | grep identical

Related

Is this possible in this command to cd into the directory thats printed in output

When I do ls | grep -e *-folder1 it prints my-folder1 that's the name of the folder matched in the command in current directory.
Is there a way I can add something like cd into this directory. This is more of an attempt to learn Bash or commands on Linux, rather than about doing what I am trying to accomplish.
You could do
ls | grep -- -folder1 | while read -r dir
do
cd "$dir"
# do things in $dir
done
# do things in the original directory
but parsing the output of ls is not recommended. You could instead use globbing:
for dir in *-efolder*
do
cd "$dir"
# do things in $dir
cd .. # need to back out again
done
# do things in the original directory
If the purpose isn't to grep on all folders matching a certain pattern and to cd down into each one of them, but to simply cd into a directory ending with -folder1, then:
cd *-folder1
If you get zero or multiple hits, cd will shown an error.

how to delete first 50 directories within a directory linux bash

I am looking to run a script which moves 50 directories to a new directory, once it has carried out that action it then deletes those 50 from the original directory
I have the below so far in my bash script:
cd /folder1/subfolder1/directories
mv `ls | head -50` ../subfolder2/
cd /folder1/subfolder1/directories
dirs=( ./*/ ) # an array of directories
mv -t ../subfolder2/ "${dirs[#]:0:50}" # first 50 array elements
For GNU coreutils mv, -t is the "target" (aka, destination) directory. This can be very handy if there are hundreds/thousands of files to move (more than can fit in one command):
some-process-that-produces-filenames-on-stdout | xargs mv -t dest_dir/

Compare directory structures

Is it possible to compare directory structures of two different server? I need to compare the directory structure of a test with that of a production server and list the directories that exists on prod but no in test (the test server has lot less info).
I am using following rsync command
rsync -rvnc --delete userid#servername:/directory /directory
Besides above rsync, i have also tried running find commands on both server, sdiff the two output of find
find directory1 -type d -printf "%P\n" | sort > file1
find directory2 -type d -printf "%P\n" | sort > file2
sdiff file 1 file2 > file3
Please help which approach would be better.
you can use rsync -ai dir1/ dir2/ --dry-run to create a machine-readable list of changes between dir1 and dir2.
source: https://stackoverflow.com/a/42160545/2536029

How can I generate a folder with the last X files added?

So I have a huge folder full subfolders with tons of files, and I add files to it all the time.
I need a subfolder in the root of that folder with a symlink of the last 10-20 files added so that I can quickly find the things I recently added. This is located on a NAS, but I have a linux box running Arch connected through NFS, so I assume the best way is to run a bash script with a find command followed by a loop of ln -sf, but I can't do it safely without help.
Something like this is required:
mkdir -p subfolder
find /dir/ -type f -printf '%T# %p\n' | sort -n | tail -n 10 | cut -d' ' -f2- | while IFS= read -r file ; do ln -s "$file" subfolder ; done
Which will create symlinks in subfolder pointing to the 10 most recently modified files in the directory tree rooted at /dir/
You could just create a shell function like:
recent() { ls -lt ${1+"$#"} | head -n 20; }
which will give you a listing of the 20 most recent items in the specified directories, or the current directory if no arguments are given.

How to copy a file to multiple directories using the gnu cp command

Is it possible to copy a single file to multiple directories using the cp command ?
I tried the following , which did not work:
cp file1 /foo/ /bar/
cp file1 {/foo/,/bar}
I know it's possible using a for loop, or find. But is it possible using the gnu cp command?
You can't do this with cp alone but you can combine cp with xargs:
echo dir1 dir2 dir3 | xargs -n 1 cp file1
Will copy file1 to dir1, dir2, and dir3. xargs will call cp 3 times to do this, see the man page for xargs for details.
No, cp can copy multiple sources but will only copy to a single destination. You need to arrange to invoke cp multiple times - once per destination - for what you want to do; using, as you say, a loop or some other tool.
Wildcards also work with Roberts code
echo ./fs*/* | xargs -n 1 cp test
I would use cat and tee based on the answers I saw at https://superuser.com/questions/32630/parallel-file-copy-from-single-source-to-multiple-targets instead of cp.
For example:
cat inputfile | tee outfile1 outfile2 > /dev/null
As far as I can see it you can use the following:
ls | xargs -n 1 cp -i file.dat
The -i option of cp command means that you will be asked whether to overwrite a file in the current directory with the file.dat. Though it is not a completely automatic solution it worked out for me.
These answers all seem more complicated than the obvious:
for i in /foo /bar; do cp "$file1" "$i"; done
ls -db di*/subdir | xargs -n 1 cp File
-b in case there is a space in directory name otherwise it will be broken as a different item by xargs, had this problem with the echo version
Not using cp per se, but...
This came up for me in the context of copying lots of Gopro footage off of a (slow) SD card to three (slow) USB drives. I wanted to read the data only once, because it took forever. And I wanted it recursive.
$ tar cf - src | tee >( cd dest1 ; tar xf - ) >( cd dest2 ; tar xf - ) | ( cd dest3 ; tar xf - )
(And you can add more of those >() sections if you want more outputs.)
I haven't benchmarked that, but it's definitely a lot faster than cp-in-a-loop (or a bunch of parallel cp invocations).
If you want to do it without a forked command:
tee <inputfile file2 file3 file4 ... >/dev/null
To use copying with xargs to directories using wildcards on Mac OS, the only solution that worked for me with spaces in the directory name is:
find ./fs*/* -type d -print0 | xargs -0 -n 1 cp test
Where test is the file to copy
And ./fs*/* the directories to copy to
The problem is that xargs sees spaces as a new argument, the solutions to change the delimiter character using -d or -E is unfortunately not properly working on Mac OS.
Essentially equivalent to the xargs answer, but in case you want parallel execution:
parallel -q cp file1 ::: /foo/ /bar/
So, for example, to copy file1 into all subdirectories of current folder (including recursion):
parallel -q cp file1 ::: `find -mindepth 1 -type d`
N.B.: This probably only conveys any noticeable speed gains for very specific use cases, e.g. if each target directory is a distinct disk.
It is also functionally similar to the '-P' argument for xargs.
No - you cannot.
I've found on multiple occasions that I could use this functionality so I've made my own tool to do this for me.
http://github.com/ddavison/branch
pretty simple -
branch myfile dir1 dir2 dir3
ls -d */ | xargs -iA cp file.txt A
Suppose you want to copy fileName.txt to all sub-directories within present working directory.
Get all sub-directories names through ls and save them to some temporary file say, allFolders.txt
ls > allFolders.txt
Print the list and pass it to command xargs.
cat allFolders.txt | xargs -n 1 cp fileName.txt
Another way is to use cat and tee as follows:
cat <source file> | tee <destination file 1> | tee <destination file 2> [...] > <last destination file>
I think this would be pretty inefficient though, since the job would be split among several processes (one per destination) and the hard drive would be writing several files at once over different parts of the platter. However if you wanted to write a file out to several different drives, this method would probably be pretty efficient (as all copies could happen concurrently).
Using a bash script
DESTINATIONPATH[0]="xxx/yyy"
DESTINATIONPATH[1]="aaa/bbb"
..
DESTINATIONPATH[5]="MainLine/USER"
NumberOfDestinations=6
for (( i=0; i<NumberOfDestinations; i++))
do
cp SourcePath/fileName.ext ${DESTINATIONPATH[$i]}
done
exit
if you want to copy multiple folders to multiple folders one can do something like this:
echo dir1 dir2 dir3 | xargs -n 1 cp -r /path/toyourdir/{subdir1,subdir2,subdir3}
If all your target directories match a path expression — like they're all subdirectories of path/to — then just use find in combination with cp like this:
find ./path/to/* -type d -exec cp [file name] {} \;
That's it.
If you need to be specific on into which folders to copy the file you can combine find with one or more greps. For example to replace any occurences of favicon.ico in any subfolder you can use:
find . | grep favicon\.ico | xargs -n 1 cp -f /root/favicon.ico
This will copy to the immediate sub-directories, if you want to go deeper, adjust the -maxdepth parameter.
find . -mindepth 1 -maxdepth 1 -type d| xargs -n 1 cp -i index.html
If you don't want to copy to all directories, hopefully you can filter the directories you are not interested in. Example copying to all folders starting with a
find . -mindepth 1 -maxdepth 1 -type d| grep \/a |xargs -n 1 cp -i index.html
If copying to a arbitrary/disjoint set of directories you'll need Robert Gamble's suggestion.
I like to copy a file into multiple directories as such:
cp file1 /foo/; cp file1 /bar/; cp file1 /foo2/; cp file1 /bar2/
And copying a directory into other directories:
cp -r dir1/ /foo/; cp -r dir1/ /bar/; cp -r dir1/ /foo2/; cp -r dir1/ /bar2/
I know it's like issuing several commands, but it works well for me when I want to type 1 line and walk away for a while.
For example if you are in the parent directory of you destination folders you can do:
for i in $(ls); do cp sourcefile $i; done

Resources