how do I check that two folders are the same in linux - linux

I have moved a web site from one server to another and I copied the files using SCP
I now wish to check that all the files have been copied OK.
How do I compare the sites?
Count files for a folder?
Get the total files size for folder tree?
or is there a better way to compare the sites?
Paul

Using diff with the recursive -r and quick -q option. It is the best and by far the fastest way to do this.
diff -r -q /path/to/dir1 /path/to/dir2
It won't tell you what the differences are (remove the -q option to see that), but it will very quickly tell you if all the files are the same.
If it shows no output, all the files are the same, otherwise it will list the files that are different.

If you were using scp, you could probably have used rsync.
rsync won't transfer files that are already up to date, so you can use it to verify a copy is current by simply running rsync again.
If you were doing something like this on the old host:
scp -r from/my/dir newhost:/to/new/dir
Then you could do something like
rsync -a --progress from/my/dir newhost:/to/new/dir
The '-a' is short for 'archive' which does a recursive copy and preserves permissions, ownerships etc. Check the man page for more info, as it can do a lot of clever things.

cd website
find . -type f -print | sort | xargs sha1sum
will produce a list of checksums for the files. You can then diff those to see if there are any missing/added/different files.

maybe you can use something similar to this:
find <original root dir> | xargs md5sum > original
find <new root dir> | xargs md5sum > new
diff original new

To add on reply from Sidney.
It is not very necessary to filter out -type f, and produce hash code.
In reply to zidarsk8, you don't need to sort, since find, same as ls, sorts the filenames alphabetically by default. It works for empty directories as well.
To summarize, top 3 best answers would be:
(P.S. Nice to do a dry run with rsync)
diff -r -q /path/to/dir1 /path/to/dir2
diff <(cd dir1 && find) <(cd dir2 && find)
rsync --dry-run -avh from/my/dir newhost:/to/new/dir

Make checksums for all files, for example using md5sum. If they're all the same for all the files and no file is missing, everything's OK.

If you used scp, you probably can also use rsync over ssh.
rsync -avH --delete-after 1.example.com:/path/to/your/dir 2.example.com:/path/to/your/
rsync does the checksums for you.
Be sure to use the -n option to perform a dry-run. Check the manual page.
I prefer rsync over scp or even local cp, every time I can use it.
If rsync is not an option, md5sum can generate md5 digests and md5sumc --check will check them.

Try diffing your directory recursively. You'll get a nice summary if something is different in one of the directories.

I have been move a web site from one server to another I copied the files using SCP
You could do this with rsync, it is great if you just want to mirror something.
/Johan
Update : Seems like #rjack beat me with the rsync answer with 6 seconds :-)

I would add this to Douglas Leeder or Eineki, but sadly, don't have enough reputation to comment. Anyway, their answers are both great, excepting that they don't work for file names with spaces. To make that work, do
find [dir1] -type f -print0 | xargs -0 [preferred hash function] > [file1]
find [dir2] -type f -print0 | xargs -0 [preferred hash function] > [file2]
diff -y [file1] [file2]
Just from experimenting, I also like to use the -W ### arguement on diff and output it to a file, easier to parse and understand in the terminal.

...when comparing two folders across a network drive or on separate computers
If comparing two folders on the same computer, diff is fine, as explained by the main answer.
However, if trying to compare two folders on different computers, or across a network, don't do that! If across a network, it will take forever since it has to actually transmit every byte of every file in the folder across the network. So, if you are comparing a 3 GB dir, all 3 GB have to be transferred across the network just to see if the remote dir and local dir are the same.
Instead, use a SHA256 hash. Hash the dir on one computer on that computer, and on the other computer on that computer. Here is how:
(From my answer here: How to hash all files in an entire directory, including the filenames as well as their contents):
# 1. First, cd to the dir in which the dir of interest is found. This is
# important! If you don't do this, then the paths output by find will differ
# between the two computers since the absolute paths to `mydir` differ. We are
# going to hash the paths too, not just the file contents, so this matters.
cd /home/gabriel # example on computer 1
cd /home/gabriel/dev/repos # example on computer 2
# 2. hash all files inside `mydir`, then hash the list of all hashes and their
# respective file paths. This obtains one single final hash. Sorting is
# necessary by piping to `sort` to ensure we get a consistent file order in
# order to ensure a consistent final hash result. Piping to awk extracts
# just the hash.
find mydir -type f -exec sha256sum {} + | sort | sha256sum | awk '{print $1}'
Example run and doutput:
$ find eclipse-workspace -type f -exec sha256sum {} + | sort | sha256sum | awk '{print $1}'
8f493478e7bb77f1d025cba31068c1f1c8e1eab436f8a3cf79d6e60abe2cd2e4
Do this on each computer, then ensure the hashes are the same to know if the directories are the same.
Note that the above commands ignore empty directories, file permissions, timestamps of when files were last edited, etc. For most cases though that's ok.
You can also use rsync to basically do this same thing for you, even when copying or comparing across a network.

Related

linux diff on folder and file structure [duplicate]

I have two directories with the same list of files. I need to compare all the files present in both the directories using the diff command. Is there a simple command line option to do it, or do I have to write a shell script to get the file listing and then iterate through them?
You can use the diff command for that:
diff -bur folder1/ folder2/
This will output a recursive diff that ignore spaces, with a unified context:
b flag means ignoring whitespace
u flag means a unified context (3 lines before and after)
r flag means recursive
If you are only interested to see the files that differ, you may use:
diff -qr dir_one dir_two | sort
Option "q" will only show the files that differ but not the content that differ, and "sort" will arrange the output alphabetically.
Diff has an option -r which is meant to do just that.
diff -r dir1 dir2
diff can not only compare two files, it can, by using the -r option, walk entire directory trees, recursively checking differences between subdirectories and files that occur at comparable points in each tree.
$ man diff
...
-r --recursive
Recursively compare any subdirectories found.
...
Another nice option is the über-diff-tool diffoscope:
$ diffoscope a b
It can also emit diffs as JSON, html, markdown, ...
If you specifically don't want to compare contents of files and only check which one are not present in both of the directories, you can compare lists of files, generated by another command.
diff <(find DIR1 -printf '%P\n' | sort) <(find DIR2 -printf '%P\n' | sort) | grep '^[<>]'
-printf '%P\n' tells find to not prefix output paths with the root directory.
I've also added sort to make sure the order of files will be the same in both calls of find.
The grep at the end removes information about identical input lines.
If it's GNU diff then you should just be able to point it at the two directories and use the -r option.
Otherwise, try using
for i in $(\ls -d ./dir1/*); do diff ${i} dir2; done
N.B. As pointed out by Dennis in the comments section, you don't actually need to do the command substitution on the ls. I've been doing this for so long that I'm pretty much doing this on autopilot and substituting the command I need to get my list of files for comparison.
Also I forgot to add that I do '\ls' to temporarily disable my alias of ls to GNU ls so that I lose the colour formatting info from the listing returned by GNU ls.
When working with git/svn or multiple git/svn instances on disk this has been one of the most useful things for me over the past 5-10 years, that somebody might find useful:
diff -burN /path/to/directory1 /path/to/directory2 | grep +++
or:
git diff /path/to/directory1 | grep +++
It gives you a snapshot of the different files that were touched without having to "less" or "more" the output. Then you just diff on the individual files.
In practice the question often arises together with some constraints. In that case following solution template may come in handy.
cd dir1
find . \( -name '*.txt' -o -iname '*.md' \) | xargs -i diff -u '{}' 'dir2/{}'
Here is a script to show differences between files in two folders. It works recursively. Change dir1 and dir2.
(search() { for i in $1/*; do [ -f "$i" ] && (diff "$1/${i##*/}" "$2/${i##*/}" || echo "files: $1/${i##*/} $2/${i##*/}"); [ -d "$i" ] && search "$1/${i##*/}" "$2/${i##*/}"; done }; search "dir1" "dir2" )
Try this:
diff -rq /path/to/folder1 /path/to/folder2

How do you move files from one folder to the other?

I am trying to move specific files from one folder to another. Would the below work?
mkdir test
touch test1.sh
touch test2.sh
touch test3.sh
mkdir test2
find test/ | xargs -I% mv % test2
I think this can work:
find ./ -name "test*.sh" | xargs -I% mv % test2
There is somethin odd in your example:
If test1 does not contain any subdirectories, or if you want to move the subdirectories as they are, you could simply do a
mv test1/* test2
(Note that this would (by default) not move entries which start with a period. If this is a problem, you either should consider not using Posix shell but, say, bash or Zsh, or indeed could use find, for the safe side with the -prune option) .
The problem starts with subdirectories. The output of find contains all directories along with the files at the end. The mv inside the xargs would then move, say, a directory test1/foo, and if it later wants to process a file test1/foo/bar/baz.txt, the file is not here anymore. The overall effect would be that you would have moved all the subdirectory (as in my first solution which does not need find), but get in addition plenty of error messages.

Zipping and deleting files with certain age

i'm trying to elaborate a command that will find files that haven't been modified in over 6 months and zip them with one command. Afterwards i want to delete all those files and i just archived.
My current command to find the directories with the files is
find /var/www -type d -mtime -400 ! -mtime -180 | xargs ls -l > testd.txt
This gave me all the directories including the files that are older than 6 months
Now i was wondering if there was a way of zipping all the results and deleting them afterwards. Something amongst the line of
find /var/www -type f -mtime -400 ! -mtime -180 | gzip -c archive.gz
If anyone knows the proper syntax to achieve this i'd love to know. Thakns!
Edit, after a few tests this command results in a corrupted file
find /var/www -mtime -900 ! -mtime -180 | xargs tar -cf test4.tar
Any ideas?
Break this into several distinct steps that you can implement and thoroughly test separately:
Build a list of files to be archived and then deleted, saved to a temp file
Use the list from step 1 to add the files to .tar.gz archives. Give the archive file a name following a specific pattern that won't appear in the files to be archived, and put it in a directory outside the hierarchy of files being archived.
Read back the files from the .tar.gz and compare them (or their hashes) to the original files to ENSURE that you got them all without corruption
Use the list from step 1 to delete the files. Do not use a wildcard for deletion. Put in some guard code to prevent deletion of any file matching the name pattern of the archive .tar.gz file(s) created in step 2.
When testing a script that can do irreversible damage, always code the dangerous command with a leading echo and leave it that way until you are sure everything works. Only then remove the echo.
Consider zip, it should meet your requirements.
find ... | zip -m# archive.zip
-m (move) deletes the input directories/files after making the specified zip archive.
-# takes the list of input files from standard input.
You may find more options which are useful to you in the zip manual, e. g.
-r (recurse) travels the directory structure recursively.
-sf (show-files) shows the files that would be operated on, then exits.
-t or --from-date operates on files not modified prior to the specified date.
-tt or --before-date operates on files not modified after or at the specified date.
This could possibly make findexpendable.
zip -mr --from-date 2012-09-05 --before-date 2013-04-13 archive /var/www

Script to zip complete file structure depending on file age

Alright so i have a web server running CentOS at work that is hosting a few websites internally only. It's our developpement server and thus has lots [read tons] of old junk websites and whatnot.
I was trying to elaborate a command that would find files that haven't been modified for over 6 months, group them all in a tarball and then delete them. Thus far i have tried many different type of find commands with arguments and whatnot. Our structure looks like such
/var/www/joomla/username/fileshere/temp
/var/www/username/fileshere
So i tried something amongst the lines of :
find /var/www -mtime -900 ! -mtime -180 | xargs tar -cf test4.tar
Only to have a 10MB resulting tar, when the expected result would be over 50 GB's.
I tried using gzip instead, but i ended up zipping MY WHOLE SERVER thus making is unusable, had to transfer the whole filesystem and reinstall a complete new server and lots of shit and trouble and... you get the idea. So i want to find the perfect command that won't blow up our server but will find all FILES and DIRECTORIES that haven't been modified for over 6 months.
Be careful with ctime.
ctime is related to changes made to inodes (changing permissions, owner, etc)
atime when a file was last accessed (check if your file system is using noatime or relatime options, in that case the atime option may not work in the expected way)
mtime when data in a file was last modified.
Depending on what are you trying to do, the mtime option could be your best option.
Besides, you should check the print0 option. From man find:
-print0
True; print the full file name on the standard output, followed by a null character (instead of the newline character that -print uses). This allows file names that contain newlines or
other types of white space to be correctly interpreted by programs that process the find output. This option corresponds to the -0 option of xargs.
I do not know what are you trying to do but this command could be useful for you:
find /var/www -mtime +180 -print0 | xargs -0 tar -czf example.tar.gz
Try this:
find /var/www -ctime +180 | xargs tar cf test.tar
The ctime parameter tells you the difference between current time and each files modification times, and if you use the + instead of minus it will give you the "files modified in a date older than x days".
Then just pass it to tar with xargs and you should be set.

How to build one file contains other files, selected by mask?

I need to put the contents of all *.as files in some specified folder into one big file.
How can I do it in Linux shell?
You mean cat *.as > onebigfile?
If you need all files in all subdirectories, th most robust way to do this is:
rm onebigfile
find -name '*.as' -print0 | xargs -0 cat >> onebigfile
This:
deletes onebigfile
for each file found, appends it onto onebigfile (this is why we delete it in the previous step -- otherwise you could end up tacking onto some existing file.)
A less robust but simpler solution:
cat `find -name '*.as'` > onebigfile
(The latter version doesn't handle very large numbers of files or files with weird filenames so well.)
Not sure what you mean by compile but are you looking for tar?

Resources