linux diff on folder and file structure [duplicate] - linux

I have two directories with the same list of files. I need to compare all the files present in both the directories using the diff command. Is there a simple command line option to do it, or do I have to write a shell script to get the file listing and then iterate through them?

You can use the diff command for that:
diff -bur folder1/ folder2/
This will output a recursive diff that ignore spaces, with a unified context:
b flag means ignoring whitespace
u flag means a unified context (3 lines before and after)
r flag means recursive

If you are only interested to see the files that differ, you may use:
diff -qr dir_one dir_two | sort
Option "q" will only show the files that differ but not the content that differ, and "sort" will arrange the output alphabetically.

Diff has an option -r which is meant to do just that.
diff -r dir1 dir2

diff can not only compare two files, it can, by using the -r option, walk entire directory trees, recursively checking differences between subdirectories and files that occur at comparable points in each tree.
$ man diff
...
-r --recursive
Recursively compare any subdirectories found.
...
Another nice option is the über-diff-tool diffoscope:
$ diffoscope a b
It can also emit diffs as JSON, html, markdown, ...

If you specifically don't want to compare contents of files and only check which one are not present in both of the directories, you can compare lists of files, generated by another command.
diff <(find DIR1 -printf '%P\n' | sort) <(find DIR2 -printf '%P\n' | sort) | grep '^[<>]'
-printf '%P\n' tells find to not prefix output paths with the root directory.
I've also added sort to make sure the order of files will be the same in both calls of find.
The grep at the end removes information about identical input lines.

If it's GNU diff then you should just be able to point it at the two directories and use the -r option.
Otherwise, try using
for i in $(\ls -d ./dir1/*); do diff ${i} dir2; done
N.B. As pointed out by Dennis in the comments section, you don't actually need to do the command substitution on the ls. I've been doing this for so long that I'm pretty much doing this on autopilot and substituting the command I need to get my list of files for comparison.
Also I forgot to add that I do '\ls' to temporarily disable my alias of ls to GNU ls so that I lose the colour formatting info from the listing returned by GNU ls.

When working with git/svn or multiple git/svn instances on disk this has been one of the most useful things for me over the past 5-10 years, that somebody might find useful:
diff -burN /path/to/directory1 /path/to/directory2 | grep +++
or:
git diff /path/to/directory1 | grep +++
It gives you a snapshot of the different files that were touched without having to "less" or "more" the output. Then you just diff on the individual files.

In practice the question often arises together with some constraints. In that case following solution template may come in handy.
cd dir1
find . \( -name '*.txt' -o -iname '*.md' \) | xargs -i diff -u '{}' 'dir2/{}'

Here is a script to show differences between files in two folders. It works recursively. Change dir1 and dir2.
(search() { for i in $1/*; do [ -f "$i" ] && (diff "$1/${i##*/}" "$2/${i##*/}" || echo "files: $1/${i##*/} $2/${i##*/}"); [ -d "$i" ] && search "$1/${i##*/}" "$2/${i##*/}"; done }; search "dir1" "dir2" )

Try this:
diff -rq /path/to/folder1 /path/to/folder2

Related

Unix - Only list directories which contain a subdirectory

How can I print in the Unix shell the number of directories in a tree which contain other directories?
I haven't found a solution yet with commands like find or ls.
You can use find command: find . -type d -not -empty
That will print every subdirectory that is not empty. You can control how deep you want the search with -maxdepth.
To print the number, you can use wc -l.
find . -type d -not -empty | wc -l
If you generate a list of all the directories under a particular directory, and then remove the last component from the name, you have a list of the directories containing subdirectories, but there are likely to be repeats in that list. So, you need to post-process the list, yielding (as a first approximation):
find ${base:-.} -type d |
sed 's%/[^/]*$%%' |
sort -u
Find all the directories under the directory or directories listed in variable $base, defaulting to the current directory, and print their names. The code assumes you don't have directories with a newline in the name. If you do, there are fixes, but the best fix is to rename the directory. The sed command removes the last slash and everything after it. The sort eliminates duplicate entries. What's left is the list of directories containing subdirectories.
Well, more or less. There's the degenerate case to consider: the top-level directories in the list will be listed regardless of whether they have sub-directories or not. Fixing that is a bit harder. You need to eliminate any lines of output that exactly match the directories specified to find before removing trailing material. So, you need something like:
{
printf '\\#^%s$#d\n' ${base:-.}
echo 's%/[^/]*$%%'
} > sed.script
find ${base:-.} -type d |
sed -f sed.script |
sort -u
rm -f sed.script
The \\#^%s$#d assumes you don't use # in directory names. If you do use it, then you need to find a character you don't use in names (maybe Control-A) and use that in place of the #. If you could face absolutely any character, then you'll need to do more work escaping some obscure character, such as Control-A, when it appears in a directory name.
There's a problem still: using a fixed name like sed.script for a temporary file name is bad (for multiple reasons — such as two people trying to run the script at the same time in the same directory, though it can also be a security risk), so use mktemp to create a temporary file name:
tmp=$(mktemp ${TMPDIR:-/tmp}/dircnt.XXXXXX)
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15
{
printf '\\#^%s$#d\n' ${base:-.}
echo 's%/[^/]*$%%'
} > $tmp
find ${base:-.} -type d |
sed -f $tmp |
sort -u
rm -f $tmp
trap 0
This deals with the most common signals (HUP, INT, QUIT, PIPE, TERM) and removes the temporary file even if one of those arrives.
Clearly, if you want to simply count the number of directories, you can pipe the output from the commands above through wc -l to get the count.
ls -1d */*/. | cut -d / -f1 | uniq

How to compare contents of two directoriers in bash?

Lets say there are two dirs
/path1 and /path2
for example
/path1/bin
/path1/lib
/path1/...
/path2/bin
/path2/lib
/path2/...
And one needs to know if they are identical by contents (names of files and content of files) and if not have differences listed.
How to do this in Linux?
Is there some Bash/Zsh command for it?
The diff command can show all the differences between two directories:
diff -qr /path1 /path2
Someone suggested this already but deleted their answer, not sure why. Try using rsync:
rsync -avni /path1/ /path2
This program will normally sync two folders, but with -n it will do a dry-run instead.
I'm using this script for such a task:
diff <(cd "$dir1"; find . -type f -printf "%p %s\n" | sort) \
<(cd "$dir2"; find . -type f -printf "%p %s\n" | sort)
Feel free to adjust the script in the <(...) part to your specific needs. This version uses find to print the directory contents by printing the paths and the sizes of the files it found therein. Other things are possible of course.

Unix: traverse a directory

I need to traverse a directory so starting in one directory and going deeper into difference sub directories. However I also need to be able to have access to each individual file to modify the file. Is there already a command to do this or will I have to write a script? Could someone provide some code to help me with this task? Thanks.
The find command is just the tool for that. Its -exec flag or -print0 in combination with xargs -0 allows fine-grained control over what to do with each file.
Example: Replace all foo's by bar's in all files in /tmp and subdirectories.
find /tmp -type f -exec sed -i -e 's/foo/bar/' '{}' ';'
for i in `find` ; do
if [ -d $i ] ; then do something with a directory ; fi
if [ -f $i ] ; then do something with a file etc. ; fi
done
This will return the whole tree (recursively) in the current directory in a list that the loop will go through.
This can be easily achieved by mixing find, xargs, sed (or other file modification command).
For example:
$ find /path/to/base/dir -type f -name '*.properties' | xargs sed -ie '/^#/d'
This will filter all files with file extension .properties.
The xargs command will feed the file path generated by find command into the sed command.
The sed command will delete all lines start with # in the files (feed by xargs).
Command combination in this way is very flexible.
For example, find command have different parameters so you can filter by user name, file size, file path (eg: under /test/ subfolder), file modification time.
Another dimension of flexibility is how and what to change in your file. For ex, sed command allows you to make changes on file in applying substitution (specify via regular expressions). Similarly, you can use gzip to compress the file. And so on ...
You would usually use the find command. On Linux, you have the GNU version, of course. It has many extra (and useful) options. Both will allow you to execute a command (eg a shell script) on the files as they are found.
The exact details of how to make changes to the file depend on the change you want to make to the file. That is probably best scripted, with find running the script:
POSIX or GNU:
find . -type f -exec your_script '{}' +
This will run your script once for a group of files with those names provided as arguments. If you want to do it one file at a time, replace the + with ';' (or \;).
I am assuming SearchMe is the example directory name you need to traverse completely.
I am also assuming, since it was not specified, the files you want to modify are all text file. Is this correct?
In such scenario I would suggest using the command:
find SearchMe -type f -exec vi {} \;
If you are not familiar with vi editor, just use another one (nano, emacs, kate, kwrite, gedit, etc.) and it should work as well.
Bash 4+
shopt -s globstar
for file in **
do
if [ -f "$file" ];then
# do some processing to your file here
# where the find command can't do conveniently
fi
done

Searching for information in files in several directories

I need to check several files which are in different locations for a specific information.
So, how to make a script which checks for the argument word through several directories?
The directories are in different locations. For ex.
/home/check1/
/opt/log/
/var/status/
You could also do (next to ´find´)
do a
for DIR in /home/check1 /opt/log /var/status ; do
grep -R searchword $DIR;
done
At the very simplest, it boils down to
find . -name '*.c' | xargs grep word
to find a given word in all the .c files in the current directory and below.
grep -R may also work for you, but it can be a problem if you don't want to search all files.
Use the grep -R (recursive) option and give grep multiple directory arguments.
Try find http://content.hccfl.edu/pollock/Unix/FindCmd.htm using your searchwords and the directories.
The man page of grep should explain what you need. Anyway, if you need to search recursively you can use:
grep -R --include=PATTERN "string_to_search" $directory
You can also use:
--exclude=PATTERN to skip some file
--exclude-dir=PATTERN to skip some directories
The other option is use find to get the files and pipe it to grep to search the strings.

Find the number of files in a directory

Is there any method in Linux to calculate the number of files in a directory (that is, immediate children) in O(1) (independently of the number of files) without having to list the directory first? If not O(1), is there a reasonably efficient way?
I'm searching for an alternative to ls | wc -l.
readdir is not as expensive as you may think. The knack is avoid stat'ing each file, and (optionally) sorting the output of ls.
/bin/ls -1U | wc -l
avoids aliases in your shell, doesn't sort the output, and lists 1 file-per-line (not strictly necessary when piping the output into wc).
The original question can be rephrased as "does the data structure of a directory store a count of the number of entries?", to which the answer is no. There isn't a more efficient way of counting files than readdir(2)/getdents(2).
One can get the number of subdirectories of a given directory without traversing the whole list by stat'ing (stat(1) or stat(2)) the given directory and observing the number of links to that directory. A given directory with N child directories will have a link count of N+2, one link for the ".." entry of each subdirectory, plus two for the "." and ".." entries of the given directory.
However one cannot get the number of all files (whether regular files or subdirectories) without traversing the whole list -- that is correct.
The "/bin/ls -1U" command will not get all entries however. It will get only those directory entries that do not start with the dot (.) character. For example, it would not count the ".profile" file found in many login $HOME directories.
One can use either the "/bin/ls -f" command or the "/bin/ls -Ua" command to avoid the sort and get all entries.
Perhaps unfortunately for your purposes, either the "/bin/ls -f" command or the "/bin/ls -Ua" command will also count the "." and ".." entries that are in each directory. You will have to subtract 2 from the count to avoid counting these two entries, such as in the following:
expr `/bin/ls -f | wc -l` - 2 # Those are back ticks, not single quotes.
The --format=single-column (-1) option is not necessary on the "/bin/ls -Ua" command when piping the "ls" output, as in to "wc" in this case. The "ls" command will automatically write its output in a single column if the output is not a terminal.
The -U option for ls is not in POSIX, and in OS X's ls it has a different meaning from GNU ls, which is that it makes -t and -l use creation times instead of modification times. -f is in POSIX as an XSI extension. The manual of GNU ls describes -f as do not sort, enable -aU, disable -ls --color and -U as do not sort; list entries in directory order.
POSIX describes -f like this:
Force each argument to be interpreted as a directory and list the name found in each slot. This option shall turn off -l, -t, -s, and -r, and shall turn on -a; the order is the order in which entries appear in the directory.
Commands like ls|wc -l give the wrong result when filenames contain newlines.
In zsh you can do something like this:
a=(*(DN));echo ${#a}
D (glob_dots) includes files whose name starts with a period and N (null_glob) causes the command to not result in an error in an empty directory.
Or the same in bash:
shopt -s dotglob nullglob;a=(*);echo ${#a[#]}
If IFS contains ASCII digits, add double quotes around ${#a[#]}. Add shopt -u failglob to ensure that failglob is unset.
A portable option is to use find:
find . ! -name . -prune|grep -c /
grep -c / can be replaced with wc -l if filenames do not contain newlines. ! -name . -prune is a portable alternative to -mindepth 1 -maxdepth 1.
Or here's another alternative that does not usually include files whose name starts with a period:
set -- *;[ -e "$1" ]&&echo "$#"
The command above does however include files whose name starts with a period when an option like dotglob in bash or glob_dots in zsh is set. When * matches no file, the command results in an error in zsh with the default settings.
I used this command..works like a charm..only to change the maxdepth..that is sub directories
find * -maxdepth 0 -type d -exec sh -c "echo -n {} ' ' ; ls -lR {} | wc -l" \;
I think you can have more control on this using find:
find <path> -maxdepth 1 -type f -printf "." | wc -c
find -maxdepth 1 will not go deeper into the hierarchy of files.
-type f allows filtering to just files. Similarly, you can use -type d for directories.
-printf "." prints a dot for every match.
wc -c counts the characters, so it counts the dots created by the print... which means counting how many files exist in the given path.
For the number of all file in a current directory try this:
ls -lR * | wc -l
As far as I know, there is no better alternative. This information might be off-topic to this question and you may already know this that under Linux (in general under Unix) directories are just special file which contains the list of other files (I understand that the exact details will be dependent on specific file system but this is the general idea). And there is no call to find the total number of entries without traversing the whole list. Please make me correct if I'm wrong.
use ls -1 | wc -l

Resources