displaying disk space usage of the current directory excluding size of subdirectories - linux

I want to write a command to display the disk space usage the current directory excluding the size of subdirectories. The following image describes the files and directories of the current directory:
du ./ --exclude='./file*'
output will be :
4 ./dir1
4 .
I am getting first output but not second.

du -Sd 1
Output will be:
4 ./dir1
4 .

Suppose the current directory is /tmp/foo, which has no files, except for a single directory /tmp/foo/bar, into which is put a copy of bash (1113504 bytes). Running the tree util:
tree --du "$(pwd)"
...reports:
/tmp/foo
└── [ 1117600] bar
└── [ 1113504] bash
1121696 bytes used in 1 directory, 1 file
To get the size in bytes of /tmp/foo, (but not /tmp/foo/bar), this works:
du -bSd 1 "$(pwd)" | grep -w "$(pwd)$"
Output:
4096 /tmp/foo
The same line of code can be reused, just cd to any directory:
cd foo/bar/
du -bSd 1 "$(pwd)" | grep -w "$(pwd)$"
Output:
1117600 /tmp/foo/bar

The command will be :-
$du -S
And the output will be
(https://i.stack.imgur.com/fxqkC.jpg)

Try This.
du -S ./ --exclude='./file*'
OUTPUT
4 ./dir1
4 ./

Related

How to use xargs curl to read lines of a text file in subdirectory and keep downloaded files in subdirectory?

I have several subdirectories within a parent folder, each with a URLs.txt file within the subdirectory.
$ ls
H3K4me1_assay H3K4me2_assay H3K4me3_assay ... +30 or so more *_assay files
Each assay_file contains one URLs.txt file:
$ cat URLs.txt
https://www.encodeproject.org/files/ENCFF052HMX/##download/ENCFF052HMX.bed.gz
https://www.encodeproject.org/files/ENCFF052HMX/##download/ENCFF052HMX.bed.gz
https://www.encodeproject.org/files/ENCFF466DMK/##download/ENCFF466DMK.bed.gz
... +200 or more URLs
Is there any way I can execute a command from the parent folder that reads and curls the URLs.txt file in each subdirectory, and then downloads the file within each subdirectory?
I can cd into each file and run the following commands to download all of the files:
$ cd ~/largescale/H3K4me3_assay
$ ls URL* | xargs -L 1 -d '\n' zcat | xargs -L 1 curl -O -J -L
But I will have to run this command for experiments with +300 folders, so cd'ing in each time isn't really practical.
I have tried to run this, it does download the correct files but within the parent folder rather than the subdirectories. Any idea what I am doing wrong?
$ for i in ./*_assay; do cd ~/largescale/"$i" | ls URL* | xargs -L 1 -d '\n' zcat | xargs -L 1 curl -O -J -L; done
Thanks, Steven

Pretty recursive directory and file print

I am building a JAVA app that saves the output of bash commands into a list.
#root ~/a $ zipinfo -1 data.zip
a.txt
b.txt
test/
test/c.txt
root# ~/a $ find .
.
./test
./test/c.txt
./b.txt
./data.zip
./a.txt
The idea is to compare files and directories from the zip to what is on the disk and remove any differences from the disk. In this example test/c.txt should only be removed.
As you can see the format is different. Which command do I need to have the same style as zipinfo -1?
I tried commands like:
ls -R
ls -LR | grep ""
find . -exec ls -dl \{\} \; | awk '{print $9}'
One way to remove the . prefix is to use sed. For example:
find . ! -name . | sed -e 's|^\./||'
The ! -name . removes the . entry from the list. The sed part removes the ./ from the beginning of each line.
It should give you pretty much the same format as your zipinfo, though perhaps not the same order.
Note: the above suggestion is compatible with both Linux and other unix versions such as MacOS X. On Linux specifically you can achieve the same result with:
find . ! -name . -printf "%P\n"
The -printf predicate is specific to the Linux findutils, and the %P format prints each found file, removing the search directory from its beginning.
The easiest thing might be to ask zip itself for the list. Its -sf option means to show the files it would add if it were going to, without actually creating a zip file. So:
$ zip add fake.zip -r -sf *
zip warning: name not matched: fake.zip
Would Add/Update:
a.txt
b.txt
test/
test/c.txt
Total 4 entries (0 bytes)
You Java code would then have to skip the extraneous header/footer lines that it also outputs, but the file list itself would be in the same format that you're getting from your other execution of zip.

Recursively counting files in a Linux directory

How can I recursively count files in a Linux directory?
I found this:
find DIR_NAME -type f ¦ wc -l
But when I run this it returns the following error.
find: paths must precede expression: ¦
This should work:
find DIR_NAME -type f | wc -l
Explanation:
-type f to include only files.
| (and not ¦) redirects find command's standard output to wc command's standard input.
wc (short for word count) counts newlines, words and bytes on its input (docs).
-l to count just newlines.
Notes:
Replace DIR_NAME with . to execute the command in the current folder.
You can also remove the -type f to include directories (and symlinks) in the count.
It's possible this command will overcount if filenames can contain newline characters.
Explanation of why your example does not work:
In the command you showed, you do not use the "Pipe" (|) to kind-of connect two commands, but the broken bar (¦) which the shell does not recognize as a command or something similar. That's why you get that error message.
For the current directory:
find -type f | wc -l
If you want a breakdown of how many files are in each dir under your current dir:
for i in */ .*/ ; do
echo -n $i": " ;
(find "$i" -type f | wc -l) ;
done
That can go all on one line, of course. The parenthesis clarify whose output wc -l is supposed to be watching (find $i -type f in this case).
On my computer, rsync is a little bit faster than find | wc -l in the accepted answer:
$ rsync --stats --dry-run -ax /path/to/dir /tmp
Number of files: 173076
Number of files transferred: 150481
Total file size: 8414946241 bytes
Total transferred file size: 8414932602 bytes
The second line has the number of files, 150,481 in the above example. As a bonus you get the total size as well (in bytes).
Remarks:
the first line is a count of files, directories, symlinks, etc all together, that's why it is bigger than the second line.
the --dry-run (or -n for short) option is important to not actually transfer the files!
I used the -x option to "don't cross filesystem boundaries", which means if you execute it for / and you have external hard disks attached, it will only count the files on the root partition.
You can use
$ tree
after installing the tree package with
$ sudo apt-get install tree
(on a Debian / Mint / Ubuntu Linux machine).
The command shows not only the count of the files, but also the count of the directories, separately. The option -L can be used to specify the maximum display level (which, by default, is the maximum depth of the directory tree).
Hidden files can be included too by supplying the -a option .
Since filenames in UNIX may contain newlines (yes, newlines), wc -l might count too many files. I would print a dot for every file and then count the dots:
find DIR_NAME -type f -printf "." | wc -c
Note: The -printf option does only work with find from GNU findutils. You may need to install it, on a Mac for example.
Combining several of the answers here together, the most useful solution seems to be:
find . -maxdepth 1 -type d -print0 |
xargs -0 -I {} sh -c 'echo -e $(find "{}" -printf "\n" | wc -l) "{}"' |
sort -n
It can handle odd things like file names that include spaces parenthesis and even new lines. It also sorts the output by the number of files.
You can increase the number after -maxdepth to get sub directories counted too. Keep in mind that this can potentially take a long time, particularly if you have a highly nested directory structure in combination with a high -maxdepth number.
If you want to know how many files and sub-directories exist from the present working directory you can use this one-liner
find . -maxdepth 1 -type d -print0 | xargs -0 -I {} sh -c 'echo -e $(find {} | wc -l) {}' | sort -n
This will work in GNU flavour, and just omit the -e from the echo command for BSD linux (e.g. OSX).
You can use the command ncdu. It will recursively count how many files a Linux directory contains. Here is an example of output:
It has a progress bar, which is convenient if you have many files:
To install it on Ubuntu:
sudo apt-get install -y ncdu
Benchmark: I used https://archive.org/details/cv_corpus_v1.tar (380390 files, 11 GB) as the folder where one has to count the number of files.
find . -type f | wc -l: around 1m20s to complete
ncdu: around 1m20s to complete
If what you need is to count a specific file type recursively, you can do:
find YOUR_PATH -name '*.html' -type f | wc -l
-l is just to display the number of lines in the output.
If you need to exclude certain folders, use -not -path
find . -not -path './node_modules/*' -name '*.js' -type f | wc -l
tree $DIR_PATH | tail -1
Sample Output:
5309 directories, 2122 files
If you want to avoid error cases, don't allow wc -l to see files with newlines (which it will count as 2+ files)
e.g. Consider a case where we have a single file with a single EOL character in it
> mkdir emptydir && cd emptydir
> touch $'file with EOL(\n) character in it'
> find -type f
./file with EOL(?) character in it
> find -type f | wc -l
2
Since at least gnu wc does not appear to have an option to read/count a null terminated list (except from a file), the easiest solution would just be to not pass it filenames, but a static output each time a file is found, e.g. in the same directory as above
> find -type f -exec printf '\n' \; | wc -l
1
Or if your find supports it
> find -type f -printf '\n' | wc -l
1
To determine how many files there are in the current directory, put in ls -1 | wc -l. This uses wc to do a count of the number of lines (-l) in the output of ls -1. It doesn't count dotfiles. Please note that ls -l (that's an "L" rather than a "1" as in the previous examples) which I used in previous versions of this HOWTO will actually give you a file count one greater than the actual count. Thanks to Kam Nejad for this point.
If you want to count only files and NOT include symbolic links (just an example of what else you could do), you could use ls -l | grep -v ^l | wc -l (that's an "L" not a "1" this time, we want a "long" listing here). grep checks for any line beginning with "l" (indicating a link), and discards that line (-v).
Relative speed: "ls -1 /usr/bin/ | wc -l" takes about 1.03 seconds on an unloaded 486SX25 (/usr/bin/ on this machine has 355 files). "ls -l /usr/bin/ | grep -v ^l | wc -l" takes about 1.19 seconds.
Source: http://www.tldp.org/HOWTO/Bash-Prompt-HOWTO/x700.html
With bash:
Create an array of entries with ( ) and get the count with #.
FILES=(./*); echo ${#FILES[#]}
Ok that doesn't recursively count files but I wanted to show the simple option first. A common use case might be for creating rollover backups of a file. This will create logfile.1, logfile.2, logfile.3 etc.
CNT=(./logfile*); mv logfile logfile.${#CNT[#]}
Recursive count with bash 4+ globstar enabled (as mentioned by #tripleee)
FILES=(**/*); echo ${#FILES[#]}
To get the count of files recursively we can still use find in the same way.
FILES=(`find . -type f`); echo ${#FILES[#]}
For directories with spaces in the name ... (based on various answers above) -- recursively print directory name with number of files within:
find . -mindepth 1 -type d -print0 | while IFS= read -r -d '' i ; do echo -n $i": " ; ls -p "$i" | grep -v / | wc -l ; done
Example (formatted for readability):
pwd
/mnt/Vancouver/Programming/scripts/claws/corpus
ls -l
total 8
drwxr-xr-x 2 victoria victoria 4096 Mar 28 15:02 'Catabolism - Autophagy; Phagosomes; Mitophagy'
drwxr-xr-x 3 victoria victoria 4096 Mar 29 16:04 'Catabolism - Lysosomes'
ls 'Catabolism - Autophagy; Phagosomes; Mitophagy'/ | wc -l
138
## 2 dir (one with 28 files; other with 1 file):
ls 'Catabolism - Lysosomes'/ | wc -l
29
The directory structure is better visualized using tree:
tree -L 3 -F .
.
├── Catabolism - Autophagy; Phagosomes; Mitophagy/
│   ├── 1
│   ├── 10
│   ├── [ ... SNIP! (138 files, total) ... ]
│   ├── 98
│   └── 99
└── Catabolism - Lysosomes/
├── 1
├── 10
├── [ ... SNIP! (28 files, total) ... ]
├── 8
├── 9
└── aaa/
└── bbb
3 directories, 167 files
man find | grep mindep
-mindepth levels
Do not apply any tests or actions at levels less than levels
(a non-negative integer). -mindepth 1 means process all files
except the starting-points.
ls -p | grep -v / (used below) is from answer 2 at https://unix.stackexchange.com/questions/48492/list-only-regular-files-but-not-directories-in-current-directory
find . -mindepth 1 -type d -print0 | while IFS= read -r -d '' i ; do echo -n $i": " ; ls -p "$i" | grep -v / | wc -l ; done
./Catabolism - Autophagy; Phagosomes; Mitophagy: 138
./Catabolism - Lysosomes: 28
./Catabolism - Lysosomes/aaa: 1
Applcation: I want to find the max number of files among several hundred directories (all depth = 1) [output below again formatted for readability]:
date; pwd
Fri Mar 29 20:08:08 PDT 2019
/home/victoria/Mail/2_RESEARCH - NEWS
time find . -mindepth 1 -type d -print0 | while IFS= read -r -d '' i ; do echo -n $i": " ; ls -p "$i" | grep -v / | wc -l ; done > ../../aaa
0:00.03
[victoria#victoria 2_RESEARCH - NEWS]$ head -n5 ../../aaa
./RNA - Exosomes: 26
./Cellular Signaling - Receptors: 213
./Catabolism - Autophagy; Phagosomes; Mitophagy: 138
./Stress - Physiological, Cellular - General: 261
./Ancient DNA; Ancient Protein: 34
[victoria#victoria 2_RESEARCH - NEWS]$ sed -r 's/(^.*): ([0-9]{1,8}$)/\2: \1/g' ../../aaa | sort -V | (head; echo ''; tail)
0: ./Genomics - Gene Drive
1: ./Causality; Causal Relationships
1: ./Cloning
1: ./GenMAPP 2
1: ./Pathway Interaction Database
1: ./Wasps
2: ./Cellular Signaling - Ras-MAPK Pathway
2: ./Cell Death - Ferroptosis
2: ./Diet - Apples
2: ./Environment - Waste Management
988: ./Genomics - PPM (Personalized & Precision Medicine)
1113: ./Microbes - Pathogens, Parasites
1418: ./Health - Female
1420: ./Immunity, Inflammation - General
1522: ./Science, Research - Miscellaneous
1797: ./Genomics
1910: ./Neuroscience, Neurobiology
2740: ./Genomics - Functional
3943: ./Cancer
4375: ./Health - Disease
sort -V is a natural sort. ... So, my max number of files in any of those (Claws Mail) directories is 4375 files. If I left-pad (https://stackoverflow.com/a/55409116/1904943) those filenames -- they are all named numerically, starting with 1, in each directory -- and pad to 5 total digits, I should be ok.
Addendum
Find the total number of files, subdirectories in a directory.
$ date; pwd
Tue 14 May 2019 04:08:31 PM PDT
/home/victoria/Mail/2_RESEARCH - NEWS
$ ls | head; echo; ls | tail
Acoustics
Ageing
Ageing - Calorie (Dietary) Restriction
Ageing - Senescence
Agriculture, Aquaculture, Fisheries
Ancient DNA; Ancient Protein
Anthropology, Archaeology
Ants
Archaeology
ARO-Relevant Literature, News
Transcriptome - CAGE
Transcriptome - FISSEQ
Transcriptome - RNA-seq
Translational Science, Medicine
Transposons
USACEHR-Relevant Literature
Vaccines
Vision, Eyes, Sight
Wasps
Women in Science, Medicine
$ find . -type f | wc -l
70214 ## files
$ find . -type d | wc -l
417 ## subdirectories
There are many correct answers here. Here's another!
find . -type f | sort | uniq -w 10 -c
where . is the folder to look in and 10 is the number of characters by which to group the directory.
I have written ffcnt to speed up recursive file counting under specific circumstances: rotational disks and filesystems that support extent mapping.
It can be an order of magnitude faster than ls or find based approaches, but YMMV.
suppose you want a per directory total files, try:
for d in `find YOUR_SUBDIR_HERE -type d`; do
printf "$d - files > "
find $d -type f | wc -l
done
for current dir try this:
for d in `find . -type d`; do printf "$d - files > "; find $d -type f | wc -l; done;
if you have long space names you need change IFS, like this:
OIFS=$IFS; IFS=$'\n'
for d in `find . -type d`; do printf "$d - files > "; find $d -type f | wc -l; done
IFS=$OIFS
We can use tree command it displays all the files and folders recursively. As well as it displays count of folders and files in last line of output.
$ tree path/to/folder/
path/to/folder/
├── a-first.html
├── b-second.html
├── subfolder
│ ├── readme.html
│ ├── code.cpp
│ └── code.h
└── z-last-file.html
1 directories, 6 files
For only last line of output in tree command we can use tail command on it's output
$ tree path/to/folder/ | tail -1
1 directories, 6 files
for installing tree we can use below command
$ sudo apt-get install tree
This alternate approach with filtering for format counts all available grub kernel modules:
ls -l /boot/grub/*.mod | wc -l
Based on the responses given above and comments, I've came up with the following file count listing. Especially it's a combination of the solution provided by #Greg Bell, with comments from #Arch Stanton
& #Schneems
Count all files in the current directory & subdirectories
function countit { find . -maxdepth 1000000 -type d -print0 | while IFS= read -r -d '' i ; do file_count=$(find "$i" -type f | wc -l) ; echo "$file_count: $i" ; done }; countit | sort -n -r >file-count.txt
Count all files of given name in the current directory & subdirectories
function countit { find . -maxdepth 1000000 -type d -print0 | while IFS= read -r -d '' i ; do file_count=$(find "$i" -type f | grep <enter_filename_here> | wc -l) ; echo "$file_count: $i" ; done }; countit | sort -n -r >file-with-name-count.txt
find -type f | wc -l
OR (If directory is current directory)
find . -type f | wc -l
This will work completely fine. Simple short. If you want to count the number of files present in a folder.
ls | wc -l
ls -l | grep -e -x -e -dr | wc -l
long list
filter files and dirs
count the filtered line no

How to list the size of each file and directory and sort by descending size in Bash?

I found that there is no easy to get way the size of a directory in Bash?
I want that when I type ls -<some options>, it can list of all the sum of the file size of directory recursively and files at the same time and sort by size order.
Is that possible?
Simply navigate to directory and run following command:
du -a --max-depth=1 | sort -n
OR add -h for human readable sizes and -r to print bigger directories/files first.
du -a -h --max-depth=1 | sort -hr
Apparently --max-depth option is not in Mac OS X's version of the du command. You can use the following instead.
du -h -d 1 | sort -n
du -s -- * | sort -n
(this willnot show hidden (.dotfiles) files)
Use du -sm for Mb units etc. I always use
du -smc -- * | sort -n
because the total line (-c) will end up at the bottom for obvious reasons :)
PS:
See comments for handling dotfiles
I frequently use e.g. 'du -smc /home// | sort -n |tail' to get a feel of where exactly the large bits are sitting
Command
du -h --max-depth=0 * | sort -hr
Output
3,5M asdf.6000.gz
3,4M asdf.4000.gz
3,2M asdf.2000.gz
2,5M xyz.PT.gz
136K xyz.6000.gz
116K xyz.6000p.gz
88K test.4000.gz
76K test.4000p.gz
44K test.2000.gz
8,0K desc.common.tcl
8,0K wer.2000p.gz
8,0K wer.2000.gz
4,0K ttree.3
Explanation
du displays "disk usage"
h is for "human readable" (both, in sort and in du)
max-depth=0 means du will not show sizes of subfolders (remove that if you want to show all sizes of every file in every sub-, subsub-, ..., folder)
r is for "reverse" (biggest file first)
ncdu
When I came to this question, I wanted to clean up my file system. The command line tool ncdu is way better suited to this task.
Installation on Ubuntu:
$ sudo apt-get install ncdu
Usage:
Just type ncdu [path] in the command line. After a few seconds for analyzing the path, you will see something like this:
$ ncdu 1.11 ~ Use the arrow keys to navigate, press ? for help
--- / ---------------------------------------------------------
. 96,1 GiB [##########] /home
. 17,7 GiB [# ] /usr
. 4,5 GiB [ ] /var
1,1 GiB [ ] /lib
732,1 MiB [ ] /opt
. 275,6 MiB [ ] /boot
198,0 MiB [ ] /storage
. 153,5 MiB [ ] /run
. 16,6 MiB [ ] /etc
13,5 MiB [ ] /bin
11,3 MiB [ ] /sbin
. 8,8 MiB [ ] /tmp
. 2,2 MiB [ ] /dev
! 16,0 KiB [ ] /lost+found
8,0 KiB [ ] /media
8,0 KiB [ ] /snap
4,0 KiB [ ] /lib64
e 4,0 KiB [ ] /srv
! 4,0 KiB [ ] /root
e 4,0 KiB [ ] /mnt
e 4,0 KiB [ ] /cdrom
. 0,0 B [ ] /proc
. 0,0 B [ ] /sys
# 0,0 B [ ] initrd.img.old
# 0,0 B [ ] initrd.img
# 0,0 B [ ] vmlinuz.old
# 0,0 B [ ] vmlinuz
Delete the currently highlighted element with d, exit with CTRL + c
ls -S sorts by size. Then, to show the size too, ls -lS gives a long (-l), sorted by size (-S) display. I usually add -h too, to make things easier to read, so, ls -lhS.
Simple and fast:
find . -mindepth 1 -maxdepth 1 -type d | parallel du -s | sort -n
*requires GNU Parallel.
I think I might have figured out what you want to do. This will give a sorted list of all the files and all the directories, sorted by file size and size of the content in the directories.
(find . -depth 1 -type f -exec ls -s {} \;; find . -depth 1 -type d -exec du -s {} \;) | sort -n
[enhanced version]
This is going to be much faster and precise than the initial version below and will output the sum of all the file size of current directory:
echo `find . -type f -exec stat -c %s {} \; | tr '\n' '+' | sed 's/+$//g'` | bc
the stat -c %s command on a file will return its size in bytes. The tr command here is used to overcome xargs command limitations (apparently piping to xargs is splitting results on more lines, breaking the logic of my command). Hence tr is taking care of replacing line feed with + (plus) sign. sed has the only goal to remove the last + sign from the resulting string to avoid complains from the final bc (basic calculator) command that, as usual, does the math.
Performances: I tested it on several directories and over ~150.000 files top (the current number of files of my fedora 15 box) having what I believe it is an amazing result:
# time echo `find / -type f -exec stat -c %s {} \; | tr '\n' '+' | sed 's/+$//g'` | bc
12671767700
real 2m19.164s
user 0m2.039s
sys 0m14.850s
Just in case you want to make a comparison with the du -sb / command, it will output an estimated disk usage in bytes (-b option)
# du -sb /
12684646920 /
As I was expecting it is a little larger than my command calculation because the du utility returns allocated space of each file and not the actual consumed space.
[initial version]
You cannot use du command if you need to know the exact sum size of your folder because (as per man page citation) du estimates file space usage. Hence it will lead you to a wrong result, an approximation (maybe close to the sum size but most likely greater than the actual size you are looking for).
I think there might be different ways to answer your question but this is mine:
ls -l $(find . -type f | xargs) | cut -d" " -f5 | xargs | sed 's/\ /+/g'| bc
It finds all files under . directory (change . with whatever directory you like), also hidden files are included and (using xargs) outputs their names in a single line, then produces a detailed list using ls -l. This (sometimes) huge output is piped towards cut command and only the fifth field (-f5), which is the file size in bytes is taken and again piped against xargs which produces again a single line of sizes separated by blanks. Now take place a sed magic which replaces each blank space with a plus (+) sign and finally bc (basic calculator) does the math.
It might need additional tuning and you may have ls command complaining about arguments list too long.
Another simple solution.
$ for entry in $(ls); do du -s "$entry"; done | sort -n
the result will look like
2900 tmp
6781 boot
8428 bin
24932 lib64
34436 sbin
90084 var
106676 etc
125216 lib
3313136 usr
4828700 opt
changing "du -s" to "du -sh" will show human readable size, but we won't be able to sort in this method.
you can use the below to list files by size
du -h | sort -hr | more
or
du -h --max-depth=0 * | sort -hr | more
I tend to use du in a simple way.
du -sh */ | sort -n
This provides me with an idea of what directories are consuming the most space. I can then run more precise searches later.
sudo du -hsx 2>/dev/null * | sort -hr | less
4.9G var
2.2G usr
61M root
9.0M etc
6.5M home
824K init
36K run
16K lost+found
4.0K tmp
4.0K srv
4.0K opt
4.0K mnt
4.0K media
4.0K boot
0 sys
0 sbin
0 proc
0 libx32
0 lib64
0 lib32
0 lib
0 dev
0 bin
(END)

diff to output only the file names

I'm looking to run a Linux command that will recursively compare two directories and output only the file names of what is different. This includes anything that is present in one directory and not the other or vice versa, and text differences.
From the diff man page:
-q Report only whether the files differ, not the details of the differences.
-r When comparing directories, recursively compare any subdirectories found.
Example command:
diff -qr dir1 dir2
Example output (depends on locale):
$ ls dir1 dir2
dir1:
same-file different only-1
dir2:
same-file different only-2
$ diff -qr dir1 dir2
Files dir1/different and dir2/different differ
Only in dir1: only-1
Only in dir2: only-2
You can also use rsync
rsync -rv --size-only --dry-run /my/source/ /my/dest/ > diff.out
If you want to get a list of files that are only in one directory and not their sub directories and only their file names:
diff -q /dir1 /dir2 | grep /dir1 | grep -E "^Only in*" | sed -n 's/[^:]*: //p'
If you want to recursively list all the files and directories that are different with their full paths:
diff -rq /dir1 /dir2 | grep -E "^Only in /dir1*" | sed -n 's/://p' | awk '{print $3"/"$4}'
This way you can apply different commands to all the files.
For example I could remove all the files and directories that are in dir1 but not dir2:
diff -rq /dir1 /dir2 | grep -E "^Only in /dir1*" | sed -n 's/://p' | awk '{print $3"/"$4}' xargs -I {} rm -r {}
The approach of running diff -qr old/ new/ has one major drawback: it may miss files in newly created directories. E.g. in the example below the file data/pages/playground/playground.txt is not in the output of diff -qr old/ new/ whereas the directory data/pages/playground/ is (search for playground.txt in your browser to quickly compare). I also posted the following solution on Unix & Linux Stack Exchange, but I'll copy it here as well:
To create a list of new or modified files programmatically the best solution I could come up with is using rsync, sort, and uniq:
(rsync -rcn --out-format="%n" old/ new/ && rsync -rcn --out-format="%n" new/ old/) | sort | uniq
Let me explain with this example: we want to compare two dokuwiki releases to see which files were changed and which ones were newly created.
We fetch the tars with wget and extract them into the directories old/ and new/:
wget http://download.dokuwiki.org/src/dokuwiki/dokuwiki-2014-09-29d.tgz
wget http://download.dokuwiki.org/src/dokuwiki/dokuwiki-2014-09-29.tgz
mkdir old && tar xzf dokuwiki-2014-09-29.tgz -C old --strip-components=1
mkdir new && tar xzf dokuwiki-2014-09-29d.tgz -C new --strip-components=1
Running rsync one way might miss newly created files as the comparison of rsync and diff shows here:
rsync -rcn --out-format="%n" old/ new/
yields the following output:
VERSION
doku.php
conf/mime.conf
inc/auth.php
inc/lang/no/lang.php
lib/plugins/acl/remote.php
lib/plugins/authplain/auth.php
lib/plugins/usermanager/admin.php
Running rsync only in one direction misses the newly created files and the other way round would miss deleted files, compare the output of diff:
diff -qr old/ new/
yields the following output:
Files old/VERSION and new/VERSION differ
Files old/conf/mime.conf and new/conf/mime.conf differ
Only in new/data/pages: playground
Files old/doku.php and new/doku.php differ
Files old/inc/auth.php and new/inc/auth.php differ
Files old/inc/lang/no/lang.php and new/inc/lang/no/lang.php differ
Files old/lib/plugins/acl/remote.php and new/lib/plugins/acl/remote.php differ
Files old/lib/plugins/authplain/auth.php and new/lib/plugins/authplain/auth.php differ
Files old/lib/plugins/usermanager/admin.php and new/lib/plugins/usermanager/admin.php differ
Running rsync both ways and sorting the output to remove duplicates reveals that the directory data/pages/playground/ and the file data/pages/playground/playground.txt were missed initially:
(rsync -rcn --out-format="%n" old/ new/ && rsync -rcn --out-format="%n" new/ old/) | sort | uniq
yields the following output:
VERSION
conf/mime.conf
data/pages/playground/
data/pages/playground/playground.txt
doku.php
inc/auth.php
inc/lang/no/lang.php
lib/plugins/acl/remote.php
lib/plugins/authplain/auth.php
lib/plugins/usermanager/admin.php
rsync is run with theses arguments:
-r to "recurse into directories",
-c to also compare files of identical size and only "skip based on checksum, not mod-time & size",
-n to "perform a trial run with no changes made", and
--out-format="%n" to "output updates using the specified FORMAT", which is "%n" here for the file name only
The output (list of files) of rsync in both directions is combined and sorted using sort, and this sorted list is then condensed by removing all duplicates with uniq
On my linux system to get just the filenames
diff -q /dir1 /dir2|cut -f2 -d' '
I have a directory.
$ tree dir1
dir1
├── a
│   └── 1.txt
├── b
│   └── 2.txt
└── c
├── 3.txt
├── 4.txt
└── d
└── 5.txt
4 directories, 5 files
I have another directory.
$ tree dir2
dir2
├── a
│   └── 1.txt
├── b
└── c
├── 3.txt
├── 5.txt
└── d
└── 5.txt
4 directories, 4 files
I can diff two directories.
$ diff <(cd dir1; find . -type f | sort) <(cd dir2; find . -type f| sort)
--- /dev/fd/11 2022-01-21 20:27:15.000000000 +0900
+++ /dev/fd/12 2022-01-21 20:27:15.000000000 +0900
## -1,5 +1,4 ##
./a/1.txt
-./b/2.txt
./c/3.txt
-./c/4.txt
+./c/5.txt
./c/d/5.txt
rsync -rvc --delete --size-only --dry-run source dir target dir

Resources