How to replace spaces in file names using a bash script - linux

Can anyone recommend a safe solution to recursively replace spaces with underscores in file and directory names starting from a given root directory? For example:
$ tree
.
|-- a dir
| `-- file with spaces.txt
`-- b dir
|-- another file with spaces.txt
`-- yet another file with spaces.pdf
becomes:
$ tree
.
|-- a_dir
| `-- file_with_spaces.txt
`-- b_dir
|-- another_file_with_spaces.txt
`-- yet_another_file_with_spaces.pdf

I use:
for f in *\ *; do mv "$f" "${f// /_}"; done
Though it's not recursive, it's quite fast and simple. I'm sure someone here could update it to be recursive.
The ${f// /_} part utilizes bash's parameter expansion mechanism to replace a pattern within a parameter with supplied string.
The relevant syntax is ${parameter/pattern/string}. See: https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html or http://wiki.bash-hackers.org/syntax/pe .

Use rename (aka prename) which is a Perl script which may be on your system already. Do it in two steps:
find . -name "* *" -type d | rename 's/ /_/g' # do the directories first
find . -name "* *" -type f | rename 's/ /_/g'
Based on Jürgen's answer and able to handle multiple layers of files and directories in a single bound using the "Revision 1.5 1998/12/18 16:16:31 rmb1" version of /usr/bin/rename (a Perl script):
find /tmp/ -depth -name "* *" -execdir rename 's/ /_/g' "{}" \;

find . -depth -name '* *' \
| while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done
failed to get it right at first, because I didn't think of directories.

you can use detox by Doug Harple
detox -r <folder>

A find/rename solution. rename is part of util-linux.
You need to descend depth first, because a whitespace filename can be part of a whitespace directory:
find /tmp/ -depth -name "* *" -execdir rename " " "_" "{}" ";"

you can use this:
find . -depth -name '* *' | while read fname
do
new_fname=`echo $fname | tr " " "_"`
if [ -e $new_fname ]
then
echo "File $new_fname already exists. Not replacing $fname"
else
echo "Creating new file $new_fname to replace $fname"
mv "$fname" $new_fname
fi
done

bash 4.0
#!/bin/bash
shopt -s globstar
for file in **/*\ *
do
mv "$file" "${file// /_}"
done

Recursive version of Naidim's Answers.
find . -name "* *" | awk '{ print length, $0 }' | sort -nr -s | cut -d" " -f2- | while read f; do base=$(basename "$f"); newbase="${base// /_}"; mv "$(dirname "$f")/$(basename "$f")" "$(dirname "$f")/$newbase"; done

In macOS
Just like the chosen answer.
brew install rename
#
cd <your dir>
find . -name "* *" -type d | rename 's/ /_/g' # do the directories first
find . -name "* *" -type f | rename 's/ /_/g'

For those struggling through this using macOS, first install all the tools:
brew install tree findutils rename
Then when needed to rename, make an alias for GNU find (gfind) as find. Then run the code of #Michel Krelin:
alias find=gfind
find . -depth -name '* *' \
| while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done

Here's a (quite verbose) find -exec solution which writes "file already exists" warnings to stderr:
function trspace() {
declare dir name bname dname newname replace_char
[ $# -lt 1 -o $# -gt 2 ] && { echo "usage: trspace dir char"; return 1; }
dir="${1}"
replace_char="${2:-_}"
find "${dir}" -xdev -depth -name $'*[ \t\r\n\v\f]*' -exec bash -c '
for ((i=1; i<=$#; i++)); do
name="${#:i:1}"
dname="${name%/*}"
bname="${name##*/}"
newname="${dname}/${bname//[[:space:]]/${0}}"
if [[ -e "${newname}" ]]; then
echo "Warning: file already exists: ${newname}" 1>&2
else
mv "${name}" "${newname}"
fi
done
' "${replace_char}" '{}' +
}
trspace rootdir _

This one does a little bit more. I use it to rename my downloaded torrents (no special characters (non-ASCII), spaces, multiple dots, etc.).
#!/usr/bin/perl
&rena(`find . -type d`);
&rena(`find . -type f`);
sub rena
{
($elems)=#_;
#t=split /\n/,$elems;
for $e (#t)
{
$_=$e;
# remove ./ of find
s/^\.\///;
# non ascii transliterate
tr [\200-\377][_];
tr [\000-\40][_];
# special characters we do not want in paths
s/[ \-\,\;\?\+\'\"\!\[\]\(\)\#\#]/_/g;
# multiple dots except for extension
while (/\..*\./)
{
s/\./_/;
}
# only one _ consecutive
s/_+/_/g;
next if ($_ eq $e ) or ("./$_" eq $e);
print "$e -> $_\n";
rename ($e,$_);
}
}

An easy alternative to recursive version is to increase the range of for loop step by step(n times for n sub-levels irrespective of number of sub-directories at each level). i.e from the outermost directory run these.
for f in *; do mv "$f" "${f// /_}"; done
for f in */*; do mv "$f" "${f// /_}"; done
for f in */*/*; do mv "$f" "${f// /_}"; done
To check/understand what's being done, run the following before and after the above steps.
for f in *;do echo $f;done
for f in */*;do echo $f;done
for f in */*/*;do echo $f;done

I found around this script, it may be interesting :)
IFS=$'\n';for f in `find .`; do file=$(echo $f | tr [:blank:] '_'); [ -e $f ] && [ ! -e $file ] && mv "$f" $file;done;unset IFS

Here's a reasonably sized bash script solution
#!/bin/bash
(
IFS=$'\n'
for y in $(ls $1)
do
mv $1/`echo $y | sed 's/ /\\ /g'` $1/`echo "$y" | sed 's/ /_/g'`
done
)

This only finds files inside the current directory and renames them. I have this aliased.
find ./ -name "* *" -type f -d 1 | perl -ple '$file = $_; $file =~ s/\s+/_/g; rename($_, $file);

I just make one for my own purpose.
You may can use it as reference.
#!/bin/bash
cd /vzwhome/c0cheh1/dev_source/UB_14_8
for file in *
do
echo $file
cd "/vzwhome/c0cheh1/dev_source/UB_14_8/$file/Configuration/$file"
echo "==> `pwd`"
for subfile in *\ *; do [ -d "$subfile" ] && ( mv "$subfile" "$(echo $subfile | sed -e 's/ /_/g')" ); done
ls
cd /vzwhome/c0cheh1/dev_source/UB_14_8
done

For files in folder named /files
for i in `IFS="";find /files -name *\ *`
do
echo $i
done > /tmp/list
while read line
do
mv "$line" `echo $line | sed 's/ /_/g'`
done < /tmp/list
rm /tmp/list

My solution to the problem is a bash script:
#!/bin/bash
directory=$1
cd "$directory"
while [ "$(find ./ -regex '.* .*' | wc -l)" -gt 0 ];
do filename="$(find ./ -regex '.* .*' | head -n 1)"
mv "$filename" "$(echo "$filename" | sed 's|'" "'|_|g')"
done
just put the directory name, on which you want to apply the script, as an argument after executing the script.

Use below command to replace space with underscore in filename as well as directory name.
find -name "* *" -print0 | sort -rz | \
while read -d $'\0' f; do mv -v "$f" "$(dirname "$f")/$(basename "${f// /_}")"; done

If you need to rename only files in one directory by replacing all spaces. Then you can use this command with rename.ul:
for i in *' '*; do rename.ul ' ' '_' *; done

Actually, there's no need to use rename script in perl:
find . -depth -name "* *" -execdir bash -c 'mv "$1" `echo $1 | sed s/ /_/g`' -- {} \;

Related

Replacing both spaces and " - " in filenames and directorys

I have been reading this this discussion and this find . -depth -name '* *' \ | while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done
helps me deleting spaces in files and directories.
Beat Boy becomes Beat_Boy. This is ok.
What I don't get right is how to deal with this:
Beat Boy - Best of becomes Beat_Boy_-_Best_of while I want it to be Beat_Boy-Best_of.
I would appreciate any hint which way to go...
Regards
You can add sed to substitute "_-_" with "-"
f="Beat Boy - Best of"
echo $f | tr ' ' _ | sed 's/_-_/-/g'
#Beat_Boy-Best_of
In your case, you would want:
find . -depth -name '* *' |
while IFS= read -r f ; do
mv -i "$f" "$(dirname "$f")/$(basename "$f" |
tr ' ' _ |
sed 's/_-_/-/g')" ;
done
Edit
You can also replace tr ' ' _ | sed 's/_-_/-/g' with sed 's/ /_/g ; s/_-_/-/g'.
f="Beat Boy - Best of"
f1=${f// - /-}
f1=${f1// /_}
echo $f1
Beat_Boy-Best_of
This solution just replaces any number of [ ' ' or '-' ] with a single '_'. I assume that's probably what you want.
The find command still only searches for files with spaces in them, but you can change that to suit your needs.
while IFS= read -r f ;
mv -i "$f" "$(dirname "$f")/$(basename "$f" | sed -re 's/[ -]+/_/g')";
done < <(find . -depth -name '* *')
Full credit: this takes LC-datascientist's solution and replacing the somewhat awkward combination of tr and sed. Even Doyousketch2's comment about sed didn't use s///g option to make it simpler.

rename files with one single command [duplicate]

Can anyone recommend a safe solution to recursively replace spaces with underscores in file and directory names starting from a given root directory? For example:
$ tree
.
|-- a dir
| `-- file with spaces.txt
`-- b dir
|-- another file with spaces.txt
`-- yet another file with spaces.pdf
becomes:
$ tree
.
|-- a_dir
| `-- file_with_spaces.txt
`-- b_dir
|-- another_file_with_spaces.txt
`-- yet_another_file_with_spaces.pdf
I use:
for f in *\ *; do mv "$f" "${f// /_}"; done
Though it's not recursive, it's quite fast and simple. I'm sure someone here could update it to be recursive.
The ${f// /_} part utilizes bash's parameter expansion mechanism to replace a pattern within a parameter with supplied string.
The relevant syntax is ${parameter/pattern/string}. See: https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html or http://wiki.bash-hackers.org/syntax/pe .
Use rename (aka prename) which is a Perl script which may be on your system already. Do it in two steps:
find . -name "* *" -type d | rename 's/ /_/g' # do the directories first
find . -name "* *" -type f | rename 's/ /_/g'
Based on Jürgen's answer and able to handle multiple layers of files and directories in a single bound using the "Revision 1.5 1998/12/18 16:16:31 rmb1" version of /usr/bin/rename (a Perl script):
find /tmp/ -depth -name "* *" -execdir rename 's/ /_/g' "{}" \;
find . -depth -name '* *' \
| while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done
failed to get it right at first, because I didn't think of directories.
you can use detox by Doug Harple
detox -r <folder>
A find/rename solution. rename is part of util-linux.
You need to descend depth first, because a whitespace filename can be part of a whitespace directory:
find /tmp/ -depth -name "* *" -execdir rename " " "_" "{}" ";"
you can use this:
find . -depth -name '* *' | while read fname
do
new_fname=`echo $fname | tr " " "_"`
if [ -e $new_fname ]
then
echo "File $new_fname already exists. Not replacing $fname"
else
echo "Creating new file $new_fname to replace $fname"
mv "$fname" $new_fname
fi
done
bash 4.0
#!/bin/bash
shopt -s globstar
for file in **/*\ *
do
mv "$file" "${file// /_}"
done
Recursive version of Naidim's Answers.
find . -name "* *" | awk '{ print length, $0 }' | sort -nr -s | cut -d" " -f2- | while read f; do base=$(basename "$f"); newbase="${base// /_}"; mv "$(dirname "$f")/$(basename "$f")" "$(dirname "$f")/$newbase"; done
In macOS
Just like the chosen answer.
brew install rename
#
cd <your dir>
find . -name "* *" -type d | rename 's/ /_/g' # do the directories first
find . -name "* *" -type f | rename 's/ /_/g'
For those struggling through this using macOS, first install all the tools:
brew install tree findutils rename
Then when needed to rename, make an alias for GNU find (gfind) as find. Then run the code of #Michel Krelin:
alias find=gfind
find . -depth -name '* *' \
| while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done
Here's a (quite verbose) find -exec solution which writes "file already exists" warnings to stderr:
function trspace() {
declare dir name bname dname newname replace_char
[ $# -lt 1 -o $# -gt 2 ] && { echo "usage: trspace dir char"; return 1; }
dir="${1}"
replace_char="${2:-_}"
find "${dir}" -xdev -depth -name $'*[ \t\r\n\v\f]*' -exec bash -c '
for ((i=1; i<=$#; i++)); do
name="${#:i:1}"
dname="${name%/*}"
bname="${name##*/}"
newname="${dname}/${bname//[[:space:]]/${0}}"
if [[ -e "${newname}" ]]; then
echo "Warning: file already exists: ${newname}" 1>&2
else
mv "${name}" "${newname}"
fi
done
' "${replace_char}" '{}' +
}
trspace rootdir _
This one does a little bit more. I use it to rename my downloaded torrents (no special characters (non-ASCII), spaces, multiple dots, etc.).
#!/usr/bin/perl
&rena(`find . -type d`);
&rena(`find . -type f`);
sub rena
{
($elems)=#_;
#t=split /\n/,$elems;
for $e (#t)
{
$_=$e;
# remove ./ of find
s/^\.\///;
# non ascii transliterate
tr [\200-\377][_];
tr [\000-\40][_];
# special characters we do not want in paths
s/[ \-\,\;\?\+\'\"\!\[\]\(\)\#\#]/_/g;
# multiple dots except for extension
while (/\..*\./)
{
s/\./_/;
}
# only one _ consecutive
s/_+/_/g;
next if ($_ eq $e ) or ("./$_" eq $e);
print "$e -> $_\n";
rename ($e,$_);
}
}
An easy alternative to recursive version is to increase the range of for loop step by step(n times for n sub-levels irrespective of number of sub-directories at each level). i.e from the outermost directory run these.
for f in *; do mv "$f" "${f// /_}"; done
for f in */*; do mv "$f" "${f// /_}"; done
for f in */*/*; do mv "$f" "${f// /_}"; done
To check/understand what's being done, run the following before and after the above steps.
for f in *;do echo $f;done
for f in */*;do echo $f;done
for f in */*/*;do echo $f;done
I found around this script, it may be interesting :)
IFS=$'\n';for f in `find .`; do file=$(echo $f | tr [:blank:] '_'); [ -e $f ] && [ ! -e $file ] && mv "$f" $file;done;unset IFS
Here's a reasonably sized bash script solution
#!/bin/bash
(
IFS=$'\n'
for y in $(ls $1)
do
mv $1/`echo $y | sed 's/ /\\ /g'` $1/`echo "$y" | sed 's/ /_/g'`
done
)
This only finds files inside the current directory and renames them. I have this aliased.
find ./ -name "* *" -type f -d 1 | perl -ple '$file = $_; $file =~ s/\s+/_/g; rename($_, $file);
I just make one for my own purpose.
You may can use it as reference.
#!/bin/bash
cd /vzwhome/c0cheh1/dev_source/UB_14_8
for file in *
do
echo $file
cd "/vzwhome/c0cheh1/dev_source/UB_14_8/$file/Configuration/$file"
echo "==> `pwd`"
for subfile in *\ *; do [ -d "$subfile" ] && ( mv "$subfile" "$(echo $subfile | sed -e 's/ /_/g')" ); done
ls
cd /vzwhome/c0cheh1/dev_source/UB_14_8
done
For files in folder named /files
for i in `IFS="";find /files -name *\ *`
do
echo $i
done > /tmp/list
while read line
do
mv "$line" `echo $line | sed 's/ /_/g'`
done < /tmp/list
rm /tmp/list
My solution to the problem is a bash script:
#!/bin/bash
directory=$1
cd "$directory"
while [ "$(find ./ -regex '.* .*' | wc -l)" -gt 0 ];
do filename="$(find ./ -regex '.* .*' | head -n 1)"
mv "$filename" "$(echo "$filename" | sed 's|'" "'|_|g')"
done
just put the directory name, on which you want to apply the script, as an argument after executing the script.
Use below command to replace space with underscore in filename as well as directory name.
find -name "* *" -print0 | sort -rz | \
while read -d $'\0' f; do mv -v "$f" "$(dirname "$f")/$(basename "${f// /_}")"; done
If you need to rename only files in one directory by replacing all spaces. Then you can use this command with rename.ul:
for i in *' '*; do rename.ul ' ' '_' *; done
Actually, there's no need to use rename script in perl:
find . -depth -name "* *" -execdir bash -c 'mv "$1" `echo $1 | sed s/ /_/g`' -- {} \;

assigning files in a directory to sub-directories

I have a 1000s of files in a directory with and I want to be able to divide them into sub-directories, with each sub-directory containing a specific number of files. I don't care what files go into what directories, just as long as each contain a specific number. All the file names have a common ending (e.g. .txt) but what goes before varies.
Anyone know an easy way to do this.
Assuming you only have files ending in *.txt, no hidden files and no directories:
#!/bin/bash
shopt -s nullglob
maxf=42
files=( *.txt )
for ((i=0;maxf*i<${#files[#]};++i)); do
s=subdir$i
mkdir -p "$s"
mv -t "$s" -- "${files[#]:i*maxf:maxf}"
done
This will create directories subdirX with X an integer starting from 0, and will put 42 files in each directory.
You can tweak the thing to have padded zeroes for X:
#!/bin/bash
shopt -s nullglob
files=( *.txt )
maxf=42
((l=${#files[#]}/maxf))
p=${#l}
for ((i=0;maxf*i<${#files[#]};++i)); do
printf -v s "subdir%0${p}d" "$i"
mkdir -p "$s"
mv -t "$s" -- "${files[#]:i*maxf:maxf}"
done
max_per_subdir=1000
start=1
while [ -e $(printf %03d $start) ]; do
start=$((start + 1))
done
find -maxdepth 1 -type f ! -name '.*' -name '*.txt' -print0 \
| xargs -0 -n $max_per_subdir echo \
| while read -a files; do
subdir=$(printf %03d $start)
mkdir $subdir || exit 1
mv "${files[#]}" $subdir/ || exit 1
start=$((start + 1))
done
How about
find *.txt -print0 | xargs -0 -n 100 | xargs -I {} echo cp {} '$(md5sum <<< "{}")' | sh
This will create several directories each containing 100 files. The name of each created directory is a md5 hash of the filenames it contains.

Finding the number of files in a directory for all directories in pwd

I am trying to list all directories and place its number of files next to it.
I can find the total number of files ls -lR | grep .*.mp3 | wc -l. But how can I get an output like this:
dir1 34
dir2 15
dir3 2
...
I don't mind writing to a text file or CSV to get this information if its not possible to get it on screen.
Thank you all for any help on this.
This seems to work assuming you are in a directory where some subdirectories may contain mp3 files. It omits the top level directory. It will list the directories in order by largest number of contained mp3 files.
find . -mindepth 2 -name \*.mp3 -print0| xargs -0 -n 1 dirname | sort | uniq -c | sort -r | awk '{print $2 "," $1}'
I updated this with print0 to handle filenames with spaces and other tricky characters and to print output suitable for CSV.
find . -type f -iname '*.mp3' -printf "%h\n" | uniq -c
Or, if order (dir-> count instead of count-> dir) is really important to you:
find . -type f -iname '*.mp3' -printf "%h\n" | uniq -c | awk '{print $2" "$1}'
There's probably much better ways, but this seems to work.
Put this in a shell script:
#!/bin/sh
for f in *
do
if [ -d "$f" ]
then
cd "$f"
c=`ls -l *.mp3 2>/dev/null | wc -l`
if test $c -gt 0
then
echo "$f $c"
fi
cd ..
fi
done
With Perl:
perl -MFile::Find -le'
find {
wanted => sub {
return unless /\.mp3$/i;
++$_{$File::Find::dir};
}
}, ".";
print "$_,$_{$_}" for
sort {
$_{$b} <=> $_{$a}
} keys %_;
'
Here's yet another way to even handle file names containing unusual (but legal) characters, such as newlines, ...:
# count .mp3 files (using GNU find)
find . -xdev -type f -iname "*.mp3" -print0 | tr -dc '\0' | wc -c
# list directories with number of .mp3 files
find "$(pwd -P)" -xdev -depth -type d -exec bash -c '
for ((i=1; i<=$#; i++ )); do
d="${#:i:1}"
mp3s="$(find "${d}" -xdev -type f -iname "*.mp3" -print0 | tr -dc "${0}" | wc -c )"
[[ $mp3s -gt 0 ]] && printf "%s\n" "${d}, ${mp3s// /}"
done
' "'\\0'" '{}' +

How to find duplicate files with same name but in different case that exist in same directory in Linux?

How can I return a list of files that are named duplicates i.e. have same name but in different case that exist in the same directory?
I don't care about the contents of the files. I just need to know the location and name of any files that have a duplicate of the same name.
Example duplicates:
/www/images/taxi.jpg
/www/images/Taxi.jpg
Ideally I need to search all files recursively from a base directory. In above example it was /www/
The other answer is great, but instead of the "rather monstrous" perl script i suggest
perl -pe 's!([^/]+)$!lc $1!e'
Which will lowercase just the filename part of the path.
Edit 1: In fact the entire problem can be solved with:
find . | perl -ne 's!([^/]+)$!lc $1!e; print if 1 == $seen{$_}++'
Edit 3: I found a solution using sed, sort and uniq that also will print out the duplicates, but it only works if there are no whitespaces in filenames:
find . |sed 's,\(.*\)/\(.*\)$,\1/\2\t\1/\L\2,'|sort|uniq -D -f 1|cut -f 1
Edit 2: And here is a longer script that will print out the names, it takes a list of paths on stdin, as given by find. Not so elegant, but still:
#!/usr/bin/perl -w
use strict;
use warnings;
my %dup_series_per_dir;
while (<>) {
my ($dir, $file) = m!(.*/)?([^/]+?)$!;
push #{$dup_series_per_dir{$dir||'./'}{lc $file}}, $file;
}
for my $dir (sort keys %dup_series_per_dir) {
my #all_dup_series_in_dir = grep { #{$_} > 1 } values %{$dup_series_per_dir{$dir}};
for my $one_dup_series (#all_dup_series_in_dir) {
print "$dir\{" . join(',', sort #{$one_dup_series}) . "}\n";
}
}
Try:
ls -1 | tr '[A-Z]' '[a-z]' | sort | uniq -c | grep -v " 1 "
Simple, really :-) Aren't pipelines wonderful beasts?
The ls -1 gives you the files one per line, the tr '[A-Z]' '[a-z]' converts all uppercase to lowercase, the sort sorts them (surprisingly enough), uniq -c removes subsequent occurrences of duplicate lines whilst giving you a count as well and, finally, the grep -v " 1 " strips out those lines where the count was one.
When I run this in a directory with one "duplicate" (I copied qq to qQ), I get:
2 qq
For the "this directory and every subdirectory" version, just replace ls -1 with find . or find DIRNAME if you want a specific directory starting point (DIRNAME is the directory name you want to use).
This returns (for me):
2 ./.gconf/system/gstreamer/0.10/audio/profiles/mp3
2 ./.gconf/system/gstreamer/0.10/audio/profiles/mp3/%gconf.xml
2 ./.gnome2/accels/blackjack
2 ./qq
which are caused by:
pax> ls -1d .gnome2/accels/[bB]* .gconf/system/gstreamer/0.10/audio/profiles/[mM]* [qQ]?
.gconf/system/gstreamer/0.10/audio/profiles/mp3
.gconf/system/gstreamer/0.10/audio/profiles/MP3
.gnome2/accels/blackjack
.gnome2/accels/Blackjack
qq
qQ
Update:
Actually, on further reflection, the tr will lowercase all components of the path so that both of
/a/b/c
/a/B/c
will be considered duplicates even though they're in different directories.
If you only want duplicates within a single directory to show as a match, you can use the (rather monstrous):
perl -ne '
chomp;
#flds = split (/\//);
$lstf = $f[-1];
$lstf =~ tr/A-Z/a-z/;
for ($i =0; $i ne $#flds; $i++) {
print "$f[$i]/";
};
print "$x\n";'
in place of:
tr '[A-Z]' '[a-z]'
What it does is to only lowercase the final portion of the pathname rather than the whole thing. In addition, if you only want regular files (no directories, FIFOs and so forth), use find -type f to restrict what's returned.
I believe
ls | sort -f | uniq -i -d
is simpler, faster, and will give the same result
Following up on the response of mpez0, to detect recursively just replace "ls" by "find .".
The only problem I see with this is that if this is a directory that is duplicating, then you have 1 entry for each files in this directory. Some human brain is required to treat the output of this.
But anyway, you're not automatically deleting these files, are you?
find . | sort -f | uniq -i -d
This is a nice little command line app called findsn you get if you compile fslint that the deb package does not include.
it will find any files with the same name, and its lightning fast and it can handle different case.
/findsn --help
find (files) with duplicate or conflicting names.
Usage: findsn [-A -c -C] [[-r] [-f] paths(s) ...]
If no arguments are supplied the $PATH is searched for any redundant
or conflicting files.
-A reports all aliases (soft and hard links) to files.
If no path(s) specified then the $PATH is searched.
If only path(s) specified then they are checked for duplicate named
files. You can qualify this with -C to ignore case in this search.
Qualifying with -c is more restrictive as only files (or directories)
in the same directory whose names differ only in case are reported.
I.E. -c will flag files & directories that will conflict if transfered
to a case insensitive file system. Note if -c or -C specified and
no path(s) specified the current directory is assumed.
Here is an example how to find all duplicate jar files:
find . -type f -printf "%f\n" -name "*.jar" | sort -f | uniq -i -d
Replace *.jar with whatever duplicate file type you are looking for.
Here's a script that worked for me ( I am not the author). the original and discussion can be found here:
http://www.daemonforums.org/showthread.php?t=4661
#! /bin/sh
# find duplicated files in directory tree
# comparing by file NAME, SIZE or MD5 checksum
# --------------------------------------------
# LICENSE(s): BSD / CDDL
# --------------------------------------------
# vermaden [AT] interia [DOT] pl
# http://strony.toya.net.pl/~vermaden/links.htm
__usage() {
echo "usage: $( basename ${0} ) OPTION DIRECTORY"
echo " OPTIONS: -n check by name (fast)"
echo " -s check by size (medium)"
echo " -m check by md5 (slow)"
echo " -N same as '-n' but with delete instructions printed"
echo " -S same as '-s' but with delete instructions printed"
echo " -M same as '-m' but with delete instructions printed"
echo " EXAMPLE: $( basename ${0} ) -s /mnt"
exit 1
}
__prefix() {
case $( id -u ) in
(0) PREFIX="rm -rf" ;;
(*) case $( uname ) in
(SunOS) PREFIX="pfexec rm -rf" ;;
(*) PREFIX="sudo rm -rf" ;;
esac
;;
esac
}
__crossplatform() {
case $( uname ) in
(FreeBSD)
MD5="md5 -r"
STAT="stat -f %z"
;;
(Linux)
MD5="md5sum"
STAT="stat -c %s"
;;
(SunOS)
echo "INFO: supported systems: FreeBSD Linux"
echo
echo "Porting to Solaris/OpenSolaris"
echo " -- provide values for MD5/STAT in '$( basename ${0} ):__crossplatform()'"
echo " -- use digest(1) instead for md5 sum calculation"
echo " $ digest -a md5 file"
echo " -- pfexec(1) is already used in '$( basename ${0} ):__prefix()'"
echo
exit 1
(*)
echo "INFO: supported systems: FreeBSD Linux"
exit 1
;;
esac
}
__md5() {
__crossplatform
:> ${DUPLICATES_FILE}
DATA=$( find "${1}" -type f -exec ${MD5} {} ';' | sort -n )
echo "${DATA}" \
| awk '{print $1}' \
| uniq -c \
| while read LINE
do
COUNT=$( echo ${LINE} | awk '{print $1}' )
[ ${COUNT} -eq 1 ] && continue
SUM=$( echo ${LINE} | awk '{print $2}' )
echo "${DATA}" | grep ${SUM} >> ${DUPLICATES_FILE}
done
echo "${DATA}" \
| awk '{print $1}' \
| sort -n \
| uniq -c \
| while read LINE
do
COUNT=$( echo ${LINE} | awk '{print $1}' )
[ ${COUNT} -eq 1 ] && continue
SUM=$( echo ${LINE} | awk '{print $2}' )
echo "count: ${COUNT} | md5: ${SUM}"
grep ${SUM} ${DUPLICATES_FILE} \
| cut -d ' ' -f 2-10000 2> /dev/null \
| while read LINE
do
if [ -n "${PREFIX}" ]
then
echo " ${PREFIX} \"${LINE}\""
else
echo " ${LINE}"
fi
done
echo
done
rm -rf ${DUPLICATES_FILE}
}
__size() {
__crossplatform
find "${1}" -type f -exec ${STAT} {} ';' \
| sort -n \
| uniq -c \
| while read LINE
do
COUNT=$( echo ${LINE} | awk '{print $1}' )
[ ${COUNT} -eq 1 ] && continue
SIZE=$( echo ${LINE} | awk '{print $2}' )
SIZE_KB=$( echo ${SIZE} / 1024 | bc )
echo "count: ${COUNT} | size: ${SIZE_KB}KB (${SIZE} bytes)"
if [ -n "${PREFIX}" ]
then
find ${1} -type f -size ${SIZE}c -exec echo " ${PREFIX} \"{}\"" ';'
else
# find ${1} -type f -size ${SIZE}c -exec echo " {} " ';' -exec du -h " {}" ';'
find ${1} -type f -size ${SIZE}c -exec echo " {} " ';'
fi
echo
done
}
__file() {
__crossplatform
find "${1}" -type f \
| xargs -n 1 basename 2> /dev/null \
| tr '[A-Z]' '[a-z]' \
| sort -n \
| uniq -c \
| sort -n -r \
| while read LINE
do
COUNT=$( echo ${LINE} | awk '{print $1}' )
[ ${COUNT} -eq 1 ] && break
FILE=$( echo ${LINE} | cut -d ' ' -f 2-10000 2> /dev/null )
echo "count: ${COUNT} | file: ${FILE}"
FILE=$( echo ${FILE} | sed -e s/'\['/'\\\['/g -e s/'\]'/'\\\]'/g )
if [ -n "${PREFIX}" ]
then
find ${1} -iname "${FILE}" -exec echo " ${PREFIX} \"{}\"" ';'
else
find ${1} -iname "${FILE}" -exec echo " {}" ';'
fi
echo
done
}
# main()
[ ${#} -ne 2 ] && __usage
[ ! -d "${2}" ] && __usage
DUPLICATES_FILE="/tmp/$( basename ${0} )_DUPLICATES_FILE.tmp"
case ${1} in
(-n) __file "${2}" ;;
(-m) __md5 "${2}" ;;
(-s) __size "${2}" ;;
(-N) __prefix; __file "${2}" ;;
(-M) __prefix; __md5 "${2}" ;;
(-S) __prefix; __size "${2}" ;;
(*) __usage ;;
esac
If the find command is not working for you, you may have to change it. For example
OLD : find "${1}" -type f | xargs -n 1 basename
NEW : find "${1}" -type f -printf "%f\n"
You can use:
find -type f -exec readlink -m {} \; | gawk 'BEGIN{FS="/";OFS="/"}{$NF=tolower($NF);print}' | uniq -c
Where:
find -type f
recursion print all file's full path.
-exec readlink -m {} \;
get file's absolute path
gawk 'BEGIN{FS="/";OFS="/"}{$NF=tolower($NF);print}'
replace the all filename's to lower case
uniq -c
unique the path, -c output the count of duplicate.
Little bit late to this one, but here's the version I went with:
find . -type f | awk -F/ '{print $NF}' | sort -f | uniq -i -d
Here we are using:
find - find all files under the current dir
awk - remove the file path part of the filename
sort - sort case insensitively
uniq - find the dupes from what makes it through the pipe
(Inspired by #mpez0 answer, and #SimonDowdles comment on #paxdiablo answer.)
You can check duplicates in a given directory with GNU awk:
gawk 'BEGINFILE {if ((seen[tolower(FILENAME)]++)) print FILENAME; nextfile}' *
This uses BEGINFILE to perform some action before going on and reading a file. In this case, it keeps track of the names that have appeared in an array seen[] whose indexes are the names of the files in lowercase.
If a name has already appeared, no matter its case, it prints it. Otherwise, it just jumps to the next file.
See an example:
$ tree
.
├── bye.txt
├── hello.txt
├── helLo.txt
├── yeah.txt
└── YEAH.txt
0 directories, 5 files
$ gawk 'BEGINFILE {if ((a[tolower(FILENAME)]++)) print FILENAME; nextfile}' *
helLo.txt
YEAH.txt
I just used fdupes on CentOS to clean up a whole buncha duplicate files...
yum install fdupes

Resources