deleting common subdirectories

deleting common subdirectories - linux

I need a bash script that goes trough a text file, finds lines starting in "Common subdirectories: ", and rmdir -rf the two subdirectories. Example of line:
Common subdirectories: /dir1/dirA and /dir1/dirB
I'm quite new to bash scripting so any help would be great.

grep 'Common subdirectories: ' < in.txt |\
cut -d: -f2 | cut -d" " -f2,4 |\
while read a b
do
rm -rf "$a" "$b"
done
Edit; added quoting, use the same rm command for both

A more succinct version:
awk '/^Common subdirectories:/{ system("rm -rf "$3" "$5) }' input.txt

Here's a more complete example:
for F in `grep 'Common subdirectories' input.txt | cut -d: -f2 | awk 'BEGIN{RS=" "}{ print }' | tr -d ' '`
do
[ -d "$F" ] && rm -rf $F
done

A bit shorter command:
awk '/Common subdirectories:/ { print $3 " " $5 }' in.txt | xargs -n1 rm -rf

Related

Created directory with for loop in bash

I have these files. Imagine that each "test" represent the name of one server:
test10.txt
test11.txt
test12.txt
test13.txt
test14.txt
test15.txt
test16.txt
test17.txt
test18.txt
test19.txt
test1.txt
test20.txt
test21.txt
test22.txt
test23.txt
test24.txt
test25.txt
test26.txt
test27.txt
test28.txt
test29.txt
test2.txt
test30.txt
test31.txt
test32.txt
test33.txt
test34.txt
test35.txt
test36.txt
test37.txt
test38.txt
test39.txt
test3.txt
test40.txt
test4.txt
test5.txt
test6.txt
test7.txt
test8.txt
test9.txt
In each txt file, I have this type of data:
2019-10-14-00-00;/dev/hd1;1024.00;136.37;/
2019-10-14-00-00;/dev/hd2;5248.00;4230.53;/usr
2019-10-14-00-00;/dev/hd3;2560.00;481.66;/var
2019-10-14-00-00;/dev/hd4;3584.00;67.65;/tmp
2019-10-14-00-00;/dev/hd5;256.00;26.13;/home
2019-10-14-00-00;/dev/hd1;1024.00;476.04;/opt
2019-10-14-00-00;/dev/hd5;384.00;0.38;/usr/xxx
2019-10-14-00-00;/dev/hd4;256.00;21.39;/xxx
2019-10-14-00-00;/dev/hd2;512.00;216.84;/opt
2019-10-14-00-00;/dev/hd3;128.00;21.46;/var/
2019-10-14-00-00;/dev/hd8;256.00;75.21;/usr/
2019-10-14-00-00;/dev/hd7;384.00;186.87;/var/
2019-10-14-00-00;/dev/hd6;256.00;0.63;/var/
2019-10-14-00-00;/dev/hd1;128.00;0.37;/admin
2019-10-14-00-00;/dev/hd4;256.00;179.14;/opt/
2019-10-14-00-00;/dev/hd3;2176.00;492.93;/opt/
2019-10-14-00-00;/dev/hd1;256.00;114.83;/opt/
2019-10-14-00-00;/dev/hd9;256.00;41.73;/var/
2019-10-14-00-00;/dev/hd1;3200.00;954.28;/var/
2019-10-14-00-00;/dev/hd10;256.00;0.93;/var/
2019-10-14-00-00;/dev/hd10;64.00;1.33;/
2019-10-14-00-00;/dev/hd2;1664.00;501.64;/opt/
2019-10-14-00-00;/dev/hd4;256.00;112.32;/opt/
2019-10-14-00-00;/dev/hd9;2176.00;1223.1;/opt/
2019-10-14-00-00;/dev/hd11;22784.00;12325.8;/opt/
2019-10-14-00-00;/dev/hd12;256.00;2.36;/
2019-10-14-06-00;/dev/hd12;1024.00;137.18;/
2019-10-14-06-00;/dev/hd1;256.00;2.36;/
2019-10-14-00-00;/dev/hd1;1024.00;136.37;/
2019-10-14-00-00;/dev/hd2;5248.00;4230.53;/usr
2019-10-14-00-00;/dev/hd3;2560.00;481.66;/var
2019-10-14-00-00;/dev/hd4;3584.00;67.65;/tmp
2019-10-14-00-00;/dev/hd5;256.00;26.13;/home
2019-10-14-00-00;/dev/hd1;1024.00;476.04;/opt
2019-10-14-00-00;/dev/hd5;384.00;0.38;/usr/xxx
2019-10-14-00-00;/dev/hd4;256.00;21.39;/xxx
2019-10-14-00-00;/dev/hd2;512.00;216.84;/opt
2019-10-14-00-00;/dev/hd3;128.00;21.46;/var/
2019-10-14-00-00;/dev/hd8;256.00;75.21;/usr/
2019-10-14-00-00;/dev/hd7;384.00;186.87;/var/
2019-10-14-00-00;/dev/hd6;256.00;0.63;/var/
2019-10-14-00-00;/dev/hd1;128.00;0.37;/admin
2019-10-14-00-00;/dev/hd4;256.00;179.14;/opt/
2019-10-14-00-00;/dev/hd3;2176.00;492.93;/opt/
2019-10-14-00-00;/dev/hd1;256.00;114.83;/opt/
2019-10-14-00-00;/dev/hd9;256.00;41.73;/var/
2019-10-14-00-00;/dev/hd1;3200.00;954.28;/var/
2019-10-14-00-00;/dev/hd10;256.00;0.93;/var/
2019-10-14-00-00;/dev/hd10;64.00;1.33;/
2019-10-14-00-00;/dev/hd2;1664.00;501.64;/opt/
2019-10-14-00-00;/dev/hd4;256.00;112.32;/opt/
I would like to create a directory for each server, create in each directory a txt file for each FS and put in these txt files each lines which correspond to the FS.
For that, I've tried loop :
#!/bin/bash
directory=(ls *.txt | cut -d'.' -f1)
for d in $directory
do
if [ ! -d $d ]
then
mkdir $d
fi
done
for i in $(cat *.txt)
do
file=$(echo $i | awk -F';' '{print $2}' | sort | uniq | cut -d'/' -f3 )
data=$(echo $i | awk -F';' '{print $2}' )
echo $i | grep -w $data >> /xx/xx/xx/xx/xx/${directory/${file}.txt
done
But this loop doesn't work properly. The directories are created but not the file inside each directory.
I would like something like :
test1/hd1.txt ( with each line which for the hd1 fs in the hd1.txt)
And same thing for each server.
Can you show me how to do that?

#!/bin/bash
for src in *.txt; do
# start a subshell so we don't need to cd back afterwards
# make "$src" be stdin before cd, so we don't need full path
# be careful that in subshell only awk reads from stdin
(
# extract server name to use as directory
dir=/xx/xx/xx/xx/xx/"${src%.txt}"
# chain with "&&" so failures don't cause bad files
mkdir -p "$dir" &&
cd "$dir" &&
awk -F \; '{ split($2, dev, "/"); print > dev[3]".txt" }'
) < "$src"
done
The awk script reads lines delimited by semi-colons.
It splits the second field on slashes to extract the device name (assumption is that the devices always have form: /dev/name
Finally, the > sends output to the relevant file.
For reference, you can make your script work by doing directory=$(...); adding the prefix to mkdir (assuming the prefix directories already exist); closing the reference ${directory}; and quoting all variable references for safety:
#!/bin/bash
directory=$(ls *.txt | cut -d'.' -f1)
for d in "$directory"
do
if [ ! -d "$d" ]
then
mkdir /xx/xx/xx/xx/xx/"$d"
fi
done
for i in $(cat *.txt)
do
file=$(echo "$i" | awk -F';' '{print $2}' | sort | uniq | cut -d'/' -f3 )
data=$(echo $i | awk -F';' '{print $2}' )
echo "$i" | grep -w "$data" >> /xx/xx/xx/xx/xx/"${directory}"/"${file}".txt
done

for file in `ls *.txt`
do
echo ${file}
directory=`echo ${file} | cut -d'.' -f1`
#echo ${directory}
if [ ! -d ${directory} ]
then
mkdir ${directory}
fi
FS=`cat ${file} | awk -F';' '{print $2}' | sort | uniq | cut -d'/' -f3`
#echo $FS
for f in $FS
do
cat ${file} |grep -w -e $f > ${directory}/${f}.txt
done
done
Explanation:
For each file in the current directory, the outer for loop will run.
In the loop for the selected file, a respective directory will be created first.
Next using the FS variable we take all the possible file systems from that selected file.
Finally, an inner loop will be run using the FS types to grep and create separate file system files in the directory.

Looping a linux command with input and multiple pipes

This command works, but I want it run it on every document (input.txt) in every subdirectory.
tr -d '\n' < input.txt | awk '{gsub(/\. /,".\n");print}' | grep “\[" >> SingleOutput.txt
The code takes the file input and divides it into sentences with new lines. Then it finds all the sentences that contain a “[“ and outputs the sentences to a single file.
I tried several looping techniques with find and for loops, but couldn't get it to run in this example. I tried
for dir in ./*; do
(cd "$dir" && tr -d '\n' < $dir | awk '{gsub(/\. /,".\n");print}' | grep “\[" >> /home/dan/SingleOutput.txt);
done;
and also
find ./ -execdir tr -d '\n' < . | awk '{gsub(/\. /,".\n");print}' | grep "\[" >> /home/dan/SingleOutput.txt;
but they didn't work execute just giving me > marks. any ideas?

Try this:
cd $dir
find ./ | grep "input.txt$" | while read file; do tr -d '\n' < $file | awk '{gsub(/\. /,".\n");print}' | grep “\[" >> SingleOutput.txt; done
This will find all files called input.txt under $dir, the it will perform what you say it's already working send output to $dir/SingleOutput.txt.

Why not just something like this?
tr -d '\n' < */input.txt | awk '{gsub(/\. /,".\n");print}' | grep “\[" >> SingleOutput.txt
Or are you interested in keeping the output for each input.txt separate?

Rename multiple files with sed

How can i rename files with titles like Stargate SG-1 Season 01 Episode 01 to just "s01e01"? Variable numbering of course.
I already have something like this:
for file in *.mkv; do mv "$file" "$(echo "$file" | sed -e "REGEX HERE")
I just need the sed command that does what i need.
Thanks

No need for sed, try this:
#!/bin/bash
for f in *.mkv;
do
set -- $f
mv "$f" s${4}e${6}
done
in action:
$ ls
Stargate SG-1 Season 01 Episode 01.mkv
$ ./l.sh
$ ls
s01e01.mkv

GNU sed
for file in *.mkv; do mv "$file" "$(echo "$file" | sed -e 's/.*\(\S\+\)\s\+\S\+\s\(\S\+\)$/s\1e\2/')

Awk is also good for this
for file in *.mkv; do
mv "$file" $(awk '{print "s", $4, "e", $6}' <<<$file).mkv
done
I think that this is not a problem for sed :)

I would go this way to rename all *.mkv files:
ls *.mkv | awk '{print "mv \"" $0 "\" s" $4 "e" $6}' | sh
or
ls *.mkv | awk '{print "\"" $0 "\" s" $4 "e" $6}' | xargs mv

Merge two text files specific position

I need to merge two files with a Bash script.
File_1.txt
TEXT01 TEXT02 TEXT03 TEXT04
TEXT05 TEXT06 TEXT07 TEXT08
TEXT09 TEXT10 TEXT11 TEXT12
File_2.txt
1993.0
1994.0
1995.0
Result.txt
TEXT01 TEXT02 1993.0 TEXT03 TEXT04
TEXT05 TEXT06 1994.0 TEXT07 TEXT08
TEXT09 TEXT10 1995.0 TEXT11 TEXT12
File_2.txt need to be merged at this specific position. I have tried different solutions with multiple do while loops, but they have not been working so far..

awk '{
getline s3 < "file1"
printf "%s %s %s ",$1,$2,s3
for(i=3;i<=NF;i++){
printf "%s ",$i
}
print ""
}END{close(s3)}' file
output
# more file
TEXT01 TEXT02 TEXT03 TEXT04
TEXT05 TEXT06 TEXT07 TEXT08
TEXT09 TEXT10 TEXT11 TEXT12
$ more file1
1993.0
1994.0
1995.0
$ ./shell.sh
TEXT01 TEXT02 1993.0 TEXT03 TEXT04
TEXT05 TEXT06 1994.0 TEXT07 TEXT08
TEXT09 TEXT10 1995.0 TEXT11 TEXT12

Why, use cut and paste, of course! Give this a try:
paste -d" " <(cut -d" " -f 1-2 File_1.txt) File_2.txt <(cut -d" " -f 3-4 File_1.txt)

This was inspirated by Dennis Williamson's answer so if you like it give there a +1 too!
paste test1.txt test2.txt | awk '{print $1,$2,$5,$3,$4}'

This is a solution without awk.
The interesting is how to use the file descriptors in bash.
#!/bin/sh
exec 5<test2.txt # open file descriptor 5
cat test1.txt | while read ln
do
read ln2 <&5
#change this three lines as you wish:
echo -n "$(echo $ln | cut -d ' ' -f 1-2) "
echo -n "$ln2 "
echo $ln | cut -d ' ' -f 3-4
done
exec 5>&- # Close fd 5

Since the question was tagged with 'sed', here's a variant of Vereb's answer using sed instead of awk:
paste File_1.txt File_2.txt | sed -r 's/( [^ ]* [^ ]*)\t(.*)/ \2\1/'
Or in pure sed ... :D
sed -r '/ /{H;d};G;s/^([^\n]*)\n*([^ ]* [^ ]*)/\2 \1/;P;s/^[^\n]*\n//;x;d' File_1.txt File_2.txt

Using perl, give file1 and file2 as arguments to:
#/usr/local/bin/perl
open(TXT2, pop(#ARGV));
while (<>) {
chop($m = <TXT2>);
s/^((\w+\s+){2})/$1$m /;
print;
}

How to find duplicate files with same name but in different case that exist in same directory in Linux?

How can I return a list of files that are named duplicates i.e. have same name but in different case that exist in the same directory?
I don't care about the contents of the files. I just need to know the location and name of any files that have a duplicate of the same name.
Example duplicates:
/www/images/taxi.jpg
/www/images/Taxi.jpg
Ideally I need to search all files recursively from a base directory. In above example it was /www/

The other answer is great, but instead of the "rather monstrous" perl script i suggest
perl -pe 's!([^/]+)$!lc $1!e'
Which will lowercase just the filename part of the path.
Edit 1: In fact the entire problem can be solved with:
find . | perl -ne 's!([^/]+)$!lc $1!e; print if 1 == $seen{$_}++'
Edit 3: I found a solution using sed, sort and uniq that also will print out the duplicates, but it only works if there are no whitespaces in filenames:
find . |sed 's,\(.*\)/\(.*\)$,\1/\2\t\1/\L\2,'|sort|uniq -D -f 1|cut -f 1
Edit 2: And here is a longer script that will print out the names, it takes a list of paths on stdin, as given by find. Not so elegant, but still:
#!/usr/bin/perl -w
use strict;
use warnings;
my %dup_series_per_dir;
while (<>) {
my ($dir, $file) = m!(.*/)?([^/]+?)$!;
push #{$dup_series_per_dir{$dir||'./'}{lc $file}}, $file;
}
for my $dir (sort keys %dup_series_per_dir) {
my #all_dup_series_in_dir = grep { #{$_} > 1 } values %{$dup_series_per_dir{$dir}};
for my $one_dup_series (#all_dup_series_in_dir) {
print "$dir\{" . join(',', sort #{$one_dup_series}) . "}\n";
}
}

Try:
ls -1 | tr '[A-Z]' '[a-z]' | sort | uniq -c | grep -v " 1 "
Simple, really :-) Aren't pipelines wonderful beasts?
The ls -1 gives you the files one per line, the tr '[A-Z]' '[a-z]' converts all uppercase to lowercase, the sort sorts them (surprisingly enough), uniq -c removes subsequent occurrences of duplicate lines whilst giving you a count as well and, finally, the grep -v " 1 " strips out those lines where the count was one.
When I run this in a directory with one "duplicate" (I copied qq to qQ), I get:
2 qq
For the "this directory and every subdirectory" version, just replace ls -1 with find . or find DIRNAME if you want a specific directory starting point (DIRNAME is the directory name you want to use).
This returns (for me):
2 ./.gconf/system/gstreamer/0.10/audio/profiles/mp3
2 ./.gconf/system/gstreamer/0.10/audio/profiles/mp3/%gconf.xml
2 ./.gnome2/accels/blackjack
2 ./qq
which are caused by:
pax> ls -1d .gnome2/accels/[bB]* .gconf/system/gstreamer/0.10/audio/profiles/[mM]* [qQ]?
.gconf/system/gstreamer/0.10/audio/profiles/mp3
.gconf/system/gstreamer/0.10/audio/profiles/MP3
.gnome2/accels/blackjack
.gnome2/accels/Blackjack
qq
qQ
Update:
Actually, on further reflection, the tr will lowercase all components of the path so that both of
/a/b/c
/a/B/c
will be considered duplicates even though they're in different directories.
If you only want duplicates within a single directory to show as a match, you can use the (rather monstrous):
perl -ne '
chomp;
#flds = split (/\//);
$lstf = $f[-1];
$lstf =~ tr/A-Z/a-z/;
for ($i =0; $i ne $#flds; $i++) {
print "$f[$i]/";
};
print "$x\n";'
in place of:
tr '[A-Z]' '[a-z]'
What it does is to only lowercase the final portion of the pathname rather than the whole thing. In addition, if you only want regular files (no directories, FIFOs and so forth), use find -type f to restrict what's returned.

I believe
ls | sort -f | uniq -i -d
is simpler, faster, and will give the same result

Following up on the response of mpez0, to detect recursively just replace "ls" by "find .".
The only problem I see with this is that if this is a directory that is duplicating, then you have 1 entry for each files in this directory. Some human brain is required to treat the output of this.
But anyway, you're not automatically deleting these files, are you?
find . | sort -f | uniq -i -d

This is a nice little command line app called findsn you get if you compile fslint that the deb package does not include.
it will find any files with the same name, and its lightning fast and it can handle different case.
/findsn --help
find (files) with duplicate or conflicting names.
Usage: findsn [-A -c -C] [[-r] [-f] paths(s) ...]
If no arguments are supplied the $PATH is searched for any redundant
or conflicting files.
-A reports all aliases (soft and hard links) to files.
If no path(s) specified then the $PATH is searched.
If only path(s) specified then they are checked for duplicate named
files. You can qualify this with -C to ignore case in this search.
Qualifying with -c is more restrictive as only files (or directories)
in the same directory whose names differ only in case are reported.
I.E. -c will flag files & directories that will conflict if transfered
to a case insensitive file system. Note if -c or -C specified and
no path(s) specified the current directory is assumed.

Here is an example how to find all duplicate jar files:
find . -type f -printf "%f\n" -name "*.jar" | sort -f | uniq -i -d
Replace *.jar with whatever duplicate file type you are looking for.

Here's a script that worked for me ( I am not the author). the original and discussion can be found here:
http://www.daemonforums.org/showthread.php?t=4661
#! /bin/sh
# find duplicated files in directory tree
# comparing by file NAME, SIZE or MD5 checksum
# --------------------------------------------
# LICENSE(s): BSD / CDDL
# --------------------------------------------
# vermaden [AT] interia [DOT] pl
# http://strony.toya.net.pl/~vermaden/links.htm
__usage() {
echo "usage: $( basename ${0} ) OPTION DIRECTORY"
echo " OPTIONS: -n check by name (fast)"
echo " -s check by size (medium)"
echo " -m check by md5 (slow)"
echo " -N same as '-n' but with delete instructions printed"
echo " -S same as '-s' but with delete instructions printed"
echo " -M same as '-m' but with delete instructions printed"
echo " EXAMPLE: $( basename ${0} ) -s /mnt"
exit 1
}
__prefix() {
case $( id -u ) in
(0) PREFIX="rm -rf" ;;
(*) case $( uname ) in
(SunOS) PREFIX="pfexec rm -rf" ;;
(*) PREFIX="sudo rm -rf" ;;
esac
;;
esac
}
__crossplatform() {
case $( uname ) in
(FreeBSD)
MD5="md5 -r"
STAT="stat -f %z"
;;
(Linux)
MD5="md5sum"
STAT="stat -c %s"
;;
(SunOS)
echo "INFO: supported systems: FreeBSD Linux"
echo
echo "Porting to Solaris/OpenSolaris"
echo " -- provide values for MD5/STAT in '$( basename ${0} ):__crossplatform()'"
echo " -- use digest(1) instead for md5 sum calculation"
echo " $ digest -a md5 file"
echo " -- pfexec(1) is already used in '$( basename ${0} ):__prefix()'"
echo
exit 1
(*)
echo "INFO: supported systems: FreeBSD Linux"
exit 1
;;
esac
}
__md5() {
__crossplatform
:> ${DUPLICATES_FILE}
DATA=$( find "${1}" -type f -exec ${MD5} {} ';' | sort -n )
echo "${DATA}" \
| awk '{print $1}' \
| uniq -c \
| while read LINE
do
COUNT=$( echo ${LINE} | awk '{print $1}' )
[ ${COUNT} -eq 1 ] && continue
SUM=$( echo ${LINE} | awk '{print $2}' )
echo "${DATA}" | grep ${SUM} >> ${DUPLICATES_FILE}
done
echo "${DATA}" \
| awk '{print $1}' \
| sort -n \
| uniq -c \
| while read LINE
do
COUNT=$( echo ${LINE} | awk '{print $1}' )
[ ${COUNT} -eq 1 ] && continue
SUM=$( echo ${LINE} | awk '{print $2}' )
echo "count: ${COUNT} | md5: ${SUM}"
grep ${SUM} ${DUPLICATES_FILE} \
| cut -d ' ' -f 2-10000 2> /dev/null \
| while read LINE
do
if [ -n "${PREFIX}" ]
then
echo " ${PREFIX} \"${LINE}\""
else
echo " ${LINE}"
fi
done
echo
done
rm -rf ${DUPLICATES_FILE}
}
__size() {
__crossplatform
find "${1}" -type f -exec ${STAT} {} ';' \
| sort -n \
| uniq -c \
| while read LINE
do
COUNT=$( echo ${LINE} | awk '{print $1}' )
[ ${COUNT} -eq 1 ] && continue
SIZE=$( echo ${LINE} | awk '{print $2}' )
SIZE_KB=$( echo ${SIZE} / 1024 | bc )
echo "count: ${COUNT} | size: ${SIZE_KB}KB (${SIZE} bytes)"
if [ -n "${PREFIX}" ]
then
find ${1} -type f -size ${SIZE}c -exec echo " ${PREFIX} \"{}\"" ';'
else
# find ${1} -type f -size ${SIZE}c -exec echo " {} " ';' -exec du -h " {}" ';'
find ${1} -type f -size ${SIZE}c -exec echo " {} " ';'
fi
echo
done
}
__file() {
__crossplatform
find "${1}" -type f \
| xargs -n 1 basename 2> /dev/null \
| tr '[A-Z]' '[a-z]' \
| sort -n \
| uniq -c \
| sort -n -r \
| while read LINE
do
COUNT=$( echo ${LINE} | awk '{print $1}' )
[ ${COUNT} -eq 1 ] && break
FILE=$( echo ${LINE} | cut -d ' ' -f 2-10000 2> /dev/null )
echo "count: ${COUNT} | file: ${FILE}"
FILE=$( echo ${FILE} | sed -e s/'\['/'\\\['/g -e s/'\]'/'\\\]'/g )
if [ -n "${PREFIX}" ]
then
find ${1} -iname "${FILE}" -exec echo " ${PREFIX} \"{}\"" ';'
else
find ${1} -iname "${FILE}" -exec echo " {}" ';'
fi
echo
done
}
# main()
[ ${#} -ne 2 ] && __usage
[ ! -d "${2}" ] && __usage
DUPLICATES_FILE="/tmp/$( basename ${0} )_DUPLICATES_FILE.tmp"
case ${1} in
(-n) __file "${2}" ;;
(-m) __md5 "${2}" ;;
(-s) __size "${2}" ;;
(-N) __prefix; __file "${2}" ;;
(-M) __prefix; __md5 "${2}" ;;
(-S) __prefix; __size "${2}" ;;
(*) __usage ;;
esac
If the find command is not working for you, you may have to change it. For example
OLD : find "${1}" -type f | xargs -n 1 basename
NEW : find "${1}" -type f -printf "%f\n"

You can use:
find -type f -exec readlink -m {} \; | gawk 'BEGIN{FS="/";OFS="/"}{$NF=tolower($NF);print}' | uniq -c
Where:
find -type f
recursion print all file's full path.
-exec readlink -m {} \;
get file's absolute path
gawk 'BEGIN{FS="/";OFS="/"}{$NF=tolower($NF);print}'
replace the all filename's to lower case
uniq -c
unique the path, -c output the count of duplicate.

Little bit late to this one, but here's the version I went with:
find . -type f | awk -F/ '{print $NF}' | sort -f | uniq -i -d
Here we are using:
find - find all files under the current dir
awk - remove the file path part of the filename
sort - sort case insensitively
uniq - find the dupes from what makes it through the pipe
(Inspired by #mpez0 answer, and #SimonDowdles comment on #paxdiablo answer.)

You can check duplicates in a given directory with GNU awk:
gawk 'BEGINFILE {if ((seen[tolower(FILENAME)]++)) print FILENAME; nextfile}' *
This uses BEGINFILE to perform some action before going on and reading a file. In this case, it keeps track of the names that have appeared in an array seen[] whose indexes are the names of the files in lowercase.
If a name has already appeared, no matter its case, it prints it. Otherwise, it just jumps to the next file.
See an example:
$ tree
.
├── bye.txt
├── hello.txt
├── helLo.txt
├── yeah.txt
└── YEAH.txt
0 directories, 5 files
$ gawk 'BEGINFILE {if ((a[tolower(FILENAME)]++)) print FILENAME; nextfile}' *
helLo.txt
YEAH.txt

I just used fdupes on CentOS to clean up a whole buncha duplicate files...
yum install fdupes

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

deleting common subdirectories - linux

I need a bash script that goes trough a text file, finds lines starting in "Common subdirectories: ", and rmdir -rf the two subdirectories. Example of line: Common subdirectories: /dir1/dirA and /dir1/dirB I'm quite new to bash scripting so any help would be great.

grep 'Common subdirectories: ' < in.txt |\ cut -d: -f2 | cut -d" " -f2,4 |\ while read a b do rm -rf "$a" "$b" done Edit; added quoting, use the same rm command for both

A more succinct version: awk '/^Common subdirectories:/{ system("rm -rf "$3" "$5) }' input.txt

Here's a more complete example: for F in `grep 'Common subdirectories' input.txt | cut -d: -f2 | awk 'BEGIN{RS=" "}{ print }' | tr -d ' '` do [ -d "$F" ] && rm -rf $F done

A bit shorter command: awk '/Common subdirectories:/ { print $3 " " $5 }' in.txt | xargs -n1 rm -rf

Related

Created directory with for loop in bash

Looping a linux command with input and multiple pipes

Rename multiple files with sed

Merge two text files specific position

How to find duplicate files with same name but in different case that exist in same directory in Linux?

Categories

Resources