How get get xargs to split string to lines? - linux

Can someone explain why the below
$ echo $a
1 2 3
$ echo $a | xargs -n1 -I{} echo {}
1 2 3
isn't outputted as?
1
2
3
I would like to end up with
cp 1 1.old
cp 2 2.old
cp 3 3.old
and understand how -I works in xargs in the process.

Regarding your question-I and -n cannot work together but if you put it separately, it can work
a="1 2 3"; echo $a | xargs -n1 | xargs -I{} echo cp {} {}.old

You can use this printf | xargs:
a='1 2 3'
printf '%s\n' $a | xargs -I{} echo cp {} {}.old
Once satisfied with the output, you can remove echo before cp.
or else, without using printf, you can do this in xargs only:
xargs -I{} echo cp {} {}.old <<< "${a// /$'\n'}
cp 1 1.old
cp 2 2.old
cp 3 3.old
Here is a pure bash way of doing this:
for v in $a; do
echo cp "$v" "$v.old"
done

Related

I want to check if some given files contain more then 3 words from an input file in a shell script

My first parameter is the file that contains the given words and the rest are the other directories in which I'm searching for files, that contain at least 3 of the words from the 1st parameter
I can successfully print out the number of matching words, but when testing if it's greater then 3 it gives me the error: test: too many arguments
Here's my code:
#!/bin/bash
file=$1
shift 1
for i in $*
do
for j in `find $i`
do
if test -f "$j"
then
if test grep -o -w "`cat $file`" $j | wc -w -ge 3
then
echo $j
fi
fi
done
done
You first need to execute the grep | wc, and then compare that output with 3. You need to change your if statement for that. Since you are already using the backquotes, you cannot nest them, so you can use the other syntax $(command), which is equivalent to `command`:
if [ $(grep -o -w "`cat $file`" $j | wc -w) -ge 3 ]
then
echo $j
fi
I believe your problem is that you are trying to get the result of grep -o -w "cat $file" $j | wc -w to see if it's greater or equal to three, but your syntax is incorrect. Try this instead:
if test $(grep -o -w "`cat $file`" $j | wc -w) -ge 3
By putting the grep & wc commands inside the $(), the shell executes those commands and uses the output rather than the text of the commands themselves. Consider this:
> cat words
western
found
better
remember
> echo "cat words | wc -w"
cat words | wc -w
> echo $(cat words | wc -w)
4
> echo "cat words | wc -w gives you $(cat words | wc -w)"
cat words | wc -w gives you 4
>
Note that the $() syntax is equivalent to the double backtick notation you're already using for the cat $file command.
Hope this helps!
Your code can be refactored and corrected at few places.
Have it this way:
#!/bin/bash
input="$1"
shift
for dir; do
while IFS= read -r d '' file; do
if [[ $(grep -woFf "$input" "$file" | sort -u | wc -l) -ge 3 ]]; then
echo "$file"
fi
done < <(find "$dir" -type f -print0)
done
for dir loops through all the arguments
Use of sort -u is to remove duplicate words from output of grep.
Usewc -linstead ofwc -wsincegrep -o` prints matching words in separate lines.
find ... -print0 is to take care of file that may have whitespaces.
find ... -type f is to retrieve only files and avoid checking for -f later.

How to delete a directory based on date and time

The below script will be called from a cronjob and my desire is to remove directories older than 10 mins of the current time.
I'm having trouble with the following in my script:
Trimming \n from the beginning of a variable and not from the end (dumping the $oldDirday variable below to hexdump shows a leading \n
Adding 600 seconds to the current time
Getting if test to work properly
This is what I have:
#!/bin/sh
i=1
homeDirData="/home/user"$i"/dirToClean/"
while [ -d $homeDirData ]
do
echo "Checking user"$i""
for oldDir in $(find . -type d)
do
oldDirDay=$(ls -l $oldDir | awk '{print $7}')
oldDirTime=$(ls -l $oldDir | awk '{print $8}' | tr -d ':')
curDay=$(date +%d%t%R | awk '{print $1}')
curTime=$(date +%d%t%R | awk '{print $2}')
if [ $oldDirDay -lt $curDay ] && [ $oldDirTime -lt $(($curTime+600)) ]
then
find $homeDirData -type f -exec shred -f --remove {} \;
rm -rf $homeDirData
else
echo "Nothing to remove"
fi
((i+1))
done
done
Error
line 17: [: too many arguments (where the if statement is)
Issue
Nothing is being deleted
You can use something like this:
find $homeDirData -type d -mmin +10 -print0 | xargs -0 rmdir

File name printed twice when using wc

For printing number of lines in all ".txt" files of current folder, I am using following script:
for f in *.txt;
do l="$(wc -l "$f")";
echo "$f" has "$l" lines;
done
But in output I am getting:
lol.txt has 2 lol.txt lines
Why is lol.txt printed twice (especially after 2)? I guess there is some sort of stream flush required, but I dont know how to achieve that in this case.So what changes should i make in the script to get the output as :
lol.txt has 2 lines
You can remove the filename with 'cut':
for f in *.txt;
do l="$(wc -l "$f" | cut -f1 -d' ')";
echo "$f" has "$l" lines;
done
The filename is printed twice, because wc -l "$f" also prints the filename after the number of lines. Try changing it to cat "$f" | wc -l.
wc prints the filename, so you could just write the script as:
ls *.txt | while read f; do wc -l "$f"; done
or, if you really want the verbose output, try
ls *.txt | while read f; do wc -l "$f" | awk '{print $2, "has", $1, "lines"}'; done
There is a trick here. Get wc to read stdin and it won't print a file name:
for f in *.txt; do
l=$(wc -l < "$f")
echo "$f" has "$l" lines
done

Using -0 with xargs

I am trying to give an input to xargs that is NUL separated. To this effect I have this:
$ echo -n abc$'\000'def$'\000' | xargs -0 -L 1
I get
abcdef
I wonder why doesn't it print o/p as
abc
def
Your main problem is that you forgot -e:
$ echo -n abc$'\000'def$'\000' |cat -v
abcdef
No zero bytes are seen. But this:
$ echo -en abc'\000'def'\000' |cat -v
abc^#def^#
is more like it, the ^# is how cat -v shows a zero byte. And now for xargs:
$ echo -en abc'\000'def'\000' | xargs -0 -L 1
abc
def
Try help echo from your bash prompt.
Try treating the input as a single quoted string.
echo -ne "abc\0def\0" | xargs -0 -L 1

How to find duplicate files with same name but in different case that exist in same directory in Linux?

How can I return a list of files that are named duplicates i.e. have same name but in different case that exist in the same directory?
I don't care about the contents of the files. I just need to know the location and name of any files that have a duplicate of the same name.
Example duplicates:
/www/images/taxi.jpg
/www/images/Taxi.jpg
Ideally I need to search all files recursively from a base directory. In above example it was /www/
The other answer is great, but instead of the "rather monstrous" perl script i suggest
perl -pe 's!([^/]+)$!lc $1!e'
Which will lowercase just the filename part of the path.
Edit 1: In fact the entire problem can be solved with:
find . | perl -ne 's!([^/]+)$!lc $1!e; print if 1 == $seen{$_}++'
Edit 3: I found a solution using sed, sort and uniq that also will print out the duplicates, but it only works if there are no whitespaces in filenames:
find . |sed 's,\(.*\)/\(.*\)$,\1/\2\t\1/\L\2,'|sort|uniq -D -f 1|cut -f 1
Edit 2: And here is a longer script that will print out the names, it takes a list of paths on stdin, as given by find. Not so elegant, but still:
#!/usr/bin/perl -w
use strict;
use warnings;
my %dup_series_per_dir;
while (<>) {
my ($dir, $file) = m!(.*/)?([^/]+?)$!;
push #{$dup_series_per_dir{$dir||'./'}{lc $file}}, $file;
}
for my $dir (sort keys %dup_series_per_dir) {
my #all_dup_series_in_dir = grep { #{$_} > 1 } values %{$dup_series_per_dir{$dir}};
for my $one_dup_series (#all_dup_series_in_dir) {
print "$dir\{" . join(',', sort #{$one_dup_series}) . "}\n";
}
}
Try:
ls -1 | tr '[A-Z]' '[a-z]' | sort | uniq -c | grep -v " 1 "
Simple, really :-) Aren't pipelines wonderful beasts?
The ls -1 gives you the files one per line, the tr '[A-Z]' '[a-z]' converts all uppercase to lowercase, the sort sorts them (surprisingly enough), uniq -c removes subsequent occurrences of duplicate lines whilst giving you a count as well and, finally, the grep -v " 1 " strips out those lines where the count was one.
When I run this in a directory with one "duplicate" (I copied qq to qQ), I get:
2 qq
For the "this directory and every subdirectory" version, just replace ls -1 with find . or find DIRNAME if you want a specific directory starting point (DIRNAME is the directory name you want to use).
This returns (for me):
2 ./.gconf/system/gstreamer/0.10/audio/profiles/mp3
2 ./.gconf/system/gstreamer/0.10/audio/profiles/mp3/%gconf.xml
2 ./.gnome2/accels/blackjack
2 ./qq
which are caused by:
pax> ls -1d .gnome2/accels/[bB]* .gconf/system/gstreamer/0.10/audio/profiles/[mM]* [qQ]?
.gconf/system/gstreamer/0.10/audio/profiles/mp3
.gconf/system/gstreamer/0.10/audio/profiles/MP3
.gnome2/accels/blackjack
.gnome2/accels/Blackjack
qq
qQ
Update:
Actually, on further reflection, the tr will lowercase all components of the path so that both of
/a/b/c
/a/B/c
will be considered duplicates even though they're in different directories.
If you only want duplicates within a single directory to show as a match, you can use the (rather monstrous):
perl -ne '
chomp;
#flds = split (/\//);
$lstf = $f[-1];
$lstf =~ tr/A-Z/a-z/;
for ($i =0; $i ne $#flds; $i++) {
print "$f[$i]/";
};
print "$x\n";'
in place of:
tr '[A-Z]' '[a-z]'
What it does is to only lowercase the final portion of the pathname rather than the whole thing. In addition, if you only want regular files (no directories, FIFOs and so forth), use find -type f to restrict what's returned.
I believe
ls | sort -f | uniq -i -d
is simpler, faster, and will give the same result
Following up on the response of mpez0, to detect recursively just replace "ls" by "find .".
The only problem I see with this is that if this is a directory that is duplicating, then you have 1 entry for each files in this directory. Some human brain is required to treat the output of this.
But anyway, you're not automatically deleting these files, are you?
find . | sort -f | uniq -i -d
This is a nice little command line app called findsn you get if you compile fslint that the deb package does not include.
it will find any files with the same name, and its lightning fast and it can handle different case.
/findsn --help
find (files) with duplicate or conflicting names.
Usage: findsn [-A -c -C] [[-r] [-f] paths(s) ...]
If no arguments are supplied the $PATH is searched for any redundant
or conflicting files.
-A reports all aliases (soft and hard links) to files.
If no path(s) specified then the $PATH is searched.
If only path(s) specified then they are checked for duplicate named
files. You can qualify this with -C to ignore case in this search.
Qualifying with -c is more restrictive as only files (or directories)
in the same directory whose names differ only in case are reported.
I.E. -c will flag files & directories that will conflict if transfered
to a case insensitive file system. Note if -c or -C specified and
no path(s) specified the current directory is assumed.
Here is an example how to find all duplicate jar files:
find . -type f -printf "%f\n" -name "*.jar" | sort -f | uniq -i -d
Replace *.jar with whatever duplicate file type you are looking for.
Here's a script that worked for me ( I am not the author). the original and discussion can be found here:
http://www.daemonforums.org/showthread.php?t=4661
#! /bin/sh
# find duplicated files in directory tree
# comparing by file NAME, SIZE or MD5 checksum
# --------------------------------------------
# LICENSE(s): BSD / CDDL
# --------------------------------------------
# vermaden [AT] interia [DOT] pl
# http://strony.toya.net.pl/~vermaden/links.htm
__usage() {
echo "usage: $( basename ${0} ) OPTION DIRECTORY"
echo " OPTIONS: -n check by name (fast)"
echo " -s check by size (medium)"
echo " -m check by md5 (slow)"
echo " -N same as '-n' but with delete instructions printed"
echo " -S same as '-s' but with delete instructions printed"
echo " -M same as '-m' but with delete instructions printed"
echo " EXAMPLE: $( basename ${0} ) -s /mnt"
exit 1
}
__prefix() {
case $( id -u ) in
(0) PREFIX="rm -rf" ;;
(*) case $( uname ) in
(SunOS) PREFIX="pfexec rm -rf" ;;
(*) PREFIX="sudo rm -rf" ;;
esac
;;
esac
}
__crossplatform() {
case $( uname ) in
(FreeBSD)
MD5="md5 -r"
STAT="stat -f %z"
;;
(Linux)
MD5="md5sum"
STAT="stat -c %s"
;;
(SunOS)
echo "INFO: supported systems: FreeBSD Linux"
echo
echo "Porting to Solaris/OpenSolaris"
echo " -- provide values for MD5/STAT in '$( basename ${0} ):__crossplatform()'"
echo " -- use digest(1) instead for md5 sum calculation"
echo " $ digest -a md5 file"
echo " -- pfexec(1) is already used in '$( basename ${0} ):__prefix()'"
echo
exit 1
(*)
echo "INFO: supported systems: FreeBSD Linux"
exit 1
;;
esac
}
__md5() {
__crossplatform
:> ${DUPLICATES_FILE}
DATA=$( find "${1}" -type f -exec ${MD5} {} ';' | sort -n )
echo "${DATA}" \
| awk '{print $1}' \
| uniq -c \
| while read LINE
do
COUNT=$( echo ${LINE} | awk '{print $1}' )
[ ${COUNT} -eq 1 ] && continue
SUM=$( echo ${LINE} | awk '{print $2}' )
echo "${DATA}" | grep ${SUM} >> ${DUPLICATES_FILE}
done
echo "${DATA}" \
| awk '{print $1}' \
| sort -n \
| uniq -c \
| while read LINE
do
COUNT=$( echo ${LINE} | awk '{print $1}' )
[ ${COUNT} -eq 1 ] && continue
SUM=$( echo ${LINE} | awk '{print $2}' )
echo "count: ${COUNT} | md5: ${SUM}"
grep ${SUM} ${DUPLICATES_FILE} \
| cut -d ' ' -f 2-10000 2> /dev/null \
| while read LINE
do
if [ -n "${PREFIX}" ]
then
echo " ${PREFIX} \"${LINE}\""
else
echo " ${LINE}"
fi
done
echo
done
rm -rf ${DUPLICATES_FILE}
}
__size() {
__crossplatform
find "${1}" -type f -exec ${STAT} {} ';' \
| sort -n \
| uniq -c \
| while read LINE
do
COUNT=$( echo ${LINE} | awk '{print $1}' )
[ ${COUNT} -eq 1 ] && continue
SIZE=$( echo ${LINE} | awk '{print $2}' )
SIZE_KB=$( echo ${SIZE} / 1024 | bc )
echo "count: ${COUNT} | size: ${SIZE_KB}KB (${SIZE} bytes)"
if [ -n "${PREFIX}" ]
then
find ${1} -type f -size ${SIZE}c -exec echo " ${PREFIX} \"{}\"" ';'
else
# find ${1} -type f -size ${SIZE}c -exec echo " {} " ';' -exec du -h " {}" ';'
find ${1} -type f -size ${SIZE}c -exec echo " {} " ';'
fi
echo
done
}
__file() {
__crossplatform
find "${1}" -type f \
| xargs -n 1 basename 2> /dev/null \
| tr '[A-Z]' '[a-z]' \
| sort -n \
| uniq -c \
| sort -n -r \
| while read LINE
do
COUNT=$( echo ${LINE} | awk '{print $1}' )
[ ${COUNT} -eq 1 ] && break
FILE=$( echo ${LINE} | cut -d ' ' -f 2-10000 2> /dev/null )
echo "count: ${COUNT} | file: ${FILE}"
FILE=$( echo ${FILE} | sed -e s/'\['/'\\\['/g -e s/'\]'/'\\\]'/g )
if [ -n "${PREFIX}" ]
then
find ${1} -iname "${FILE}" -exec echo " ${PREFIX} \"{}\"" ';'
else
find ${1} -iname "${FILE}" -exec echo " {}" ';'
fi
echo
done
}
# main()
[ ${#} -ne 2 ] && __usage
[ ! -d "${2}" ] && __usage
DUPLICATES_FILE="/tmp/$( basename ${0} )_DUPLICATES_FILE.tmp"
case ${1} in
(-n) __file "${2}" ;;
(-m) __md5 "${2}" ;;
(-s) __size "${2}" ;;
(-N) __prefix; __file "${2}" ;;
(-M) __prefix; __md5 "${2}" ;;
(-S) __prefix; __size "${2}" ;;
(*) __usage ;;
esac
If the find command is not working for you, you may have to change it. For example
OLD : find "${1}" -type f | xargs -n 1 basename
NEW : find "${1}" -type f -printf "%f\n"
You can use:
find -type f -exec readlink -m {} \; | gawk 'BEGIN{FS="/";OFS="/"}{$NF=tolower($NF);print}' | uniq -c
Where:
find -type f
recursion print all file's full path.
-exec readlink -m {} \;
get file's absolute path
gawk 'BEGIN{FS="/";OFS="/"}{$NF=tolower($NF);print}'
replace the all filename's to lower case
uniq -c
unique the path, -c output the count of duplicate.
Little bit late to this one, but here's the version I went with:
find . -type f | awk -F/ '{print $NF}' | sort -f | uniq -i -d
Here we are using:
find - find all files under the current dir
awk - remove the file path part of the filename
sort - sort case insensitively
uniq - find the dupes from what makes it through the pipe
(Inspired by #mpez0 answer, and #SimonDowdles comment on #paxdiablo answer.)
You can check duplicates in a given directory with GNU awk:
gawk 'BEGINFILE {if ((seen[tolower(FILENAME)]++)) print FILENAME; nextfile}' *
This uses BEGINFILE to perform some action before going on and reading a file. In this case, it keeps track of the names that have appeared in an array seen[] whose indexes are the names of the files in lowercase.
If a name has already appeared, no matter its case, it prints it. Otherwise, it just jumps to the next file.
See an example:
$ tree
.
├── bye.txt
├── hello.txt
├── helLo.txt
├── yeah.txt
└── YEAH.txt
0 directories, 5 files
$ gawk 'BEGINFILE {if ((a[tolower(FILENAME)]++)) print FILENAME; nextfile}' *
helLo.txt
YEAH.txt
I just used fdupes on CentOS to clean up a whole buncha duplicate files...
yum install fdupes

Resources