I have 400 folders with several files inside, I am interested in:
counting how many files with the extension .solution are in each folder, and
then output only those folder have less than 440 elements
The point 1) is easy to get with the command:
for folder in $(ls -d */ | grep "sol_cv_");
do
a=$(ls -1 "$folder"/*.solution | wc -l);
echo $folder has "${a}" files;
done
But is there any easy way to filter only the files with less than 440 elements?
This simple script could work for you:-
#!/bin/bash
MAX=440
for folder in sol_cv_*; do
COUNT=$(find "$folder" -type f -name "*.solution" | wc -l)
((COUNT < MAX)) && echo "$folder"
done
The script below
counterfun(){
count=$(find "$1" -maxdepth 1 -type f -iname "*.solution" | wc -l)
(( count < 440 )) && echo "$1"
}
export -f counterfun
find /YOUR/BASE/FOLDER/ -maxdepth 1 -type d -iname "sol_cv_*" -exec bash -c 'counterfun "$1"' _ {} \;
#maxdepth 1 in both find above as you've confirmed no sub-folders
should do it
Avoid parsing ls command and use printf '%q\n for counting files:
for folder in *sol_cv_*/; do
# if there are less than 440 elements then skip
(( $(printf '%q\n' "$folder"/* | wc -l) < 440 )) && continue
# otherwise print the count using safer printf '%q\n'
echo "$folder has $(printf '%q\n' "$folder"*.solution | wc -l) files"
done
Related
I have a directory with many subdirs and about 7000+ files in total. What I need to find all duplicates of all files. For any given file, its duplicates might be scattered around various subdirs and may or may not have the same file name. A duplicate is a file that you get a 0 return code from the diff command.
The simplest thing to do is to run a double loop over all the files in the directory tree. But that's 7000^2 sequential diffs and not very efficient:
for f in `find /path/to/root/folder -type f`
do
for g in `find /path/to/root/folder -type f`
do
if [ "$f" = "$g" ]
then
continue
fi
diff "$f" "$g" > /dev/null
if [ $? -eq 0 ]
then
echo "$f" MATCHES "$g"
fi
done
done
Is there a more efficient way to do it?
On Debian 11:
% mkdir files; (cd files; echo "one" > 1; echo "two" > 2a; cp 2a 2b)
% find files/ -type f -print0 | xargs -0 md5sum | tee listing.txt | \
awk '{print $1}' | sort | uniq -c | awk '$1>1 {print $2}' > dups.txt
% grep -f dups.txt listing.txt
c193497a1a06b2c72230e6146ff47080 files/2a
c193497a1a06b2c72230e6146ff47080 files/2b
Find and print all files null terminated (-print0).
Use xargs to md5sum them.
Save a copy of the sums and filenames in "listing.txt" file.
Grab the sum and pass to sort then uniq -c to count, saving into the "dups.txt" file.
Use awk to list duplicates, then grep to find the sum and filename.
stat part:
$ find * -depth -exec stat --format '%n %U %G' {} + | sort -d > acl_file
$ cat acl_file
xfce4/desktop/icons screen0-3824x1033.rc john john
Code/CachedData/f30a9b73e8ffc278e71575118b6bf568f04587c8/index-ec362010a4d520491a88088c200c853d.code john john
VirtualBox/selectorwindow.log.6 john john
md5sum part:
$ find * -depth -exec md5sum {} + | sort -d > md5_file
$ cat md5_file
3da180c2d9d1104a17db0749d527aa4b xfce4/desktop/icons screen0-3824x1033.rc
3de44d64a6ce81c63f9072c0517ed3b9 Code/CachedData/f30a9b73e8ffc278e71575118b6bf568f04587c8/index-ec362010a4d520491a88088c200c853d.code
3f85bb5b59bcd13b4fc63d5947e51294 VirtualBox/selectorwindow.log.6
How to combine stat --format '%n %U %G' and md5sum and output to file line by line,such as:
3da180c2d9d1104a17db0749d527aa4b xfce4/desktop/icons screen0-3824x1033.rc john john
3de44d64a6ce81c63f9072c0517ed3b9 Code/CachedData/f30a9b73e8ffc278e71575118b6bf568f04587c8/index-ec362010a4d520491a88088c200c853d.code john john
3f85bb5b59bcd13b4fc63d5947e51294 VirtualBox/selectorwindow.log.6 john john
This is really just a minor variation on #Zilog80's solution. My time testing had it a few seconds faster by skipping reads on a smallish dataset of a few hundred files running on a windows laptop under git bash. YMMV.
mapfile -t lst< <( find . -type f -exec md5sum "{}" \; -exec stat --format '%U %G' "{}" \; )
for ((i=0; i < ${#lst[#]}; i++)); do if (( i%2 )); then echo "${lst[i]}"; else printf "%s " "${lst[i]}"; fi done | sort -d
edit
My original solution was pretty broken. It was skipping files in hidden subdirectories, and the printf botched filenames with spaces. If you don't have hidden directories to deal with, or if you want to skip those (e.g., you're working in a git repo and would rather skip the .git tree...), here's a rework.
shopt -s dotglob # check hidden files
shopt -s globstar # process at arbitrary depth
for f in **/*; do # this properly handles odd names
[[ -f "$f" ]] && echo "$(md5sum "$f") $(stat --format "%U %G" "$f")"
done | sort -d
The quickest way should be :
find * -type f -exec stat --format '%n %U %G' "{}" \; -exec md5sum "{}" \; |
{ while read -r line1 && read -r line2; do printf "%s %s\n" "${line2/ */}" "${line1}";done; } |
sort -d
We use two -exec to apply stat and md5sum file by file, then we read both output lines and use printf to format one output line by file with both the output of stat/ md5sum. We finally pipe the whole output to sort.
Warning: As we pipe the whole output to sort, you may to wait that all the stat/md5sum had been done before getting any output on a console.
And if only md5sum and not stat fails on a file (or vice versa), the output will be trashed.
Edit: A way a little safer for the output :
find * -type f -exec md5sum "{}" \; -exec stat --format '%n %U %G' "{}" \; |
{ while read -r line; do
mdsum="${line/[0-9a-f]* /}";
[ "${mdsum}" != "${line}" ] &&
{ mdsumdisp="${line% ${mdsum}}"; mdsumfile="${mdsum}"; } ||
{ [ "${line#${mdsumfile}}" != "${line}" ] &&
printf "%s %s\n" "${mdsumdisp}" "${line}"; };
done; } | sort -d
Here, at least, we check we have something like a md5sum on the expected line matching the file in the line.
Ubuntu 18.04 LTS with bash 4.4.20
I am trying to count the number of files in each directory starting in the directory where I executed the script. Borrowing from other coders, I found this script and modified it. I am trying to modify it to provide a total at the end, but I can't seem to get it. Also, the script is running the same count function twice each loop and that is inefficient. I inserted that extra find command because I could not get the results of the nested 'find | wc -l' to store in a variable. And it still didn't work.
Thanks!
#!/bin/bash
count=0
find . -maxdepth 1 -mindepth 1 -type d | sort -n | while read dir; do
printf "%-25.25s : " "$dir"
find "$dir" -type f | wc -l
filesthisdir=$(find "$dir" -type f | wc -l)
count=$count+$filesthisdir
done
echo "Total files : $count"
Here are the results. It should total up the results. Otherwise, this would work well.
./1800wls1 : 1086
./1800wls2 : 1154
./1900wls-in1 : 780
./1900wls-in2 : 395
./1900wls-in3 : 0
./1900wls-out1 : 8
./1900wls-out2 : 304
./1900wls-out3 : 160
./test : 0
Total files : 0
This doesn't work because the while loop is executed in a sub shell. By using <<< you make sure it's executed in the current shell.
#!/bin/bash
count=0
while read dir; do
printf "%-25.25s : " "$dir"
find "$dir" -type f | wc -l
filesthisdir=$(find "$dir" -type f | wc -l)
((count+=filesthisdir))
done <<< "$(find . -maxdepth 1 -mindepth 1 -type d | sort -n)"
echo "Total files : $count"
Of course you also can make use of a for loop:
for i in "$(find . -maxdepth 1 -mindepth 1 -type d | sort -n)"; do
# do something
done
Use (( count += filesthisdir)) and think about counting files with newlines.
You should change your find command:
filesthisdir=$(find "$dir" -type f -exec echo . \;| wc -l)
I have a 1000s of files in a directory with and I want to be able to divide them into sub-directories, with each sub-directory containing a specific number of files. I don't care what files go into what directories, just as long as each contain a specific number. All the file names have a common ending (e.g. .txt) but what goes before varies.
Anyone know an easy way to do this.
Assuming you only have files ending in *.txt, no hidden files and no directories:
#!/bin/bash
shopt -s nullglob
maxf=42
files=( *.txt )
for ((i=0;maxf*i<${#files[#]};++i)); do
s=subdir$i
mkdir -p "$s"
mv -t "$s" -- "${files[#]:i*maxf:maxf}"
done
This will create directories subdirX with X an integer starting from 0, and will put 42 files in each directory.
You can tweak the thing to have padded zeroes for X:
#!/bin/bash
shopt -s nullglob
files=( *.txt )
maxf=42
((l=${#files[#]}/maxf))
p=${#l}
for ((i=0;maxf*i<${#files[#]};++i)); do
printf -v s "subdir%0${p}d" "$i"
mkdir -p "$s"
mv -t "$s" -- "${files[#]:i*maxf:maxf}"
done
max_per_subdir=1000
start=1
while [ -e $(printf %03d $start) ]; do
start=$((start + 1))
done
find -maxdepth 1 -type f ! -name '.*' -name '*.txt' -print0 \
| xargs -0 -n $max_per_subdir echo \
| while read -a files; do
subdir=$(printf %03d $start)
mkdir $subdir || exit 1
mv "${files[#]}" $subdir/ || exit 1
start=$((start + 1))
done
How about
find *.txt -print0 | xargs -0 -n 100 | xargs -I {} echo cp {} '$(md5sum <<< "{}")' | sh
This will create several directories each containing 100 files. The name of each created directory is a md5 hash of the filenames it contains.
I am trying to list all directories and place its number of files next to it.
I can find the total number of files ls -lR | grep .*.mp3 | wc -l. But how can I get an output like this:
dir1 34
dir2 15
dir3 2
...
I don't mind writing to a text file or CSV to get this information if its not possible to get it on screen.
Thank you all for any help on this.
This seems to work assuming you are in a directory where some subdirectories may contain mp3 files. It omits the top level directory. It will list the directories in order by largest number of contained mp3 files.
find . -mindepth 2 -name \*.mp3 -print0| xargs -0 -n 1 dirname | sort | uniq -c | sort -r | awk '{print $2 "," $1}'
I updated this with print0 to handle filenames with spaces and other tricky characters and to print output suitable for CSV.
find . -type f -iname '*.mp3' -printf "%h\n" | uniq -c
Or, if order (dir-> count instead of count-> dir) is really important to you:
find . -type f -iname '*.mp3' -printf "%h\n" | uniq -c | awk '{print $2" "$1}'
There's probably much better ways, but this seems to work.
Put this in a shell script:
#!/bin/sh
for f in *
do
if [ -d "$f" ]
then
cd "$f"
c=`ls -l *.mp3 2>/dev/null | wc -l`
if test $c -gt 0
then
echo "$f $c"
fi
cd ..
fi
done
With Perl:
perl -MFile::Find -le'
find {
wanted => sub {
return unless /\.mp3$/i;
++$_{$File::Find::dir};
}
}, ".";
print "$_,$_{$_}" for
sort {
$_{$b} <=> $_{$a}
} keys %_;
'
Here's yet another way to even handle file names containing unusual (but legal) characters, such as newlines, ...:
# count .mp3 files (using GNU find)
find . -xdev -type f -iname "*.mp3" -print0 | tr -dc '\0' | wc -c
# list directories with number of .mp3 files
find "$(pwd -P)" -xdev -depth -type d -exec bash -c '
for ((i=1; i<=$#; i++ )); do
d="${#:i:1}"
mp3s="$(find "${d}" -xdev -type f -iname "*.mp3" -print0 | tr -dc "${0}" | wc -c )"
[[ $mp3s -gt 0 ]] && printf "%s\n" "${d}, ${mp3s// /}"
done
' "'\\0'" '{}' +