linux bash count size of files in folder - linux

i saw a few posts in forum but i cant manage to make them work for me
i have a script that runs in a folder and i want it to count the size only of the files in that folder but without the folders inside.
so if i have
file1.txt
folder1
file2.txt
it will return the size in bytes of file1+file2 without folder1
find . -maxdepth 1 -type f
gives me a list of all the files i want to count but how can i get the size of all this files?

The tool for this is xargs:
find "$dir" -maxdepth 1 -type f -print0 | xargs -0 wc -c
Note that find -print0 and xargs -0 are GNU extensions, but if you know they are available, they are well worth using in your script - you don't know what characters might be present in the filenames in the target directory.
You will need to post-process the output of wc; alternatively, use cat to give it a single input stream, like this:
find "$dir" -maxdepth 1 -type f -print0 | xargs -0 cat | wc -c
That gives you a single number you can use in following commands.
(I've assumed you meant "size" in bytes; obviously substitute wc -m if you meant characters or wc -l if you meant lines).

Related

UNIX: Use a single find command to search files larger than 4 MiB, then pipe the output to a sort command

I currently have a question I am trying to answer below. Below is what I have come up with, but doesn't appear to be working:
find /usr/bin -type f -size +4194304c | sort -n
Am I on the right track with the above?
Question:
Use a single find command to search for all files larger than 4 MiB in
/usr/bin, printing the listing in a long format. Pipe this output to a sort command
which will sort the list from largest to smallest
I'd fiddle with for -printf command line switch, sth like this:
find YOUR_CONDITION_HERE -printf '%s %p\n' | sort -n: %s stands for size in bytes, %p for file name.
You can trim the sizes later, e.g. using cut, e.g.:
find -type f -size +4194304c -printf '%s %p\n' | sort -n | cut -f 2 -d ' '
But given the fact you need the long list format, I guess you'll be adding more fields to printf's argument.
Related topic: https://superuser.com/questions/294161/unix-linux-find-and-sort-by-date-modified
You are on the right track, but the find command will only output the name of the file, not it's size. This is why sort will sort them alphabetically.
To sort by size, you can output the file list and then pass it to ls with xargs like this:
find /usr/bin -type f -size +4194304c | xargs ls -S
If you want ls to output the file list on a single column, you can replace the -S with -S1. The command would become:
find /usr/bin -type f -size +4194304c | xargs ls -S1
To make your command resistant to all filenames, I would suggest using -print0 (it will separate paths with the null character which is the only one that cannot appear in a filename in Linux). The command would become:
find /usr/bin -type f -size +4194304c -print0 | xargs -0 ls -S1
You could also try
find /usr/bin -type f -size +4194304c -ls | sort -n -k7
and if you want the results reversed then try
find /usr/bin -type f -size +4194304c -ls | sort -r -n -k7
Or another option
find /usr/bin -type f -size +4194304c -exec ls -lSd {} +

Count number of files in several folders with Unix command

I'd like to count the number of files in each folder. For one folder, I can do it with:
find ./folder1/ -type f | wc -l
I can repeat this command for every folder (folder2, folder3, ...), but I'd like to know if it is possible to get the information with one command. The output should look like this:
folder1 13
folder2 4
folder3 1254
folder4 327
folder5 2145
I can get the list of my folders with:
find . -maxdepth 1 -type d
which returns:
./folder1
./folder2
./folder3
./folder4
./folder5
Then, I thought about combining this command with the first one, but I don't know exactly how. Maybe with "-exec" or "xargs"?
Many thanks in advance.
A possible solution using xargs is to use the -I option, which replaces occurrences of replace-str (% in the code sample below) in the initial-arguments with names read from standard input:
find . -maxdepth 1 -type d -print0 | xargs -0 -I% sh -c 'echo -n "%: "; find "%" -type f | wc -l'
You also need to pass the find command to sh if you want to pipe it with wc, otherwise wc will count files in all directories.
Another solution (maybe less cryptic) is to use a one-liner for loop:
for d in */; do echo -n "$d: "; find "$d" -type f | wc -l; done

Delete files 100 at a time and count total files

I have written a bash script to delete 100 files at a time from a directory because i was getting args list too long error but now i want to count the total files that were deleted in total from the directory
Here is the script
echo /example-dir/* | xargs -n 100 rm -rf
What i want is to write the total deleted files from each directory into a file along with path for example Deleted <count> files from <path>
How can i achieve this with my current setup?
You can simply do this by enabling verbose output from rm and then simply count the output lines using wc -l
If you have whitespaces or special characters in the file names, using echo to pass the list of files to xargs will not work.
Better use find with -print0 to use a NULL character as a delimiter for the individual files:
find /example-dir -type f -print0 | xargs --null -n 100 rm -vrf | wc -l
You can avoid xargs and do this in a simple while loop and use a counter:
destdir='/example-dir/'
count=0
while IFS= read -d '' file; do
rm -rf "$file"
((count++))
done < <(find "$destdir" -type f -print0)
echo "Deleted $count files from $destdir"
Note use of -print0 to take care of file names with whitespaces/newlines/glob etc.
By the way, if you really have lots of files and you do this often, it might be useful to look at some other options:
Use find's built-in -delete
time find . -name \*.txt -print -delete | wc -l
30000
real 0m1.244s
user 0m0.055s
sys 0m1.037s
Use find's ability to build up maximal length argument list
time find . -name \*.txt -exec rm -v {} + | wc -l
30000
real 0m0.979s
user 0m0.043s
sys 0m0.920s
Use GNU Parallel's ability to build long argument lists
time find . -name \*.txt -print0 | parallel -0 -X rm -v | wc -l
30000
real 0m1.076s
user 0m1.090s
sys 0m1.223s
Use a single Perl process to read filenames and delete whilst counting
time find . -name \*.txt -print0 | perl -0ne 'unlink;$i++;END{print $i}'
30000
real 0m1.049s
user 0m0.057s
sys 0m1.006s
For testing, you can create 30,000 files really fast with GNU Parallel, which allows -X to also build up long argument lists. For example, I can create 30,000 files in 8 seconds on my Mac with:
seq -w 0 29999 | parallel -X touch file{}.txt

Bash script that writes subdirectories who has more than 5 files

while I was trying to practice my linux skills, but I could not solve this question.
So its basically saying "Write a bash script that takes a name of
directory as a command argument and printf the name of subdirectories
that has more than 5 files in it."
I thought we will use the find command but ı still could not figure it out. My code is:
find directory -type d -mindepth5
but it's not working.
You can use find twice:
First you can use find and wc to count the number of files in a given directory:
nb=$(find directory -maxdepth 1 -type f -printf "x\n" | wc -l)
This just asks find to output an x on a line for each file in the directory directory, proceeding non-recursively, then wc -l counts the number of lines, so, really, nb is the number of files in directory.
If you want to know whether a directory contains more than 5 files, it's a good idea to stop find as soon as 6 files are found:
nb=$(find directory -maxdepth 1 -type f -printf "x\n" | head -6 | wc -l)
Here nb has an upper threshold of 6.
Now if for each subdirectory of a directory directory you want to output the number of files (threshold at 6), you can do this:
find directory -type d -exec bash -c 'nb=$(find "$0" -maxdepth 1 -type f -printf "x\n" | head -6 | wc -l); echo "$nb"' {} \;
where the $0 that appears is the 0-th argument, namely {} that find will replaced by the subdirectory of directory.
Finally, you only want to display the subdirectory name if the number of files is more than 5:
find . -type d -exec bash -c 'nb=$(find "$0" -maxdepth 1 -type f -printf "x\n" | head -6 | wc -l); ((nb>5))' {} \; -print
The final test ((nb>5)) returns success or failure whether nb is greater than 5 or not, and in case of success, find will -print the subdirectory name.
This should do the trick:
find directory/ -type f | sed 's/\(.*\)\/.*/\1/g' | sort | uniq -c | sort -n | awk '{if($1>5) print($2)}'
Using mindpeth is useless here since it only lists directories at at least depth 5. You say you need subdirectories with more then 5 files in it.
find directory -type f prints all files in subdirectories
sed 's/\(.*\)\/.*/\1/g' removes names of files leaving only list of subdirecotries without filenames
sort sorts that list so we can use uniq
uniq -c merges duplicate lines and writes how many times it occured
sort -n sorts it by number of occurences (so you end up with a list:(how many times, subdirectory))
awk '{if($1>5) print($2)}' prints only those with first comlun 1 > 5 (and it only prints the second column)
So you end up with a list of subdirectories with at least 5 files inside.
EDIT:
A fix for paths with spaces was proposed:
Instead of awk '{if($1>5) print($2)}' there should be awk '{if($1>5){ $1=""; print(substr($0,2)) }}' which sets first part of line to "" and then prints whole line without a leading space (which was delimiter). So put together we get this:
find directory/ -type f | sed 's/\(.*\)\/.*/\1/g' | sort | uniq -c | sort -n | awk '{if($1>5){ $1=""; print(substr($0,2)) }}'

Linux: Find a List of Files in a Dictionary recursively

I have a Textfile with one Filename per row:
Interpret 1 - Song 1.mp3
Interpret 2 - Song 2.mp3
...
(About 200 Filenames)
Now I want to search a Folder recursivly for this Filenames to get the full path for each Filename in Filenames.txt.
How to do this? :)
(Purpose: Copied files to my MP3-Player but some of them are broken and i want to recopy them all without spending hours of researching them out of my music folder)
The easiest way may be the following:
cat orig_filenames.txt | while read file ; do find /dest/directory -name "$file" ; done > output_file_with_paths
Much faster way is run the find command only once and use fgrep.
find . -type f -print0 | fgrep -zFf ./file_with_filenames.txt | xargs -0 -J % cp % /path/to/destdir
You can use a while read loop along with find:
filecopy.sh
#!/bin/bash
while read line
do
find . -iname "$line" -exec cp '{}' /where/to/put/your/files \;
done < list_of_files.txt
Where list_of_files.txt is the list of files line by line, and /where/to/put/your/files is the location you want to copy to. You can just run it like so in the directory:
$ bash filecopy.sh
+1 for #jm666 answer, but the -J option doesn't work for my flavor of xargs, so i chaned it to:
find . -type f -print0 | fgrep -zFf ./file_with_filenames.txt | xargs -0 -I{} cp "{}" /path/to/destdir/

Resources