How to count all the lines of files - linux

I also need the directory name to be outputs as well. What I was able to do is to output the total number of lines in all directories with directory name.
find . -name '*.c' | xargs wc -l | xargs -I{} dirname {} | xargs -I{} dirname {}

I have jumbled up a mixture of bash commands mostly GNU-specific, make sure you have them, GNU grep and GNU Awk
find . -type f -print0 | xargs -0 grep -c ';$' | \
awk -F":" '$NF>0{cmd="dirname "$1; while ( ( cmd | getline result ) > 0 ) {printf "%s\t%s\n",result,$2} close(cmd) }'
The idea is grep -c returns the pattern count in format, file-name:count, which I am passing it to GNU Awk to filter those files whose count is greater than zero and print the directory of the file containing it and the count itself.
As a fancy one-liner as they call it these days,
find . -type f -print0 | xargs -0 grep -c ';$' | awk -F":" '$NF>0{cmd="dirname "$1; while ( ( cmd | getline result ) > 0 ) {printf "%s\t%s\n",result,$2} close(cmd) }'

Here is a script:
#!/usr/bin/env bash
for dir in */; do (
cd "$dir"
count=$(find . -name '*.c' -print0 | xargs -0 grep '[;]$' | wc -l)
echo -e "${count}\t${dir}"
) done
If you want numbers for each sub-directory:
#!/usr/bin/env bash
for dir in $(find . -type d); do (
cd "$dir"
count=$(find . -maxdepth 1 -name '*.c' -print0 | \
xargs -0 grep '[;]$' | wc -l)
echo -e "${count}\t${dir}"
) done
Using -maxdepth 1 makes sure the calculation is only done in the current directory, not its sub-directories. So each file is counted once.

Related

Bash command to recursively find directories with the newest file older than 3 days

I was wondering if there was a single command which would recursively find the directories in which their newest file in older than 3 days. Other solutions seem to only print the newest file in all subdirectories, I was wondering if there was a way to do it recursively and print all the subdirectories? I tried find -newermt "aug 27, 2022" -ls but this only gets me directories that have files younger than the date specified, not the youngest for each directory.
A long one-liner to sort files by date, get uniq directory names, list by modification keeping first
find ~/.config -type f -newermt "aug 29, 2022" -print0 | xargs -r0 ls -l --time-style=+%s | sort -r -k 6 | gawk '{ print $7}' | xargs -I {} dirname {} | sort | uniq | xargs -I {} bash -c "ls -lt --time-style=full-iso {} | head -n2" | grep -v 'total '
With comments
find ~/.config -type f -newermt "aug 29, 2022" -print0 |\
xargs -r0 ls -l --time-style=+%s | sort -r -k 6 |\ # newer files sorted by reverse date
gawk '{ print $7}' | xargs -I {} dirname {} |\ # get directory names
sort | uniq | \ # get uniq directory names
xargs -I {} bash -c "ls -lt --time-style=full-iso {} | head -n2" |\# list each directory by time, keep first
grep -v 'total '
If I'm understanding the requirements correctly, would you please try:
#!/bin/bash
find dir -type d -print0 | while IFS= read -r -d "" d; do # traverse "dir" recursively for subdirectories assigning "$d" to each directory name
if [[ -n $(find "$d" -maxdepth 1 -type f) \
&& -z $(find "$d" -maxdepth 1 -type f -mtime -3) ]]; then # if "$d" contains file(s) and does not contain files newer than 3 days
echo "$d" # then print the directory name "$d"
fi
done
A one liner version:
find dir -type d -print0 | while IFS= read -r -d "" d; do if [[ -n $(find "$d" -maxdepth 1 -type f) && -z $(find "$d" -maxdepth 1 -type f -mtime -3) ]]; then echo "$d"; fi; done
Please modify the top directory name dir according to your file location.

How can I delete files over (n) days old but leave (n) files regardless of age?

I wrote the following in PHP but I was wondering if there is an elegant way to do this in a Linux shell script? Basically delete files over (n) days old, but leave the (n) newest files regardless of age.
PHP
foreach (glob("backup/*.db") as $file) {
$a[$file]=date("Y-m-d",filemtime($file));
}
$i=0;
arsort($a);
foreach($a as $file=>$date) {
if ($i++>=10) {
if ($date<=date("Y-m-d",strtotime("-10 days"))) {
unlink($file);
xmessage("PURGED: $file");
}
}
}
My idea was to delete with "find -mtime +(n) exec rm" but only pipe in the files that are NOT in "head -n +(n)"? But "head -n" does not seem to do what I thought it would. Thanks.
SHELL SCRIPT
find -mtime +10 | ls -t *.DB.tar.gz | head -n -10
Try this using all GNU find, sort, awk, and xargs:
find . -type f -printf '%Ts %p\0' |
sort -k1,1nr -sz |
awk -v days=10 -v cnt=10 '
BEGIN { RS=ORS="\0"; secs=systime()-(days*24*60*60) }
(NR>cnt) && ($1>secs) { print gensub(/\S+\s+/,"",1) }
' |
xargs -0 ls --
Change ls to rm when you're done testing and sure it's giving you the expected output.
If you have a version of find that cannot use -printf (f.ex. busybox), you can use this:
find ${FOLDER} -type f -exec stat -c "%y %-25n" {} \; |
sort -n |
head -n -$(MIN_FILES_TO_KEEP) |
cut -d ' ' -f3- |
xargs -r -I FILE find FILE -mtime +$(MIN_DAYS_TO_KEEP) |
xargs -n1 rm
I had to refactor the command to use in docker-compose and this was the result. It is not the most elegant or efficient solution though.

Get filenames sorted by mtime

How to get file names sorted by the modification timestamp descending?
I should add that file names may potentially contain any special character except \0.
Here is what I got so far. The loop that gets file name and its mtime, however it is unsorted:
while IFS= read -r -d '' fname; do
read -r -d '' mtime
done < <(find . -maxdepth 3 -printf '%p\0%T#\0')
If you reorder your find printf, it becomes esay to sort:
find . -maxdepth 3 -printf '%T# :: %p\0'|\
sort -zrn |\
sed -z 's/^[0-9.]* :: //' |\
xargs -0 -n1 echo
The sed and xargs lines are just examples of stripping out the mtime and then doing something with the filenames.
For files within the same folder, this will do:
$ ls -t
If you want to cross a tree, one of these will do depending on your variant of Linux (stat command has different syntaxes):
$ find . -type f -exec stat -c '%Y %n' {} \; | sort -nr | cut -d' ' -f2-
Or:
$ find . -type f -exec stat -f '%m %N' {} \; | sort -nr | cut -d' ' -f2-
I hope this helps.
Assuming that you want a list of files and timestamps, ordered by timestamp:
while IFS=: read mtime fname ; do
echo "mtime = [$mtime] / fname = [$fname]"
done < <(find . -printf '%T#:%f\n' | sort -t:)
I've choosen a : as a delimiter as it is quite rare as a character for your filenames, being even prohibited in DOS/NTFS
With needs so stringent (filenames with : or \n as possible characters),
to get what you need, you can try:
while IFS= read -r -d '' mtime; do
read -r -d '' fname;
echo "[$mtime][$fname]";
done < <(find . -maxdepth 3 -printf '%T#\0%p\0' ) | sort -nr
Trying to solve the newlines embedded in the filenames:
while IFS= read -r -d '' mtime; do
read -r -d '' fname;
printf "[%s][%s]\0" "$mtime" "$fname";
done < <(find . -maxdepth 3 -printf '%T#\0%p\0' ) \
| sort -nrz | tr \\0 \\n
All you need is:
find . -maxdepth 3 -printf '%T#\t%p\0' | sort -zn
and if you wanted to get just the filename newline-terminated after that then pipe it to awk to remove the timestamp and tab plus replace NUL with newline:
find . -maxdepth 3 -printf '%T#\t%p\0' | sort -zn | awk -v RS='\0' '{sub(/^[^\t]+\t/,"")}1'

Linux bash sum of two numbers from two commands

I have two commands which both return numbers.
For example:
cat `find -name \*.cd -print` | wc -l
cat `find -name \*.c -print` | wc -l
Let's say that the first one returns 10, the other 5.
What would the command which return the sum of them, without changing these commands, look like?
I need something like this:
cat `find -name \*.cd -print` | wc -l + cat `find -name \*.c -print` | wc -l
and it should return 15 in this case.
How I can do that?
That command will execute yours two commands and print the sum of the results.
echo $(($(cat `find -name \*.cd -print` | wc -l) + $(cat `find -name \*.c -print` | wc -l)))
EDIT:
As #Karoly Horvath commented it would be more readable if it's not a oneliner:
cd_count=$(cat `find -name \*.cd -print` | wc -l)
c_count=$(cat `find -name \*.c -print` | wc -l)
echo $(($cd_count + $c_count))
It's better to combine the two searches:
cat $(find -regex '.*\.cd?$') | wc -l
or
find -regex '.*\.cd?$' | xargs cat | wc -l
or, if you filenames can contain spaces:
find -regex '.*\.cd?$' -print0 | xargs -0 cat | wc -l
$ expr $(echo 5) + $(echo 10)
15
Just replace the echo statements with your commands.

Finding the number of files in a directory for all directories in pwd

I am trying to list all directories and place its number of files next to it.
I can find the total number of files ls -lR | grep .*.mp3 | wc -l. But how can I get an output like this:
dir1 34
dir2 15
dir3 2
...
I don't mind writing to a text file or CSV to get this information if its not possible to get it on screen.
Thank you all for any help on this.
This seems to work assuming you are in a directory where some subdirectories may contain mp3 files. It omits the top level directory. It will list the directories in order by largest number of contained mp3 files.
find . -mindepth 2 -name \*.mp3 -print0| xargs -0 -n 1 dirname | sort | uniq -c | sort -r | awk '{print $2 "," $1}'
I updated this with print0 to handle filenames with spaces and other tricky characters and to print output suitable for CSV.
find . -type f -iname '*.mp3' -printf "%h\n" | uniq -c
Or, if order (dir-> count instead of count-> dir) is really important to you:
find . -type f -iname '*.mp3' -printf "%h\n" | uniq -c | awk '{print $2" "$1}'
There's probably much better ways, but this seems to work.
Put this in a shell script:
#!/bin/sh
for f in *
do
if [ -d "$f" ]
then
cd "$f"
c=`ls -l *.mp3 2>/dev/null | wc -l`
if test $c -gt 0
then
echo "$f $c"
fi
cd ..
fi
done
With Perl:
perl -MFile::Find -le'
find {
wanted => sub {
return unless /\.mp3$/i;
++$_{$File::Find::dir};
}
}, ".";
print "$_,$_{$_}" for
sort {
$_{$b} <=> $_{$a}
} keys %_;
'
Here's yet another way to even handle file names containing unusual (but legal) characters, such as newlines, ...:
# count .mp3 files (using GNU find)
find . -xdev -type f -iname "*.mp3" -print0 | tr -dc '\0' | wc -c
# list directories with number of .mp3 files
find "$(pwd -P)" -xdev -depth -type d -exec bash -c '
for ((i=1; i<=$#; i++ )); do
d="${#:i:1}"
mp3s="$(find "${d}" -xdev -type f -iname "*.mp3" -print0 | tr -dc "${0}" | wc -c )"
[[ $mp3s -gt 0 ]] && printf "%s\n" "${d}, ${mp3s// /}"
done
' "'\\0'" '{}' +

Resources