Counting number of files in multiple subdirectories from command line - linux

I have a directory which contains a large number of subdirectories. Each subdirectory is named something like "treedir_xxx" where xxx is a number. I would like to run a command (preferably from the command line as I have no experience with batch scripts) that will count the number of files in each subdirectory named 'treedir_xxx' and write these numbers to a text file. I feel this should not be very difficult but so far I have been unsuccessful.
I have tried things like find *treedir* -maxdepth 1 -type f | wc -l however this just returns the total number of files rather than the number of files in each individual folder.

Instead of using find, use a for loop. I am assuming that you are using bash or similar since that is the most common shell on most of the modern Linux distros:
for i in treedir_*; do ls "$i" | wc -l; done
Given the following structure:
treedir_001
|__ a
|__ b
|__ c
treedir_002
|__ d
|__ e
treedir_003
|__ f
The result is:
3
2
1
You can get fancy and print whatever you want around the numbers:
for i in treedir_*; do echo $i: $(ls "$i" | wc -l); done
gives
treedir_001: 3
treedir_002: 2
treedir_003: 1
This uses $(...) to get the output of a command as a string and pass it to echo, which can then print everything on one line.
for i in treedir_*; do echo $i; ls "$i" | wc -l; done
gives
treedir_001
3
treedir_002
2
treedir_003
1
This one illustrates the use of multiple commands in a single loop.
for can be redirected to a file or piped just like any other command, so you can do
for i in treedir_*; do ls "$i" | wc -l; done > list.txt
or better yet
for i in treedir_*; do ls "$i" | wc -l; done | tee list.txt
The second version sends the output to the program tee, which prints it to standard output and also redirects it to a file. This is sometimes nicer for debugging than a simple redirect with >.
find is a powerful hammer, but not everything is a nail...

Related

Total number of lines in a directory

I have a directory with thousands of files (100K for now). When I use wc -l ./*, I'll get:
c1 ./test1.txt
c2 ./test2.txt
...
cn ./testn.txt
c1+c2+...+cn total
Because there are a lot of files in the directory, I just want to see the total count and not the details. Is there any way to do so?
I tried several ways and I got following error:
Argument list too long
If what you want is the total number of lines and nothing else, then I would suggest the following command:
cat * | wc -l
This catenates the contents of all of the files in the current working directory and pipes the resulting blob of text through wc -l.
I find this to be quite elegant. Note that the command produces no extraneous output.
UPDATE:
I didn't realize your directory contained so many files. In light of this information, you should try this command:
for file in *; do cat "$file"; done | wc -l
Most people don't know that you can pipe the output of a for loop directly into another command.
Beware that this could be very slow. If you have 100,000 or so files, my guess would be around 10 minutes. This is a wild guess because it depends on several parameters that I'm not able to check.
If you need something faster, you should write your own utility in C. You could make it surprisingly fast if you use pthreads.
Hope that helps.
LAST NOTE:
If you're interested in building a custom utility, I could help you code one up. It would be a good exercise, and others might find it useful.
Credit: this builds on #lifecrisis's answer, and extends it to handle large numbers of files:
find . -maxdepth 1 -type f -exec cat {} + | wc -l
find will find all of the files in the current directory, break them into groups as large as can be passed as arguments, and run cat on the groups.
awk 'END {print NR" total"}' ./*
Would be an interesting comparison to find out how many lines don't end with a new line.
Combining the awk and Gordon’s find solutions and avoiding the "." files.
find ./* -maxdepth 0 -type f -exec awk 'END {print NR}' {} +
No idea if this is better or worse but it does give a more accurate count (for me) and does not count lines in "." files. Using ./* is just a guess that appears to work.
Still need depth and ./* requires "0" depth.
I did get the same result with the "cat" and "awk" solutions (using the same find) since the "cat *" takes care of the new line issue. I don't have a directory with enough files to measure time. Interesting, I'm liking the "cat" solution.
This will give you the total count for all the files (including hidden files) in your current directory :
$ find . -maxdepth 1 -type f | xargs wc -l | grep total
1052 total
To count for files excluding hidden files use :
find . -maxdepth 1 -type f -not -path "*/\.*" | xargs wc -l | grep total
(Apologies for adding this as an answer—but I do not have enough reputation for commenting.)
A comment on #lifecrisis's answer. Perhaps cat is slowing things down a bit. We could replace cat by wc -l and then use awkto add the numbers. (This could be faster since much less data needs to go throught the pipe.)
That is
for file in *; do wc -l "$file"; done | awk '{sum += $1} END {print sum}'
instead of
for file in *; do cat "$file"; done | wc -l
(Disclaimer: I am not incorporating many of the improvements in other answers, but I thought the point was valid enough to write down.)
Here are my results for comparison (I ran the newer version first so that any cache effects would go against the newer candidate).
$ time for f in `seq 1 1500`; do head -c 5M </dev/urandom >myfile-$f |sed -e 's/\(................\)/\1\n/g'; done
real 0m50.360s
user 0m4.040s
sys 0m49.489s
$ time for file in myfile-*; do wc -l "$file"; done | awk '{sum += $1} END {print sum}'
30714902
real 0m3.455s
user 0m2.093s
sys 0m1.515s
$ time for file in myfile-*; do cat "$file"; done | wc -l
30714902
real 0m4.481s
user 0m2.544s
sys 0m4.312s
iF you want to know only total number Lines in directory excluding total line
ls -ltr | sed -n '/total/!p' | awk '{print NR}'
Previous comment will give total count of lines which includes only count of lines in all files
Below command will provide the total count of lines from all files in path
for i in `ls- ltr | awk ‘$1~”^-rw”{print $9}’`; do wc -l $I | awk ‘{print $1}’; done >>/var/tmp/filelinescount.txt
Cat /var/tmp/filelinescount.txt| sed -r “s/\s+//g”|tr “\n” “+”| sed “s:+$::g”| sed ’s/^/“/g’| sed ’s/$/“/g’ | awk ‘{print “echo” “ “ $0”+bc”}’| sh

Printing the number of lines

I have a directory that contains only .txt files. I want to print the number of lines for every file. When I write cat file.txt | wc -l the number of lines appears but when I want to make a script it's more complicated. I have this code:
for fis in `ls -R $1`
do
echo `cat $fis | wc -l`
done
I tried: wc -l $fis , with awk,grep and it doesn't work. It tells that:
cat: fis1: No such file or directory
0
How can I do to print the number of lines?
To find files recursively in subdirectories, use the find command, not ls -R, which is mainly intended for human reading.
find "$1" -type f -exec wc -l {} +
The problems with looping over the output of ls -R are:
Filenames with whitespace won't be parsed correctly.
It prints other output beside just the filenames.
Not the problem here, but the echo command is more than needed:
You can use
wc -l "${fis}"
What goes wrong?
You have a subdir called fis1. Look to the output of ls:
# ls -R fis1
fis1:
file1_in_fis1.txt
When you are parsing this output, your script will try
echo `cat fis1: | wc -l`
The cat will tell you No such file or directory and wc counts 0.
As #Barmar explained, ls prints additional output you do not want.
Do not try to patch your attempt by | grep .txt and if [ -f "${fis}"]; then .., these will fail with filename with spaces.txt. So use find or shopt (and accept the answer of #Barmar or #Cyrus).

Modify ls output to display [+] in front of directories

I am looking for a way to modify the ls output in that way that every directory displays [+] in front of the directory name. Ideally doing via bashrc.
me#computer[~]$ ls
[+]directory [+]directory
[+]directory file.png
file file.txt
readme
Currently I am just customizing the color output:
LS_COLORS=$LS_COLORS:'di=1;37;4' ; export LS_COLORS
This might help you, but it gives you only one column output:
ls | sed -r "$(find -maxdepth 1 -type d | cut -d/ -f2 | sed "1 d; 2~1 { s:.*:s/^\\(&\\)$/[+]\\\\1/;:g}")"
It works by piping the output of ls through sed and the sed script is dynamically build using a pipe that converts a list of directories to a list of S/^dirname$/[+]dirname/; sed script lines.
Just try out all the parts individually to see how it works.
For example when run in /etc the outputs starts likes this:
[+]acpi
adduser.conf
[+]adobe
[+]akonadi
aliases
aliases.db
You might want to alias the command in your bashrc.
And you might want to look into the tree command.
You can use :
ls -l : directories will start with d.
ls -p : a slash will be added into directory name like dir/
ls -F : will also add a slash after dir names and other marks to other file types (*, etc)
ls -d */ : As advised in comments, will list only dir names with a slash at the end. Remove -d to see also sub dir contents.
In terms of manipulating ls output you could go like :
ls -l |awk '/^d/{print "[+]"$NF}; /^[^d]/{print $NF}' |column
You can also use find and avoid parsing ls since had been said that parsing ls might break if file names contain strange chars like new lines etc.
find in this format will produce output identical to above ls:
find . -maxdepth 1 -printf '%Y %f\n' |awk '/^d/{print "[+]"$NF}; /^[^d]/{print $NF}' |column
you should also try this using a bash script
#!/usr/bin/env bash
myls() {
for i in *;do
[[ -d "${i}" ]] && {
printf "%s\n" "[+] ${i}"
continue;
}
printf "%s\n" "${i}"
done
}
source the script in your .bashrc file. Whenever you want to use this, just call myls in the directory.
you should note that it does not give you a colored output

Trying to delete lines beginning with a specific string from files where the file meets a target condition, in bash/linux

I am writing a bash script that will run a couple of times a minute. What I would like it to do is find all files in a specified directory that contain a specified string, and search that list of files and delete any line beginning with a different specific string (in this case it's
Here's what I've tried s far, but they aren't working:
ls -1t /the/directory | head -10 | grep -l "qualifying string" * | sed -i '/^<meta/d' *'
ls -1t /the/directory | head -10 | grep -l "qualifying string" * | sed -i '/^<meta/d' /the/directory'
The only reason I added in the head -10 is so that every time the script runs, it will start by only looking at the 10 most recent files. I don't want it to spend a lot of time searching needlessly through the entire directory since it will be going through and removing the line many times a minute.
The script has to be run out of a different directory than the files are in. Also, would the modified date on the files change if the "<meta" string doesn't exist in the file?
There are a variety of problem with this part of the command...
ls -1t /the/directory | head -10 | grep -l "qualifying string" * ...
First, you appear to be trying to pipe the output of ls ... | head -10 into grep, which would cause grep to search for "qualifying string" in the output of ls. Except then you turn around and provide * as a command line argument to grep, causing it to search in all the files, and completely ignoring the ls and head commands.
You probably want to read about the xargs commands, which reads a list of files on stdin and then runs a given command against that list. For example, you ought to be able to generate your file list like this:
ls -1t /the/directory | head -10 |
xargs grep -l "qualifying string"
And to apply sed to those files:
ls -1t /the/directory | head -10 |
xargs grep -l "qualifying string" |
sed -i 's/something/else/g'
Modifying the files with sed will update the modification time on the files.
You can use globbing with the * character to expand file names and loop through the directory.
n=0
for file in /the/directory/*; do
if [ -f "$file" ]; then
grep "qualifying string" "$file" && sed -i '/^<meta/d' "$file"
n=$((n+1))
fi
[ $n -eq 10 ] && break
done

How to group bash command into one function?

Here is what I am trying to achieve. I want to run a sequence of commands on that file, so for example
ls * | xargs (cat - | calculateforfile)
I want to run (cat | calculateforthisfile) on each of the file separately. So basically, how to group a list of commands as if it is one single function?
No need to use xargs. Just use a loop. You also don't need to use cat. Just redirect its input with the file.
for A in *; do
calculateforfile < "$A"
done
As a single line:
for A in *; do calculateforfile < "$A"; done
If you're looking for xargs solution for this (for example find command)
find . -name "*.txt" | xargs -I % cat %
This will cat all the files found under current directory that end in .txt
The -I option is the key there

Resources