How to count lines of code including sub-directories [duplicate] - linux

This question already has answers here:
How can I count all the lines of code in a directory recursively?
(51 answers)
Closed 7 years ago.
Suppose I want to count the lines of code in a project. If all of the files are in the same directory I can execute:
cat * | wc -l
However, if there are sub-directories, this doesn't work. For this to work cat would have to have a recursive mode. I suspect this might be a job for xargs, but I wonder if there is a more elegant solution?

First you do not need to use cat to count lines. This is an antipattern called Useless Use of Cat (UUoC). To count lines in files in the current directory, use wc:
wc -l *
Then the find command recurses the sub-directories:
find . -name "*.c" -exec wc -l {} \;
. is the name of the top directory to start searching from
-name "*.c" is the pattern of the file you're interested in
-exec gives a command to be executed
{} is the result of the find command to be passed to the command (here wc-l)
\; indicates the end of the command
This command produces a list of all files found with their line count, if you want to have the sum for all the files found, you can use find to list the files (with the -print option) and than use xargs to pass this list as argument to wc-l.
find . -name "*.c" -print | xargs wc -l
EDIT to address Robert Gamble comment (thanks): if you have spaces or newlines (!) in file names, then you have to use -print0 option instead of -print and xargs -null so that the list of file names are exchanged with null-terminated strings.
find . -name "*.c" -print0 | xargs -0 wc -l
The Unix philosophy is to have tools that do one thing only, and do it well.

If you want a code-golfing answer:
grep '' -R . | wc -l
The problem with just using wc -l on its own is it cant descend well, and the oneliners using
find . -exec wc -l {} \;
Won't give you a total line count because it runs wc once for every file, ( loL! )
and
find . -exec wc -l {} +
Will get confused as soon as find hits the ~200k1,2 character argument limit for parameters and instead calls wc multiple times, each time only giving you a partial summary.
Additionally, the above grep trick will not add more than 1 line to the output when it encounters a binary file, which could be circumstantially beneficial.
For the cost of 1 extra command character, you can ignore binary files completely:
grep '' -IR . | wc -l
If you want to run line counts on binary files too
grep '' -aR . | wc -l
Footnote on limits:
The docs are a bit vague as to whether its a string size limit or a number of tokens limit.
cd /usr/include;
find -type f -exec perl -e 'printf qq[%s => %s\n], scalar #ARGV, length join q[ ], #ARGV' {} +
# 4066 => 130974
# 3399 => 130955
# 3155 => 130978
# 2762 => 130991
# 3923 => 130959
# 3642 => 130989
# 4145 => 130993
# 4382 => 130989
# 4406 => 130973
# 4190 => 131000
# 4603 => 130988
# 3060 => 95435
This implies its going to chunk very very easily.

I think you're probably stuck with xargs
find -name '*php' | xargs cat | wc -l
chromakode's method gives the same result but is much much slower. If you use xargs your cating and wcing can start as soon as find starts finding.
Good explanation at Linux: xargs vs. exec {}

Try using the find command, which recurses directories by default:
find . -type f -execdir cat {} \; | wc -l

The correct way is:
find . -name "*.c" -print0 | xargs -0 cat | wc -l
You must use -print0 because there are only two invalid characters in Unix filenames: The null byte and "/" (slash). So for example "xxx\npasswd" is a valid name. In reality, you're more likely to encounter names with spaces in them, though. The commands above would count each word as a separate file.
You might also want to use "-type f" instead of -name to limit the search to files.

Using cat or grep in the solutions above is wasteful if you can use relatively recent GNU tools, including Bash:
wc -l --files0-from=<(find . -name \*.c -print0)
This handles file names with spaces, arbitrary recursion and any number of matching files, even if they exceed the command line length limit.

wc -cl `find . -name "*.php" -type f`

I like to use find and head together for "a recursively cat" on all the files in a project directory, for example:
find . -name "*rb" -print0 | xargs -0 head -10000
The advantage is that head will add your the filename and path:
==> ./recipes/default.rb <==
DOWNLOAD_DIR = '/tmp/downloads'
MYSQL_DOWNLOAD_URL = 'http://cdn.mysql.com/Downloads/MySQL-5.6/mysql-5.6.10-debian6.0-x86_64.deb'
MYSQL_DOWNLOAD_FILE = "#{DOWNLOAD_DIR}/mysql-5.6.10-debian6.0-x86_64.deb"
package "mysql-server-5.5"
...
==> ./templates/default/my.cnf.erb <==
#
# The MySQL database server configuration file.
#
...
==> ./templates/default/mysql56.sh.erb <==
PATH=/opt/mysql/server-5.6/bin:$PATH
For the complete example here, please see my blog post :
http://haildata.net/2013/04/using-cat-recursively-with-nicely-formatted-output-including-headers/
Note I used 'head -10000', clearly if I have files over 10,000 lines this is going to truncate the output ... however I could use head 100000 but for "informal project/directory browsing" this approach works very well for me.

If you want to generate only a total line count and not a line count for each file something like:
find . -type f -exec wc -l {} \; | awk '{total += $1} END{print total}'
works well. This saves you the need to do further text filtering in a script.

Here's a Bash script that counts the lines of code in a project. It traverses a source tree recursively, and it excludes blank lines and single line comments that use "//".
# $excluded is a regex for paths to exclude from line counting
excluded="spec\|node_modules\|README\|lib\|docs\|csv\|XLS\|json\|png"
countLines(){
# $total is the total lines of code counted
total=0
# -mindepth exclues the current directory (".")
for file in `find . -mindepth 1 -name "*.*" |grep -v "$excluded"`; do
# First sed: only count lines of code that are not commented with //
# Second sed: don't count blank lines
# $numLines is the lines of code
numLines=`cat $file | sed '/\/\//d' | sed '/^\s*$/d' | wc -l`
total=$(($total + $numLines))
echo " " $numLines $file
done
echo " " $total in total
}
echo Source code files:
countLines
echo Unit tests:
cd spec
countLines
Here's what the output looks like for my project:
Source code files:
2 ./buildDocs.sh
24 ./countLines.sh
15 ./css/dashboard.css
53 ./data/un_population/provenance/preprocess.js
19 ./index.html
5 ./server/server.js
2 ./server/startServer.sh
24 ./SpecRunner.html
34 ./src/computeLayout.js
60 ./src/configDiff.js
18 ./src/dashboardMirror.js
37 ./src/dashboardScaffold.js
14 ./src/data.js
68 ./src/dummyVis.js
27 ./src/layout.js
28 ./src/links.js
5 ./src/main.js
52 ./src/processActions.js
86 ./src/timeline.js
73 ./src/udc.js
18 ./src/wire.js
664 in total
Unit tests:
230 ./ComputeLayoutSpec.js
134 ./ConfigDiffSpec.js
134 ./ProcessActionsSpec.js
84 ./UDCSpec.js
149 ./WireSpec.js
731 in total
Enjoy! --Curran

find . -name "*.h" -print | xargs wc -l

Related

How do we concatenate all files in Linux excluding the directories?

I am trying to get the total number of lines in all files in a directory.
I tried to do the following:
cat * | wc -1
to get the total number of lines in the directory, but it gives me a message that some of the files are directories. ('cat : some_dir: Is a directory')
How can I exclude directories when concatenating all files?
To get just sum you can try something like below, get count of each file and sum it
find . -type f -exec wc -l {} \; | awk '{ SUM += $1} END { print SUM }'
add -maxdepth 1 which skips scanning further
-type f for filtering only files
Here is test results :
$ seq 1 4 >file1
$ seq 1 5 >file2
$ cat file1
1
2
3
4
$ cat file2
1
2
3
4
5
$ find . -type f -exec wc -l {} \;
5 ./file2
4 ./file1
$ find . -type f -exec wc -l {} \; | awk '{ SUM += $1} END { print SUM }'
9
$ find . -type f -exec wc -l {} +
5 ./file2
4 ./file1
9 total
$ find . -type f -exec wc -l {} + | awk 'END{print $1}'
9
Under bash:
shopt -s globstar
wc -l **/*
From bash's man page:
globstar
If set, the pattern ** used in a pathname expansion con‐
text will match all files and zero or more directories
and subdirectories. If the pattern is followed by a /,
only directories and subdirectories match.
But care: this will take symlinks too!
Some tricks about find
If you want to read files only:
find . -type f -exec wc -l {} +
and, for total only:
find . -type f -exec wc -l {} + | tail -n 1
Huge dirs:
Syntax find ... -exec ... + will limit groupping args to maximum line length. So if your tree is really big, the previous command will generate more than only one fork to wc.
-exec command {} +
This variant of the -exec action runs the specified command on
the selected files, but the command line is built by appending
each selected file name at the end; the total number of invoca‐
tions of the command will be much less than the number of
matched files. The command line is built in much the same way
that xargs builds its command lines. Only one instance of `{}'
is allowed within the command, and (when find is being invoked
from a shell) it should be quoted (for example, '{}') to protect
it from interpretation by shells. The command is executed in
the starting directory. If any invocation returns a non-zero
value as exit status, then find returns a non-zero exit status.
If find encounters an error, this can sometimes cause an immedi‐
ate exit, so some pending commands may not be run at all. This
variant of -exec always returns true.
find . -type f -exec wc -l {} + | awk 'BEGIN{t=0};/total/{t+=$1};END{print t}'
This will compute sum of total lines.
Alternative: using bash for summarize
For fun, as there is no real improvment:
tot=0
while read val nam;do
[ "$nam" = "total" ] && ((tot+=val))
done < <( find . -type f -exec wc -l {} + )
echo $tot

Find the longest file name in Linux

I am searching for the longest filename from my root directory to the very bottom.
I have coded a C program that will calculate the longest file name's length and its name.
However, I cannot get the shell to redirect the long list of file names to standard input for my program to receive it.
Here is what I did:
ls -Rp | grep -v / | grep -v "Permission denied" | ./home/user/findlongest
findlongest has been compiled and I check it on one of my IDE's to make sure it's working correctly. No run time errors were detected so far.
How do I get the list of file names into my 'findlongest' code by redirecting stdin?
Try this:
find / -type f -printf '%f\n' 2>/dev/null | /home/user/findlongest
The 2>/dev/null will discard all data written to stderr (which is where you're seeing the 'Permission denied' messages from).
Or the following to remove the dependancy on your application (from here):
find / -type f -printf '%f\n' 2>/dev/null | \
awk 'length > max_length {
max_length = length; longest_line = $0
}
END {
print length(longest_line) " " longest_line
}'
What about
find / -type f | /home/user/findlongest
It will list all files from root with absolute path and print only those files you have permissions to list.
Based on the command:
find -exec basename '{}' ';'
which prints recursively only the filenames of all the files starting from the directory you are: all the filenames.
This bash line will provide the file with longest name and the its number of characters:
Note that the loop involved will make the process slow.
for i in $(find -exec basename '{}' ';'); do printf $i" " && echo -e -n $i | wc -c; done | sort -nk 2 | tail -1
By parts:
Prints the name of the file followed by a single space:
printf $i" "
Prints the number of characters of such file:
echo -e -n $i | wc -c
Sorts the output by number of characters and takes the longest one (the very latest):
sort -nk 2 | tail -1
All this inside a for loop to handle line by line.
The for sentence can be also changed by:
for i in $(find -type f -printf '%f\n');
As stated in #Attie's answer

merge find command output with another command output and redirect to file

I am looking to combine the output of the Linux find and head commands (to derive a list of filenames) with output of another Linux/bash command and save the result in a file such that each filename from the "find" occurs with the other command output on a separate line.
So for example,
- if a dir testdir contains files a.txt, b.txt and c.txt,
- and the output of the other command is some number say 10, the desired output I'm looking for is
10 a.txt
10 b.txt
10 c.txt
On searching here, I saw folks recommending paste for doing similar merging but I couldn't figure out how to do it in this scenario as paste seems to be expecting files . I tried
paste $(find testdir -maxdepth 1 -type f -name "*.text" | head -2) $(echo "10") > output.txt
paste: 10: No such file or directory
Would appreciate any pointers as to what I'm doing wrong. Any other ways of achieving the same thing are also welcome.
Note that if I wanted to make everything appear on the same line, I could use xargs and that does the job.
$find testdir -maxdepth 1 -type f -name "*.text" | head -2 |xargs echo "10" > output.txt
$cat output.txt
10 a.txt b.txt
But my requirement is to merge the two command outputs as shown earlier.
Thanks in advance for any help!
find can handle both the -exec and -print directives, you just need to merge the output:
$ find . -maxdepth 1 -type f -name \*.txt -exec echo hello \; -print | paste - -
hello ./b.txt
hello ./a.txt
hello ./all.txt
Assuming your "command" requires the filename (here's a very contrived example):
$ find . -maxdepth 1 -type f -name \*.txt -exec sh -c 'wc -l <"$1"' _ {} \; -print | paste - -
4 ./b.txt
4 ./a.txt
7 ./all.txt
Of course, that's executing the command for each file. To restrict myself to your question:
cmd_out=$(echo 10)
for file in *.txt; do
echo "$cmd_out $file"
done
Try this,
$find testdir -maxdepth 1 -type f -name "*.text" | head -2 |tr ' ' '\n'|sed -i 's/^/10/' > output.txt
You can make xargs operate on one line at a time using -L1:
find testdir -maxdepth 1 -type f -name "*.text" | xargs -L1 echo "10" > output.txt

File with the most lines in a directory NOT bytes

I'm trying to to wc -l an entire directory and then display the filename in an echo with the number of lines.
To add to my frustration, the directory has to come from a passed argument. So without looking stupid, can someone first tell me why a simple wc -l $1 doesn't give me the line count for the directory I type in the argument? I know i'm not understanding it completely.
On top of that I need validation too, if the argument given is not a directory or there is more than one argument.
wc works on files rather than directories so, if you want the word count on all files in the directory, you would start with:
wc -l $1/*
With various gyrations to get rid of the total, sort it and extract only the largest, you could end up with something like (split across multiple lines for readability but should be entered on a single line):
pax> wc -l $1/* 2>/dev/null
| grep -v ' total$'
| sort -n -k1
| tail -1l
2892 target_dir/big_honkin_file.txt
As to the validation, you can check the number of parameters passed to your script with something like:
if [[ $# -ne 1 ]] ; then
echo 'Whoa! Wrong parameteer count'
exit 1
fi
and you can check if it's a directory with:
if [[ ! -d $1 ]] ; then
echo 'Whoa!' "[$1]" 'is not a directory'
exit 1
fi
Is this what you want?
> find ./test1/ -type f|xargs wc -l
1 ./test1/firstSession_cnaiErrorFile.txt
77 ./test1/firstSession_cnaiReportFile.txt
14950 ./test1/exp.txt
1 ./test1/test1_cnaExitValue.txt
15029 total
so your directory which is the argument should go here:
find $your_complete_directory_path/ -type f|xargs wc -l
I'm trying to to wc -l an entire directory and then display the
filename in an echo with the number of lines.
You can do a find on the directory and use -exec option to trigger wc -l. Something like this:
$ find ~/Temp/perl/temp/ -exec wc -l '{}' \;
wc: /Volumes/Data/jaypalsingh/Temp/perl/temp/: read: Is a directory
11 /Volumes/Data/jaypalsingh/Temp/perl/temp//accessor1.plx
25 /Volumes/Data/jaypalsingh/Temp/perl/temp//autoincrement.pm
12 /Volumes/Data/jaypalsingh/Temp/perl/temp//bless1.plx
14 /Volumes/Data/jaypalsingh/Temp/perl/temp//bless2.plx
22 /Volumes/Data/jaypalsingh/Temp/perl/temp//classatr1.plx
27 /Volumes/Data/jaypalsingh/Temp/perl/temp//classatr2.plx
7 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee1.pm
18 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee2.pm
26 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee3.pm
12 /Volumes/Data/jaypalsingh/Temp/perl/temp//ftp.plx
14 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit1.plx
16 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit2.plx
24 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit3.plx
33 /Volumes/Data/jaypalsingh/Temp/perl/temp//persisthash.pm
Nice question!
I saw the answers. Some are pretty good. The find ...|xrags is my most preferred. It could be simplified anyway using find ... -exec wc -l {} + syntax. But there is a problem. When the command line buffer is full a wc -l ... is called and every time a <number> total line is printer. As wc has no arg to disable this feature wc has to be reimplemented. To filter out these lines with grep is not nice:
So my complete answer is
#!/usr/bin/bash
[ $# -ne 1 ] && echo "Bad number of args">&2 && exit 1
[ ! -d "$1" ] && echo "Not dir">&2 && exit 1
find "$1" -type f -exec awk '{++n[FILENAME]}END{for(i in n) printf "%8d %s\n",n[i],i}' {} +
Or using less temporary space, but a little bit larger code in awk:
find "$1" -type f -exec awk 'function pr(){printf "%8d %s\n",n,f}FNR==1{f&&pr();n=0;f=FILENAME}{++n}END{pr()}' {} +
Misc
If it should not be called for subdirectories then add -maxdepth 1 before -type to find.
It is pretty fast. I was afraid that it would be much slower then the find ... wc + version, but for a directory containing 14770 files (in several subdirs) the wc version run 3.8 sec and awk version run 5.2 sec.
awk and wc consider the not \n ended lines differently. The last line ended with no \n is not counted by wc. I prefer to count it as awk does.
It does not print the empty files
To find the file with most lines in the current directory and its subdirectories, with zsh:
lines() REPLY=$(wc -l < "$REPLY")
wc -l -- **/*(D.nO+lined[1])
That defines a lines function which is going to be used as a glob sorting function that returns in $REPLY the number of lines of the file whose path is given in $REPLY.
Then we use zsh's recursive globbing **/* to find regular files (.), numerically (n) reverse sorted (O) with the lines function (+lines), and select the first one [1]. (D to include dotfiles and traverse dotdirs).
Doing it with standard utilities is a bit tricky if you don't want to make assumptions on what characters file names may contain (like newline, space...). With GNU tools as found on most Linux distributions, it's a bit easier as they can deal with NUL terminated lines:
find . -type f -exec sh -c '
for file do
size=$(wc -c < "$file") &&
printf "%s\0" "$size:$file"
done' sh {} + |
tr '\n\0' '\0\n' |
sort -rn |
head -n1 |
tr '\0' '\n'
Or with zsh or GNU bash syntax:
biggest= max=-1
find . -type f -print0 |
{
while IFS= read -rd '' file; do
size=$(wc -l < "$file") &&
((size > max)) &&
max=$size biggest=$file
done
[[ -n $biggest ]] && printf '%s\n' "$max: $biggest"
}
Here's one that works for me with the git bash (mingw32) under windows:
find . -type f -print0| xargs -0 wc -l
This will list the files and line counts in the current directory and sub dirs. You can also direct the output to a text file and import it into Excel if needed:
find . -type f -print0| xargs -0 wc -l > fileListingWithLineCount.txt

Use wc on all subdirectories to count the sum of lines

How can I count all lines of all files in all subdirectories with wc?
cd mydir
wc -l *
..
11723 total
man wc suggests wc -l --files0-from=-, but I do not know how to generate the list of all files as NUL-terminated names
find . -print | wc -l --files0-from=-
did not work.
You probably want this:
find . -type f -print0 | wc -l --files0-from=-
If you only want the total number of lines, you could use
find . -type f -exec cat {} + | wc -l
Perhaps you are looking for exec option of find.
find . -type f -exec wc -l {} \; | awk '{total += $1} END {print total}'
To count all lines for specific file extension u can use ,
find . -name '*.fileextension' | xargs wc -l
if you want it on two or more different types of files u can put -o option
find . -name '*.fileextension1' -o -name '*.fileextension2' | xargs wc -l
Another option would be to use a recursive grep:
grep -hRc '' . | awk '{k+=$1}END{print k}'
The awk simply adds the numbers. The grep options used are:
-c, --count
Suppress normal output; instead print a count of matching lines
for each input file. With the -v, --invert-match option (see
below), count non-matching lines. (-c is specified by POSIX.)
-h, --no-filename
Suppress the prefixing of file names on output. This is the
default when there is only one file (or only standard input) to
search.
-R, --dereference-recursive
Read all files under each directory, recursively. Follow all
symbolic links, unlike -r.
The grep, therefore, counts the number of lines matching anything (''), so essentially just counts the lines.
I would suggest something like
find ./ -type f | xargs wc -l | cut -c 1-8 | awk '{total += $1} END {print total}'
Based on ДМИТРИЙ МАЛИКОВ's answer:
Example for counting lines of java code with formatting:
one liner
find . -name *.java -exec wc -l {} \; | awk '{printf ("%3d: %6d %s\n",NR,$1,$2); total += $1} END {printf (" %6d\n",total)}'
awk part:
{
printf ("%3d: %6d %s\n",NR,$1,$2);
total += $1
}
END {
printf (" %6d\n",total)
}
example result
1: 120 ./opencv/NativeLibrary.java
2: 65 ./opencv/OsCheck.java
3: 5 ./opencv/package-info.java
190
Bit late to the game here, but wouldn't this also work? find . -type f | wc -l
This counts all lines output by the 'find' command. You can fine-tune the 'find' to show whatever you want. I am using it to count the number of subdirectories, in one specific subdir, in deep tree: find ./*/*/*/*/*/*/TOC -type d | wc -l . Output: 76435. (Just doing a find without all the intervening asterisks yielded an error.)

Resources