I am trying to get the total number of lines in all files in a directory.
I tried to do the following:
cat * | wc -1
to get the total number of lines in the directory, but it gives me a message that some of the files are directories. ('cat : some_dir: Is a directory')
How can I exclude directories when concatenating all files?
To get just sum you can try something like below, get count of each file and sum it
find . -type f -exec wc -l {} \; | awk '{ SUM += $1} END { print SUM }'
add -maxdepth 1 which skips scanning further
-type f for filtering only files
Here is test results :
$ seq 1 4 >file1
$ seq 1 5 >file2
$ cat file1
1
2
3
4
$ cat file2
1
2
3
4
5
$ find . -type f -exec wc -l {} \;
5 ./file2
4 ./file1
$ find . -type f -exec wc -l {} \; | awk '{ SUM += $1} END { print SUM }'
9
$ find . -type f -exec wc -l {} +
5 ./file2
4 ./file1
9 total
$ find . -type f -exec wc -l {} + | awk 'END{print $1}'
9
Under bash:
shopt -s globstar
wc -l **/*
From bash's man page:
globstar
If set, the pattern ** used in a pathname expansion con‐
text will match all files and zero or more directories
and subdirectories. If the pattern is followed by a /,
only directories and subdirectories match.
But care: this will take symlinks too!
Some tricks about find
If you want to read files only:
find . -type f -exec wc -l {} +
and, for total only:
find . -type f -exec wc -l {} + | tail -n 1
Huge dirs:
Syntax find ... -exec ... + will limit groupping args to maximum line length. So if your tree is really big, the previous command will generate more than only one fork to wc.
-exec command {} +
This variant of the -exec action runs the specified command on
the selected files, but the command line is built by appending
each selected file name at the end; the total number of invoca‐
tions of the command will be much less than the number of
matched files. The command line is built in much the same way
that xargs builds its command lines. Only one instance of `{}'
is allowed within the command, and (when find is being invoked
from a shell) it should be quoted (for example, '{}') to protect
it from interpretation by shells. The command is executed in
the starting directory. If any invocation returns a non-zero
value as exit status, then find returns a non-zero exit status.
If find encounters an error, this can sometimes cause an immedi‐
ate exit, so some pending commands may not be run at all. This
variant of -exec always returns true.
find . -type f -exec wc -l {} + | awk 'BEGIN{t=0};/total/{t+=$1};END{print t}'
This will compute sum of total lines.
Alternative: using bash for summarize
For fun, as there is no real improvment:
tot=0
while read val nam;do
[ "$nam" = "total" ] && ((tot+=val))
done < <( find . -type f -exec wc -l {} + )
echo $tot
Related
I have a folder and I want count all regular files in it, and for this I use this bash command:
find pathfolder -type f 2> err.txt | wc -l
In the folder there are 3 empty text files and a subfolder with inside it other text files.
For this reason I should get 3 as a result, but I get 6 and I don't understand why. Maybe there is some options that I did not set.
If I remove the subfolder I get 4 as result
To grab all the files and directories in current directory without dot files:
shopt -u dotglob
all=(*)
To grab only directories:
dirs=(*/)
To count only non-dot files in current directory:
echo $(( ${#all[#]} - ${#dirs[#]} ))
To do this with find use:
find . -type f -maxdepth 1 ! -name '.*' -exec printf '%.0s.\n' {} + | wc -l
Below solutions ignore the filenames starting with dot.
To count the files in pathfolder only:
find pathfolder -maxdepth 1 -type f -not -path '*/\.*' | wc -l
To count the files in ALL child directories of pathfolder:
find pathfolder -mindepth 2 -maxdepth 2 -type f -not -path '*/\.*' | wc -l
UPDATE: Converting comments into an answer
Based on the suggestions received from anubhava, by creating a dummy file using the command touch $'foo\nbar', the wc -l counts this filename twice, like in below example:
$> touch $'foo\nbar'
$> find . -type f
./foo?bar
$> find . -type f | wc -l
2
To avoid this, get rid of the newlines before calling wc (anubhava's solution):
$> find . -type f -exec printf '%.0sbla\n' {} +
bla
$> find . -type f -exec printf '%.0sbla\n' {} + | wc -l
1
or avoid calling wc at all:
$> find . -type f -exec sh -c 'i=0; for f; do ((i++)); done; echo $i' sh {} +
1
I am looking to combine the output of the Linux find and head commands (to derive a list of filenames) with output of another Linux/bash command and save the result in a file such that each filename from the "find" occurs with the other command output on a separate line.
So for example,
- if a dir testdir contains files a.txt, b.txt and c.txt,
- and the output of the other command is some number say 10, the desired output I'm looking for is
10 a.txt
10 b.txt
10 c.txt
On searching here, I saw folks recommending paste for doing similar merging but I couldn't figure out how to do it in this scenario as paste seems to be expecting files . I tried
paste $(find testdir -maxdepth 1 -type f -name "*.text" | head -2) $(echo "10") > output.txt
paste: 10: No such file or directory
Would appreciate any pointers as to what I'm doing wrong. Any other ways of achieving the same thing are also welcome.
Note that if I wanted to make everything appear on the same line, I could use xargs and that does the job.
$find testdir -maxdepth 1 -type f -name "*.text" | head -2 |xargs echo "10" > output.txt
$cat output.txt
10 a.txt b.txt
But my requirement is to merge the two command outputs as shown earlier.
Thanks in advance for any help!
find can handle both the -exec and -print directives, you just need to merge the output:
$ find . -maxdepth 1 -type f -name \*.txt -exec echo hello \; -print | paste - -
hello ./b.txt
hello ./a.txt
hello ./all.txt
Assuming your "command" requires the filename (here's a very contrived example):
$ find . -maxdepth 1 -type f -name \*.txt -exec sh -c 'wc -l <"$1"' _ {} \; -print | paste - -
4 ./b.txt
4 ./a.txt
7 ./all.txt
Of course, that's executing the command for each file. To restrict myself to your question:
cmd_out=$(echo 10)
for file in *.txt; do
echo "$cmd_out $file"
done
Try this,
$find testdir -maxdepth 1 -type f -name "*.text" | head -2 |tr ' ' '\n'|sed -i 's/^/10/' > output.txt
You can make xargs operate on one line at a time using -L1:
find testdir -maxdepth 1 -type f -name "*.text" | xargs -L1 echo "10" > output.txt
How can I count all lines of all files in all subdirectories with wc?
cd mydir
wc -l *
..
11723 total
man wc suggests wc -l --files0-from=-, but I do not know how to generate the list of all files as NUL-terminated names
find . -print | wc -l --files0-from=-
did not work.
You probably want this:
find . -type f -print0 | wc -l --files0-from=-
If you only want the total number of lines, you could use
find . -type f -exec cat {} + | wc -l
Perhaps you are looking for exec option of find.
find . -type f -exec wc -l {} \; | awk '{total += $1} END {print total}'
To count all lines for specific file extension u can use ,
find . -name '*.fileextension' | xargs wc -l
if you want it on two or more different types of files u can put -o option
find . -name '*.fileextension1' -o -name '*.fileextension2' | xargs wc -l
Another option would be to use a recursive grep:
grep -hRc '' . | awk '{k+=$1}END{print k}'
The awk simply adds the numbers. The grep options used are:
-c, --count
Suppress normal output; instead print a count of matching lines
for each input file. With the -v, --invert-match option (see
below), count non-matching lines. (-c is specified by POSIX.)
-h, --no-filename
Suppress the prefixing of file names on output. This is the
default when there is only one file (or only standard input) to
search.
-R, --dereference-recursive
Read all files under each directory, recursively. Follow all
symbolic links, unlike -r.
The grep, therefore, counts the number of lines matching anything (''), so essentially just counts the lines.
I would suggest something like
find ./ -type f | xargs wc -l | cut -c 1-8 | awk '{total += $1} END {print total}'
Based on ДМИТРИЙ МАЛИКОВ's answer:
Example for counting lines of java code with formatting:
one liner
find . -name *.java -exec wc -l {} \; | awk '{printf ("%3d: %6d %s\n",NR,$1,$2); total += $1} END {printf (" %6d\n",total)}'
awk part:
{
printf ("%3d: %6d %s\n",NR,$1,$2);
total += $1
}
END {
printf (" %6d\n",total)
}
example result
1: 120 ./opencv/NativeLibrary.java
2: 65 ./opencv/OsCheck.java
3: 5 ./opencv/package-info.java
190
Bit late to the game here, but wouldn't this also work? find . -type f | wc -l
This counts all lines output by the 'find' command. You can fine-tune the 'find' to show whatever you want. I am using it to count the number of subdirectories, in one specific subdir, in deep tree: find ./*/*/*/*/*/*/TOC -type d | wc -l . Output: 76435. (Just doing a find without all the intervening asterisks yielded an error.)
I need to write a script for a web server that will clean out files/folders older than 14 days, but keep the last 7 files/directories. I've been doing my research so far and here is what I came up with (I know the syntax and commands are incorrect but just so you get an idea):
ls -ldt /data/deployments/product/website.com/*/ | tail -n +8 | xargs find /data/deployments/product/website.com/ -type f -type d -mtime +14 -exec rm -R {} \;
This is my thought process as to how the script should behave (I'm more a windows batch guy):
List the directory contents
If contents is less than or equal to 7, goto END
If contents is > 7 goto CLEAN
:CLEAN
ls -ldt /data/deployments/product/website.com/*/
keep last 7 entries (tail -n +8)
output of that "tail" -> find -type f -type d (both files and directories) -mtime +14 (not older than 14 days) -exec rm -R (delete)
I've seen a bunch of examples, using xargs and sed but I just can't figure out how to put it all together.
#!/bin/bash
find you_dir -mindepth 1 -maxdepth 1 -printf "%T# %p\n" | \
sort -nrk1,1 |sed '1,7d' | cut -d' ' -f2 | \
xargs -n1 -I fname \
find fname -maxdepth 0 -mtime +14 -exec echo rm -rf {} \;
remove the echo if your happy with the output...
Explanation (line-by-line):
find in exactly in your_dir and print seconds_since_Unix_epoch (%T#) and file(/dir)name for each file/dir on a separate line
sort by first field (seconds_since_Unix_epoch) descending, throw the first seven lines away - from the rest extract just the name (second field)
xargs passes on to new find process argument-by-argument (-n1) and uses fname to represent argument
-maxdepth 0 limits find to just fname
You could store the minNrOfFiles and the ageLimit in Bash-Variables or pass in to the script with just few changes:
minNrOfFiles=7 # or $1
ageLimit=14 # or $2
change: sed '1,'"$minNrOfFiles"'d' and -mtime +"$ageLimit"
This question already has answers here:
How can I count all the lines of code in a directory recursively?
(51 answers)
Closed 7 years ago.
Suppose I want to count the lines of code in a project. If all of the files are in the same directory I can execute:
cat * | wc -l
However, if there are sub-directories, this doesn't work. For this to work cat would have to have a recursive mode. I suspect this might be a job for xargs, but I wonder if there is a more elegant solution?
First you do not need to use cat to count lines. This is an antipattern called Useless Use of Cat (UUoC). To count lines in files in the current directory, use wc:
wc -l *
Then the find command recurses the sub-directories:
find . -name "*.c" -exec wc -l {} \;
. is the name of the top directory to start searching from
-name "*.c" is the pattern of the file you're interested in
-exec gives a command to be executed
{} is the result of the find command to be passed to the command (here wc-l)
\; indicates the end of the command
This command produces a list of all files found with their line count, if you want to have the sum for all the files found, you can use find to list the files (with the -print option) and than use xargs to pass this list as argument to wc-l.
find . -name "*.c" -print | xargs wc -l
EDIT to address Robert Gamble comment (thanks): if you have spaces or newlines (!) in file names, then you have to use -print0 option instead of -print and xargs -null so that the list of file names are exchanged with null-terminated strings.
find . -name "*.c" -print0 | xargs -0 wc -l
The Unix philosophy is to have tools that do one thing only, and do it well.
If you want a code-golfing answer:
grep '' -R . | wc -l
The problem with just using wc -l on its own is it cant descend well, and the oneliners using
find . -exec wc -l {} \;
Won't give you a total line count because it runs wc once for every file, ( loL! )
and
find . -exec wc -l {} +
Will get confused as soon as find hits the ~200k1,2 character argument limit for parameters and instead calls wc multiple times, each time only giving you a partial summary.
Additionally, the above grep trick will not add more than 1 line to the output when it encounters a binary file, which could be circumstantially beneficial.
For the cost of 1 extra command character, you can ignore binary files completely:
grep '' -IR . | wc -l
If you want to run line counts on binary files too
grep '' -aR . | wc -l
Footnote on limits:
The docs are a bit vague as to whether its a string size limit or a number of tokens limit.
cd /usr/include;
find -type f -exec perl -e 'printf qq[%s => %s\n], scalar #ARGV, length join q[ ], #ARGV' {} +
# 4066 => 130974
# 3399 => 130955
# 3155 => 130978
# 2762 => 130991
# 3923 => 130959
# 3642 => 130989
# 4145 => 130993
# 4382 => 130989
# 4406 => 130973
# 4190 => 131000
# 4603 => 130988
# 3060 => 95435
This implies its going to chunk very very easily.
I think you're probably stuck with xargs
find -name '*php' | xargs cat | wc -l
chromakode's method gives the same result but is much much slower. If you use xargs your cating and wcing can start as soon as find starts finding.
Good explanation at Linux: xargs vs. exec {}
Try using the find command, which recurses directories by default:
find . -type f -execdir cat {} \; | wc -l
The correct way is:
find . -name "*.c" -print0 | xargs -0 cat | wc -l
You must use -print0 because there are only two invalid characters in Unix filenames: The null byte and "/" (slash). So for example "xxx\npasswd" is a valid name. In reality, you're more likely to encounter names with spaces in them, though. The commands above would count each word as a separate file.
You might also want to use "-type f" instead of -name to limit the search to files.
Using cat or grep in the solutions above is wasteful if you can use relatively recent GNU tools, including Bash:
wc -l --files0-from=<(find . -name \*.c -print0)
This handles file names with spaces, arbitrary recursion and any number of matching files, even if they exceed the command line length limit.
wc -cl `find . -name "*.php" -type f`
I like to use find and head together for "a recursively cat" on all the files in a project directory, for example:
find . -name "*rb" -print0 | xargs -0 head -10000
The advantage is that head will add your the filename and path:
==> ./recipes/default.rb <==
DOWNLOAD_DIR = '/tmp/downloads'
MYSQL_DOWNLOAD_URL = 'http://cdn.mysql.com/Downloads/MySQL-5.6/mysql-5.6.10-debian6.0-x86_64.deb'
MYSQL_DOWNLOAD_FILE = "#{DOWNLOAD_DIR}/mysql-5.6.10-debian6.0-x86_64.deb"
package "mysql-server-5.5"
...
==> ./templates/default/my.cnf.erb <==
#
# The MySQL database server configuration file.
#
...
==> ./templates/default/mysql56.sh.erb <==
PATH=/opt/mysql/server-5.6/bin:$PATH
For the complete example here, please see my blog post :
http://haildata.net/2013/04/using-cat-recursively-with-nicely-formatted-output-including-headers/
Note I used 'head -10000', clearly if I have files over 10,000 lines this is going to truncate the output ... however I could use head 100000 but for "informal project/directory browsing" this approach works very well for me.
If you want to generate only a total line count and not a line count for each file something like:
find . -type f -exec wc -l {} \; | awk '{total += $1} END{print total}'
works well. This saves you the need to do further text filtering in a script.
Here's a Bash script that counts the lines of code in a project. It traverses a source tree recursively, and it excludes blank lines and single line comments that use "//".
# $excluded is a regex for paths to exclude from line counting
excluded="spec\|node_modules\|README\|lib\|docs\|csv\|XLS\|json\|png"
countLines(){
# $total is the total lines of code counted
total=0
# -mindepth exclues the current directory (".")
for file in `find . -mindepth 1 -name "*.*" |grep -v "$excluded"`; do
# First sed: only count lines of code that are not commented with //
# Second sed: don't count blank lines
# $numLines is the lines of code
numLines=`cat $file | sed '/\/\//d' | sed '/^\s*$/d' | wc -l`
total=$(($total + $numLines))
echo " " $numLines $file
done
echo " " $total in total
}
echo Source code files:
countLines
echo Unit tests:
cd spec
countLines
Here's what the output looks like for my project:
Source code files:
2 ./buildDocs.sh
24 ./countLines.sh
15 ./css/dashboard.css
53 ./data/un_population/provenance/preprocess.js
19 ./index.html
5 ./server/server.js
2 ./server/startServer.sh
24 ./SpecRunner.html
34 ./src/computeLayout.js
60 ./src/configDiff.js
18 ./src/dashboardMirror.js
37 ./src/dashboardScaffold.js
14 ./src/data.js
68 ./src/dummyVis.js
27 ./src/layout.js
28 ./src/links.js
5 ./src/main.js
52 ./src/processActions.js
86 ./src/timeline.js
73 ./src/udc.js
18 ./src/wire.js
664 in total
Unit tests:
230 ./ComputeLayoutSpec.js
134 ./ConfigDiffSpec.js
134 ./ProcessActionsSpec.js
84 ./UDCSpec.js
149 ./WireSpec.js
731 in total
Enjoy! --Curran
find . -name "*.h" -print | xargs wc -l