How to use "find" and "grep" to get the file size too? - linux

I have this script:
find test -type f \( -iname \*.html -o -iname \*.htm -o -iname \*.xhtml \) -exec grep -il ".swf" {} \; -printf '%k KB - \t %p\n' > result-swf-files.csv
This will search the directory "test" (and its subdirectories) for all HTML files which contains the word ".swf" in them. ANd will write a CSV file with the results.
But I want to get the file size too in the same line (now, the script outputs on one line the grep result - which doesn't have the file size - and in another line the printf result - which includes the file size).
How do I add an option to grep to get the file size?

A less verbose way is to use recursive grep (if your system supports it):
grep -rl --include="*.htm*" ".swf" test|xargs ls -l|awk '{ print $9 "," $5 }'
Explanation :
Grep recursively using the "rl" flag
include file pattern "*.htm"
search for the string ".swf" in each htm* file
search only under the "test" directory
pipe the result to xargs where each filename becomes argument to "ls -l" command
Then use awk to get to only filename and file size. Use comma "," in between 9th and 5th columns in awk print to get the csv output.
Feel free to replace "ls -l" with human readable variants such as "ls -lk" or "ls -lh"
Alternatively, in your script, you can just print only the 2nd line of each file (the one that contains the size). You can just pipe and use grep like this : grep "[0-9] [KB]"
Below is the complete command:
find . -type f \( -iname \*.html -o -iname \*.htm -o -iname \*.xhtml \) -exec grep -il ".swf" {} \; -printf '%k KB - \t %p\n'| grep "[0-9] [KB]"

find . -name *PATTERN*.gz -print0 | xargs -0 ls -lh
So you get ls for all files that you want.

Related

Count all occurrences of a tag in lots of log files using grep

I need to get the quantity of tags, for example "103=16" found in lots of files, how many of them are, but only the files that have one or more occurrences
I'm using:
find . /opt/FIXLOGS/l51prdsrv\* -iname "TRADX\_*oe*.log" -type f -exec grep -F 103=16 -c {} /dev/null ;
which finds the file where the tag is and shows the number of matches, but it also shows the 0 occurrences
returns
file1.log:0
file2.log:0
file3.log:6
file4.log:0
using a -i to exclude the 0 or grep -v :0 haven't worked for me, gets the result:
grep: :0: No such file or directory
How can I get only the files where the count is more than 0?
Have you tried piping into grep to negate the ones with zeroes after the find/exec?
E.g., like this works for me:
find . -type f -iname "TRADX_oe.log" -exec grep -cFH "103=16" {} \; | grep -v ":0"
Using awk to do everything in one place
find . -type f -iname "TRADX_oe.log" -exec awk '/103=16/{c++} END { if(c)print FILENAME, c}' {} \;
That is the way how -c option of grep works:
-c, --count
Suppress normal output; instead print a count of matching lines for each input file. With the -v, --invert-match
option (see below), count non-matching lines.
So it will print 0 counts, only option is to remove 0 with another grep using -v or use awk:
awk '/search_pattern/{f[FILENAME]+=1} END {for(i in f){print i":"f[i]}}' /path/to/files*
It worked when i pipe the grep after the ; excluding the zero | grep -v ":0"
ending like this:
find . route -iname "TRAD_oe.log" -type f -exec grep -cHF "103=16" {} ; | grep -v ":0"

Output file size for all files of certain type in directory recursively?

I am trying to get a sum of the total size of PDF files recursively within a directory. I have tried running the command below within the directory, but the recursive part does not work properly as it seems to only report on the files in the current directory and not include the directories within. I am expecting the result to be near 100 GB in size however the command is only reporting about 200 MB of files.
find . -name "*.pdf" | xargs du -sch
Please help!
Use stat -c %n,%s to get the file name and size of the individual files. Then use awk to sum the size and print.
$ find . -name '*.pdf' -exec stat -c %n,%s {} \; | awk -F, '{sum+=$2}END{print sum}'
In fact you don't need %n, since you want only the sum:
$ find . -name '*.pdf' -exec stat -c %s {} \; | awk '{sum+=$1}END{print sum}'
You can get the sum of sizes of all files using:
find . -name '*.pdf' -exec stat -c %s {} \; | tr '\n' '+' | sed 's/+$/\n/'| bc
First find finds all the files as specified and for each file runs stat, which prints the file size. Then I supstitude all newline with a '+' using tr, then I remove the trailing '+' for the newline back and pass it to bc, which prints the sum.

search a string in a file with case insensitive file name

I want to grep for a string in all the files which have a particular patter in their name and is case-insensitive.
For eg if I have two files ABC.txt and aBc.txt, then I want something like
grep -i 'test' *ABC*
The above command should look in both the files.
You can use find and then grep on the results of that:
find . -iname "*ABC*" -exec grep -i "test" {} \;
Note that this will run grep once on each file found. If you want to run grep once on all the files (in which case you risk running into the command line length limit), you can use a plus at the end:
find . -iname "*ABC*" -exec grep -i "test" {} \+
You can also use xargs to process a really large number of results more efficiently:
find . -iname "*ABC*" -print0 | xargs -0 grep -i test
The -print0 makes find output 0-terminated results, and the -0 makes xargs able to deal with this format, which means you don't need to worry about any special characters in the filenames. However, it is not totally portable, since it's a GNU extension.
If you don't have a find that supports -print0 (for example SVR4), you can still use -exec as above or just
find . -iname "*ABC*" | xargs grep -i test
But you should be sure your filenames don't have newlines in them, otherwise xargs will treat each line of a filename as a new argument.
You should use find to match file and search string that you want with command grep which support regular expression, for your question, you should input command like below:
find . -name "*ABC*" -exec grep \<test\> {} \;

prevent space from splitting filenames using backticks

Using find to select files to pass to another command using backticks/backquotes, I've noted that filenames that contain spaces will be split, and therfore not found.
Is it possible to avoid this behaviour? The command I issued looks like this
wc `find . -name '*.txt'`
but for example when there is a file named a b c.txt in directory x it reports
$ wc `find . -name '*.txt'`
wc: ./x/a: No such file or directory
wc: b: No such file or directory
wc: c.txt: No such file or directory
When used with multiple files wc will show the output of each file, and a final summary line with the totals of all files. that's why I want to execute wc once.
I tried escaping spaces with sed, but wc produces the same output (splits filenames with spaces).
wc `find . -name '*.txt' | sed 's/ /\\\ /pg'`
Use the -print0 option to find and the corresponding -0 option to xargs:
find . -name '*.txt' -print0 | xargs -0 wc
You can also use the -exec option to find:
find . -name '*.txt' -exec wc {} +
from this very similar question (should I flag my question as a duplicate?) I found another answer to this using bash's ** expansion:
wc **/*.txt
for this to work I had to
shopt -s globstar

how to find files containing a string using egrep

I would like to find the files containing specific string under linux.
I tried something like but could not succeed:
find . -name *.txt | egrep mystring
Here you are sending the file names (output of the find command) as input to egrep; you actually want to run egrep on the contents of the files.
Here are a couple of alternatives:
find . -name "*.txt" -exec egrep mystring {} \;
or even better
find . -name "*.txt" -print0 | xargs -0 egrep mystring
Check the find command help to check what the single arguments do.
The first approach will spawn a new process for every file, while the second will pass more than one file as argument to egrep; the -print0 and -0 flags are needed to deal with potentially nasty file names (allowing to separate file names correctly even if a file name contains a space, for example).
try:
find . -name '*.txt' | xargs egrep mystring
There are two problems with your version:
Firstly, *.txt will first be expanded by the shell, giving you a listing of files in the current directory which end in .txt, so for instance, if you have the following:
[dsm#localhost:~]$ ls *.txt
test.txt
[dsm#localhost:~]$
your find command will turn into find . -name test.txt. Just try the following to illustrate:
[dsm#localhost:~]$ echo find . -name *.txt
find . -name test.txt
[dsm#localhost:~]$
Secondly, egrep does not take filenames from STDIN. To convert them to arguments you need to use xargs
find . -name *.txt | egrep mystring
That will not work as egrep will be searching for mystring within the output generated by find . -name *.txt which are just the path to *.txt files.
Instead, you can use xargs:
find . -name *.txt | xargs egrep mystring
You could use
find . -iname *.txt -exec egrep mystring \{\} \;
Here's an example that will return the file paths of a all *.log files that have a line that begins with ERROR:
find . -name "*.log" -exec egrep -l '^ERROR' {} \;
there's a recursive option from egrep you can use
egrep -R "pattern" *.log
If you only want the filenames:
find . -type f -name '*.txt' -exec egrep -l pattern {} \;
If you want filenames and matches:
find . -type f -name '*.txt' -exec egrep pattern {} /dev/null \;

Resources