Count files in a directory with filename matching a string - linux

The command:
ls /some/path/some/dir/ | grep some_mask_*.txt | wc -l
returns the correct number of files when doing this via ssh on bash. When I put this into a .sh Script
iFiles=`ls /some/path/some/dir/ | grep some_mask_*.txt | wc -l`
echo "iFiles: ${iFiles}"
it is always 0. Whats wrong here?
Solution:
When I worked on it I found out that my "wildcard-mask" seems to be the problem. using grep some_mask_ | grep \.txt instead of the single grep above helped me to solve the problem for the first.
I marked the answer as solution which pretty much describes exactly what I made wrong. I'm going to edit my script now. Thanks everyone.

The problem here is that grep some_mask_*.txt is expanded by the shell and not by grep, so most likely you have a file in the directory where grep is executed which matches some_mask_*.txtand that filename is then used by grep as a filter.
If you want to ensure that the pattern is used by grep then you need to enclose it in single quotes. In addition you need to write the pattern as a regexp and not as a wildcard match (which bash uses for matching). Putting this together your command line version should be:
ls /some/path/some/dir/ | grep 'some_mask_.*\.txt' | wc -l
and the script:
iFiles=`ls /some/path/some/dir/ | grep 'some_mask_.*\.txt' | wc -l`
echo "iFiles: ${iFiles}"
Note that . needs to be prefixed with a backslash since it has special significance as a regexp that matches a single character.
I would also suggest that you postfix the regexp with $ in order to anchor it to the end (thus ensuring that the regexp matches filenames that ends with ".txt"):
ls /some/path/some/dir/ | grep 'some_mask_.*\.txt$' | wc -l

Parsing ls is not a good thing. If you want to find files, use find:
find /some/path/some/dir/ -maxdepth 1 -name "some_mask_*.txt" -print0
This will print those files matching the condition within that directory and without going into subdirectories. Using print0 prevents weird situations when the file name contains not common characters:
-print0
True; print the full file name on the standard output, followed
by a null character (instead of the newline character that
-print uses). This allows file names that contain newlines or
other types of white space to be correctly interpreted by pro‐
grams that process the find output. This option corresponds to
the -0 option of xargs.
Then, just pipe to wc -l to get the final count.
By the way, note that
ls /some/path/some/dir/ | grep some_mask_*.txt
can be reduced to a simple
ls /some/path/some/dir/some_mask_*.txt

Simple solution is (for bash)
find -name "*pattern*" | wc -l
"*" represent anything (prefix- anything before , postfix - anything after)
wc -l : give the count
find -name : will find file with given name in double quotes

I suggest to use find as shown below. The reason for that is that filenames may contain newlines which would break a script that is using wc -l. I'm printing just a dot per filename and count the dots with wc -c:
find /some/path/some/dir/ -maxdepth 1 -name 'some_mask_*.txt' -printf '.' | wc -c
or if you want to write the results to variable:
ifiles=$(find /some/path/some/dir/ -maxdepth 1 -name 'some_mask_*.txt' -printf '.' | wc -c)

Try this,
iFiles=$(ls /some/path/some/dir/ | grep some_mask_*.txt | wc -l)
echo "iFiles: ${iFiles}"

I think there wouldn't be the shell version problem.
try to use escape char on your command. It likes below.
ls /some/path/some/dir/ | grep some_mask_\*.txt | wc -l

Your problem is due to shell expansion. You probably tested the command line in the original directory, but if you try it from another directory then it will not work anymore.
When you type:
grep *.txt
then the shell replace *.txt with all the file names that correspond to the pattern and then execute the command (something like grep a.txt dummy.txt). But you want the pattern to be interpreted by grep not expanded by the shell, so:
ls /tmp | grep '.*.cpp'
wille make it. Here the pattern is in the syntax of grep command (each command as its own syntax) and not expanded because it is protected with surroundings '.
Modify your command like:
a=`ls /tmp | grep '.*.cpp'`

This is quite similar to other answers, but with a bit more robustness
iFiles=$( find /some/path/ -name "some_mask_*.txt" -type f 2> /dev/null | wc -l )
echo "Number of files: $iFiles"
This limits the find to files and also pipes stderr to null, so if the find command doesn't work or has permission issues you don't get a bogus result.

I was writing a shell script to count the files of same format in a directory. For that I have used the below command
LOCATION=/home/students/run_date/FILENAME #stored the location in a variable
DIRECTORYCOUNT=$(find $LOCATION -type d -print | wc -l) using find command
DIRECTORYCOUNT=$(find $LOCATION -type f -print | wc -l)
I have used above commands and enter code here it worked well

Related

How to count the number of files whose name contains a vowel

I was trying to code a script that counts the number of files with a vowel in a directory.
If I use
find $1 -type f | wc -l
I get the number of files in the directory $1, but I do not know how to use grep to count just the one with a vowel, I was trying something like this
find $1 -type f | grep -l '[a,e,i,o,u,A,E,I,O,U]' | wc -l
You can use this gnu find command to count all the files with at least one vowel:
find . -maxdepth 1 -type f -iname '*[aeiou]*' -printf ".\n" | wc -l
-iname '*[aeiou]*' glob pattern will match only filename with at least one of the a,e,i,o,u (ignore case).
Remove -maxdepth 1 if you want to count files recursively in sub directories as well.
If you can accept counting directories:
ls -d *a* *e* *i* *o* *u* *y* *A* *E* *I* *O* *U* *Y* | wc -l
Otherwise:
find $1 -type f | grep -i '[aeiouy]' | wc -l
Your attempt fails for two reasons. First, -l does not make sense if grep is reading in a pipeline, since the purpose of -l is to print only the input file that matched, but in this case the only input file is stdin. Second, your syntax is wrong. Try:
... | grep -i '[aeiou]' | ...
Please don't use commas in a character group expression (the thing in [] brackets)
The best way is to first do a find(1) to get the files you want to scan. Then you need the base names, as the path info is not valid. Finally, you need to grep with [aeiouAEIOU] to get only the lines with a vowel in, and finally use wc(1) to count lines.
find ${DIRECTORY} -type f -print | sed -e 's#^.*/##' | grep '[aeiouAEIOU]' | wc -l
-type f allows you to select just files (not directories). The sed(1) command edits the output, line by line, eliminating the first part of the name up to the last / character. The grep filters names with at least one vowel and discards the others, and finally wc -l counts the lines.

Script /bin/bash augment too long and grep section not working in find

I'm trying to write a simple script that will find if a field in a file is blank,e.g
x_Field=""
find /mnt/sdb1/*/*/ -name 'files.txt' -type f -follow -print0 {} \; | xargs -0 grep -o -P '(?<=x_Field).*(?=y_Field)' | cut -c 3 | awk '{sub(/..$/,"")}1'
I think the command works without the find but not with the find?
grep -o -P '(?<=x_Field).*(?=y_Field)' | cut -c 3 | awk '{sub(/..$/,"")}1'
Also when I get this half working if seems I have too many files to scan so it gives a augment too long :-(
Sorry also to add in that I need to go through hundreds of subfolders hence why I used the wildcard
Your find command has a number of errors.
Apparently, /mnt/sdb1/*/* expands to a list which is too long for your shell. You can replace that with /mnt/sdb1 -mindepth 2 (assuming you want to avoid finding anything in the directories immediately below sdb1).
The {} \; would be useful if you had an -exec option, but you don't.
Also, the grep | cut | awk can probably be refactored into a single Awk script, but without properly understanding what it's supposed to accomplish, it's hard to write a replacement.

Use grep to find text, if found in file echo another string inside that file

I am wondering if there is a one liner for doing what I am looking for, or if I have to write a bash script. What I am looking to do is search recursively in a directory, if the string I am looking for is found then search that same file for another string and print it to the screen. So in this example I want to find out if the score of the "X-Spam-Status" header is between a range in the email, if it is, print out the sender or the subject header in that email.
Example:
The command I am using is:
grep 'X-Spam-Status: .* score=[5-9]\.' /var/email/example.com/example/cur/* | wc -l
Here is the header that I need to locate this information in:
X-Spam-Status: No, score=6.5 required=5.0 tests=HTML_MESSAGE,
RCVD_IN_DNSWL_NONE,T_DKIM_INVALID,URIBL_BLOCKED autolearn=ham version=3.3.2
If grep finds a match in the header above, find and echo this header from the same email:
From: "From the Desk of Allen Watson" <FromtheDeskofAllenWatson#emadest.eu>
Subject: Don't Live in Fear of Loud Noises
It can be either the subject or the from. It does not need to be both.
This solution first uses a find, which has a file filter.
To make sure, spaces in filenames are correctly handled, the switch -print0 is used to create a list of null terminated filenames.
This list is used by xargs to pass the arguments to the grep command, which outputs a null terminated line too. You can use as many xargs grep combinations as you like.
The last command in the pipe is without the -Z, if you want to read the output. In our case I just used a head -2 to output the first two lines of the file.
function grep2strings_recursive() {
if [ '0' = "$#" ]; then
echo "Usage: $FUNCNAME <dir> <string1> <string2>"
return
fi
find "$1" -type f -print0 | xargs -0 grep -lZ "$2" | xargs -0 grep -lZ "$3" | xargs -0 head -2
}
grep2strings_recursive '/var/email/example.com/example/cur' 'X-Spam-Status: .* score=[5-9]\.' 'From'
It is important to use the null output terminators (-print0 and -Z) and the null input terminators (-0) to ensure correct behaviour in case of space/tab/newlines characters in filenames.
Something like this?
find /path/to/files -type f | xargs -i grep "searchterm" {} | grep -o "othersearchterm"
i.e.
find /path/to/files -type f | xargs -i grep "X-Spam-Status" {} | egrep -o '(From|Subject).*'

Unix Command to List files containing string but *NOT* containing another string

How do I recursively view a list of files that has one string and specifically doesn't have another string? Also, I mean to evaluate the text of the files, not the filenames.
Conclusion:
As per comments, I ended up using:
find . -name "*.html" -exec grep -lR 'base\-maps' {} \; | xargs grep -L 'base\-maps\-bot'
This returned files with "base-maps" and not "base-maps-bot". Thank you!!
Try this:
grep -rl <string-to-match> | xargs grep -L <string-not-to-match>
Explanation: grep -lr makes grep recursively (r) output a list (l) of all files that contain <string-to-match>. xargs loops over these files, calling grep -L on each one of them. grep -L will only output the filename when the file does not contain <string-not-to-match>.
The use of xargs in the answers above is not necessary; you can achieve the same thing like this:
find . -type f -exec grep -q <string-to-match> {} \; -not -exec grep -q <string-not-to-match> {} \; -print
grep -q means run quietly but return an exit code indicating whether a match was found; find can then use that exit code to determine whether to keep executing the rest of its options. If -exec grep -q <string-to-match> {} \; returns 0, then it will go on to execute -not -exec grep -q <string-not-to-match>{} \;. If that also returns 0, it will go on to execute -print, which prints the name of the file.
As another answer has noted, using find in this way has major advantages over grep -Rl where you only want to search files of a certain type. If, on the other hand, you really want to search all files, grep -Rl is probably quicker, as it uses one grep process to perform the first filter for all files, instead of a separate grep process for each file.
These answers seem off as the match BOTH strings. The following command should work better:
grep -l <string-to-match> * | xargs grep -c <string-not-to-match> | grep '\:0'
Here is a more generic construction:
find . -name <nameFilter> -print0 | xargs -0 grep -Z -l <patternYes> | xargs -0 grep -L <patternNo>
This command outputs files whose name matches <nameFilter> (adjust find predicates as you need) which contain <patternYes>, but do not contain <patternNo>.
The enhancements are:
It works with filenames containing whitespace.
It lets you filter files by name.
If you don't need to filter by name (one often wants to consider all the files in current directory), you can strip find and add -R to the first grep:
grep -R -Z -l <patternYes> | xargs -0 grep -L <patternNo>
find . -maxdepth 1 -name "*.py" -exec grep -L "string-not-to-match" {} \;
This Command will get all ".py" files that don't contain "string-not-to-match" at same directory.
To match string A and exclude strings B & C being present in the same line I use, and quotes to allow search string to contain a space
grep -r <string A> | grep -v -e <string B> -e "<string C>" | awk -F ':' '{print $1}'
Explanation: grep -r recursively filters all lines matching in output format
filename: line
To exclude (grep -v) from those lines the ones that also contain either -e string B or -e string C. awk is used to print only the first field (the filename) using the colon as fieldseparator -F

Regarding grep in solaris

I want grep for a particular work in multiple files. Multiple files are stored in variable testing.
TESTING=$(ls -tr *.txt)
echo $TESTING
test.txt ab.txt bc.txt
grep "word" "$TESTING"
grep: can't open test.txt
ab.txt
bc.txt
Giving me an error. Is there any other way to do it other than for loop
Take the double quotes out from around $TESTING.
grep "word" $TESTING
The double quotes are making your whole file list expand to a single argument to grep. The right way to do this is:
find . -name \*.txt -print0 | xargs -0 grep "word"
No quotes needed I guess.
grep "word" $TESTING
works for me (Ubuntu, bash).

Resources