Find Specific Value (Zeros) in .csv Files - search

I'm using the following to error-check directories below the current location to find files with a "0" value on a single line. The grep string I'm using finds all the zeros but I need to find any files with a single "0" on a line in files ending with SPD-daily.csv.
I'm using this -
grep -R --include "*SPD-daily.csv" 0 ./
and I get just about everything with a 0 in it. Thanks,

It's not clear to me if you want to find any files that contain any line with just a zero, or any files that contain only a single line with zero. For the former case:
grep -Rx --include "*SPD-daily.csv" 0 .
The -x tells grep to find an exact match, so lines with a zero and other chars will be ignored.
For the latter case, where the file must contain only one line containing zero:
grep -Rxn --include "*SPD-daily.csv" 0 . | grep ':1'
The -n tells grep to print the line number. This output is piped to grep again which looks for the "line 1" bit.

Related

Unable to run cat command in CentOS (argument list too long)

I have a folder which has around 300k files of each file contains 2-3mb
Now I want to run a command to find the count of char { in shell
My command:
nohup cat *20200119*| grep "{" | wc -l > /mpt_sftp/mpt_cdr_ocs/file.txt
This works fine with small number of files
When i run in files location where I have all the files (300k files) it showing
Argument too long
Would you please try the following:
find . -maxdepth 1 -type f -name "*20200119*" -print0 | xargs -0 grep -F -o "{" | wc -l > /mpt_sftp/mpt_cdr_ocs/file.txt
I have actually tested with 300,000 files of 10-character-long filenames and it is working well.
xargs automatically adjusts the length of argument list fed to grep and we don't need to worry about it. (You can see how the grep command is executed by putting -t option to xargs.)
The -F option drastically speeds-up the execution of grep to search for a fixed string, not a regex.
The -o option will be needed if the character { appears multiple times in a line and you want to count them individually.
The maximum size of the argument list varies, but it is usually something like 128 KiB or 256 KiB. That means you have an awful lot of files if the *20200119* part is overflowing the maximum argument list. But you say "around 3 lakhs files", which is around 300,000 — each file has at least the 8-character date string in it, plus enough other characters to make the name unique, so the list of file names will be far too long for even the largest plausible 'maximum argument list size'.
Note that the nohup cat part of your command is not sensible (see UUoC: Useless Use of Cat); you should be using grep '{' *20200119* to save transferring all that data down a pipe unnecessarily. However, that too would run into problems with the argument list being too long.
You will probably have to use a variant of the following command to get the desired result without overflowing your command line:
find . -depth 1 -name '*20200119*' -exec grep '{' {} + | wc -l
This uses the feature of POSIX find that groups as many arguments as will fit on the command line without overflowing to run grep on large (but not too large) numbers of files, and then pass the output of the grep commands to wc. If you're worried about the file names appearing in the output, suppress them with the grep -h.
Or you might use:
find . -depth 1 -name '*20200119*' -exec grep -c -h '{' {} + |
awk '{sum += $1} END {print sum}'
The grep -c -h on macOS produces a simple number (the count of the number of lines containing at least one {) on its standard output for each file listed in its argument list; so too does GNU grep. The awk script adds up those numbers and prints the result.
Using -depth 1 is supported by find on macOS; so too is -maxdepth 1 — they are equivalent. GNU find does not appear to support -depth 1. It would be better to use -maxdepth 1. POSIX find only supports -depth with no number. You'd probably get a better error message from using -maxdepth 1 with a find that only supports POSIX's rather minimal set of options than you would when using -depth 1.

Looping through a file with path and file names and within these file search for a pattern

I have a file called lookupfile.txt with the following info:
path, including filename
Within bash I would like to search through these files in mylookup file.txt for a pattern : myerrorisbeinglookedat. When found, output the lines where found into another recorder file. All the found result can land in the same file.
Please help.
You can write a single grep statement to achieve this:
grep myerrorisbeinglookedat $(< lookupfile.txt) > outfile
Assuming:
the number of entries in lookupfile.txt is small (tens or hundreds)
there are no white spaces or wildcard characters in the file names
Otherwise:
while IFS= read -r file; do
# print the file names separated by a NULL character '\0'
# to be fed into xargs
printf "$file\0"
done < lookupfile.txt | xargs -0 grep myerrorisbeinglookedat > outfile
xargs takes output of the loop, tokenizes them correctly and invokes grep command. xargs batches up the files based on operating system limits in case there are a large number of files.

Linux command to count number of occurrences of a string in a specific file type within the whole dir

i used the following cmd to count lines of class appears within h extension files
grep -rc 'class' --include \*.h mydirc|wc -l
however, i think the result is wrong when i add up the number of occurrence for each file, it's wrong. I found the wc -l was actually counting the number of files that was searched and printed on the screen. For exmaple,
/afs/eos/dist/ds5-2013.06/FastModelsTools_8.2/OSCI/Syst...sc_buffer.h:6
i added the number of h: up, it didn't match the final value. The final value actually matches the number of printed lines on the sreen which is the same as the number of .h extension files it has searched.
How about
find . -name \*.h | xargs grep class | wc -l

grep an empty value in a binary file in linux

I have a binary file in Linux machine with values: AB=^] (^] is an empty value), AB=N and AB=Y. I want to get the count of occurrences of AB=^] in the file.
I am using the following command :
zcat Logfile|grep 'AB=^]' |wc -l
but it gives the count 0. The above command works fine for AB=N and Y so I guess I am searching for wrong pattern, what should I search for if not AB=^] ?
Output for the above command:
gzip: Logfile: unexpected end of file
0
here 0 indicates the number of occurrences of tag AB=^]
Basically the deleted answers should work. Except of escaping the ^ and ] your regex, you can also use their hexadecimal notation:
grep -o 'AB='$'\x5E'$'\x5D' file | wc -l

Listing entries in a directory using grep

I'm trying to list all entries in a directory whose names contain ONLY upper-case letters. Directories need "/" appended.
#!/bin/bash
cd ~/testfiles/
ls | grep -r *.*
Since grep by default looks for upper-case letters only (right?), I'm just recursively searching through the directories under testfiles for all names who contain only upper-case letters.
Unfortunately this doesn't work.
As for appending directories, I'm not sure why I need to do this. Does anyone know where I can start with some detailed explanations on what I can do with grep? Furthermore how to tackle my problem?
No, grep does not only consider uppercase letters.
Your question I a bit unclear, for example:
from your usage of the -r option, it seems you want to search recursively, however you don't say so. For simplicity I assume you don't need to; consider looking into #twm's answer if you need recursion.
you want to look for uppercase (letters) only. Does that mean you don't want to accept any other (non letter) characters, but which are till valid for file names (like digits or dashes, dots, etc.)
since you don't say th it i not permissible to have only on file per line, I am assuming it is OK (thus using ls -1).
The naive solution would be:
ls -1 | grep "^[[:upper:]]\+$"
That is, print all lines containing only uppercase letters. In my TEMP directory that prints, for example:
ALLBIG
LCFEM
WPDNSE
This however would exclude files like README.TXT or FILE001, which depending on your requirements (see above) should most likely be included.
Thus, a better solution would be:
ls -1 | grep -v "[[:lower:]]\+"
That is, print all lines not containing an lowercase letter. In my TEMP directory that prints for example:
ALLBIG
ALLBIG-01.TXT
ALLBIG005.TXT
CRX_75DAF8CB7768
LCFEM
WPDNSE
~DFA0214428CD719AF6.TMP
Finally, to "properly mark" directories with a trailing '/', you could use the -F (or --classify) option.
ls -1F | grep -v "[[:lower:]]\+"
Again, example output:
ALLBIG
ALLBIG-01.TXT
ALLBIG005.TXT
CRX_75DAF8CB7768
LCFEM/
WPDNSE/
~DFA0214428CD719AF6.TMP
Note a different option would to be use find, if you can live with the different output (e.g. find ! -regex ".*[a-z].*"), but that will have a different output.
The exact regular expression depend on the output format of your ls command. Assuming that you do not use an alias for ls, you can try this:
ls -R | grep -o -w "[A-Z]*"
note that with -R in ls you will recursively list directories and files under the current directory. The grep option -o tells grep to only print the matched part of the text. The -w options tell grep to consider as match only for whole words. The "[A-Z]*" is a regexp to filter only upper-cased words.
Note that this regexp will print TEST.txt as well as TEXT.TXT. In other words, it will only consider names that are formed by letters.
It's ls which lists the files, not grep, so that is where you need to specify that you want "/" appended to directories. Use ls --classify to append "/" to directories.
grep is used to process the results from ls (or some other source, generally speaking) and only show lines that match the pattern you specify. It is not limited to uppercase characters. You can limit it to just upper case characters and "/" with grep -E '^[A-Z/]*$ or if you also want numbers, periods, etc. you could instead filter out lines that contain lowercase characters with grep -v -E [a-z].
As grep is not the program which lists the files, it is not where you want to perform the recursion. ls can list paths recursively if you use ls -R. However, you're just going to get the last component of the file paths that way.
You might want to consider using find to handle the recursion. This works for me:
find . -exec ls -d --classify {} \; | egrep -v '[a-z][^/]*/?$'
I should note, using ls --classify to append "/" to the end of directories may also append some other characters to other types of paths that it can classify. For instance, it may append "*" to the end of executable files. If that's not OK, but you're OK with listing directories and other paths separately, this could be worked around by running find twice - once for the directories and then again for other paths. This works for me:
find . -type d | egrep -v '[a-z][^/]*$' | sed -e 's#$#/#'
find . -not -type d | egrep -v '[a-z][^/]*$'

Resources