grep an empty value in a binary file in linux - linux

I have a binary file in Linux machine with values: AB=^] (^] is an empty value), AB=N and AB=Y. I want to get the count of occurrences of AB=^] in the file.
I am using the following command :
zcat Logfile|grep 'AB=^]' |wc -l
but it gives the count 0. The above command works fine for AB=N and Y so I guess I am searching for wrong pattern, what should I search for if not AB=^] ?
Output for the above command:
gzip: Logfile: unexpected end of file
0
here 0 indicates the number of occurrences of tag AB=^]

Basically the deleted answers should work. Except of escaping the ^ and ] your regex, you can also use their hexadecimal notation:
grep -o 'AB='$'\x5E'$'\x5D' file | wc -l

Related

grep search for pipe term Argument list too long

I have something like
grep ... | grep -f - *orders*
where the first grep ... gives a list of order numbers like
1393
3435
5656
4566
7887
6656
and I want to find those orders in multiple files (a_orders_1, b_orders_3 etc.), these files look something like
1001|strawberry|sam
1002|banana|john
...
However, when the first grep... returns too many order numbers I get the error "Argument list too long".
I also tried to give the grep command one order number at a time using a while loop but that's just way too slow. I did
grep ... | while read order; do grep $order *orders*; done
I'm very new to Unix clearly, explanations would be greatly appreciated, thanks!
The problem is the expansion of *orders* in grep ... | grep -f - *orders*. Your shell expands the pattern to the full list of files before passing that list to grep.
So we need to pass fewer "orders" files to each grep invocation. The find program is one way to do that, because it accepts wildcards and expands them internally:
find . -name '*orders*' # note this searches subdirectories too
Now that you know how to generate the list of filenames without running into the command line length limit, you can tell find to execute your second grep:
grep ... | find . -name '*orders*' -exec grep -f - {} +
The {} is where find places the filenames, and the + terminates the command and lets find know you're OK with passing multiple arguments to each invocation of grep -f, while still respecting the command line length limit by invoking grep -f more than once if the list of files exceeds the allowed length of a single command.

Unable to run cat command in CentOS (argument list too long)

I have a folder which has around 300k files of each file contains 2-3mb
Now I want to run a command to find the count of char { in shell
My command:
nohup cat *20200119*| grep "{" | wc -l > /mpt_sftp/mpt_cdr_ocs/file.txt
This works fine with small number of files
When i run in files location where I have all the files (300k files) it showing
Argument too long
Would you please try the following:
find . -maxdepth 1 -type f -name "*20200119*" -print0 | xargs -0 grep -F -o "{" | wc -l > /mpt_sftp/mpt_cdr_ocs/file.txt
I have actually tested with 300,000 files of 10-character-long filenames and it is working well.
xargs automatically adjusts the length of argument list fed to grep and we don't need to worry about it. (You can see how the grep command is executed by putting -t option to xargs.)
The -F option drastically speeds-up the execution of grep to search for a fixed string, not a regex.
The -o option will be needed if the character { appears multiple times in a line and you want to count them individually.
The maximum size of the argument list varies, but it is usually something like 128 KiB or 256 KiB. That means you have an awful lot of files if the *20200119* part is overflowing the maximum argument list. But you say "around 3 lakhs files", which is around 300,000 — each file has at least the 8-character date string in it, plus enough other characters to make the name unique, so the list of file names will be far too long for even the largest plausible 'maximum argument list size'.
Note that the nohup cat part of your command is not sensible (see UUoC: Useless Use of Cat); you should be using grep '{' *20200119* to save transferring all that data down a pipe unnecessarily. However, that too would run into problems with the argument list being too long.
You will probably have to use a variant of the following command to get the desired result without overflowing your command line:
find . -depth 1 -name '*20200119*' -exec grep '{' {} + | wc -l
This uses the feature of POSIX find that groups as many arguments as will fit on the command line without overflowing to run grep on large (but not too large) numbers of files, and then pass the output of the grep commands to wc. If you're worried about the file names appearing in the output, suppress them with the grep -h.
Or you might use:
find . -depth 1 -name '*20200119*' -exec grep -c -h '{' {} + |
awk '{sum += $1} END {print sum}'
The grep -c -h on macOS produces a simple number (the count of the number of lines containing at least one {) on its standard output for each file listed in its argument list; so too does GNU grep. The awk script adds up those numbers and prints the result.
Using -depth 1 is supported by find on macOS; so too is -maxdepth 1 — they are equivalent. GNU find does not appear to support -depth 1. It would be better to use -maxdepth 1. POSIX find only supports -depth with no number. You'd probably get a better error message from using -maxdepth 1 with a find that only supports POSIX's rather minimal set of options than you would when using -depth 1.

linux grep command is not returning accurate results

grep command is not returning accurate results.
I have a text file which has some html content. I want to get the count of a specific word using the grep command.the grep command is not returning accurate results.OS - Red Hat Enterprise Linux Server release 6.6 (Santiago)
Below is the content of the input file's test.txt.
This file has two occurrences of the word "Tomcat"
<html><title>Tomcat Server</title><body><font face="Verdana, Arial" size="-1"><p>Tomcat Server</p></body></html>
grep command
cat test.txt|grep -c Tomcat
cat test.txt|grep -c "Tomcat"
Note: It's the same result with or without quotes
Expected Result: count - 2
Actual Result: count - 1
Note the difference between "accurate" and "desired." The grep man page says of the -c flag:
Suppress normal output; instead print a count of matching lines for each input file. With the -v, --invert-match option (see below), count non-matching lines.
So it's counting that one line had a match, and that's what it tells you.

Find Specific Value (Zeros) in .csv Files

I'm using the following to error-check directories below the current location to find files with a "0" value on a single line. The grep string I'm using finds all the zeros but I need to find any files with a single "0" on a line in files ending with SPD-daily.csv.
I'm using this -
grep -R --include "*SPD-daily.csv" 0 ./
and I get just about everything with a 0 in it. Thanks,
It's not clear to me if you want to find any files that contain any line with just a zero, or any files that contain only a single line with zero. For the former case:
grep -Rx --include "*SPD-daily.csv" 0 .
The -x tells grep to find an exact match, so lines with a zero and other chars will be ignored.
For the latter case, where the file must contain only one line containing zero:
grep -Rxn --include "*SPD-daily.csv" 0 . | grep ':1'
The -n tells grep to print the line number. This output is piped to grep again which looks for the "line 1" bit.

Difference in output when executing system(...) in program and actual command

I have a program written in C that creates an output file with lines of characters. My intention is to count the number of unique lines of characters in this output file (excluding "ABC").
I can do it manually via the Linux command line, using
cat output/output.txt | grep -v "ABC" | sort | uniq -c > uniq_stats/stats.txt
I also put this command into my program so I don't have to do it manually.
memset(command, 0, 500);
sprintf(command, "cat %s | grep -v \"ABC\" | sort | uniq -c > uniq_stats/%s", out_filename, filename);
system(command);
out_filename is output/output.txt and filename is stats.txt
I expect a particular line to be seen 1351 times. The method of using the command line gave this correct value. However, the system(command) method gave only 1349 times. Also, there was another line that was incomplete using the system(command) method, i.e. only a portion of the string was printed out.
Why is it that I got different output from the 2 methods? I have only seen this problem once, as I have tried 4 or 5 other files and both methods gave me the correct results.

Resources