Diff command along with Grep gives "Binary file (standard input) matches" - linux

I am trying to use the diff command in conjugation with the grep command to find the difference between 2 files. In other words I have yesterday's file and today's file, I need to find the lines that are new in today's file i.e which were not there in yesterday's file.
I am using the below command to put my required output to the file 'diff.TXT':
diff <(sed '1d' 'todayFile.txt' | sort ) <(sed '1d' yesterdayFile.txt | sort ) | grep "^<" >> 'diff.TXT'
This worked fine until today it produced the 'diff.TXT' as :
Binary file (standard input) matches
This happened in my prod environment but it works in test environment.
So I tried to do some debugging on this by breaking up the command in test environment.
I broke my initial command into 2 parts :
diff <(sed '1d' 'todayFile.txt' | sort ) <(sed '1d' yesterdayFile.txt | sort ) > temp.txt
grep "^<" temp.txt
And alas I get the same error in test environment now which I was getting in prod.
Binary file (standard input) matches
This seems very strange to me.
One strange thing in test environment that I noticed when trying by splitting the command is that, on doing file -i temp.txt, it gives binary.
Can someone please help out with this

From man grep:
-a, --text
Process a binary file as if it were text; this is equivalent to the --binary-files=text option.
--binary-files=TYPE
If the first few bytes of a file indicate that the file contains binary data, assume that the file is of type TYPE. By default, TYPE is
binary, and grep normally outputs either a one-line message saying
that a binary file matches, or no message if there is no match. If
TYPE is without-match, grep assumes that a binary file does not match;
this is equivalent to the -I option. If TYPE is text, grep processes a
binary file as if it were text; this is equivalent to the -a option.
Warning: grep --binary-files=text might output binary garbage, which
can have nasty side effects if the output is a terminal and if the
terminal driver interprets some of it as commands.
grep scans the file, and if it finds any unreadable characters, it assumes the file is in binary. Add -a switch to grep to make it treat the file a readable text. Most probably your input files contain some unreadable characters.
diff <(sed '1d' 'todayFile.txt' | sort ) <(sed '1d' yesterdayFile.txt | sort ) | grep "^<"
Wouldn't be comm -13 <(...) <(...) faster and simpler?

Related

How to capture a file name when using unzip -c and doing multiple greps

I am running the following command:
for file in 2017120[1-9]/54100_*.zip; do unzip -c "$file" | grep "3613825" | grep '3418665' ; done
This does a grep job of pulling the data that matches my grep parameters, but I can't figure out how to capture which file the results came from.
I have tried adding grep -H but the result comes back with (standard input).
How can I capture the file name?
When I need to do something like this I just add an echo of the file name to the for loop like this:
for file in 2017120[1-9]/54100_*.zip; do echo $file; unzip -c "$file" | grep "3613825" | grep '3418665' ; done
This prints out the list of files, and the grep line that matches will print immediately after the file that the match is in. like this:
file_1
file_2
file_3
matching line
file_4
file_5
another matching line
file_6
...
Thus I know the matching lines occurred in file_3 and file_5.

chaining grep commands to return values from different lines

I have a large log file where I am currently running two commands. I search the log group and then make a filtered file where I return the following 3 lines after a match:
cat testFile.log |grep 'Text I am looking for' -A 3 > filter.txt
Then once I have my filtered file, I scan through that file to create myself a final file of the values I want:
cat filter.txt | grep -E 'Data\w{7}' -o > final.txt
My aim is to do this in one line if possible so I can wrap a bunch of these checks together in a script so I can jump in and search x,y,z and then I get a set of finalised files at the end for each one.
You just need to make use of pipes. That is, the UNIX core way of thinking in which small pieces are grouped together to create a beautiful tool.
In this case you have to orders:
grep 'Text I am looking for' -A 3 testFile.log > filter.txt # 1
grep -E 'Data\w{7}' -o filter.txt > final.txt # 2
(Note I removed the cat file | grep '...' since it is the same as saying grep '...' file)
Since the output of #1 is to feed #2, just use pipes:
grep -A 3 'Text I am looking for' testFile.log | grep -Eo 'Data\w{7}' > final.txt
This way you prevent the use of an unnecessary intermediate file.

Linux - numerical sort then overwrite file

I have a csv file with a general format
date,
2013.04.04,
2013.04.04,
2012.04.02,
2013.02.01,
2013.04.05,
2013.04.02,
a script I run will add data to this file which will not necessarily be in date order. How can I sort the file into date order (ignoring the header) and overwrite the existing file rather than writing to STDOUT
I have used awk
awk 'NR == 1; NR > 1 {print $0 | "sort -n"}' file > file_sorted
mv file_sorted file
Is there a more effective way to do this without creating an additional file and moving?
You can do the following:
sort -n -o your_file your_file
-o defines the output file and is defined by POSIX, so it is safe to use (no original file mangled).
Output
$ cat s
date,
2013.04.04,
2013.04.04,
2012.04.02,
2013.02.01,
2013.04.05,
2013.04.02,
$ sort -n -o s s
$ cat s
date,
2012.04.02,
2013.02.01,
2013.04.02,
2013.04.04,
2013.04.04,
2013.04.05,
Note that there exists a race condition if the script and the sorting is running at the same time.
If the file header sorts before the data, you can use the solution suggested by fedorqui as sort -o file file is safe (at least with GNU sort, see info sort).
Running sort from within awk seems kind of convoluted, another alternative would be to use head and tail (assuming bash shell):
{ head -n1 file; tail -n+2 file | sort -n; } > file_sorted
Now, about replacing the existing file. AFAIK, You have two options, create a new file and replace old file with new as you describe in your question, or you could use sponge from moreutils like this:
{ head -n1 file; tail -n+2 file | sort -n; } | sponge file
Note that sponge still creates a temporary file.

Preserve colouring after piping grep to grep

There is a simlar question in Preserve ls colouring after grep’ing but it annoys me that if you pipe colored grep output into another grep that the coloring is not preserved.
As an example grep --color WORD * | grep -v AVOID does not keep the color of the first output. But for me ls | grep FILE do keep the color, why the difference ?
grep sometimes disables the color output, for example when writing to a pipe. You can override this behavior with grep --color=always
The correct command line would be
grep --color=always WORD * | grep -v AVOID
This is pretty verbose, alternatively you can just add the line
alias cgrep="grep --color=always"
to your .bashrc for example and use cgrep as the colored grep. When redefining grep you might run into trouble with scripts which rely on specific output of grep and don't like ascii escape code.
A word of advice:
When using grep --color=always, the actual strings being passed on to the next pipe will be changed. This can lead to the following situation:
$ grep --color=always -e '1' * | grep -ve '12'
11
12
13
Even though the option -ve '12' should exclude the middle line, it will not because there are color characters between 1 and 2.
The existing answers only address the case when the FIRST command is grep (as asked by the OP, but this problem arises in other situations too).
More general answer
The basic problem is that the command BEFORE | grep, tries to be "smart" by disabling color when it realizes the output is going to a pipe. This is usually what you want so that ANSI escape codes don't interfere with your downstream program.
But if you want colorized output emanating from earlier commands, you need to force color codes to be produced regardless of the output sink. The forcing mechanism is program-specific.
Git: use -c color.status=always
git -c color.status=always status | grep -v .DS_Store
Note: the -c option must come BEFORE the subcommand status.
Others
(this is a community wiki post so feel free to add yours)
Simply repeat the same grep command at the end of your pipe.
grep WORD * | grep -v AVOID | grep -v AVOID2 | grep WORD

Pipe output to use as the search specification for grep on Linux

How do I pipe the output of grep as the search pattern for another grep?
As an example:
grep <Search_term> <file1> | xargs grep <file2>
I want the output of the first grep as the search term for the second grep. The above command is treating the output of the first grep as the file name for the second grep. I tried using the -e option for the second grep, but it does not work either.
You need to use xargs's -i switch:
grep ... | xargs -ifoo grep foo file_in_which_to_search
This takes the option after -i (foo in this case) and replaces every occurrence of it in the command with the output of the first grep.
This is the same as:
grep `grep ...` file_in_which_to_search
Try
grep ... | fgrep -f - file1 file2 ...
If using Bash then you can use backticks:
> grep -e "`grep ... ...`" files
the -e flag and the double quotes are there to ensure that any output from the initial grep that starts with a hyphen isn't then interpreted as an option to the second grep.
Note that the double quoting trick (which also ensures that the output from grep is treated as a single parameter) only works with Bash. It doesn't appear to work with (t)csh.
Note also that backticks are the standard way to get the output from one program into the parameter list of another. Not all programs have a convenient way to read parameters from stdin the way that (f)grep does.
I wanted to search for text in files (using grep) that had a certain pattern in their file names (found using find) in the current directory. I used the following command:
grep -i "pattern1" $(find . -name "pattern2")
Here pattern2 is the pattern in the file names and pattern1 is the pattern searched for
within files matching pattern2.
edit: Not strictly piping but still related and quite useful...
This is what I use to search for a file from a listing:
ls -la | grep 'file-in-which-to-search'
Okay breaking the rules as this isn't an answer, just a note that I can't get any of these solutions to work.
% fgrep -f test file
works fine.
% cat test | fgrep -f - file
fgrep: -: No such file or directory
fails.
% cat test | xargs -ifoo grep foo file
xargs: illegal option -- i
usage: xargs [-0opt] [-E eofstr] [-I replstr [-R replacements]] [-J replstr]
[-L number] [-n number [-x]] [-P maxprocs] [-s size]
[utility [argument ...]]
fails. Note that a capital I is necessary. If i use that all is good.
% grep "`cat test`" file
kinda works in that it returns a line for the terms that match but it also returns a line grep: line 3 in test: No such file or directory for each file that doesn't find a match.
Am I missing something or is this just differences in my Darwin distribution or bash shell?
I tried this way , and it works great.
[opuser#vjmachine abc]$ cat a
not problem
all
problem
first
not to get
read problem
read not problem
[opuser#vjmachine abc]$ cat b
not problem xxy
problem abcd
read problem werwer
read not problem 98989
123 not problem 345
345 problem tyu
[opuser#vjmachine abc]$ grep -e "`grep problem a`" b --col
not problem xxy
problem abcd
read problem werwer
read not problem 98989
123 not problem 345
345 problem tyu
[opuser#vjmachine abc]$
You should grep in such a way, to extract filenames only, see the parameter -l (the lowercase L):
grep -l someSearch * | xargs grep otherSearch
Because on the simple grep, the output is much more info than file names only. For instance when you do
grep someSearch *
You will pipe to xargs info like this
filename1: blablabla someSearch blablabla something else
filename2: bla someSearch bla otherSearch
...
Piping any of above line makes nonsense to pass to xargs.
But when you do grep -l someSearch *, your output will look like this:
filename1
filename2
Such an output can be passed now to xargs
I have found the following command to work using $() with my first command inside the parenthesis to have the shell execute it first.
grep $(dig +short) file
I use this to look through files for an IP address when I am given a host name.

Resources