chaining grep commands to return values from different lines - search

I have a large log file where I am currently running two commands. I search the log group and then make a filtered file where I return the following 3 lines after a match:
cat testFile.log |grep 'Text I am looking for' -A 3 > filter.txt
Then once I have my filtered file, I scan through that file to create myself a final file of the values I want:
cat filter.txt | grep -E 'Data\w{7}' -o > final.txt
My aim is to do this in one line if possible so I can wrap a bunch of these checks together in a script so I can jump in and search x,y,z and then I get a set of finalised files at the end for each one.

You just need to make use of pipes. That is, the UNIX core way of thinking in which small pieces are grouped together to create a beautiful tool.
In this case you have to orders:
grep 'Text I am looking for' -A 3 testFile.log > filter.txt # 1
grep -E 'Data\w{7}' -o filter.txt > final.txt # 2
(Note I removed the cat file | grep '...' since it is the same as saying grep '...' file)
Since the output of #1 is to feed #2, just use pipes:
grep -A 3 'Text I am looking for' testFile.log | grep -Eo 'Data\w{7}' > final.txt
This way you prevent the use of an unnecessary intermediate file.

Related

Diff command along with Grep gives "Binary file (standard input) matches"

I am trying to use the diff command in conjugation with the grep command to find the difference between 2 files. In other words I have yesterday's file and today's file, I need to find the lines that are new in today's file i.e which were not there in yesterday's file.
I am using the below command to put my required output to the file 'diff.TXT':
diff <(sed '1d' 'todayFile.txt' | sort ) <(sed '1d' yesterdayFile.txt | sort ) | grep "^<" >> 'diff.TXT'
This worked fine until today it produced the 'diff.TXT' as :
Binary file (standard input) matches
This happened in my prod environment but it works in test environment.
So I tried to do some debugging on this by breaking up the command in test environment.
I broke my initial command into 2 parts :
diff <(sed '1d' 'todayFile.txt' | sort ) <(sed '1d' yesterdayFile.txt | sort ) > temp.txt
grep "^<" temp.txt
And alas I get the same error in test environment now which I was getting in prod.
Binary file (standard input) matches
This seems very strange to me.
One strange thing in test environment that I noticed when trying by splitting the command is that, on doing file -i temp.txt, it gives binary.
Can someone please help out with this
From man grep:
-a, --text
Process a binary file as if it were text; this is equivalent to the --binary-files=text option.
--binary-files=TYPE
If the first few bytes of a file indicate that the file contains binary data, assume that the file is of type TYPE. By default, TYPE is
binary, and grep normally outputs either a one-line message saying
that a binary file matches, or no message if there is no match. If
TYPE is without-match, grep assumes that a binary file does not match;
this is equivalent to the -I option. If TYPE is text, grep processes a
binary file as if it were text; this is equivalent to the -a option.
Warning: grep --binary-files=text might output binary garbage, which
can have nasty side effects if the output is a terminal and if the
terminal driver interprets some of it as commands.
grep scans the file, and if it finds any unreadable characters, it assumes the file is in binary. Add -a switch to grep to make it treat the file a readable text. Most probably your input files contain some unreadable characters.
diff <(sed '1d' 'todayFile.txt' | sort ) <(sed '1d' yesterdayFile.txt | sort ) | grep "^<"
Wouldn't be comm -13 <(...) <(...) faster and simpler?

how to find the files/(pwd of file) which is having a particular word below a particular word in directories and sub directories in linux

I have 200 folders, Each folder is having multiple shell and sql files, my requirement is to grep/find all the directories and the files which are having the below
Insert into dbname.table_name
Select
I want know what are all the files(pwd of the file) having insert into ${dbname}.{table_name} followed by select which is in next line. Db name and table name is same for all
You could use grep -r -i -A1 "insert.into" | grep -i -B1 select
-r will grep on all files in the current directory and recursively in all subdirectories.
-A1 prints one line After the matching line,
-B1 prints one line Before the matching line.
So the first grep above will print all lines matching insert.into plus the next; the second grep will keep only those pairs that have a select on their second line.
(-i to ignore case)
You may then append | grep -i insert.into | cut -d: -f1 | sort -u to get only the file names.
Note this makes some assumptions:
options -A/-B are only on Linux/gnu, not on plain Unixes like HPUX.
if you have lines containing both insert.into and select, you'll get some funky output.

How to capture a file name when using unzip -c and doing multiple greps

I am running the following command:
for file in 2017120[1-9]/54100_*.zip; do unzip -c "$file" | grep "3613825" | grep '3418665' ; done
This does a grep job of pulling the data that matches my grep parameters, but I can't figure out how to capture which file the results came from.
I have tried adding grep -H but the result comes back with (standard input).
How can I capture the file name?
When I need to do something like this I just add an echo of the file name to the for loop like this:
for file in 2017120[1-9]/54100_*.zip; do echo $file; unzip -c "$file" | grep "3613825" | grep '3418665' ; done
This prints out the list of files, and the grep line that matches will print immediately after the file that the match is in. like this:
file_1
file_2
file_3
matching line
file_4
file_5
another matching line
file_6
...
Thus I know the matching lines occurred in file_3 and file_5.

Find line number in a text file - without opening the file

In a very large file I need to find the position (line number) of a string, then extract the 2 lines above and below that string.
To do this right now - I launch vi, find the string, note it's line number, exit vi, then use sed to extract the lines surrounding that string.
Is there a way to streamline this process... ideally without having to run vi at all.
Maybe using grep like this:
grep -n -2 your_searched_for_string your_large_text_file
Will give you almost what you expect
-n : tells grep to print the line number
-2 : print 2 additional lines (and the wanted string, of course)
You can do
grep -C 2 yourSearch yourFile
To send it in a file, do
grep -C 2 yourSearch yourFile > result.txt
Use grep -n string file to find the line number without opening the file.
you can use cat -n to display the line numbers and then use awk to get the line number after a grep in order to extract line number:
cat -n FILE | grep WORD | awk '{print $1;}'
although grep already does what you mention if you give -C 2 (above/below 2 lines):
grep -C 2 WORD FILE
You can do it with grep -A and -B options, like this:
grep -B 2 -A 2 "searchstring" | sed 3d
grep will find the line and show two lines of context before and after, later remove the third one with sed.
If you want to automate this, simple you can do a Shell Script. You may try the following:
#!/bin/bash
VAL="your_search_keyword"
NUM1=`grep -n "$VAL" file.txt | cut -f1 -d ':'`
echo $NUM1 #show the line number of the matched keyword
MYNUMUP=$["NUM1"-1] #get above keyword
MYNUMDOWN=$["NUM1"+1] #get below keyword
sed -n "$MYNUMUP"p file.txt #display above keyword
sed -n "$MYNUMDOWN"p file.txt #display below keyword
The plus point of the script is you can change the keyword in VAL variable as you like and execute to get the needed output.

How do I grep multiple lines (output from another command) at the same time?

I have a Linux driver running in the background that is able to return the current system data/stats. I view the data by running a console utility (let's call it dump-data) in a console. All data is dumped every time I run dump-data. The output of the utility is like below
Output:
- A=reading1
- B=reading2
- C=reading3
- D=reading4
- E=reading5
...
- variableX=readingX
...
The list of readings returned by the utility can be really long. Depending on the scenario, certain readings would be useful while everything else would be useless.
I need a way to grep only the useful readings whose names might have have nothing in common (via a bash script). I.e. Sometimes I'll need to collect A,D,E; and other times I'll need C,D,E.
I'm attempting to graph the readings over time to look for trends, so I can't run something like this:
# forgive my pseudocode
Loop
dump-data | grep A
dump-data | grep D
dump-data | grep E
End Loop
to collect A,D,E as that would actually give me readings from 3 separate calls of dump-data as that would not be accurate.
If you want to save all result of grep in the same file, you can just join all expressions in one:
grep -E 'expr1|expr2|expr3'
But if you want to have results (for expr1, expr2 and expr3) in separate files, things are getting more interesting.
You can do this using tee >(command).
For example, here I process the same pipe with thre different commands:
$ echo abc | tee >(sed s/a/_a_/ > file1) | tee >(sed s/b/_b_/ > file2) | sed s/c/_c_/ > file3
$ grep "" file[123]
file1:_a_bc
file2:a_b_c
file3:ab_c_
But the command seems to be too complex.
I would better save dump-data results to a file and then grep it.
TEMP=$(mktemp /tmp/dump-data-XXXXXXXX)
dump-data > ${TEMP}
grep A ${TEMP}
grep B ${TEMP}
grep C ${TEMP}
You can use dump-data | grep -E "A|D|E". Note the -E option of grep. Alternatively you could use egrep without the -E option.
you can simply use:
dump-data | grep -E 'A|D|E'
awk '/MY PATTERN/{print > "matches-"FILENAME;}' myfile{1,3}
thx Guru at Stack Exchange

Resources