Grep a "text" and print all the line before and after the text. each log session is separated by 2 blank lines - linux

1
1
1
1
Text
11
1
1
1
The text above can be anywhere therefore grep -C wont help.
I have tried that using AWK but i want to do it using grep
zgrep -C25 "Text" engine.*.log.gz
doesn't work as text may appear anywhere
I know the awk option but it doesn't work on .gz file i have to convert it its a lengthy process
awk -v RS= '/Text/' engine.*.log.gz

Just do it the trivial, obvious, robust, portable way:
zcat engine.*.log.gz | awk -v RS= '/Text/{print; exit}'
That should also be pretty efficient since the SIGPIPE that zcat gets when awk exits on finding the first Text should terminate the zcat.
Or if Text can appear multiple times in the input and you want all associated records output:
zcat engine.*.log.gz | awk -v RS= -v ORS='\n\n' '/Text/'

Thank you so much for the help, but I figured out the solution
zgrep -C34 "text" engine.*.log.gz |awk '/20yy-mm-dd hh:mm:ss/,/SENT MESSAGES (asynchronous)/'
Search the "text" along side 34 lines before and after it to get average out all the lines required
All of my logs session always begins with date and time so is used AWK for that
All of my logs session always ends with "SENT MESSAGES (asynchronous):" So AWK did that trick for me.

Related

bash: awk print with in print

I need to grep some pattern and further i need to print some output within that. Currently I am using the below command which is working fine. But I like to eliminate using multiple pipe and want to use single awk command to achieve the same output. Is there a way to do it using awk?
root#Server1 # cat file
Jenny:Mon,Tue,Wed:Morning
David:Thu,Fri,Sat:Evening
root#Server1 # awk '/Jenny/ {print $0}' file | awk -F ":" '{ print $2 }' | awk -F "," '{ print $1 }'
Mon
I want to get this output using single awk command. Any help?
You can try something like:
awk -F: '/Jenny/ {split($2,a,","); print a[1]}' file
Try this
awk -F'[:,]+' '/Jenny/{print $2}' file.txt
It is using muliple -F value inside the [ ]
The + means one or more since it is treated as a regex.
For this particular job, I find grep to be slightly more robust.
Unless your company has a policy not to hire people named Eve.
(Try it out if you don't understand.)
grep -oP '^[^:]*Jenny[^:]*:\K[^,:]+' file
Or to do a whole-word match:
grep -oP '^[^:]*\bJenny\b[^:]*:\K[^,:]+' file
Or when you are confident that "Jenny" is the full name:
grep -oP '^Jenny:\K[^,:]+' file
Output:
Mon
Explanation:
The stuff up until \K speaks for itself: it selects the line(s) with the desired name.
[^,:]+ captures the day of week (in this case Mon).
\K cuts off everything preceding Mon.
-o cuts off anything following Mon.

zcat file not working for gzip file

I have a .gz which I need to merge and do other manipulations with (without compressing it), but I am having trouble just using zcat or gzip -dc or awk, for example when I pass these value to less -S like this:
awk '{print $1}' <(gzip -dc file.gz) | less -S
I get the incorrect column printed. When I use just less -S to view the file, only the last few columns are printed. So I thought it was a problem with the delimiter, but I have tried importing in R some lines (it is too big to import the whole file), and it seems to be space delimited since all the columns are showing up when I do this:
x=read.table("file.gz", header=T, nrows=100)
But how do I read the lines correctly to use this file with zcat?
Thank you so much for your help!
If you want the whole line to be printed, try $0.
awk '{print $0}' <(gzip -dc file.gz) | less -S
If you want specific columns to be printed, use -F to specific field separator. For example, if you want first field of ':' separated fields from each line (like in /etc/passwd), try this command.
awk -F':' '{print $1}' <(gzip -dc passwd.gz) |less -S

Empty string as a output field seperator for Cut

How can I use cut with --output-delimiter=""? I want to join two columns using cut.
I tried the following command. However cat -v shows that there are non printable characters. Specifically "^#". Any suggestions to how can I overcome this?
cut -d, -f 3,6 --output-delimiter="" file1.csv | cat -v
This is the content of my file
011,IBM,Palmisano,t,t,t
012,INTC,Otellini,t,t,t
013,SAP,Snabe,t,t,t
014,VMW,Maritz,t,t,t
015,ORCL,Ellison,t,t,t
017,RHT,Whitehurst,t,t,t
When i run my command I'm seeing
Palmisano^#t
Otellini^#t
Snabe^#t
Maritz^#t
Ellison^#t
Whitehurst^#t
Expected output: Basically I want to exclude ^# in the output
Palmisanot
Otellinit
Snabet
Maritzt
Ellisont
Whitehurstt
Thank you.
The output delimiter is not an empty string, but probably the NULL character. You might want to try
cut -d, -f 3,6 --output-delimiter=$'\00' file1.csv
(Assuming your shell supports $'...'-quoting; bash and zsh are fine here, not sure about others).
edit:
cut apparently puts the NULL character if the output separator is set to the empty string. I do not see a way around it.
If awk is an acceptable solution, this will do the trick:
awk -F, '{print $3 $6}' file*
If you want to be more verbose and explicit:
awk 'BEGIN{FS=","; OFS=""}; {print $3,$6}' file*
FS="," sets the field separator to ,.
OFS="" sets the Output Field Separator to the empty string.
You probably don't want to cut by fields but instead by characters or perhaps bytes. See the description of -c and/or -b in the man page, instead of using -f.

Sed, Awk for combining the output of two cut statements

I'm trying to combine the below outputs into one command. The issue is that the field I'm trying to grab is in reverse order. I was told that cut doesn't support a "reverse" option and to use AWK for this purpose but it didn't end up working for my purpose. I'm trying to take the output of the ls- l against the /dev/block to return the partitions and automatically build a dd if= / of= for each outputted line based on the output of the command.
I tried piping the output to awk:
cut -d' ' -f23,25 ... | awk '{print $2,$1}'
however, the result was when using sed to input the prefix and suffix, it wasn't in the appropriate order.
I built the two statements below which individually return the expected output, just looking for the "right" way to combine both of these statements in the most efficient manner using sed / awk.
ls -l /dev/block/platform/msm_sdcc.1/by-name/ | cut -d' ' -f 25 | sed "s/^/dd if=/"
ls -l /dev/block/platform/msm_sdcc.1/by-name/ | cut -d' ' -f 23 | sed "s/.*/of=\/external_sd\/&.dsk/"
Any assistance will be appreciated.
Thank you.
If you're already using awk, I don't think you'll need cut or sed. You can probably do something like the following, though I'll have to trust you on the field numbers
ls -l /dev/block/platform/msm_sdcc.1/by-name | awk '{print "dd if=/"$25 " of=/" $23 ".dsk"}'
awk will split on all whitespace, not just the space character, so it's possible the fields will shift some, though it may be more reliable too.

grep a large list against a large file

I am currently trying to grep a large list of ids (~5000) against an even larger csv file (3.000.000 lines).
I want all the csv lines, that contain an id from the id file.
My naive approach was:
cat the_ids.txt | while read line
do
cat huge.csv | grep $line >> output_file
done
But this takes forever!
Are there more efficient approaches to this problem?
Try
grep -f the_ids.txt huge.csv
Additionally, since your patterns seem to be fixed strings, supplying the -F option might speed up grep.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by
newlines, any of which is to be matched. (-F is specified by
POSIX.)
Use grep -f for this:
grep -f the_ids.txt huge.csv > output_file
From man grep:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero
patterns, and therefore matches nothing. (-f is specified by POSIX.)
If you provide some sample input maybe we can even improve the grep condition a little more.
Test
$ cat ids
11
23
55
$ cat huge.csv
hello this is 11 but
nothing else here
and here 23
bye
$ grep -f ids huge.csv
hello this is 11 but
and here 23
grep -f filter.txt data.txt gets unruly when filter.txt is larger than a couple of thousands of lines and hence isn't the best choice for such a situation. Even while using grep -f, we need to keep a few things in mind:
use -x option if there is a need to match the entire line in the second file
use -F if the first file has strings, not patterns
use -w to prevent partial matches while not using the -x option
This post has a great discussion on this topic (grep -f on large files):
Fastest way to find lines of a file from another larger file in Bash
And this post talks about grep -vf:
grep -vf too slow with large files
In summary, the best way to handle grep -f on large files is:
Matching entire line:
awk 'FNR==NR {hash[$0]; next} $0 in hash' filter.txt data.txt > matching.txt
Matching a particular field in the second file (using ',' delimiter and field 2 in this example):
awk -F, 'FNR==NR {hash[$1]; next} $2 in hash' filter.txt data.txt > matching.txt
and for grep -vf:
Matching entire line:
awk 'FNR==NR {hash[$0]; next} !($0 in hash)' filter.txt data.txt > not_matching.txt
Matching a particular field in the second file (using ',' delimiter and field 2 in this example):
awk -F, 'FNR==NR {hash[$0]; next} !($2 in hash)' filter.txt data.txt > not_matching.txt
You may get a significant search speedup with ugrep to match the strings in the_ids.txt in your large huge.csv file:
ugrep -F -f the_ids.txt huge.csv
This works with GNU grep too, but I expect ugrep to run several times faster.

Resources