grep match only lines in a specified range - linux

Is it possible to use grep to match only lines with numbers in a pre-specified range?
For instance I want to list all lines with numbers in the range [1024, 2048] of a log that contain the word 'error'.
I would like to keep the '-n' functionality i.e. have the number of the matched line in the file.

Use sed first:
sed -ne '1024,2048p' | grep ...
-n says don't print lines, 'x,y,p' says print lines x-y inclusive (overrides the -n)

sed -n '1024,2048{/error/{=;p}}' | paste - -
Here /error/ is a pattern to match and = prints the line number.

Awk is a good tool for the job:
$ awk 'NR>=1024 && NR<=2048 && /error/ {print NR,$0}' file
In awk the variable NR contains the current line number and $0 contains the line itself.
The benefits with using awk is that you can easily change the output to display however you want it. For instance to separate the line number from line with a colon followed by a TAB:
$ awk 'NR>=1024 && NR<=2048 && /error/ {print NR,$0}' OFS=':\t' file

Related

grep with two or more words, one line by file with many files

everyone. I have
file 1.log:
text1 value11 text
text text
text2 value12 text
file 2.log:
text1 value21 text
text text
text2 value22 text
I want:
value11;value12
value21;value22
For now I grep values in separated files and paste later in another file, but I think this is not a very elegant solution because I need to read all files more than one time, so I try to use grep for extract all data in a single cat | grep line, but is not the result I expected.
I use:
cat *.log | grep -oP "(?<=text1 ).*?(?= )|(?<=text2 ).*?(?= )" | tr '\n' '; '
or
cat *.log | grep -oP "(?<=text1 ).*?(?= )|(?<=text2 ).*?(?= )" | xargs
but I get in each case:
value11;value12;value21;value22
value11 value12 value21 value22
Thank you so much.
Try:
$ awk -v RS='[[:space:]]+' '$0=="text1" || $0=="text2"{getline; printf "%s%s",sep,$0; sep=";"} ENDFILE{if(sep)print""; sep=""}' *.log
value11;value12
value21;value22
For those who prefer their commands spread over multiple lines:
awk -v RS='[[:space:]]+' '
$0=="text1" || $0=="text2" {
getline
printf "%s%s",sep,$0
sep=";"
}
ENDFILE {
if(sep)print""
sep=""
}' *.log
How it works
-v RS='[[:space:]]+'
This tells awk to treat any sequence of whitespace (newlines, blanks, tabs, etc) as a record separator.
$0=="text1" || $0=="text2"{getline; printf "%s%s",sep,$0; sep=";"}
This tells awk to look file records that matches either text1 ortext2`. For those records and those records only the commands in curly braces are executed. Those commands are:
getline tells awk to read in the next record.
printf "%s%s",sep,$0 tells awk to print the variable sep followed by the word in the record.
After we print the first match, the command sep=";" is executed which tells awk to set the value of sep to a semicolon.
As we start each file, sep is empty. This means that the first match from any file is printed with no separator preceding it. All subsequent matches from the same file will have a ; to separate them.
ENDFILE{if(sep)print""; sep=""}
After the end of each file is reached, we print a newline if sep is not empty and then we set sep back to an empty string.
Alternative: Printing the second word if the first word ends with a number
In an alternative interpretation of the question (hat tip: David C. Rankin), we want to print the second word on any line for which the first word ends with a number. In that case, try:
$ awk '$1~/[0-9]$/{printf "%s%s",sep,$2; sep=";"} ENDFILE{if(sep)print""; sep=""}' *.log
value11;value12
value21;value22
In the above, $1~/[0-9]$/ selects the lines for which the first word ends with a number and printf "%s%s",sep,$2 prints the second field on that line.
Discussion
The original command was:
$ cat *.log | grep -oP "(?<=text1 ).*?(?= )|(?<=text2 ).*?(?= )" | tr '\n' '; '
value11;value12;value21;value22;
Note that, when using most unix commands, cat is rarely ever needed. In this case, for example, grep accepts a list of files. So, we could easily do without the extra cat process and get the same output:
$ grep -hoP "(?<=text1 ).*?(?= )|(?<=text2 ).*?(?= )" *.log | tr '\n' '; '
value11;value12;value21;value22;
I agree with #John1024 and how you approach this problem will really depend on what the actual text is you are looking for. If for instance your lines of concern start with text{1,2,...} and then what you want in the second field can be anything, then his approach is optimal. However, if the values in the first field and vary and what you are really interested in is records where you have valueXX in the second field, then an approach keying off the second field may be what you are looking for.
Taking for example your second field, if the text you are interested in is in the form valueXX (where XX are two or more digits at the end of the field), you can process only those records where your second field matches and then use a simple conditional testing whether FNR == 1 to control the ';' delimiter output and ENDFILE to control the new line similar to:
awk '$2 ~ /^value[0-9][0-9][0-9]*$/ {
printf "%s%s", (FNR == 1) ? "" : ";", $2
}
ENDFILE {
print ""
}' file1.log file2.log
Example Use/Output
$ awk '$2 ~ /^value[0-9][0-9][0-9]*$/ {
printf "%s%s", (FNR == 1) ? "" : ";", $2
}
ENDFILE {
print ""
}' file1.log file2.log
value11;value12
value21;value22
Look things over and consider your actual input files and then either one of these two approaches should get you there.
If I understood you correctly, you want the values but search for the text[12] ie. to get the word after matching search word, not the matching search word:
$ awk -v s="^text[12]$" ' # set the search regex *
FNR==1 { # in the beginning of each file
b=b (b==""?"":"\n") # terminate current buffer with a newline
}
{
for(i=1;i<NF;i++) # iterate all but last word
if($i~s) # if current word matches search pattern
b=b (b~/^$|\n$/?"":";") $(i+1) # add following word to buffer
}
END { # after searching all files
print b # output buffer
}' *.log
Output:
value11;value12
value21;value22
* regex could be for example ^(text1|text2)$, too.

Write a file using AWK on linux

I have a file that has several lines of which one line is
-xxxxxxxx()xxxxxxxx
I want to add the contents of this line to a new file
I did this :
awk ' /^-/ {system("echo" $0 ">" "newline.txt")} '
but this does not work , it returns an error that says :
Unnexpected token '('
I believe this is due to the () present in the line. How to overcome this issue?
You need to add proper spaces!
With your erronous awk ' /^-/ {system("echo" $0 ">" "newline.txt")} ', the shell command is essentially echo-xxxxxxxx()xxxxxxxx>newline.txt, which surely doesn't work. You need to construct a proper shell command inside the awk string, and obey awks string concatenation rules, i.e. your intended script should look like this (which is still broken, because $0 is not properly quoted in the resulting shell command):
awk '/^-/ { system("echo " $0 " > newline.txt") }'
However, if you really just need to echo $0 into a file, you can simply do:
awk '/^-/ { print $0 > "newline.txt" }'
Or even more simply
awk '/^-/' > newline.txt
Which essentially applies the default operation to all records matching /^-/, whereby the default operation is to print, which is short for neatly printing the current record, i.e. this script simply filters out the desired records. The > newline.txt redirection outside awk simply puts it into a file.
You don't need the system, echo commands, simply:
awk '/^-/ {print $1}' file > newfile
This will capture lines starting with - and truncate the rest if there's a space.
awk '/^-/ {print $0}' file > newfile
Would capture the entire line including spaces.
You could use grep also:
grep -o '^-.*' file > newfile
Captures any lines starting with -
grep -o '^-.*().*' file > newfile
Would be more specific and capture lines starting with - also containing ()
First of all for simple extraction of patterns from file, you do not need to use awk it is an overkill, grep would be more than enough for the task:
INPUT:
$ more file
123
-xxxxxxxx()xxxxxxxx
abc
-xyxyxxux()xxuxxuxx
123
abc
123
command:
$ grep -oE '^-[^(]+\(\).*' file
-xxxxxxxx()xxxxxxxx
-xyxyxxux()xxuxxuxx
explanations:
Option: -oE to define the output as the pattern and not the whole line (can be removed)
Regex: ^-[^(]+\(\).* will select lines that starts with - and contains ()
You can redirect your output to a new_file by adding > new_file at the end of your command.

Get first word of match of last line

I want to parse through a log file formatted like this:
INFO: Successfully received REQUEST_ID: 1111 from 164.12.1.11
INFO: Successfully received REQUEST_ID: 2222 from 164.12.2.22
ERROR: Some error
INFO: Successfully received REQUEST_ID: 3333 from 164.12.3.33
INFO: Successfully received REQUEST_ID: 4444 from 164.12.4.44
WARNING: Some warning
INFO: Some other info
I want a script that outputs 4444. So extract the next word after ^.*REQUEST_ID: from the last line that contains the pattern ^.*REQUEST_ID.
What I have so far:
ID=$(sed -n -e 's/^.*REQUEST_ID: //p' $logfile | tail -n 1)
For lines match the pattern matches for, it deletes all the text matching the match thus leaving only the text after the match and prints it. Then I tail it to get the last line. How to do make it so it only prints the first word?
And is there a more efficient way of doing this then having it piped to tail?
With awk:
awk '
$4 ~ /REQUEST_ID:/{val=$5}
END {print val}
' file.csv
$4 ~ /REQUEST_ID:/ : Match lines in which Field # 4 match REQUEST_ID:.
{val=$5} : Store the value of field 5 in the variable val.
END {print val} : On closing the file, print the last value stored.
I have used a regex match to allow for some variance on the string, and yet get a match. A more lenient match will be (a match at any place of the line):
awk ' /REQUEST_ID/ {val=$5}
END {print val}
' file.csv
If you value (or need) more speed than robustness, then use (Quoting needed):
awk '
$4 == "REQUEST_ID:" {val=$5}
END {print val}
' file.csv
With GNU sed:
sed -nE 's/.* REQUEST_ID: ([0-9]+) .*/\1/p' | tail -n 1
Output:
4444
With GNU grep:
grep -Po 'REQUEST_ID: \K[0-9]+' file | tail -n 1
Output:
4444
-P: Interpret PATTERN as a Perl regular expression.
-o: Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
\K: Drop everything before that point from the internal record.
sed '/^.*REQUEST_ID: \([0-9]\{1,\}\) .*/ {s//\1/;h;}
$!d
x' ${logfile}
posix version
print an empty line if no occurence, the next word (assuming it's a number here)
Principe:
if line contain REQUEST_ID
extract the next number
put it in hold buffer
if not the end, delete the current content (and cycle to next line)
load holding buffer (and print the line ending the cycle)
You can match the number and replace with that value:
sed -e 's/^.*REQUEST_ID: \([0-9]*\).*$/\1/g' $logfile
Print field where line and column meet.
awk 'FNR == 5 {print $5}' file
4444
Another awk alternative if you don't know the position of the search word.
tac file | awk '{for(i=1;i<NF;i++) if($i=="REQUEST_ID:") {print $(i+1);exit}}'
yet, another one without looping
tac file | awk -vRS=" " 'n{print;exit} /REQUEST_ID:/{n=1}'

Add text between two patterns in File using sed command

I want to add Some large code between two patterns:
File1.txt
This is text to be inserted into the File.
infile.txt
Some Text here
First
Second
Some Text here
I want to add File1.txt content between First and Second :
Desired Output:
Some Text here
First
This is text to be inserted into the File.
Second
Some Text here
I can search using two patterns with sed command ,But I don't have idea how do I add content between them.
sed '/First/,/Second/!d' infile
Since /r stands for reading in a file, use:
sed '/First/r file1.txt' infile.txt
You can find some info here: Reading in a file with the 'r' command.
Add -i (that is, sed -i '/First/r file1.txt' infile.txt) for in-place edition.
To perform this action no matter the case of the characters, use the I mark as suggested in Use sed with ignore case while adding text before some pattern:
sed 's/first/last/Ig' file
As indicated in comments, the above solution is just printing a given string after a pattern, without taking into consideration the second pattern.
To do so, I'd go for an awk with a flag:
awk -v data="$(<patt_file)" '/First/ {f=1} /Second/ && f {print data; f=0}1' file
Given these files:
$ cat patt_file
This is text to be inserted
$ cat file
Some Text here
First
First
Second
Some Text here
First
Bar
Let's run the command:
$ awk -v data="$(<patt_file)" '/First/ {f=1} /Second/ && f {print data; f=0}1' file
Some Text here
First # <--- no line appended here
First
This is text to be inserted # <--- line appended here
Second
Some Text here
First # <--- no line appended here
Bar
i think you can try this
$ sed -n 'H;${x;s/Second.*\n/This is text to be inserted into the File\
&/;p;}' infile.txt
awk flavor:
awk '/First/ { print $0; getline < "File1.txt" }1' File2.txt
Here's a cut of bash code that I wrote to insert a pattern from patt_file. Essentially had had to delete some repetitious data using uniq then add some stuff back in. I copy the stuff I need to put back in using lineNum values, save it to past_file. Then match patMatch in the file I'm adding the stuff to.
#This pulls the line number from row k, column 2 of the reduced repitious file
lineNum1=$(awk -v i=$k -v j=2 'FNR == i {print $j}' test.txt)
#This pulls the line number from row k + 1, coulmn 2 of the reduced repitious file
lineNum2=$(awk -v i=$((k+1)) -v j=2 'FNR == i {print $j}' test.txt)
#This pulls fields row 4, 2 and 3 column into with tab spacing (important) from reduced repitious file
awk -v i=$k -v j=2 -v h=3 'FNR == i {print $j" "$h}' test.txt>closeJ.txt
#This substitutes all of the periods (dots) for \. so that sed will match them
patMatch=$(sed 's/\./\\./' closeJ.txt)
#This Selects text in the full data file between lineNum1 and lineNum2 and copies it to a file
awk -v awkVar1=$((lineNum1 +1)) -v awkVar2=$((lineNum2 -1)) 'NR >= awkVar1 && NR <= awkVar2 { print }' nice.txt >patt_file.txt
#This inserts the contents of the pattern matched file into the reduced repitious file
#The reduced repitious file will now grow
sed -i.bak "/$patMatch/ r "patt_file.txt"" test.txt

UNIX Shell Script remove one column from the file

I have a file like the following:
Header1:value1|value2|value3|
Header2:value4|value5|value6|
The column number is unknown and I have a function which can return the column number.
And I want to write a script which can remove one column from the file. For exampple, after removing column 1, I will get:
Header1:value2|value3|
Header2:value5|value6|
I use cut to achieve this and so far I can give the values after removing one column but without the headers. For example
value2|value3|
value5|value6|
Could anyone tell me how can I add headers back? Or any command can do that directly? Thanks.
Replace the colon with a pipe, do your cut command, then replace the first pipe with a colon again:
sed 's/:/|/' input.txt | cut ... | sed 's/|/:/'
You may need to adjust the column number for the cut command, to ensure you don't count the header.
Turn the ':' into '|', so that the header is another field, rather than part of the first field. You can do that either in whatever generates the data to begin with, or by passing the data through tr ':' '|' before cut. The rest of your fields will be offset by +1 then, but that should be easy enough to compensate for.
Your problem is that HeaderX are followed by ':' which is not the '|' delimiter you use in cut.
You could separate first your lines in two parts with :, with something like
"cut -f 1 --delimiter=: YOURFILE", then remove the first column and then put back the headers.
awk can handle multiple delimiters. So another alternative is...
jkern#ubuntu:~/scratch$ cat ./data188
Header1:value1|value2|value3|
Header2:value4|value5|value6|
jkern#ubuntu:~/scratch$ awk -F"[:|]" '{ print $1 $3 $4 }' ./data188
Header1value2value3
Header2value5value6
you can do it just with sed without cut:
sed 's/:[^|]*|/:/' input.txt
My solution:
$ sed 's,:,|,' data | awk -F'|' 'BEGIN{OFS="|"}{$2=""; print}' | sed 's,||,:,'
Header1:value2|value3|
Header2:value5|value6|
replace : with |
-F'|' tells awk to use | symbol as field separator
in each line we replace 2nd (because header now becomes first) field with empty string and printing result line with new field separator (|)
return back header by replacing first | with :
Not perfect, but should works.
$ cat file.txt | grep 'Header1' | awk -F"1" '{ print $1 $2 $3 $4}'
This will print all values in separate columns. You can print any number of columns.
Just chiming in with a Perl solution:
(rearrange/remove fields as needed)
-l effectively adds a newline to every print statement
-a autosplit mode splits each line using the -F expression into array #F
-n adds a loop around the -e code
-e your 'one liner' follows this option
$ perl -F[:\|] -lane 'print "$F[0]:$F[1]|$F[2]|$F[3]"' input.txt

Resources