Output text between two PATTERNS if text matches a CRITERIA [duplicate] - linux

This question already has answers here:
grep lines matching a pattern, and the lines before and after the matching until different pattern
(4 answers)
Closed 4 years ago.
I want to select blocks between PATTERN1 and PATTERN2 if text inside the block contains CRITERIA, otherwise discard the whole block.
Sample task:
Select text between PATTERN1='start' and PATTERN2='end', if some text between 'start' and 'end' matches CRITERIA='DCE', then output the whole block between 'start' and 'end'.
Sample input:
start
123
ABC
123
end
start
123
DCE
123
end
start
123
EFG
123
end
Sample output:
start
123
DCE
123
end
I've tried the following using awk, but couldn't find how to use CRITERIA between two patterns:
awk '/start/,/end/' input_file

EDIT: As per OP a Input_file may have match at the end too and may not have end string, so adding code as per that too now.
awk '
/start/{
if(val) { print value };
flag=1;
value=val=""}
/[dD[cC][eE]/ && flag { val=1 }
/end/ { flag="" }
flag{
value=value?value ORS $0:$0
}
END{
if(val) { print value }}
' Input_file
Explanation:
awk '
/start/{ ##Looking for string start in a line if found then do following.
if(val) { print value }; ##Checking if variable val is NOT NULL, if yes then print variable of value.
flag=1; ##Setting variable named flag as 1 here.
value=val=""} ##Nullifying variables value and val here.
/[dD[cC][eE]/ && flag { val=1 } ##Searching string DCE/dce in a line and checking if variable flag is NOT NULL then set variable val as 1.
/end/ { flag="" } ##Searching string end in current line, if found then Nullifying flag here.
flag{ ##Checking if variable named flag is SET or NOT NULL here.
value=value?value ORS $0:$0 ##Creating value whose value is current line value and concatenating in its own value.
}
END{ ##Starting END block of awk here.
if(val) { print value }} ##Checking if variable val is NOT NULL then print variable value here.
' Input_file
Could you please try following awk and let me know if this helps you.
awk '/start/{if(val){print value};flag=1;value=val=""} /[dD[cC][eE]/ && flag{val=1} /end/{flag=""} {value=value?value ORS $0:$0}' Input_file
Adding a non-one liner form of solution too here.
awk '
/start/{
if(val) { print value };
flag=1;
value=val=""}
/[dD[cC][eE]/ && flag{ val=1 }
/end/ { flag="" }
{
value=value?value ORS $0:$0
}
' Input_file

Since the start-end blocks are separated by empty rexords, you can use those for separating instead. Here with awk:
$ awk 'BEGIN{RS=""}/DCE/' file
start
123
DCE
123
end
Edit: Since the empty records were not there after all, let's separate with ends:
$ awk 'BEGIN{RS=ORS="end\n"}/DCE/' file
start
123
DCE
123
end

Related

Replace each nth occurrence of 'foo' and 'bar' on two distincts columns by numerically respective nth line of a supplied file in respective columns

I have a source.txt file like below containing two columns of data. The format of the columns of source.txt include [ ] (square bracket) as shown in my source.txt:
[hot] [water]
[16] [boots and, juice]
and I have another target.txt file and contain empty lines plus full stops at the end of each line:
the weather is today (foo) but we still have (bar).
= (
the next bus leaves at (foo) pm, we can't forget to take the (bar).
I want to do replace foo of each nth line of target.txt with the "respective contents" of the first column of source.txt, and also replace bar of each nth line of target.txt with the "respective contents" of the second column of source. txt.
i tried to search other sources and understand how i would do it, at first i already have a command that i use to replace "replace each nth occurrence of 'foo' by numerically respective nth line of a supplied file" but i couldn't adapt it:
awk 'NR==FNR {a[NR]=$0; next} /foo/{gsub("foo", a[++i])} 1' source.txt target.txt > output.txt;
I remember seeing a way to use gsub with containing two columns of data but I don't remember what exactly the difference was.
EDIT POST: sometimes read with some symbols between them = and ( and ) within the target.txt text. I added this symbol as some answers will not work if these symbols are in the target.txt file
Note: the number of target.txt lines and therefore the number of occurrences of bar and foo in this file can vary, I just showed a sample. But the number of occurrences of both foo and bar in each row is 1 respectively.
With your shown samples, please try following answer. Written and tested in GNU awk.
awk -F'\\[|\\] \\[|\\]' '
FNR==NR{
foo[FNR]=$2
bar[FNR]=$3
next
}
NF{
gsub(/\<foo\>/,foo[++count])
gsub(/\<bar\>/,bar[count])
}
1
' source.txt FS=" " target.txt
Explanation: Adding detailed explanation for above.
awk -F'\\[|\\] \\[|\\]' ' ##Setting field separator as [ OR ] [ OR ] here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when source.txt will be read.
foo[FNR]=$2 ##Creating foo array with index of FNR and value of 2nd field here.
bar[FNR]=$3 ##Creating bar array with index of FNR and value of 3rd field here.
next ##next will skip all further statements from here.
}
NF{ ##If line is NOT empty then do following.
gsub(/\<foo\>/,foo[++count]) ##Globally substituting foo with array foo value, whose index is count.
gsub(/\<bar\>/,bar[count]) ##Globally substituting bar with array of bar with index of count.
}
1 ##printing line here.
' source.txt FS=" " target.txt ##Mentioning Input_files names here.
EDIT: Adding following solution also which will handle n number of occurrences of [...] in source and matching them at target file also. Since this is a working solution for OP(confirmed in comments) adding this in here. Also fair warning this will fail when source.txt contains a &.
awk '
FNR==NR{
while(match($0,/\[[^]]*\]/)){
arr[++count]=substr($0,RSTART+1,RLENGTH-2)
$0=substr($0,RSTART+RLENGTH)
}
next
}
{
line=$0
while(match(line,/\(?[[:space:]]*(\<foo\>|\<bar\>)[[:space:]]*\)?/)){
val=substr(line,RSTART,RLENGTH)
sub(val,arr[++count1])
line=substr(line,RSTART+RLENGTH)
}
}
1
' source.txt target.txt
Using any awk in any shell on every Unix box:
$ cat tst.awk
BEGIN {
FS="[][]"
tags["foo"]
tags["bar"]
}
NR==FNR {
map["foo",NR] = $2
map["bar",NR] = $4
next
}
{
found = 0
head = ""
while ( match($0,/\([^)]+)/) ) {
tag = substr($0,RSTART+1,RLENGTH-2)
if ( tag in tags ) {
if ( !found++ ) {
lineNr++
}
val = map[tag,lineNr]
}
else {
val = substr($0,RSTART,RLENGTH)
}
head = head substr($0,1,RSTART-1) val
$0 = substr($0,RSTART+RLENGTH)
}
print head $0
}
$ awk -f tst.awk source.txt target.txt
the weather is today hot but we still have water.
= (
the next bus leaves at 16 pm, we can't forget to take the boots and, juice.
awk '
NR==FNR { # build lookup
# delete gumph
gsub(/(^[[:space:]]*\[)|(\][[:space:]]*$)/, "")
# split
split($0, a, /\][[:space:]]+\[/)
# store
foo[FNR] = a[1]
bar[FNR] = a[2]
next
}
!/[^[:space:]]/ { next } # ignore blank lines
{ # do replacements
VFNR++ # FNR - (ignored lines)
# can use sub if foo/bar only appear once
gsub(/\<foo\>/, foo[VFNR])
gsub(/\<bar\>/, bar[VFNR])
print
}
' source.txt target.txt
Note: \< and \> are not in POSIX but are accepted by some versions of awk (eg. gawk). I'm not sure if POSIX awk regex has "word boundary".

Combining rows of data

File.csv
1234,1
6789,1
I'm trying to transform the file above to the below output :
1234,1
6789,1
Looking to merge rows using array or loop
Could you please try following, written and tested with shown samples in GNU awk.
awk '
BEGIN{
FS=OFS=","
}
{
sub(/ +$/,"")
first=$1
sub(/^[^,]*,/,"")
arr[first]=(arr[first]?arr[first] OFS:"")$0
}
END{
for(i in arr){
print i,arr[i]
}
}' Input_file
Explanation: Adding detailed explanation for above solution:
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS=OFS="," ##Setting field separator and output field separator as , here.
}
{
sub(/ +$/,"") ##Substituting spaces coming at last of line with NULL OP samples have it.
first=$1 ##Setting $1 value to first variable here.
sub(/^[^,]*,/,"") ##Substituting everything till first , with NULL here.
arr[first]=(arr[first]?arr[first] OFS:"")$0 ##Creating array arr with index of first and keep on adding values to it.
}
END{ ##Starting END block of this awk program from here.
for(i in arr){ ##Traversing through arr here for all elements here.
print i,arr[i] ##Printing i and value of arr with index of i here.
}
}' Input_file ##Mentioning Input_file name here.
One way, using a perl one-liner:
$ perl -F, -lanE '
push #{$g{$F[0]}}, #F[1..$#F];
END { print join(",", $_, $g{$_}->#*) for (sort { $a <=> $b } keys %g) }
' input.csv
1234,1,5,No,4,1,Not Applicable,2,5,6,8,6,1,3
6789,1,5,No,4,1,Not Applicable,2,5,6,8,6,1,3
Splits lines on commas, and adds all the fields to arrays stored in a hash table using the first element as the key, and then prints out all the combined lines in sorted order.

Find a pattern and replace

This is the input to my file.
Number : 123
PID : IIT/123/Dakota
The expected output is :
Number : 111
PID : IIT/111/Dakota
I want to replace 123 to 111. To solve this I have tried following:
awk '/Number/{$NF=111} 1' log.txt
awk -F '[/]' '/PID/{$2="123"} 1' log.txt
Use sed for something this simple ?
Print the change to the screen (test with this) :
sed -e 's:123:111:g' f2.txt
Update the file (with this) :
sed -i 's:123:111:g' f2.txt
Example:
$ sed -i 's:123:111:g' f2.txt
$ cat f2.txt
Number : 111
PID : IIT/111/Dakota
EDIT2: Or you want to substitute each line's 123 with 111 without checking any condition which you tried in your awk then simply do:
awk '{sub(/123/,"111")} 1' Input_file
Change sub to gsub in case of many occurrences of 123 in a single line too.
Explanation of above code:
awk -v new_value="111" ' ##Creating an awk variable named new_value where OP could keep its new value which OP needs to be there in line.
/^Number/ { $NF=new_value } ##Checking if a line starts from Number string and then setting last field value to new_value variable here.
/^PID/ { num=split($NF,array,"/"); ##Checking if a line starts from PID then creating an array named array whose delimiter it / from last field value
array[2]=new_value; ##Setting second item of array to variable new_value here.
for(i=1;i<=num;i++){ val=val?val "/" array[i]:array[i] }; ##Starting a loop from 1 to till length of array and creating variable val to re-create last field of current line.
$NF=val; ##Setting last field value to variable val here.
val="" ##Nullifying variable val here.
}
1' Input_file ##Mentioning 1 to print the line and mentioning Input_file name here too.
EDIT: In case you need to / in your output too then use following awk.
awk -v new_value="111" '
/^Number/ { $NF=new_value }
/^PID/ { num=split($NF,array,"/");
array[2]=new_value;
for(i=1;i<=num;i++){ val=val?val "/" array[i]:array[i] };
$NF=val;
val=""
}
1' Input_file
Following awk may help you here.(Seems after I have applied code tags to your samples your sample input is changed a bit so editing my code accordingly now)
awk -F"[ /]" -v new_value="111" '/^Number/{$NF=new_value} /^PID/{$(NF-1)=new_value}1' Input_file
In case you want to save changes into Input_file itself append > temp_file &7 mv temp_file Input_file in above code then.
Explanation:
awk -F"[ /]" -v new_value="111" ' ##Setting field separator as space and / to each line and creating awk variable new_value which OP wants to have new value.
/^Number/{ $NF=new_value } ##Checking condition if a line is starting with string Number then change its last field to new_value value.
/^PID/ { $(NF-1)=new_value } ##Checking condition if a line starts from string PID then setting second last field to variable new_value.
1 ##awk works on method of condition then action, so putting 1 making condition TRUE here and not mentioning any action so by default print of current line will happen.
' Input_file ##Mentioning Input_file name here.

move lines into a file by number of columns using awk

I have a sample file with '||o||' as field separator.
www.google.org||o||srScSG2C5tg=||o||bngwq
farhansingla.it||o||4sQVj09gpls=||o||
ngascash||o||||o||
ms-bronze.com.br||o||||o||
I want to move the lines with only 1 field in 1.txt and those having more than 1 field in not_1.txt. I am using the following command:
sed 's/\(||o||\)\+$//g' sample.txt | awk -F '[|][|]o[|][|]' '{if (NF == 1) print > "1.txt"; else print > "not_1.txt" }'
The problem is that it is moving not the original lines but the replaced ones.
The output I am getting is (not_1.txt):
td#the-end.org||o||srScSG2C5tg=||o||bnm
erba01#tiscali.it||o||4sQVj09gpls=
1.txt:
ngas
ms-inside#bol.com.br
As you can see the original lines are modified. I don't want to modify the lines.
Any help would be highly appreciated.
Awk solution:
awk -F '[|][|]o[|][|]' \
'{
c = 0;
for (i=1; i<=NF; i++) if ($i != "") c++;
print > (c == 1? "1" : "not_1")".txt"
}' sample.txt
Results:
$ head 1.txt not_1.txt
==> 1.txt <==
ngascash||o||||o||
ms-bronze.com.br||o||||o||
==> not_1.txt <==
www.google.org||o||srScSG2C5tg=||o||bngwq
farhansingla.it||o||4sQVj09gpls=||o||
Following awk may help you on same.
awk -F'\\|\\|o\\|\\|' '{for(i=1;i<=NF;i++){count=$i?++count:count};if(count==1){print > "1_field_only"};if(count>1){print > "not_1_field"};count=""}' Input_file
Adding a non-one liner form of solution too now.
awk -F'\\|\\|o\\|\\|' '
{
for(i=1;i<=NF;i++){ count=$i?++count:count };
if(count==1) { print > "1_field_only" };
if(count>1) { print > "not_1_field" };
count=""
}
' Input_file
Explanation: Adding explanation for above code too now.
awk -F'\\|\\|o\\|\\|' ' ##Setting field separator as ||o|| here and escaping the | here to take it literal character here.
{
for(i=1;i<=NF;i++){ count=$i?++count:count }; ##Starting a for loop to traverse through all the fields here, increasing variable count value if a field is NOT null.
if(count==1) { print > "1_field_only" }; ##Checking if count value is 1 it means fields are only 1 in line so printing current line into 1_field_only file.
if(count>1) { print > "not_1_field" }; ##Checking if count is more than 1 so printing current line into output file named not_1_field file here.
count="" ##Nullifying the variable count here.
}
' Input_file ##Mentioning Input_file name here.

AWK remove blank lines

The /./ is removing blank lines for the first condition { print "a"$0 } only, how would I ensure the script removes blank lines for every condition ?
awk -F, '/./ { print "a"$0 } NR!=1 { print "b"$0 } { print "c"$0 } END { print "d"$0 }' MyFile
A shorter form of the already proposed answer could be the following:
awk NF file
Any awk script follows the syntax condition {statement}. If the statement block is not present, awk will print the whole record (line) in case the condition is not zero.
NF variable in awk holds the number of fields in the line. So when the line is non empty, NF holds a positive value which trigger the default awk action (print the whole line). In case of empty line, NF is zero and the condition is not met, so awk does nothing.
Note that you don't even need quote because this 2 letters awk script doesn't contain any space or character that could be interpreted by the shell.
or
awk '!/^$/' file
^$ is the regex for an empty line. The 2 / is needed to let awk understand the string is a regex. ! is the standard negation.
Awk command to remove blank lines from a file:
awk 'NF > 0' filename
if you want to ignore all blank lines, put this at the beginning of the script
/^$/ {next}
Put following conditions inside the first one, and check them with if statements, like this:
awk -F, '
/./ {
print "a"$0;
if (NR!=1) { print "b"$0 }
print "c"$0
}
END { print "d"$0 }
' MyFile

Resources