Find a pattern and replace

Find a pattern and replace - linux

This is the input to my file.
Number : 123
PID : IIT/123/Dakota
The expected output is :
Number : 111
PID : IIT/111/Dakota
I want to replace 123 to 111. To solve this I have tried following:
awk '/Number/{$NF=111} 1' log.txt
awk -F '[/]' '/PID/{$2="123"} 1' log.txt

Use sed for something this simple ?
Print the change to the screen (test with this) :
sed -e 's:123:111:g' f2.txt
Update the file (with this) :
sed -i 's:123:111:g' f2.txt
Example:
$ sed -i 's:123:111:g' f2.txt
$ cat f2.txt
Number : 111
PID : IIT/111/Dakota

EDIT2: Or you want to substitute each line's 123 with 111 without checking any condition which you tried in your awk then simply do:
awk '{sub(/123/,"111")} 1' Input_file
Change sub to gsub in case of many occurrences of 123 in a single line too.
Explanation of above code:
awk -v new_value="111" ' ##Creating an awk variable named new_value where OP could keep its new value which OP needs to be there in line.
/^Number/ { $NF=new_value } ##Checking if a line starts from Number string and then setting last field value to new_value variable here.
/^PID/ { num=split($NF,array,"/"); ##Checking if a line starts from PID then creating an array named array whose delimiter it / from last field value
array[2]=new_value; ##Setting second item of array to variable new_value here.
for(i=1;i<=num;i++){ val=val?val "/" array[i]:array[i] }; ##Starting a loop from 1 to till length of array and creating variable val to re-create last field of current line.
$NF=val; ##Setting last field value to variable val here.
val="" ##Nullifying variable val here.
}
1' Input_file ##Mentioning 1 to print the line and mentioning Input_file name here too.
EDIT: In case you need to / in your output too then use following awk.
awk -v new_value="111" '
/^Number/ { $NF=new_value }
/^PID/ { num=split($NF,array,"/");
array[2]=new_value;
for(i=1;i<=num;i++){ val=val?val "/" array[i]:array[i] };
$NF=val;
val=""
}
1' Input_file
Following awk may help you here.(Seems after I have applied code tags to your samples your sample input is changed a bit so editing my code accordingly now)
awk -F"[ /]" -v new_value="111" '/^Number/{$NF=new_value} /^PID/{$(NF-1)=new_value}1' Input_file
In case you want to save changes into Input_file itself append > temp_file &7 mv temp_file Input_file in above code then.
Explanation:
awk -F"[ /]" -v new_value="111" ' ##Setting field separator as space and / to each line and creating awk variable new_value which OP wants to have new value.
/^Number/{ $NF=new_value } ##Checking condition if a line is starting with string Number then change its last field to new_value value.
/^PID/ { $(NF-1)=new_value } ##Checking condition if a line starts from string PID then setting second last field to variable new_value.
1 ##awk works on method of condition then action, so putting 1 making condition TRUE here and not mentioning any action so by default print of current line will happen.
' Input_file ##Mentioning Input_file name here.

Related

Match lines based on patterns and reformat file Bash/ Linux

I am looking preferably for a bash/Linux method for the problem below.
I have a text file (input.txt) that looks like so (and many many more lines):
TCCTCCGC+TAGTTAGG_Vel_24_CC_LlanR_34 CC_LlanR
GGAGTATG+TCTATTCG_Vel_24_CC_LlanR_22 CC_LlanR
TTGACTAG+TGGAGTAC_Vel_02_EN_DavaW_11 EN_DavaW
TCGAATAA+TGGTAATT_Vel_24_CC_LlanR_23 CC_LlanR
CTGCTGAA+CGTTGCGG_Vel_02_EN_DavaW_06 EN_DavaW
index_07_barcode_04_PA-17-ACW-04 17-ACW
index_09_barcode_05_PA-17-ACW-05 17-ACW
index_08_barcode_37_PA-21-YC-15 21-YC
index_09_barcode_04_PA-22-GB-10 22-GB
index_10_barcode_37_PA-28-CC-17 28-CC
index_11_barcode_29_PA-32-MW-07 32-MW
index_11_barcode_20_PA-32-MW-08 32-MW
I want to produce a file that looks like
CC_LlanR(TCCTCCGC+TAGTTAGG_Vel_24_CC_LlanR_34,GGAGTATG+TCTATTCG_Vel_24_CC_LlanR_22,TCGAATAA+TGGTAATT_Vel_24_CC_LlanR_23)
EN_DavaW(TTGACTAG+TGGAGTAC_Vel_02_EN_DavaW_11,CTGCTGAA+CGTTGCGG_Vel_02_EN_DavaW_06)
17-ACW(index_07_barcode_04_PA-17-ACW-04,index_09_barcode_05_PA-17-ACW-05)
21-YC(index_08_barcode_37_PA-21-YC-15)
22-GB(index_09_barcode_04_PA-22-GB-10)
28-CC(index_10_barcode_37_PA-28-CC-17)
32-MW(index_11_barcode_29_PA-32-MW-07,index_11_barcode_20_PA-32-MW-08)
I thought that I could do something along the lines of this.
cat input.txt | awk '{print $1}' | grep -e "CC_LlanR" | paste -sd',' > intermediate_file
cat input.txt | awk '{print $2"("}' something something??
But I only know how to grep one pattern at a time? Is there a way to find all the matching lines at once and output them in this format?
Thank you!
(Happy Easter/ long weekend to all!)

With your shown samples please try following.
awk '
FNR==NR{
arr[$2]=(arr[$2]?arr[$2]",":"")$1
next
}
($2 in arr){
print $2"("arr[$2]")"
delete arr[$2]
}
' Input_file Input_file
2nd solution: Within a single read of Input_file try following.
awk '{arr[$2]=(arr[$2]?arr[$2]",":"")$1} END{for(i in arr){print i"("arr[i]")"}}' Input_file
Explanation(1st solution): Adding detailed explanation for 1st solution here.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first time Input_file is being read.
arr[$2]=(arr[$2]?arr[$2]",":"")$1 ##Creating array with index of 2nd field and keep adding its value with comma here.
next ##next will skip all further statements from here.
}
($2 in arr){ ##Checking condition if 2nd field is present in arr then do following.
print $2"("arr[$2]")" ##Printing 2nd field ( arr[$2] ) here.
delete arr[$2] ##Deleteing arr value with 2nd field index here.
}
' Input_file Input_file ##Mentioning Input_file names here.

Assuming your input is grouped by the $2 value as shown in your example (if it isn't then just run sort -k2,2 on your input first) using 1 pass and only storing one token at a time in memory and producing the output in the same order of $2s as the input:
$ cat tst.awk
BEGIN { ORS="" }
$2 != prev {
printf "%s%s(", ORS, $2
ORS = ")\n"
sep = ""
prev = $2
}
{
printf "%s%s", sep, $1
sep = ","
}
END { print "" }
$ awk -f tst.awk input.txt
CC_LlanR(TCCTCCGC+TAGTTAGG_Vel_24_CC_LlanR_34,GGAGTATG+TCTATTCG_Vel_24_CC_LlanR_22)
EN_DavaW(TTGACTAG+TGGAGTAC_Vel_02_EN_DavaW_11)
CC_LlanR(TCGAATAA+TGGTAATT_Vel_24_CC_LlanR_23)
EN_DavaW(CTGCTGAA+CGTTGCGG_Vel_02_EN_DavaW_06)
17-ACW(index_07_barcode_04_PA-17-ACW-04,index_09_barcode_05_PA-17-ACW-05)
21-YC(index_08_barcode_37_PA-21-YC-15)
22-GB(index_09_barcode_04_PA-22-GB-10)
28-CC(index_10_barcode_37_PA-28-CC-17)
32-MW(index_11_barcode_29_PA-32-MW-07,index_11_barcode_20_PA-32-MW-08)

This might work for you (GNU sed):
sed -E 's/^(\S+)\s+(\S+)/\2(\1)/;H
x;s/(\n\S+)\((\S+)\)(.*)\1\((\S+)\)/\1(\2,\4)\3/;x;$!d;x;s/.//' file
Append each manipulated line to the hold space.
Before moving on to the next line, accumlate like keys into a single line.
Delete every line except the last.
Replace the last line by the contents of the hold space.
Remove the first character (newline artefact introduced by H comand) and print the result.
N.B. The final solution is unsorted and in the original order.

gsub in awk with variable

I want to replace the ">" with variable names staring with ">" and ends with ".". But the following code is not printing the variable names.
for f in *.fasta;
do
nam=$(basename $f .fasta);
awk '{print $f}' $f | awk '{gsub(">", ">$nam."); print $0}'; done
Input of first file sample01.fasta:
cat sample01.fasta:
>textofDNA
ATCCCCGGG
>textofDNA2
ATCCCCGGGTTTT
Output expected:
>sample01.textofDNA
ATCCCCGGG
>sample01.textofDNA2
ATCCCCGGGTTTT

$ awk 'FNR==1{fname=FILENAME; sub(/[^.]+$/,"",fname)} sub(/^>/,""){$0=">" fname $0} 1' *.fasta
>sample01.textofDNA
ATCCCCGGG
>sample01.textofDNA2
ATCCCCGGGTTTT
Compared to the other answers you've got so far, the above will work in any awk, only does the file name calculation once per input file rather than once per line or once per >-line, won't fail if the file name contains other .s, won't fail if the file name contains &, and won't fail if the file name doesn't contain the string fasta..

Or like this? You don't really need the looping and basename or two awk invocations.
awk '{stub=gensub( /^([^.]+\.)fasta.*/ , "\\1", "1",FILENAME ) ; gsub( />/, ">"stub); print}' *.fasta
>sample01.textofDNA
ATCCCCGGG
>sample01.textofDNA2
ATCCCCGGGTTTT
Explanation: awk has knowledge of the filename it currently operates on through the built-in variable FILENAME; I strip the .fasta extension using gensub, and store it in the variable stub. The I invoke gsub to replace ">" with ">" and the content of my variable stub. After that I print it.
As Ed points out in the comments: gensub is a GNU extension and won't work on other awk implementations.

Could you please try following too.
awk '/^>/{split(FILENAME,array,".");print substr($0,1,1) array[1]"." substr($0,2);next} 1' Input_file
Explanation: Adding explanation for above code here.
awk '
/^>/{ ##Checking condition if a line starts from > then do following.
split(FILENAME,array,".") ##Using split function of awk to split Input_file name here which is stored in awk variable FILENAME.
print substr($0,1,1) array[1]"." substr($0,2) ##Printing substring to print 1st char then array 1st element and then substring from 2nd char to till last of line.
next ##next will skip all further statements from here.
}
1 ##1 will print all lines(except line that are starting from >).
' sample01.fasta ##Mentioning Input_file name here.

Shell script to grep string and its value and produce output in another file

I need to grep value of ErrCode, ErrAttkey and ErrDesc from the below Input file.
and need to display as below in another file
How can i do this using shell script?
Required output
ErrCode|ErrAtkey|ErrDesc
003010|A3|The Unique Record IDalreadyExists
008024|A8|Prepaid / Postpaid not specified
Input File
<TariffRecords><Tariff><UniqueID>TT07PMST0088</UniqueID><SubStat>Failure</SubStat><ErrCode>003010</ErrCode><ErrAttKey>A3</ErrAttKey><ErrDesc>The' Unique Record ID already 'Exists</ErrDesc></Tariff><Tariff><UniqueID>TT07PMST0086</UniqueID><SubStat>Success</SubStat><ErrCode>000000</ErrCode><ErrAttKey></ErrAttKey><ErrDesc>SUCCESS</ErrDesc></Tariff><Tariff><UniqueID>TT07PMCM0048</UniqueID><SubStat>Failure</SubStat><ErrCode>003010</ErrCode><ErrAttKey>A3</ErrAttKey><ErrDesc>The' Unique Record ID already 'Exists</ErrDesc></Tariff><Tariff><UniqueID>TT07PMCM0049</UniqueID><SubStat>Failure</SubStat><ErrCode>003010</ErrCode><ErrAttKey>A3</ErrAttKey><ErrDesc>The' Unique Record ID already 'Exists</ErrDesc></Tariff><Tariff><UniqueID>TT07PMPV0188</UniqueID><SubStat>Failure</SubStat><ErrCode>003010</ErrCode><ErrAttKey>A3</ErrAttKey><ErrDesc>The' Unique Record ID already 'Exists</ErrDesc></Tariff><Tariff><UniqueID>TT07PMTP0060</UniqueID><SubStat>Failure</SubStat><ErrCode>003010</ErrCode><ErrAttKey>A3</ErrAttKey><ErrDesc>The' Unique Record ID already 'Exists</ErrDesc></Tariff><Tariff><UniqueID>TT07PMVS0072</UniqueID><SubStat>Failure</SubStat><ErrCode>003010</ErrCode><ErrAttKey>A3</ErrAttKey><ErrDesc>The' Unique Record ID already 'Exists</ErrDesc></Tariff><Tariff><UniqueID>TT07PMPO0073</UniqueID><SubStat>Failure</SubStat><ErrCode>003010</ErrCode><ErrAttKey>A3</ErrAttKey><ErrDesc>The' Unique Record ID already 'Exists</ErrDesc></Tariff><Tariff><UniqueID>TT07PMPO0073</UniqueID><SubStat>Failure</SubStat><ErrCode>008024</ErrCode><ErrAttKey>A8</ErrAttKey><ErrDesc>Prepaid' / Postpaid not 'specified</ErrDesc></Tariff><Tariff><UniqueID>TT07PMSK0005</UniqueID><SubStat>Failure</SubStat><ErrCode>003010</ErrCode><ErrAttKey>A3</ErrAttKey><ErrDesc>The' Unique Record ID already 'Exists</ErrDesc></Tariff><Tariff><UniqueID>TT07PMSK0005</UniqueID><SubStat>Failure</SubStat><ErrCode>005020</ErrCode><ErrAttKey>A5</ErrAttKey><ErrDesc>Invalid' LSA 'Name</ErrDesc></Tariff><Tariff><UniqueID>TT07PMSK0005</UniqueID><SubStat>Failure</SubStat><ErrCode>008024</ErrCode><ErrAttKey>A8</ErrAttKey><ErrDesc>Prepaid' / Postpaid not 'specified</ErrDesc></Tariff><Tariff><UniqueID>TT07PMSK0005</UniqueID><SubStat>Failure</SubStat><ErrCode>015038</ErrCode><ErrAttKey>A15</ErrAttKey><ErrDesc>Regular' / Promotional is 'compulsory</ErrDesc></Tariff><Tariff><UniqueID>TT07PMSK0005</UniqueID><SubStat>Failure</SubStat><ErrCode>018048</ErrCode><ErrAttKey>A18</ErrAttKey><ErrDesc>Special' Eligibility Conditions cannot be left blank. If no conditions, please enter '`NIL`</ErrDesc></Tariff><Tariff><UniqueID>TT07PMTP0080</UniqueID><SubStat>Success</SubStat><ErrCode>000000</ErrCode><ErrAttKey></ErrAttKey><ErrDesc>SUCCESS</ErrDesc></Tariff></TariffRecords>

EDIT: As per OP all results should be shown even they are coming multiple times in Input_file so in that case following may help.
awk '{gsub(/></,">"RS"<")} 1' Input_file |
awk -F"[><]" -v time="$(date +%r)" -v date="$(date +%d/%m/%Y)" '
/ErrCode/||/ErrAttKey/||/ErrDesc/{
val=val?val OFS $3:$3
}
/<\/Tariff>/{
print val,date,time,FILENAME;
val=""
}' OFS="|"
I am surprised that you are saying that all lines are actually a single line.
So in case you want to change them into multiple lines(which actually should be the case then do following in single awk).
awk '{gsub(/></,">"RS"<")} 1' Input_file > temp_file && mv temp_file Input_file
awk -F"[><]" '/ErrCode/{value=$3;a[value]++} a[value]==1 && NF>3 &&(/ErrCode/||/ErrAttKey/||/ErrDesc/){val=val?val OFS $3:$3} /<\/Tariff>/{if(val && val ~ /^[0-9]/){print val};val=""}' Input_file
In case you don't want to change your Input_file into multiple lines pattern then run these 2 commands with pipe as follows.
awk '{gsub(/></,">"RS"<")} 1' Input_file |
awk -F"[><]" '
/ErrCode/{
value=$3;
a[value]++
}
a[value]==1 && NF>3 && (/ErrCode/||/ErrAttKey/||/ErrDesc/){
val=val?val OFS $3:$3
}
/<\/Tariff>/{
if(val && val ~ /^[0-9]/){
print val};
val=""
}'
NOTE: 2 points to be noted here, 1st: If anywhere tag's ErrCode value is null or not starting from digits then that tag's values will not be printed. 2nd point is it will not print any duplicate of values of ErrCode tag.

Assuming the content of your xml is in a file file.txt, the following will work :
echo "ErrCode|ErrAtkey|ErrDesc" && cat file.txt | sed 's/<Tariff>/\n/g' | sed 's/.*<ErrCode>//g;s/<.*<ErrAttKey>/|/g;s/<.*<ErrDesc>/|/g;s/<.*//g' | grep -v '^$'

AWK process data until next match

I am trying to process a file using awk.
sample data:
233;20180514;1;00;456..;m
233;1111;2;5647;6754;..;n
233;1111;2;5647;2342;..;n
233;1111;2;5647;p234;..;n
233;20180211;1;00;780..;m
233;1111;2;5647;3434;..;n
233;1111;2;5647;4545;..;n
233;1111;2;5647;3453;..;n
The problem statement is say I need to copy second column of record matching "1;00;" to following records until the next "1;00;" match and then copy the second column of that record further until next "1;00;" match. The match pattern "1;00;" could change as well.
It could be say "2;20;" . In that case I need to copy the second column until there is either "1;00;" or "2;20;" match.
I can do this using a while loop but I really need to do this using awk or sed as the file is huge and while may take a lot of time.
Expected output:
233;20180514;1;00;456..;m
233;20180514;1111;2;5647;6754;..;n+1
233;20180514;1111;2;5647;2342;..;n+1
233;20180514;1111;2;5647;p234;..;n+1
233;20180211;1;00;780..;m
233;20180211;1111;2;5647;3434;..;n+1
233;20180211;1111;2;5647;4545;..;n+1
233;20180211;1111;2;5647;3453;..;n+1
Thanks in advance.

EDIT: Since OP have changed the sample Input_file in question so adding code as per the new sample now.
awk -F";" '
length($2)==8 && !($3=="1" && $4=="00"){
flag=""}
($3=="1" && $4=="00"){
val=$2;
$2="";
sub(/;;/,";");
flag=1;
print;
next
}
flag{
$2=val OFS $2;
$NF=$NF"+1"
}
1
' OFS=";" Input_file
Basically checking if length of 2nd field of 8 and 3rd and 4th fields are NOT 1 and 0 conditions, rather than checking ;1;0.
If your actual Input_file is same as shown samples then following may help you.
awk -F";" 'NF==5 || !/pay;$/{flag=""} /1;00;$/{val=$2;$2="";sub(/;;/,";");flag=1} flag{$2=val OFS $2} 1' OFS=";" Input_file
Explanation:
awk -F";" ' ##Setting field separator as semi colon for all the lines here.
NF==5 || !/pay;$/{ ##Checking condition if number of fields are 5 on a line OR line is NOT ending with pay; if yes then do following.
flag=""} ##Setting variable flag value as NULL here.
/1;00;$/{ ##Searching string /1;00; at last of a line if it is found then do following:
val=$2; ##Creating variable named val whose value is $2(3nd field of current line).
$2=""; ##Nullifying 2nd column now for current line.
sub(/;;/,";"); ##Substituting 2 continous semi colons with single semi colon to remove 2nd columns NULL value.
flag=1} ##Setting value of variable flag as 1 here.
flag{ ##Checking condition if variable flag is having values then do following.
$2=val OFS $2} ##Re-creating value of $2 as val OFS $2, basically adding value of 2nd column of pay; line here.
1 ##awk works on concept of condition then action so mentioning 1 means making condition TRUE and no action mentioned so print will happen of line.
' OFS=";" Input_file ##Setting OFS as semi colon here and mentioning Input_file name here.

Output text between two PATTERNS if text matches a CRITERIA [duplicate]

This question already has answers here:
grep lines matching a pattern, and the lines before and after the matching until different pattern
(4 answers)
Closed 4 years ago.
I want to select blocks between PATTERN1 and PATTERN2 if text inside the block contains CRITERIA, otherwise discard the whole block.
Sample task:
Select text between PATTERN1='start' and PATTERN2='end', if some text between 'start' and 'end' matches CRITERIA='DCE', then output the whole block between 'start' and 'end'.
Sample input:
start
123
ABC
123
end
start
123
DCE
123
end
start
123
EFG
123
end
Sample output:
start
123
DCE
123
end
I've tried the following using awk, but couldn't find how to use CRITERIA between two patterns:
awk '/start/,/end/' input_file

EDIT: As per OP a Input_file may have match at the end too and may not have end string, so adding code as per that too now.
awk '
/start/{
if(val) { print value };
flag=1;
value=val=""}
/[dD[cC][eE]/ && flag { val=1 }
/end/ { flag="" }
flag{
value=value?value ORS $0:$0
}
END{
if(val) { print value }}
' Input_file
Explanation:
awk '
/start/{ ##Looking for string start in a line if found then do following.
if(val) { print value }; ##Checking if variable val is NOT NULL, if yes then print variable of value.
flag=1; ##Setting variable named flag as 1 here.
value=val=""} ##Nullifying variables value and val here.
/[dD[cC][eE]/ && flag { val=1 } ##Searching string DCE/dce in a line and checking if variable flag is NOT NULL then set variable val as 1.
/end/ { flag="" } ##Searching string end in current line, if found then Nullifying flag here.
flag{ ##Checking if variable named flag is SET or NOT NULL here.
value=value?value ORS $0:$0 ##Creating value whose value is current line value and concatenating in its own value.
}
END{ ##Starting END block of awk here.
if(val) { print value }} ##Checking if variable val is NOT NULL then print variable value here.
' Input_file
Could you please try following awk and let me know if this helps you.
awk '/start/{if(val){print value};flag=1;value=val=""} /[dD[cC][eE]/ && flag{val=1} /end/{flag=""} {value=value?value ORS $0:$0}' Input_file
Adding a non-one liner form of solution too here.
awk '
/start/{
if(val) { print value };
flag=1;
value=val=""}
/[dD[cC][eE]/ && flag{ val=1 }
/end/ { flag="" }
{
value=value?value ORS $0:$0
}
' Input_file

Since the start-end blocks are separated by empty rexords, you can use those for separating instead. Here with awk:
$ awk 'BEGIN{RS=""}/DCE/' file
start
123
DCE
123
end
Edit: Since the empty records were not there after all, let's separate with ends:
$ awk 'BEGIN{RS=ORS="end\n"}/DCE/' file
start
123
DCE
123
end

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Find a pattern and replace - linux

This is the input to my file. Number : 123 PID : IIT/123/Dakota The expected output is : Number : 111 PID : IIT/111/Dakota I want to replace 123 to 111. To solve this I have tried following: awk '/Number/{$NF=111} 1' log.txt awk -F '[/]' '/PID/{$2="123"} 1' log.txt

Use sed for something this simple ? Print the change to the screen (test with this) : sed -e 's:123:111:g' f2.txt Update the file (with this) : sed -i 's:123:111:g' f2.txt Example: $ sed -i 's:123:111:g' f2.txt $ cat f2.txt Number : 111 PID : IIT/111/Dakota

Related

Match lines based on patterns and reformat file Bash/ Linux

gsub in awk with variable

Shell script to grep string and its value and produce output in another file

AWK process data until next match

Output text between two PATTERNS if text matches a CRITERIA [duplicate]

Categories

Resources