Printing all matches in a line using regular expression in awk - linux

Say i have a line:
Terminal="123" Pwd="567"
I want to select only number portion using awk
awk 'match($1, /[0-9]+/){print substr($1, RSTART, RLENGTH)};match($2, /[0-9]+/){print
substr($2, RSTART, RLENGTH)}' file
This gives the desired result.
123 567.
However there must be other better way to select both numbers without writing two match statements.
Thanks.

does grep work for you?
kent$ echo 'Terminal="123" Pwd="567"'|grep -o '[0-9]\+'
123
567
quick and dirty with awk:
awk -F'[^0-9]*' '{$1=$1}7'
test:
kent$ awk -F'[^0-9]*' '{$1=$1}7'<<< 'Terminal="123" Pwd="567"'
123 567
or:
kent$ awk '{gsub(/[^0-9 ]/,"")}7'<<< 'Terminal="123" Pwd="567"'
123 567

Here is a nice little solution with awk:
awk '{gsub("[^0-9]+"," "); print}'
Just converts all consecutive non-digit characters into one space, so it leaves one space before the digit sequence 123.

Here is another way to do it with awk. We set the field separator to "
$ echo 'Terminal="123" Pwd="567"' | awk -F\" '{print $2, $4}'
123 567

I ran into a similar problem but my patterns were more complex so I couldn't brush off my problems with gsub or such. I wrote a recursive function and a wrapper to it. It finds multiple matches in one variable and prints them out separated with a space:
awk '
function rec_wrap(str)
{
matches=""
return rec_func(str)
}
function rec_func(str2)
{
where=match(str2, /RE/)
if(where!=0) {
matches=(matches substr(str2, RSTART, RLENGTH) " ")
rec_func(substr(str2, RSTART+RLENGTH, length(str2)))
}
return matches
}
{print rec_wrap($1)}
' file.txt
The wrapper rec_wrap is needed to empty the variable matches. Function match writes the position and length of the leftmost match to variables RSTART and RLENGTH and the match is extracted with substr and appended to variable matches. Then the function rec_func calls itself with the rest of the string str2 as parameter until match fails to find anymore matches.

Related

How to split on several delimiters but keep those in between square brackets?

I am trying to split the following text string by dash, square brackets and colon delimiters but keep those in square brackets
Input:
10:100 - [10/09/21:12:23:22]
Desired output:
100, 10/09/21:12:23:22
My current code:
awk -F '[- ":]' '{print $1, $2, $3, $4, $5}'
1st solution: With GNU awk you could try following code.
awk '
match($0,/:([^[:space:]]+)[[:space:]]+-[[:space:]]+\[([^]]*)\]/,arr){
print arr[1],arr[2]
}
' Input_file
2nd solution: Using sed's s(substitution operation) along with its capturing group capability try following:
sed -E 's/^[^:]*:([^[:space:]]+)[[:space:]]+-[[:space:]]+\[([^]]*)\]/\1 \2/' Input_file
3rd solution: Using any awk you could use following code. Using its sub and gsub operations on 1st and last fields.
awk '{sub(/.*:/,"",$1);gsub(/^\[|\]$/,"",$NF);print $1,$NF}' Input_file
4th solution: With Perl's one-liner solution using a lazy match.*? one could try following using its substitution operation.
perl -pe 's/^.*?:([^[:space:]]+)[[:space:]]+-[[:space:]]+\[([^]]*)\]/\1 \2/' Input_file
If you have multiple of these patterns in the string and not regarding the order, you can make use of awk, match the patterns that you are interested in, and then remove the surrounding delimters.
In this case, you can match
\[[^][]+]|:[0-9]+
The pattern matches:
\[[^][]+] Match from [...]
| Or
:[0-9]+ Match : and 1+ digits
The part in gsub [:\[]|\]$ matches either : [ at the start of the string, or match ] at the end of the string, and will replace that with an empty string.
awk '
{
while(match($0,/\[[^][]+]|:[0-9]+/)){
v = substr($0,RSTART,RLENGTH)
gsub(/^[:\[]|\]$/, "", v)
print v
$0=substr($0,RSTART+RLENGTH)
}
}
' file
Output
100
10/09/21:12:23:22
assuming no empty lines within input data :
echo '10:100 - [10/09/21:12:23:22]' |
nawk 'sub("^[^:]*:",_, $!--NF)' FS='[ -]*[][]' OFS=', '
or
gawk 'NF -= sub("^[^:]*:",_)' FS='[ -]*[][]' OFS=', '
or
mawk 'NF -= sub("^[^:]*:",_)' FS='[][ -]+' OFS=', '
100, 10/09/21:12:23:22

How to remove double quotes in a specific column by using sub() in AWK

My sample data is
cat > myfile
"a12","b112122","c12,d12"
a13,887988,c13,d13
a14,b14121,c79,d13
when I try to remove " from colum 2 by
awk -F, 'BEGIN { OFS = FS } $2 ~ /"/ { sub(/"/, "", $2) }1' myfile
"a12",b112122","c12,d12"
a13,887988,c13,d13
a14,b14121,c79,d13
It only remove only 1 comma, instead of b112122 i am getting b112122"
how to remove all " in 2nd column
From the documentation:
Search target, which is treated as a string, for the leftmost, longest substring matched by the regular expression regexp.[...] Return the number of substitutions made (zero or one).
It is quite clear that the function sub is doing at most one single replacement and does not replace all occurences.
Instead, use gsub:
Search target for all of the longest, leftmost, nonoverlapping matching substrings it can find and replace them with replacement. The ‘g’ in gsub() stands for “global,” which means replace everywhere.
So you can add a 'g' to your line and it works fine:
awk -F, 'BEGIN { OFS = FS } $2 ~ /"/ { gsub(/"/, "", $2) }1' myfile
When you dealing with CSV file, not using FPAT, it will break sooner or later.
Here is a gnu awk that does the jib.
awk -v OFS="," -v FPAT="([^,]+)|(\"[^\"]+\")" '{gsub(/"/,"",$2)}1' file
"a12",b112122,"c12,d12"
a13,887988,c13,d13
a14,b14121,c79,d13
It will work fine on any column, number 3 as well.
Example on remove " on column 3 at the same time change separator to |
awk -v OFS="|" -v FPAT="([^,]+)|(\"[^\"]+\")" '{gsub(/"/,"",$3);$1=$1}1' file
"a12"|"b112122"|c12,d12
a13|887988|c13|d13
a14|b14121|c79|d13

Replace last character in specific column with value 0

How to replace the last character in column 2 with value 0
input
1232;1001;1
2231;2007;1
2234;2009;2
2003;1114;1
output desired
1232;1000;1
2231;2000;1
2234;2000;2
2003;1110;1
Modifying Input with gensub()
You can use any number of GNU awk string functions to do this, but the gensub() command is particularly useful. It has the signature:
gensub(regexp, replacement, how [, target])
which makes it extremely flexible for these sorts of transformations.
Converting Your Example
# Store your input in a shell variable for MCVE convenience, although
# you can have this data in a file or pass it on standard input if you
# prefer.
example_input='1232;1001;1
2231;2007;1
2234;2009;2
2003;1114;1'
# Use awk's gensub() string function.
echo "$example_input" | awk '{print gensub(/.;/, "0;", 2, $1)}'
This results in the following output:
1232;1000;1
2231;2000;1
2234;2000;2
2003;1110;1
awk approach:
awk -F';' '{ sub(/.$/,0,$2) }1' OFS=';' file
The output:
1232;1000;1
2231;2000;1
2234;2000;2
2003;1110;1
Or the same with substr() function:
awk -F';' '{ $2=substr($2,0,3)0 }1' OFS=';' file
not necessarily better, but a mathematical approach for numerical data...
$ awk 'BEGIN{FS=OFS=";"} {$2=int($2/10)*10}1'
round down the last digits (ones), to round down two digits (ones and tens) replace 10 with 100.
Or, simple replacement is easier with GNU sed
$ sed 's/.;/0;/2'
I would do that with sed:
sed -e 's/^\([^;]*;[^;]*\).;/\10;/' filename

How To Sed Search Replace Entire Word With String Match In File

I have modified the code found here: sed whole word search and replace
I have been trying to use the proper syntax \< and \> for the sed to match multiple terms in a file.
echo "Here Is My Example Testing Code" | sed -e "$(sed 's:\<.*\>:s/&//ig:' file.txt)"
However, I think, because it's looking into the file, it doesn't match the full word (only exact match) leaving some split words and single characters.
Does anyone know the proper syntax?
Example:
Input:
Here Is My Example Testing Code
File.txt:
example
test
Desired output:
Here Is My Code
Modify your sed command as followed should extract what you want,
sed -e "$(sed 's:\<.*\>:s/&\\w*\\s//ig:' file.txt)"
Brief explanation,
\b matches the position between a word and a non-alphanumeric character. In this case, the pattern 'test' in file.txt would not match 'Testing'.
In this way, modify the searched pattern appended with \w* should work. \w actually matched [a-zA-Z0-9_]
And don't forget to eliminate the space behind each searched pattern, \s should be added.
Following awk could help you in same.
awk 'FNR==NR{a[$0]=$0;next} {for(i=1;i<=NF;i++){for(j in a){if(tolower($i)~ a[j]){$i=""}}}} 1' file.txt input
***OR***
awk '
FNR==NR{
a[$0]=$0;
next
}
{
for(i=1;i<=NF;i++){
for(j in a){
if(tolower($i)~ a[j]){
$i=""}
}}}
1
' file.txt input
Output will be as follows.
Here Is My Code
Also if your Input_file is always a single space delimited and you don't want unnecessary space as shown in above output, then you could use following.
awk 'FNR==NR{a[$0]=$0;next} {for(i=1;i<=NF;i++){for(j in a){if(tolower($i)~ a[j]){$i=""}}};gsub(/ +/," ")} 1' file.txt input
***OR***
awk '
FNR==NR{
a[$0]=$0;
next
}
{
for(i=1;i<=NF;i++){
for(j in a){
if(tolower($i)~ a[j]){
$i=""}
}};
gsub(/ +/," ")
}
1
' file.txt input
Output will be as follows.
Here Is My Code

Awk using index with Substring

I have one command to cut string.
I wonder detail of control index of command in Linux "awk"
I have two different case.
I want to get word "Test" in below example string.
1. "Test-01-02-03"
2. "01-02-03-Test-Ref1-Ref2
First one I can get like
substr('Test-01-02-03',0,index('Test-01-02-03',"-"))
-> Then it will bring result only "test"
How about Second case I am not sure how can I get Test in that case using index function.
Do you have any idea about this using awk?
Thanks!
This is how to use index() to find/print a substring:
$ cat file
Test-01-02-03
01-02-03-Test-Ref1-Ref2
$ awk -v tgt="Test" 's=index($0,tgt){print substr($0,s,length(tgt))}' file
Test
Test
but that may not be the best solution for whatever your actual problem is.
For comparison here's how to do the equivalent with match() for an RE:
$ awk -v tgt="Test" 'match($0,tgt){print substr($0,RSTART,RLENGTH)}' file
Test
Test
and if you like the match() synopsis, here's how to write your own function to do it for strings:
awk -v tgt="Test" '
function strmatch(source,target) {
SSTART = index(source,target)
SLENGTH = length(target)
return SSTART
}
strmatch($0,tgt){print substr($0,SSTART,SLENGTH)}
' file
If these lines are the direct input to awk then the following work:
echo 'Test-01-02-03' | awk -F- '{print $1}' # First field
echo '01-02-03-Test-Ref1-Ref2' | awk -F- '{print $NF-2}' # Third field from the end.
If these lines are pulled out of a larger line in an awk script and need to be split again then the following snippets will do that:
str="Test-01-02-03"; split(str, a, /-/); print a[1]
str="01-02-03-Test-Ref1-Ref2"; numfields=split(str, a, /-/); print a[numfields-2]

Resources