AWK--Print From End of Line till string is found - linux

Using awk or sed, how would one print from the end of a line until (the first instance of) a string was found. For instance, if flow were the string then flow.com would be parsed from www.stackoverflow.com and similarly for www.flow.stackoverflow.com

sed is an excellent tool for simple substitutions on a single line:
sed 's/.*\(flow\)/\1/' file

try this line if it works for you:
awk -F'flow' 'NF>1{print FS$NF}' file
alternative one-liner:
awk 'sub(/.*flow/,"flow")' file
test (I added some numbers to the EOL, so that we know where did the output come from):
kent$ cat f
www.stackoverflow.com1
and similarly for 2
www.flow.stackoverflow.com3
kent$ awk -F'flow' 'NF>1{print FS$NF}' f
flow.com1
flow.com3
kent$ awk 'sub(/.*flow/,"flow")' f
flow.com1
flow.com3
note that if the string has some speical meaning (for regex) chars, like *, |, [ ... you may need to escape those.

GNU grep can do it:
grep -oP 'flow(?!.*flow).*' <<END
www.stackoverflow.com
nothing here
www.flow.stackoverflow.com
END
flow.com
flow.com
That regular expression finds "flow" where, looking ahead, "flow" is not found, and then the rest of the line.
This would also work: simpler regex but more effort:
rev filename | grep -oP '^.*?wolf' | rev

Related

Replace substring of a string of characters with other characters in awk

I have a file which contains a very long string of characters and I would like to replace a substring of it with Ns. Example:
test
ABCDABCDABCD
I would like to replace a substring of it with all letters N with awk command and sed, all the characters from index 5 to 8, so the total length of letter N is 4.
Output
ABCDNNNNABCD
I tried something like this:
awk '{ v=substr($0,5,4); sed -i "s/$v/N/g";print substr($0,1,4)""v""substr($0,9,12)}' test
however, this command seems to give this output:
ABCDABCDABC
And no substitution was made
I would like to have in the code the number of the index from where to start the substitution, (here, for example, is 4) and the length number of the substitution ( here also 4), so I can just modify these numbers in case I want to start in another position and for a different length of substitution because in reality, I have a string with thousands of letter and I want to replace hundreds of characters so substitution of pattern does not work in my case
You want to use awk and sed? Seems like you actually want one of:
$ echo ABCDABCDABCD | perl -pe 'substr($_,4,4)="NNNN"'
ABCDNNNNABCD
$ echo ABCDABCDABCD | perl -pe 'substr($_,4,4)="N"x4'
ABCDNNNNABCD
$ echo 'ABCDABCDABCD' |
awk -v b=5 -v e=8 '{
t=substr($0,b,e-b+1); gsub(/./,"N",t); print substr($0,1,b-1) t substr($0,e+1)
}'
ABCDNNNNABCD
echo ABCDABCDABCD | awk '{$0=gensub(/ABCD/,"NNNN",2)}1'
ABCDNNNNABCD

sed only print substring in a string

I am trying to get a substring in a string that is in a large line of data.
The regex (INC............) matches the substring I am trying to get the value of at https://regexr.com/, but I am unable to get the value of the substring into a variable or print it out.
The part of the string around this value is
......TemplateID2":null,"Incident Number":"INC000006743193","Priority":"High","mc_ueid":null,"Assint......
I am getting the error char 26: unknown option to `s' when I try this or the entire string is printed out.
cat /tmp/file1 | sed -n 's/\(INC............\)/\1/p'
cat /tmp/file1 | sed -n 's/./*\(INC............).*/\1/'
Using sed, you need to remove what precedes and follows the string:
sed 's/.*\(INC............\).*/\1/' file
But you can also use grep, if your implementation supports the -o option:
grep -o 'INC............' file
Perl can be used, too:
perl -lne 'print $1 if /(INC............)/' file
That looks like JSON. If it's got {braces} around it which you cut out before posting (tsk tsk), you should definitely use jq if it's available. That said, this page needs some awk!
POSIX (works everywhere):
awk 'match($0, /INC[^"]+/) {print substr($0, RSTART, RLENGTH)}' /tmp/file1`
GNU (works on GNU/Linux):
gawk 'match($0, /INC[^"]+/, a) {print a[0]}' /tmp/file1
If you have more than one match per line (GNU):
gawk '{while(match($0=substr($0, RSTART+RLENGTH), /INC[0-9]+/, a)) print a[0]}' /tmp/file1

cut a particular number from the url in linux

I've a file with the below header generated by certain process
Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=8>; rel="last"
I want to cut just the number 8 from page=8 in the above content. How to go about it? Appreciate any help.
Try this -
$ cat f
Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=8>; rel="last"
$ awk -F'[&=<>]' '{for(i=1;i<=NF;i++) if($i ~ /^page$/) {print $(i+1)}}' f
2
8
If it is getting appended then you will get the last value using below awk :
$ awk -F'[&=<>]' '{for(i=1;i<=NF;i++) if($i ~ /^page$/) {kk=$(i+1)}} END{print kk}' ff
8
Limitation : Currently you have page=2 and page=8 and above command
will print the last page value.
And if you always want to print the 2nd value "8" (Added extra lines to the existing url, considering that it will keep on increasing and you always need the 2nd value then use below) -
$ cat f
Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=8>; rel="last"
<https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=8>; rel="last"
$ awk -v k=1 -F'[&=<>]' '{for(i=1;i<=NF;i++) if(($i ~ /^page$/) && (k==2) ) {print $(i+1)} k++}' f
8
Following is an implementation using grep:
grep -Po "&page=[0-9]*" <file_name> | grep -Po "[0-9]*"
Example:
echo 'Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=8000>; rel="last"' | grep -Po "&page=[0-9]*" | grep -Po "[0-9]*"
This will produces the result as expected.
echo 'Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=12345>; rel="last"' | grep -Po "&page=[0-9]*" |grep -Po "[0-9]*"| awk '2 == NR % $ct'
In awk. reverse the text, remove first [0-9]+=egap, output and rev again:
$ rev foo | awk 'sub(/[0-9]+=egap/,"")||1' |rev
Output:
Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&>; rel="last"
try:
awk '{gsub(/.*page=/,"page=");sub(/>.*/,"");print}' Input_file
Simply substitute the all line with .*page= to page= which is nothing but will go till last page string(as * is a greedy regex match), so then substitute >.*(means starting from > to till end of line) with NULL, then print the line which will be page=8 or last value of the page. Off course I am considering that your Input_file is same as example shown.
awk -F'[= >]' '{print $12}' file
8
awk -F= '{split($8,a,">");print a[1]}' file
8
awk -F= '$8=="8>; rel"{print substr($8,1,1)}' file
8
The fact that a greedy regex is needed here (only the last occurrence of &page= should be matched) enables a simple sed solution:
sed -E 's/^.*&page=([0-9]+).*$/\1/' file
^.*&page= matches everything up to the last occurrence of &page on the line.
([0-9]+) matches one or more digits, and - thanks to enclosure in (...) stores the match in the 1st (and only) capture group, which the replacement string then reference as \1.
.*$ matches any remaining character on the line.
By virtue of the regex having matched the entire line, \1 therefore results in just the captured number as the output.
The above works with both GNU and BSD/macOS sed and takes advantage of modern extended regular expressions (-E), but in case you need a POSIX-compliant solution (which must use basic regular expressions and is therefore more cumbersome):
sed 's/^.*&page=\([0-9]\{1,\}\).*$/\1/' file
With GNU grep (on Linux, as requested), a single-pass grep -Po solution is also possible; like the sed solution, it relies on greedily matching up to the last &page=:
grep -Po "^.*&page=\K[0-9]+" file
-P activates support for PRCEs (Perl-compatible Regular Expressions).
-o only outputs the matching part of the line.
\K drops everything matched so far, so that what [0-9]+ matches - one or more digits - is the only output.

search and replace a string in first occurrence

I am using the below sed command for search and replace operation.
sed -i '/searchstring/s|find string|replace string|g' filename
it change all the occurrences in a input file. how can i make it for only one time.
Thanks.
for example ,
a
a
b
b
a
c
d
this the input file.
command i have used like below,
sed -i '/a/s|a|changed|g' filename
output i got like below,
changed
changed
b
b
changed
c
d
ie, it made change 3 times.
but i have to change only one time.
the expected outputs is,
changed
a
b
b
a
c
d
You can use this sed:
sed -i '/searchstring/s|find string|replace string|'
Note: Removed g ( global substitution )
As per your updation,
sed -i '/search/{ s/search/changed/; t loop;}; :loop; n; b loop' yourfile
Removing the g is the correct approach, but just to show how it is done with awk
awk '/searchstring/ {sub(/find string/,"replace string")}1' file
It may not be the most elegant solution but it's very easy to understand:
tiago#dell:/tmp$ o="a";n="changed"; line=$(cat file | grep -n "$o" | cut -d: -f1| sort -n | head -1); sed -i.bak "$line s/$o/$n/g" file; cat file
changed
a
b
b
a
c
d
Explanation:
find the line number of the first occurrence of the match and then run the substitution on that line.
With gnu sed,
sed '0,/searchstring/ { /searchstring/ s|find string|replace string|g }' filename
with any g flag tweaking you need.
If you want to replace the first occurrence in the whole file:
awk '!f&&/search/{sub(/find/,"replace");f=7}7' file
or with the g:
awk '!f&&/search/{gsub(/find/,"replace");f=7}7' file
is the one you need.

Strings extraction from text file with sed command

I have a text file which contains some lines as the following:
ASDASD2W 3ASGDD12 SDADFDFDDFDD W11 ACC=PNO23 DFSAEFEA EAEDEWRESAD ASSDRE
AERREEW2 3122312 SDADDSADADAD W12 ACC=HH34 23SAEFEA EAEDEWRESAD ASEEWEE
A15ECCCW 3XCXXF12 SDSGTRERRECC W43 ACC=P11 XXFSAEFEA EAEDEWRESAD ASWWWW
ASDASD2W 3122312 SDAFFFDEEEEE SD3 ACC=PNI22 ABCEFEA EAEDEWRESAD ASWEDSSAD
...
I have to extract the substring between the '=' character and the following blank space for each line , i.e.
PNO23
HH34
P11
PNI22
I've been using the sed command but cannot figure out how to ignore all characters following the blank space.
Any help?
Use the right tool for the job.
$ awk -F '[= ]+' '{ print $6 }' input.txt
PNO23
HH34
P11
PNI22
Sorry, but have to add another one because I feel the existing answers are just to complicated
sed 's/.*=//; s/ .*//;' inputfile
This might work for you:
sed -n 's/.*=\([^ ]*\).*/\1/p' file
or, if you prefer:
sed 's/.*=\([^ ]*\).*/\1/p;d' file
Put the string you want to capture in a backreference:
sed 's/.*=\([^ =]*\) .*/\1/'
or do the substitution piecemeal;
sed -e 's/.*=//' -e 's/ .*//'
sed 's/[^=]*=\([^ ]*\) .*/\1/' inputfile
Match all the non-equal-sign characters and an equal sign. Capture a sequence of non-space characters. Match a space and the rest of the line. Substitute the captured string.
A chain of grep can do the trick.
grep -o '[=][a-zA-Z0-9]*' file | grep -o '[a-zA-Z0-9]*'

Resources