How to search and delete a pattern from a line?

How to search and delete a pattern from a line? - linux

I need to write a simple bash script that takes a text line
some-pattern something-else
and erases some-pattern and returns only something-else. I wrote a script to do the opposite with grep -o, but I don't know how I could do with this case. Any help is very much appreciated.
sample input:
"SNMPv2::sysLocation.0 = STRING: someLocation"
Desired Output:
"someLocation"

Considering " are NOT in your sample Input_file and expected output, could you please try following with GNU grep.
grep -oP '.*STRING: \K(.*)' Input_file
someLocation
For \K explanation:
\K is a PCRE extension to regex syntax discarding content prior to
that point from being included in match output

You can use sed to delete the part in front of what you want to keep.
Given:
$ echo "$s"
"SNMPv2::sysLocation.0 = STRING: someLocation"
You can do:
$ echo "$s" | sed -nE 's/^.*(someLocation)/\1/p'
someLocation
And if you want to add quotes:
$ echo "$s" | sed -nE 's/^.*(someLocation)/"\1"/p'
"someLocation"
If the portion after STRING: is variable, not fixed, you can use STRING: and the capture anchor:
$ echo "$s" | sed -nE 's/^.*STRING:[[:space:]]*(.*)/"\1"/p'
"someLocation"
Or, sed to capture and print the last word after the last space:
$ echo "$s" | sed -nE 's/([^[:space:]]*$)/\1/p'
You can also use awk if the last word is space separated from the other fields:
$ echo "$s" | awk '{print $NF}'
Or a pipeline with cut and rev works too:
$ echo "$s" | rev | cut -d' ' -f 1 | rev

You can use: echo ${STRING} | awk -F" " '/someLocation/ { print $NF }'
-F will use space (represented by double-quotes with space between them) as separator; /someLocation/ will search for your location; { print $NF } will show the last part of your string (which, I believe, is the place where location is.

Related

How to extract substring in the double quotes by using awk or other methods?

I want to extract you in this sample string:
See [ "you" later
However, my attempt does not work as expected:
awk '{ sub(/.*\"/, ""); sub(/\".*/, ""); print }' <<< "See [ \"you\" later"
result:
later
Using awk or other methods, how can I extract the substring in the double quotes?

1st solution: You can make use of gsub function of awk here. Just simply do 2 substitutions with NULL. 1st till 1st occurrence of " and then substitute everything from next " occurrence to everything with NULL and print that line.
awk '{gsub(/^[^"]*"|".*/,"")} 1' Input_file
2nd solution: Using GNU grep solution. Using its -oP option to print matched part and enable PCRE regex option respectively. With regex from starting match till very first occurrence of " and using \K option to forget matched part and then again match everything just before next occurrence of " which will print text between 2 " as per requirement.
grep -oP '^.*?"\K[^"]*' Input_file

You can also use cut here:
cut -d\" -f 2 <<< 'See [ "you" later '
It splits the string with a double quote and gets the second item.
Output:
you
See the online demo.

Using bash
IFS='"'
read -ra arr <<< "See [ \"you\" later"
echo ${arr[1]}
gives output
you
Explanation: use IFS to inform bash to split at ", read splitted text into array arr print 2nd element (which is [1] as [0] denotes 1st element).

Just a few ways using GNU awk for:
multi-char RS and RT:
$ echo 'See [ "you" later' |
awk -v RS='"[^"]*"' 'RT{ print substr(RT,2,length(RT)-2) }'
you
the 3rd arg to match():
$ echo 'See [ "you" later' |
awk 'match($0,/"([^"]*)"/,a){ print a[1] }'
you
gensub() (assuming the quoted string is always present):
$ echo 'See [ "you" later' |
awk '{print gensub(/.*"([^"]*)".*/,"\\1",1)}'
you
FPAT:
$ echo 'See [ "you" later' |
awk -v FPAT='[^"]*' 'NF>2{print $2}'
you
$ echo 'See [ "you" later' |
awk -v FPAT='"[^"]*"' 'NF{print substr($1,2,length($1)-2)}'
you
patsplit():
$ echo 'See [ "you" later' |
awk 'patsplit($0,f,/"[^"]*"/,s){print substr(f[1],2,length(f[1])-2)}'
you
the 4th arg to split():
$ echo 'See [ "you" later' |
awk 'split($0,f,/"[^"]*"/,s)>1{print substr(s[1],2,length(s[1])-2)}'
you

Here is an awk solution without any regex:
s='See [ "you" later'
awk -F '"' 'NF>2 {print $2}' <<< "$s"
you
Or a sed solution with regex:
sed -E 's/[^"]*"([^"]*)".*/\1/' <<< "$s"
you
Another awk with match:
awk 'match($0, /"[^"]*"/) {print substr($0, RSTART+1, RLENGTH-2)}' <<< "$s"
you

Using sed
$ sed -n 's/[^"]*"\([[:alpha:]]\+\)"[^"]*/\1 /gp' input_file
you

$ grep -oP '(?<=").*(?=")' <<< "See [ \"you\" later"
you

Extract all quoted substrings, and remove the quotes:
echo 'See [ "you" later, "" "a" "b" "c' |
grep -o '"[^"]*"' | tr -d \"
Gives:
you
a
b
"" is matched as an empty string on the second line of output (use grep -o '"[^"]\+"' to skip empty strings)
"c is not fully quoted, so it doesn't match
For a small string, you may want to use pure shell. This extracts the first quoted substring in $str:
str='Example "a" and "b".'
str=${str#*\"} # Cut up to first quote
case $str in
*\"*) str=${str%%\"*};; # Cut from second quote onwards
*) str= # $str contains less than two quotes
esac
echo "$str"
Gives
a

hands-free driving with awk :
echo 'See [ "you" later' |
gawk ++NF OFS= FS='^[^\"]*\"|\".*$' # any one of these 3,
# specific for this case
gawk '$_ = $--NF' FS='\"'
mawk '$!--NF=$NF' FS='\"'
you

perl -nE 'say $& if /(?<=")\w+/' <<< "See [ \"you\" later"

An awk example that works in BSD and Linux:
% echo 'See [ "you" later ' | awk -F\" '{print $2}'
you
For sed, use for both BSD and Linux:
% echo 'See [ "you" later ' | sed -e 's/^[^"]*"//' -e 's/"[^["]*$//'
you
For php, make a file called you.php:
<?php
$you = explode('"', 'See [ "you" later ');
echo $you[1], PHP_EOL;
Then run it:
% php you.php
you
For perl, find the content inside quotes:
% echo 'See [ "you" later| perl -pe 's/^[^"]*"(.*)"[^"]*(.)$/$1$2/'
Remove the (.) and $2 if you don't wish to capture the newline.

Extract field after colon for lines where field before colon matches pattern

I have a file file1 which looks as below:
tool1v1:1.4.4
tool1v2:1.5.3
tool2v1:1.5.2.c8.5.2.r1981122221118
tool2v2:32.5.0.abc.r20123433554
I want to extract value of tool2v1 and tool2v2
My output should be 1.5.2.c8.5.2.r1981122221118 and 32.5.0.abc.r20123433554.
I have written the following awk but it is not giving correct result:
awk -F: '/^tool2v1/ {print $2}' file1
awk -F: '/^tool2v2/ {print $2}' file1

grep -E can also do the job:
grep -E "tool2v[12]" file1 |sed 's/^.*://'

If you have a grep that supports Perl compatible regular expressions such as GNU grep, you can use a variable-sized look-behind:
$ grep -Po '^tool2v[12]:\K.*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
The -o option is to retain just the match instead of the whole matching line; \K is the same as "the line must match the things to the left, but don't include them in the match".
You could also use a normal look-behind:
$ grep -Po '(?<=^tool2v[12]:).*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
And finally, to fix your awk which was almost correct (and as pointed out in a comment):
$ awk -F: '/^tool2v[12]/ { print $2 }' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554

You can filter with grep:
grep '\(tool2v1\|tool2v2\)'
And then remove the part before the : with sed:
sed 's/^.*://'
This sed operation means:
^ - match from beginning of string
.* - all characters
up to and including the :
... and replace this matched content with nothing.
The format is sed 's/<MATCH>/<REPLACE>/'
Whole command:
grep '\(tool2v1\|tool2v2\)' file1|sed 's/^.*://'
Result:
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554

the question has already been answered though, but you can also use pure bash to achieve the desired result
#!/usr/bin/env bash
while read line;do
if [[ "$line" =~ ^tool2v* ]];then
echo "${line#*:}"
fi
done < ./file1.txt
the while loop reads every line of the file.txt, =~ does a regexp match to check if the value of $line variable if it starts with toolv2, then it trims : backward

printing "grep -o" output in single line

How to print output of grep -o in a single line ? I am trying to print :
$ echo "Hello Guys!" |grep -E '[A-Z]'
Hello Guys!
$ echo "Hello Guys!" |grep -Eo '[A-Z]' <----Multiple lines
H
G
$ echo "Hello Guys!" |grep -Eo '[A-Z]'
Desired output:
HG
I am able to cheaply achieve it using following command ,but the issue is that number of letters(3 in this case) could be dynamic. So this approach cannot be used.
echo "HEllo Guys!" |grep -oE '[A-Z]' |xargs -L3 |sed 's/ //g'
HEG

You could do it all with this sed instruction
echo "Hello Guys!" |sed 's/[^A-Z]//g'
UPDATE
Breakdown of sed command:
The s/// is sed's substitute command. It simply replaces the first RegEx (the one between the first and the second slash) with the Expression between slash two and three. The trailing g stands for global, i.e, do this for every match of the RegEx in the current line. Without the g it would just stop processing after the first match. The RegEx itself is matching any non-capital letter and then those letters are replaced with nothing, i.e., effectively deleted.

You can use awk:
echo "Hello Guys!" | awk '{ gsub(/[^A-Z]/,"", $0); print;}'
HG
Also with tr:
echo "Hello Guys!" | tr -cd [:upper:]
HG
Also with sed :
echo "Hello Guys!" | sed 's/[^\[:upper:]]//g'
HG

You just need to remove the newline characters. You can use tr for that:
echo "HEllo Guys!" |grep -Eo '[A-Z]' |tr -d '\n'
HEG
Though, it cuts the last newline too.

You can use perl instead of grep
echo 'HEllo Guys!' | perl -lne 'print /([A-Z])/g'
HEG

Extract substring after a character

I'm trying to extract substring after the last period (dot).
examples below.
echo "filename..txt" should return "txt"
echo "filename.txt." should return ""
echo "filename" should return ""
echo "filename.xml" should return "xml"
I tried below. but works only if the character(dot) exists once. But my filename may have (dot) for 0 or more times.
echo "filename.txt" | cut -d "." -f2

Let's use awk!
awk -F"." '{print (NF>1)? $NF : ""}' file
This sets field separator to . and prints the last one. But if there is none, it prints an empty string.
Test
$ cat file
filename..txt
filename.txt.
filename
filename.xml
$ awk -F"." '{print (NF>1)? $NF : ""}' file
txt
xml

One can make this portable (so it's not Linux-only), avoiding an ERE dependency, with the following:
$ sed -ne 's/.*\.//p' <<< "file..txt"
txt
$ sed -ne 's/.*\.//p' <<< "file.txt."
$ sed -ne 's/.*\.//p' <<< "file"
$ sed -ne 's/.*\.//p' <<< "file.xml"
xml
Note that for testing purposes, I'm using a "here-string" in bash. If your shell is not bash, use whatever your shell uses to feed data to sed.
The important bit here is the use of sed's -n option, which tells it not to print anything by default, combined with the substitute command's explicit p flag, which tells sed to print only upon a successful substitution, which obviously requires a dot to be included in the pattern.
With this solution, the difference between "file.txt." and "file" is that the former returns the input line replaced with null (so you may still get a newline depending on your usage), whereas the latter returns nothing, as sed is not instructed to print, as no . is included in the input. The end result may well be the same, of course:
$ printf "#%s#\n" $(sed -ne 's/.*\.//p' <<< "file.txt.")
##
$ printf "#%s#\n" $(sed -ne 's/.*\.//p' <<< "file")
##

Simple to do with awk:
awk -F"." '{ print $NF }'
What this does: With dot as a delimiter, extract the last field from the input.

Use sed in 2 steps: first remove string without a dot and than remove up to the last dot:
sed -e 's/^[^.]*$//' -e 's/.*\.//'
Test:
for s in file.txt.. file.txt. file.txt filename file.xml; do
echo "$s -> $(echo "$s" | sed -e 's/^[^.]*$//' -e 's/.*\.//')"
done
Testresult:
file.txt.. ->
file.txt. ->
file.txt -> txt
filename ->
file.xml -> xml
Actually the answer of #ghoti is roughly the same, just a bit shorter (better).
This solution can be used by other readers who wants to do something like this in another language.

Need to grab data inbetween tilde character

Can any one advise how to search on linux for some data between a tilde character. I need to get IP data however its been formed like the below.
Details:
20110906000418~118.221.246.17~DATA~DATA~DATA

One more:
echo '20110906000418~118.221.246.17~DATA~DATA~DATA' | sed -r 's/[^~]*~([^~]+)~.*/\1/'

echo "20110906000418~118.221.246.17~DATA~DATA~DATA" | cut -d'~' -f2
This uses the cut command with the delimiter set to ~. The -f2 switch then outputs just the 2nd field.
If the text you give is in a file (called filename), try:
grep "[0-9]*~" filename | cut -d'~' -f2

With cut:
echo "20110906000418~118.221.246.17~DATA~DATA~DATA" | cut -d~ -f2
With awk:
echo "20110906000418~118.221.246.17~DATA~DATA~DATA"
| awk -F~ '{ print $2 }'

In awk:
echo '20110906000418~118.221.246.17~DATA~DATA~DATA' | awk -F~ '{print $2}'

Just use bash
$ string="20110906000418~118.221.246.17~DATA~DATA~DATA"
$ echo ${string#*~}
118.221.246.17~DATA~DATA~DATA
$ string=${string#*~}
$ echo ${string%%~*}
118.221.246.17

one more, using perl:
$ perl -F~ -lane 'print $F[1]' <<< '20110906000418~118.221.246.17~DATA~DATA~DATA'
118.221.246.17
bash:
#!/bin/bash
IFS='~'
while read -a array;
do
echo ${array[1]}
done < ip
If string is constant, the following parameter expansion performs substring extraction:
$ a=20110906000418~118.221.246.17~DATA~DATA~DATA
$ echo ${a:15:14}
118.221.246.17
or using regular expressions in bash:
$ echo $(expr "$a" : '[^~]*~\([^~]*\)~.*')
118.221.246.17
last one, again using pure bash methods:
$ tmp=${a#*~}
$ echo $tmp
118.221.246.17~DATA~DATA~DATA
$ echo ${tmp%%~*}
118.221.246.17

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to search and delete a pattern from a line? - linux

Considering " are NOT in your sample Input_file and expected output, could you please try following with GNU grep. grep -oP '.STRING: \K(.)' Input_file someLocation For \K explanation: \K is a PCRE extension to regex syntax discarding content prior to that point from being included in match output

Related

How to extract substring in the double quotes by using awk or other methods?

Extract field after colon for lines where field before colon matches pattern

printing "grep -o" output in single line

Extract substring after a character

Need to grab data inbetween tilde character

Categories

Resources