Extract text between two given different delimiters in a given text in bash [duplicate]

Extract text between two given different delimiters in a given text in bash [duplicate] - linux

This question already has answers here:
Print text between delimiters using sed
(2 answers)
Closed 2 years ago.
I have a line of text which looks like hh^ay-pau+h#ow, I want to extract the text between - and + which in this case is pau. This should be done in bash. Any help would be appreciated.
EDIT: I want to extract the text between the first occurence of the tokens
PS: My google search didn't take me anywhere. I apologize if this question is already asked.

The way to do this in pure bash, is by using parameter expansions in bash
$ a=hh^ay-pau+h#ow
$ b=${a%%+*}
$ c=${b#*-}
$ echo $c
pau
b: remove everything including and behind the first + occurence
c: remove everything excluding and before the first - ocurrence
More info about substring removing in bash parameter expansion

Try
grep -Po "(?<=\-).*?(?=\+)"
For example,
echo "hh^ay-pau+h#ow" | grep -Po "(?<=\-).*?(?=\+)"

If you have only one occurence of - and + you can use cut:
$ echo "hh^ay-pau+h#ow" | cut -d "-" -f 2 | cut -d "+" -f 1
pau

Assuming one occurence of + and -, you can stick to bash
IFS=+- read -r _ x _ <<<'hh^ay-pau+h#ow'
echo $x
pau

If you're guarenteed to only have one - and one + .
% echo "hh^ay-pau+h#ow" | sed -e 's/.*-//' -e 's/+.*//'
pau

echo "hh^ay-pau+h#ow" | awk -F'-' '{print $2}' |awk -F'+' '{print $1}'

Related

How to strip stdout before logging into file? [duplicate]

Without using sed or awk, only cut, how do I get the last field when the number of fields are unknown or change with every line?

You could try something like this:
echo 'maps.google.com' | rev | cut -d'.' -f 1 | rev
Explanation
rev reverses "maps.google.com" to be moc.elgoog.spam
cut uses dot (ie '.') as the delimiter, and chooses the first field, which is moc
lastly, we reverse it again to get com

Use a parameter expansion. This is much more efficient than any kind of external command, cut (or grep) included.
data=foo,bar,baz,qux
last=${data##*,}
See BashFAQ #100 for an introduction to native string manipulation in bash.

It is not possible using just cut. Here is a way using grep:
grep -o '[^,]*$'
Replace the comma for other delimiters.
Explanation:
-o (--only-matching) only outputs the part of the input that matches the pattern (the default is to print the entire line if it contains a match).
[^,] is a character class that matches any character other than a comma.
* matches the preceding pattern zero or more time, so [^,]* matches zero or more non‑comma characters.
$ matches the end of the string.
Putting this together, the pattern matches zero or more non-comma characters at the end of the string.
When there are multiple possible matches, grep prefers the one that starts earliest. So the entire last field will be matched.
Full example:
If we have a file called data.csv containing
one,two,three
foo,bar
then grep -o '[^,]*$' < data.csv will output
three
bar

Without awk ?...
But it's so simple with awk:
echo 'maps.google.com' | awk -F. '{print $NF}'
AWK is a way more powerful tool to have in your pocket.
-F if for field separator
NF is the number of fields (also stands for the index of the last)

There are multiple ways. You may use this too.
echo "Your string here"| tr ' ' '\n' | tail -n1
> here
Obviously, the blank space input for tr command should be replaced with the delimiter you need.

This is the only solution possible for using nothing but cut:
echo "s.t.r.i.n.g." | cut -d'.' -f2-
[repeat_following_part_forever_or_until_out_of_memory:] | cut -d'.' -f2-
Using this solution, the number of fields can indeed be unknown and vary from time to time. However as line length must not exceed LINE_MAX characters or fields, including the new-line character, then an arbitrary number of fields can never be part as a real condition of this solution.
Yes, a very silly solution but the only one that meets the criterias I think.

If your input string doesn't contain forward slashes then you can use basename and a subshell:
$ basename "$(echo 'maps.google.com' | tr '.' '/')"
This doesn't use sed or awk but it also doesn't use cut either, so I'm not quite sure if it qualifies as an answer to the question as its worded.
This doesn't work well if processing input strings that can contain forward slashes. A workaround for that situation would be to replace forward slash with some other character that you know isn't part of a valid input string. For example, the pipe (|) character is also not allowed in filenames, so this would work:
$ basename "$(echo 'maps.google.com/some/url/things' | tr '/' '|' | tr '.' '/')" | tr '|' '/'

the following implements A friend's suggestion
#!/bin/bash
rcut(){
nu="$( echo $1 | cut -d"$DELIM" -f 2- )"
if [ "$nu" != "$1" ]
then
rcut "$nu"
else
echo "$nu"
fi
}
$ export DELIM=.
$ rcut a.b.c.d
d

An alternative using perl would be:
perl -pe 's/(.*) (.*)$/$2/' file
where you may change \t for whichever the delimiter of file is

It is better to use awk while working with tabular data. You don't have to master on command. If it can be achieved by awk, why not use that? I suggest you do not waste your precious time, and use a handful of commands to get the job done.
Example:
# $NF refers to the last column in awk
ll | awk '{print $NF}'

If you have a file named filelist.txt that is a list paths such as the following:
c:/dir1/dir2/file1.h
c:/dir1/dir2/dir3/file2.h
then you can do this:
rev filelist.txt | cut -d"/" -f1 | rev

Adding an approach to this old question just for the fun of it:
$ cat input.file # file containing input that needs to be processed
a;b;c;d;e
1;2;3;4;5
no delimiter here
124;adsf;15454
foo;bar;is;null;info
$ cat tmp.sh # showing off the script to do the job
#!/bin/bash
delim=';'
while read -r line; do
while [[ "$line" =~ "$delim" ]]; do
line=$(cut -d"$delim" -f 2- <<<"$line")
done
echo "$line"
done < input.file
$ ./tmp.sh # output of above script/processed input file
e
5
no delimiter here
15454
info
Besides bash, only cut is used.
Well, and echo, I guess.

choose -1
choose supports negative indexing (the syntax is similar to Python's slices).

I realized if we just ensure a trailing delimiter exists, it works. So in my case I have comma and whitespace delimiters. I add a space at the end;
$ ans="a, b"
$ ans+=" "; echo ${ans} | tr ',' ' ' | tr -s ' ' | cut -d' ' -f2
b

how to print text between two specific words using awk, sed? [duplicate]

This question already has answers here:
How to use sed/grep to extract text between two words?
(14 answers)
Closed 4 years ago.
how to print text between two specific words using awk, sed ?
$ ofed_info | awk '/MLNX_OFED_LINUX/{print}'
MLNX_OFED_LINUX-4.1-1.0.2.0 (OFED-4.1-1.0.2):
$
Output required:-
4.1-1.0.2.0

Following awk may help you here.(considering that your input to awk will be same as shown sample only)
your_command | awk '{sub(/[^-]*/,"");sub(/ .*/,"");sub(/-/,"");print}'
Solution 2nd: With sed solution now.
your_command | sed 's/\([^-]*\)-\([^ ]*\).*/\2/'
Solution 3rd: Using awk's match utility:
your_command | awk 'match($0,/[0-9]+\.[0-9]+\-[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/){print substr($0,RSTART,RLENGTH)}'

You may use this sed:
echo 'MLNX_OFED_LINUX-4.1-1.0.2.0 (OFED-4.1-1.0.2):' |
sed -E 's/^[^-]*-| .*//g'
4.1-1.0.2.0
This sed command removes text till first hyphen from start or text starting with space towards end.

Try this:
ofed_info | sed -n 's/^MLNX_OFED_LINUX-\([^ ]\+\).*/\1/p'
The sed command only selects lines starting with the keyword and prints the version attached to it.

How to remove "-" and a space from the beginning in a bash script? [duplicate]

This question already has answers here:
Editing/Replacing content in multiple files in Unix AIX without opening it
(2 answers)
Closed 6 years ago.
I have an output that looks as below
- 0.1-1
- 0.1-2
- 0.1-3
- 0.1-6
- 0.1-7
- 0.1-9
How to use grep or something else so as to remove "-" and a space from the beginning.
0.1-1
0.1-2
0.1-3
0.1-6
0.1-7
0.1-9

With sed:
sed -e 's/^- //' input.txt
Or with GNU grep:
grep -oP '^- \K.*' input.txt

You may use grep also,
grep -oE '[0-9].*' file

With awk:
awk '{print $2}' file

You can use cut to remove the first two columns of every line:
cut -c3- input.txt

How to grep for specific pattern in a file [duplicate]

This question already has answers here:
Can grep show only words that match search pattern?
(15 answers)
Closed 8 years ago.
I have a bash file that has below line along with other lines.
var BUILD_VERSION = '2014.17.10_23';
I just want to extract 2014.17.10_23 and this value may change so something like grep for 2014* . However when I do that I get the whole line returned instead of the value 2014.17.10_23.
What would be the best way to achieve this?
Thanks

Using awk:
awk -F= '/BUILD_VERSION/{print $2}' input | tr -d "[' ;]"
And with sed:
sed -n "/BUILD_VERSION/s/.*'\([^']*\)'.*/\1/p" input

grep 'BUILD_VERSION' <your file> | sed -e 's/var BUILD_VERSION = //g'
Would get you '2014.17.10_23'; tweak the sed expression (or pipe it through a few more) to get rid of quotes.
It would be a 1 liner regex in Perl...

Here is another awk solution:
awk -F' = ' '/BUILD_VERSION/ {gsub(/\x27|;/,""); print $NF}'

You can use this awk
awk -F\' '/BUILD_VERSION/ {print $2}' file
2014.17.10_23

How to find the last field using 'cut'

Without using sed or awk, only cut, how do I get the last field when the number of fields are unknown or change with every line?

You could try something like this:
echo 'maps.google.com' | rev | cut -d'.' -f 1 | rev
Explanation
rev reverses "maps.google.com" to be moc.elgoog.spam
cut uses dot (ie '.') as the delimiter, and chooses the first field, which is moc
lastly, we reverse it again to get com

Use a parameter expansion. This is much more efficient than any kind of external command, cut (or grep) included.
data=foo,bar,baz,qux
last=${data##*,}
See BashFAQ #100 for an introduction to native string manipulation in bash.

It is not possible using just cut. Here is a way using grep:
grep -o '[^,]*$'
Replace the comma for other delimiters.
Explanation:
-o (--only-matching) only outputs the part of the input that matches the pattern (the default is to print the entire line if it contains a match).
[^,] is a character class that matches any character other than a comma.
* matches the preceding pattern zero or more time, so [^,]* matches zero or more non‑comma characters.
$ matches the end of the string.
Putting this together, the pattern matches zero or more non-comma characters at the end of the string.
When there are multiple possible matches, grep prefers the one that starts earliest. So the entire last field will be matched.
Full example:
If we have a file called data.csv containing
one,two,three
foo,bar
then grep -o '[^,]*$' < data.csv will output
three
bar

Without awk ?...
But it's so simple with awk:
echo 'maps.google.com' | awk -F. '{print $NF}'
AWK is a way more powerful tool to have in your pocket.
-F if for field separator
NF is the number of fields (also stands for the index of the last)

There are multiple ways. You may use this too.
echo "Your string here"| tr ' ' '\n' | tail -n1
> here
Obviously, the blank space input for tr command should be replaced with the delimiter you need.

This is the only solution possible for using nothing but cut:
echo "s.t.r.i.n.g." | cut -d'.' -f2-
[repeat_following_part_forever_or_until_out_of_memory:] | cut -d'.' -f2-
Using this solution, the number of fields can indeed be unknown and vary from time to time. However as line length must not exceed LINE_MAX characters or fields, including the new-line character, then an arbitrary number of fields can never be part as a real condition of this solution.
Yes, a very silly solution but the only one that meets the criterias I think.

If your input string doesn't contain forward slashes then you can use basename and a subshell:
$ basename "$(echo 'maps.google.com' | tr '.' '/')"
This doesn't use sed or awk but it also doesn't use cut either, so I'm not quite sure if it qualifies as an answer to the question as its worded.
This doesn't work well if processing input strings that can contain forward slashes. A workaround for that situation would be to replace forward slash with some other character that you know isn't part of a valid input string. For example, the pipe (|) character is also not allowed in filenames, so this would work:
$ basename "$(echo 'maps.google.com/some/url/things' | tr '/' '|' | tr '.' '/')" | tr '|' '/'

the following implements A friend's suggestion
#!/bin/bash
rcut(){
nu="$( echo $1 | cut -d"$DELIM" -f 2- )"
if [ "$nu" != "$1" ]
then
rcut "$nu"
else
echo "$nu"
fi
}
$ export DELIM=.
$ rcut a.b.c.d
d

An alternative using perl would be:
perl -pe 's/(.*) (.*)$/$2/' file
where you may change \t for whichever the delimiter of file is

It is better to use awk while working with tabular data. You don't have to master on command. If it can be achieved by awk, why not use that? I suggest you do not waste your precious time, and use a handful of commands to get the job done.
Example:
# $NF refers to the last column in awk
ll | awk '{print $NF}'

If you have a file named filelist.txt that is a list paths such as the following:
c:/dir1/dir2/file1.h
c:/dir1/dir2/dir3/file2.h
then you can do this:
rev filelist.txt | cut -d"/" -f1 | rev

Adding an approach to this old question just for the fun of it:
$ cat input.file # file containing input that needs to be processed
a;b;c;d;e
1;2;3;4;5
no delimiter here
124;adsf;15454
foo;bar;is;null;info
$ cat tmp.sh # showing off the script to do the job
#!/bin/bash
delim=';'
while read -r line; do
while [[ "$line" =~ "$delim" ]]; do
line=$(cut -d"$delim" -f 2- <<<"$line")
done
echo "$line"
done < input.file
$ ./tmp.sh # output of above script/processed input file
e
5
no delimiter here
15454
info
Besides bash, only cut is used.
Well, and echo, I guess.

choose -1
choose supports negative indexing (the syntax is similar to Python's slices).

I realized if we just ensure a trailing delimiter exists, it works. So in my case I have comma and whitespace delimiters. I add a space at the end;
$ ans="a, b"
$ ans+=" "; echo ${ans} | tr ',' ' ' | tr -s ' ' | cut -d' ' -f2
b

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extract text between two given different delimiters in a given text in bash [duplicate] - linux

Try grep -Po "(?<=\-).?(?=\+)" For example, echo "hh^ay-pau+h#ow" | grep -Po "(?<=\-).?(?=\+)"

If you have only one occurence of - and + you can use cut: $ echo "hh^ay-pau+h#ow" | cut -d "-" -f 2 | cut -d "+" -f 1 pau

Assuming one occurence of + and -, you can stick to bash IFS=+- read -r _ x _ <<<'hh^ay-pau+h#ow' echo $x pau

If you're guarenteed to only have one - and one + . % echo "hh^ay-pau+h#ow" | sed -e 's/.-//' -e 's/+.//' pau

echo "hh^ay-pau+h#ow" | awk -F'-' '{print $2}' |awk -F'+' '{print $1}'

Related

How to strip stdout before logging into file? [duplicate]

how to print text between two specific words using awk, sed? [duplicate]

How to remove "-" and a space from the beginning in a bash script? [duplicate]

How to grep for specific pattern in a file [duplicate]

How to find the last field using 'cut'

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extract text between two given different delimiters in a given text in bash [duplicate] - linux

Try grep -Po "(?<=\-).*?(?=\+)" For example, echo "hh^ay-pau+h#ow" | grep -Po "(?<=\-).*?(?=\+)"

If you have only one occurence of - and + you can use cut: $ echo "hh^ay-pau+h#ow" | cut -d "-" -f 2 | cut -d "+" -f 1 pau

Assuming one occurence of + and -, you can stick to bash IFS=+- read -r _ x _ <<<'hh^ay-pau+h#ow' echo $x pau

If you're guarenteed to only have one - and one + . % echo "hh^ay-pau+h#ow" | sed -e 's/.*-//' -e 's/+.*//' pau

echo "hh^ay-pau+h#ow" | awk -F'-' '{print $2}' |awk -F'+' '{print $1}'

Related

How to strip stdout before logging into file? [duplicate]

how to print text between two specific words using awk, sed? [duplicate]

How to remove "-" and a space from the beginning in a bash script? [duplicate]

How to grep for specific pattern in a file [duplicate]

How to find the last field using 'cut'

Categories

Resources

Try grep -Po "(?<=\-).?(?=\+)" For example, echo "hh^ay-pau+h#ow" | grep -Po "(?<=\-).?(?=\+)"

If you're guarenteed to only have one - and one + . % echo "hh^ay-pau+h#ow" | sed -e 's/.-//' -e 's/+.//' pau