Bash - How to split a string with multiple words

Bash - How to split a string with multiple words - string

I have a long string like following:
string='<span id="/yourid/12345" class="noname">lala1</span><span id="/yourid/34567" class="noname">lala2</span><span id="/yourid/39201" class="noname">lala3</span>'
The objective is to loop through each of the 'yourid' and echo the id 12345, 34567 and 39201 for further processing. How can this be achieve through bash shell?

GNU grep:
grep -oP '(?<=/yourid/)\d+' <<< "$string"
12345
34567
39201

Use a real XML parser. For instance, if you have XMLStarlet installed...
while read -r id; do
[[ $id ]] || continue
printf '%s\n' "${id#/yourid/}"
done < <(xmlstarlet sel -m -t '//span[#id]' -v ./#id -n <<<"<root>${string}</root>")

With Perl:
declare -a ids
ids=( $(perl -lne 'while(m!yourid/(\w+)!g){print $1}' <<< "$string") )
echo ${ids[#]}

Related

how to extract grep and cut into a bash array

I tried:
here is content of file.txt
some other text
#1.something1=kjfk
#2.something2=dfkjdk
#3.something3=3232
some other text
bash script:
ids=( `grep "something" file.txt | cut -d'.' -f1` )
for id in "${ids[#]}"; do
echo $id
done
result:
(nothing newline...)
(nothing newline...)
(nothing newline...)
but all it prints is nothing like newline for every such id found what am i missing?

Your grep and cut should be working but you can use awk and reduce 2 commands into one:
while read -r id;
echo "$id"
done < <(awk -F '\\.' '/something/{print $1}' file.txt)
To populate an array:
ids=()
while read -r id;
ids+=( "$id" )
done < <(awk -F '\\.' '/something/{print $1}' file.txt)

You can use grep's -o option to output only the text matched by a regular expression:
$ ids=($(grep -Eo '^#[0-9]+' file.txt))
$ echo ${ids[#]}
#1 #2 #3
This of course doesn't check for the existence of a period on the line... If that's important, then you could either expand things with another pipe:
$ ids=($(grep -Eo '^#[0-9]+\.something' file.txt | grep -o '^#[0-9]*'))
or you could trim the array values after populating the array:
$ ids=($(grep -Eo '^#[0-9]+\.something' file.txt))
$ echo ${ids[#]}
#1.something #2.something #3.something
$ for key in "${!ids[#]}"; do ids[key]="${ids[key]%.*}"; done
$ echo ${ids[#]}
#1 #2 #3

How to search through a string and extract the required value in unix

I have a string like below
QUERY_RESULT='88371087|COB-A#2014-04-22,COB-C#2014-04-22,2014-04-22,2014-04-23 88354188|COB-W#2014-04-22,2014-04-22,2014-04-22 88319898|COB-X#2014-04-22,COB-K#2014-04-22,2014-04-22,2014-04-22'
This is a result taken by querying the database. Now I want to take all the values before the pipe and separate it with coma. So the output needed is :
A='88371087,88354188,88319898'
The db values can be different every time, there can be just one value or 2 or more values
How do I do it.

Using awk
A=`echo $QUERY_RESULT | awk '{ nreg=split($0,reg);for(i=1;i<=nreg;i++){split(reg[i],fld,"|");printf("%s%s",(i==1?"":","),fld[1]);}}'`
echo $A
88371087,88354188,88319898

Using grep -oP
grep -oP '(^| )\K[^|]+' <<< "$QUERY_RESULT"
88371087
88354188
88319898
OR to get comma separated value:
A=$(grep -oP '(^| )\K[^|]+' <<< "$QUERY_RESULT"|sed '$!s/$/,/'|tr -d '\n')
echo "$A"
88371087,88354188,88319898

$ words=( $( grep -oP '\S+(?=\|)' <<< "$QUERY_RESULT") )
$ A=$(IFS=,; echo "${words[*]}")
$ echo "$A"
88371087,88354188,88319898

Bash only.
shopt -s extglob
result=${QUERY_RESULT//|+([^ ]) /,}
result=${result%|*}
echo "$result"
Output:
88371087,88354188,88319898

How to extract a part of string?

I have string contains a path
string="toto.titi.1.tata.2.abc.def"
I want to extract the substring which is situated after toto.titi.1.tata.2.. but 1 and 2 here are examples and could be other numbers.
In general: I want to extract the substring which situated after toto.titi.[i].tata.[j]..
[i] and [j] are a numbers
How to do it?

Pure bash solution:
[[ $string =~ toto\.titi\.[0-9]+\.tata\.[0-9]+\.(.*$) ]] && result="${BASH_REMATCH[1]}"
echo "$result"

An alternate bash solution that uses parameter expansion instead of a regular expression:
echo "${string#toto.titi.[0-9].tata.[0-9].}"
If the numbers can be multi-digit values (i.e., greater than 9), you would need to use an extended pattern:
shopt -s extglob
echo "${string#toto.titi.+([0-9]).tata.+([0-9]).}"

You can use cut
echo $string | cut -f6- -d'.'

This does it:
echo ${string} | sed -re 's/^toto\.titi\.[[:digit:]]+\.tata\.[[:digit:]]+\.//'

May be like this:
echo "$string" | cut -d '.' -f 6-

You can use sed. Like this:
string="toto.titi.1.tata.2.abc.def"
string=$(sed 's/toto\.titi\.[0-9]\.tata\.[0-9]\.//' <<< "$string")
echo "$string"
Output:
abc.def

try this awk line:
awk -F'toto\\.titi\\.[0-9]+\\.tata\\.[0-9]+\\.' '{print $2}' file
with your example:
kent$ echo "toto.titi.1.tata.2.abc.def"|awk -F'toto\\.titi\\.[0-9]+\\.tata\\.[0-9]+\\.' '{print $2}'
abc.def

Script on bash, can you?

There is a string $STRING, in which syllables are written with the spaces. If the variable $WORD have at least one syllable in this string, report of this in any way.

Your solution checks to see if $WORD exists in $STRING when it should be the other way around. Try this:
string="run walk stand"
word=walking
if echo "$string" | sed -e 's/ /\n/g' | grep -Fqif - <(echo "$word")
then
echo "Match!"
fi
As you can see, you can test the result of the grep without having to save the output in a variable.
By the way -n is the same as ! -z.

Need to grab data inbetween tilde character

Can any one advise how to search on linux for some data between a tilde character. I need to get IP data however its been formed like the below.
Details:
20110906000418~118.221.246.17~DATA~DATA~DATA

One more:
echo '20110906000418~118.221.246.17~DATA~DATA~DATA' | sed -r 's/[^~]*~([^~]+)~.*/\1/'

echo "20110906000418~118.221.246.17~DATA~DATA~DATA" | cut -d'~' -f2
This uses the cut command with the delimiter set to ~. The -f2 switch then outputs just the 2nd field.
If the text you give is in a file (called filename), try:
grep "[0-9]*~" filename | cut -d'~' -f2

With cut:
echo "20110906000418~118.221.246.17~DATA~DATA~DATA" | cut -d~ -f2
With awk:
echo "20110906000418~118.221.246.17~DATA~DATA~DATA"
| awk -F~ '{ print $2 }'

In awk:
echo '20110906000418~118.221.246.17~DATA~DATA~DATA' | awk -F~ '{print $2}'

Just use bash
$ string="20110906000418~118.221.246.17~DATA~DATA~DATA"
$ echo ${string#*~}
118.221.246.17~DATA~DATA~DATA
$ string=${string#*~}
$ echo ${string%%~*}
118.221.246.17

one more, using perl:
$ perl -F~ -lane 'print $F[1]' <<< '20110906000418~118.221.246.17~DATA~DATA~DATA'
118.221.246.17
bash:
#!/bin/bash
IFS='~'
while read -a array;
do
echo ${array[1]}
done < ip
If string is constant, the following parameter expansion performs substring extraction:
$ a=20110906000418~118.221.246.17~DATA~DATA~DATA
$ echo ${a:15:14}
118.221.246.17
or using regular expressions in bash:
$ echo $(expr "$a" : '[^~]*~\([^~]*\)~.*')
118.221.246.17
last one, again using pure bash methods:
$ tmp=${a#*~}
$ echo $tmp
118.221.246.17~DATA~DATA~DATA
$ echo ${tmp%%~*}
118.221.246.17

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Bash - How to split a string with multiple words - string

GNU grep: grep -oP '(?<=/yourid/)\d+' <<< "$string" 12345 34567 39201

Use a real XML parser. For instance, if you have XMLStarlet installed... while read -r id; do [[ $id ]] || continue printf '%s\n' "${id#/yourid/}" done < <(xmlstarlet sel -m -t '//span[#id]' -v ./#id -n <<<"<root>${string}</root>")

With Perl: declare -a ids ids=( $(perl -lne 'while(m!yourid/(\w+)!g){print $1}' <<< "$string") ) echo ${ids[#]}

Related

how to extract grep and cut into a bash array

How to search through a string and extract the required value in unix

How to extract a part of string?

Script on bash, can you?

Need to grab data inbetween tilde character

Categories

Resources