Bash extract after substring and before substring - string

Say I have a string:
random text before authentication_token = 'pYWastSemJrMqwJycZPZ', gravatar_hash = 'd74a97f
I want a shell command to extract everything after "authentication_token = '" and before the next '.
So basically, I want to return pYWastSemJrMqwJycZPZ.
How do I do this?

Use parameter expansion:
#!/bin/bash
text="random text before authentication_token = 'pYWastSemJrMqwJycZPZ', gravatar_hash = 'd74a97f"
token=${text##* authentication_token = \'} # Remove the left part.
token=${token%%\'*} # Remove the right part.
echo "$token"
Note that it works even if random text contains authentication token = '...'.

If your grep supports -P then you could use this PCRE regex,
$ echo "random text before authentication_token = 'pYWastSemJrMqwJycZPZ', gravatar_hash = 'd74a97f" | grep -oP "authentication_token = '\K[^']*"
pYWastSemJrMqwJycZPZ
$ echo "random text before authentication_token = 'pYWastSemJrMqwJycZPZ', gravatar_hash = 'd74a97f" | grep -oP "authentication_token = '\K[^']*(?=')"
pYWastSemJrMqwJycZPZ
\K discards previously matched characters from printing at the final.
[^']* negated character class which matches any character but not of ' zero or more times.
(?=') Positive lookahead which asserts that the match must be followed by a single quote.

My simple version is
sed -r "s/(.*authentication_token = ')([^']*)(.*)/\2/"

IMO, grep -oP is the best solution. For completeness, a couple of alternatives:
sed 's/.*authentication_token = '\''//; s/'\''.*//' <<<"$string"
awk -F "'" '{for (i=1; i<NF; i+=2) if ($1 ~ /authentication_token = $/) {print $(i+1); break}}' <<< "$string"

Use bash's regular expression matching facilities.
$ regex="_token = '([^']+)'"
$ string="random text before authentication_token = 'pYWastSemJrMqwJycZPZ', gravatar_hash = 'd74a97f'"
$ [[ $string =~ $regex ]] && hash=${BASH_REMATCH[1]}
$ echo "$hash"
pYWastSemJrMqwJycZPZ
Using a variable in place of a literal regular expression simplifies quoting the spaces and single quotes.

Related

How to add single quotes in a shell script using sed

Need help in making a sed script to find and replace user input along with single quotes. Input file admins.py:
Script:
read adminsid
while [[ $adminsid == "" ]];
do
echo "You did not enter anything. Please re-enter AdminID"
read adminsid
done
## Please enter Admin's ID
9999999999,8888888888,1111111111
## Script To Replace ADMIN_IDS = [] to ADMIN_IDS = ['9999999999,8888888888,1111111111'] in file
sed -i "s|ADMIN_IDS = \[.*\]|ADMIN_IDS = ['$adminsid']|g" $file
## Current results:
ADMIN_IDS = ['9999999999,8888888888,1111111111']
## Expected results:
ADMIN_IDS = ['9999999999','8888888888','1111111111']
Assign the variable to the data
adminsid=9999999999,8888888888,1111111111
Then use sed -e (script) option to add the quoting, and square brackets.
echo "$adminsid" | sed -e "s/,/\',\'/g" -e "s/^/[\'/" -e "s/$/\']/"
or to apply changes to a file (filename in $file):
sed -i "$file" -e "s/,/\',\'/g" -e "s/^/[\'/" -e "s/$/\']/"
You can do this with awk too:
Suppose you have assigned the variable as :
adminsid=9999999999,8888888888,1111111111
Then the solution:
echo "$adminsid"| awk -F"," -v quote="'" -v OFS="','" '$1=$1 {print "["quote $0 quote"]"}'
-F"," -v OFS="','" :: Replacing separator (,) with (',')
print "["quote $0 quote"]" :: Add single quotes(') and ([) and (]) to the begin and end of line
This might work for you (GNU sed & bash):
<<<"$adminsid" sed 's/[^,]\+/'\''&'\''/g;s/.*/[&]/'
Surround all non-comma characters by single quotes and then surround the entire string by square brackets.
Replace the , with ',' in the variable and add characters at the beginning and at the end.
sed "s/.*/['&']/" <<< "${adminsid//,/','}"
echo "('${adminsid//,/\\',\\'}')"

Field separation with adding quotes

I am beginner in shell script .
I have one variable containing value having = character.
I want to add quote in fields after = Character.
abc="source=TDG"
echo $abc|awk -F"=" '{print $2}'
My code is printing one field only.
my expected output is
source='TDG'
$ abc='source=TDG'
$ echo "$abc" | sed 's/[^=]*$/\x27&\x27/'
source='TDG'
[^=]*$ match non = characters at end of line
\x27&\x27 add single quotes around the matched text
With awk
$ echo "$abc" | awk -F= '{print $1 FS "\047" $2 "\047"}'
source='TDG'
-F= input field separator is =
print $1 FS "\047" $2 "\047" print first field, followed by input field separator, followed by single quotes then second field and another single quotes
See how to escape single quote in awk inside printf
for more ways of handling single quotes in print
With bash parameter expansion
$ echo "${abc%=*}='${abc#*=}'"
source='TDG'
${abc%=*} will delete last occurrence of = and zero or more characters after it
${abc#*=} will delete zero or more characters and first = from start of string
Sed would be the better choice:
echo "$abc" | sed "s/[^=]*$/'&'/"
Awk can do it but needs extra bits:
echo "$abc" | awk -F= 'gsub(/(^|$)/,"\047",$2)' OFS==
What is taking place?
Using sub to surround TDG with single quotes by its octal nr to avoid quoting problems.
echo "$abc" | awk '{sub(/TDG/,"\047TDG\047")}1'
source='TDG'

Extract email string from string in bash

I have a variable: $change.
I have tried to extract email from it (find the string between "by" and "#"):
change="Change 1234 on 2016/08/31 by name#company.com 'cdex abcd'"
email=$(echo $change|sed -e 's/\by\(.*\)#/\1/')
It did not work.
You have an escape character before b, which makes it \b. And this is a word boundary, so something you don't want here.
See the difference:
$ echo "$change" | sed -e 's/\by\(.*\)#/\1/'
# ^
Change 1234 on 2016/08/31 by name#company.com 'cdex abcd'
$ echo "$change" | sed -e 's/by\(.*\)#/\1/'
# ^
Change 1234 on 2016/08/31 namecompany.com 'cdex abcd'
# ^
# by is not here any more
But if you want to get the name, just use .* to match everything up to by:
$ echo "$change" | sed -e 's/.*by\(.*\)#/\1/'
namecompany.com 'cdex abcd'
Finally, if what you want is just the data between by (note the trailing space) and #, use either of these (with -r you don't have to escape the captured groups):
sed -e 's/.*by \(.*\)#.*/\1/'
sed -r 's/.*by (.*)#.*/\1/'
With your input:
$ sed -e 's/.*by \(.*\)#.*/\1/' <<< "Change 1234 on 2016/08/31 by name#company.com 'cdex abcd'"
name
Using grep -oP you can use match reset \K:
grep -oP ' by \K[^#]*' <<< "$change"
name
or using lookbehind:
grep -oP '(?<= by )[^#]*' <<< "$change"
name
There is no need to resort sed, awk, grep, etc. use regular expression matching:
[[ $change =~ by\ ([^#]*)# ]] && email=${BASH_REMATCH[1]}
From the man page
An additional binary operator, =~, is available, with the same
precedence as == and !=. When it is used, the string to the
right of the operator is considered an extended regular expres-
sion and matched accordingly (as in regex(3)). The return value
is 0 if the string matches the pattern, and 1 otherwise. If the
regular expression is syntactically incorrect, the conditional
expression's return value is 2. If the shell option nocasematch
is enabled, the match is performed without regard to the case of
alphabetic characters. Any part of the pattern may be quoted to
force the quoted portion to be matched as a string. Bracket
expressions in regular expressions must be treated carefully,
since normal quoting characters lose their meanings between
brackets. If the pattern is stored in a shell variable, quoting
the variable expansion forces the entire pattern to be matched
as a string. Substrings matched by parenthesized subexpressions
within the regular expression are saved in the array variable
BASH_REMATCH. The element of BASH_REMATCH with index 0 is the
portion of the string matching the entire regular expression.
The element of BASH_REMATCH with index n is the portion of the
string matching the nth parenthesized subexpression.
It might be surprising, that the pattern is written without surrounding quotes, which is why it is probably a good idea to use a variable for the pattern instead:
regex='by ([^#]*)#'
[[ $change =~ $regex ]] && email=${BASH_REMATCH[1]}
With sed:
sed -E 's/.* by ([^#]+).*/\1/' <<<"$change"
With awk:
awk -F# '{sub(".* ", "", $1); print $1}' <<<"$change"
Example:
$ sed -E 's/.* by ([^#]+).*/\1/' <<<"Change 1234 on 2016/08/31 by name#company.com 'cdex abcd'"
name
$ awk -F# '{sub(".* ", "", $1); print $1}' <<<"Change 1234 on 2016/08/31 by name#company.com 'cdex abcd'"
name
awk version, this will use awk's inbuilt split function to split 6th field using "#" as delimiter and store it in an array named a. Print it for printing first value of array a.
echo $change |awk '{ split($6,a,"#"); print a[1]}'
name
In case you need complete email address then :
echo $change |awk '{print $6}'
name#company.com
Solution with Parameter Expansion
First, a temporary variable that deletes string upto by and a space
$ change="Change 1234 on 2016/08/31 by name#company.com 'cdex abcd'"
$ tmp="${change#*by }"
$ echo "$tmp"
name#company.com 'cdex abcd'
Then, extract either the string before #
$ email="${tmp%#*}"
$ echo "$email"
name
Or, extract complete email address
$ email="${tmp%% *}"
$ echo "$email"
name#company.com
Edit:
To extract multiple strings separated by comma:
$ change="Change 1234 on 2016/08/31 by name#company.com 'cdex abcd'"
$ email=$(echo "$change" | perl -ne 'print join(",",/(\S+)#/g)')
$ echo "$email"
name
$ change="by name#company.com asd abcd#xyz.net 123 tom#xyz asdf"
$ email=$(echo "$change" | perl -ne 'print join(",",/(\S+)#/g)')
$ echo "$email"
name,abcd,tom

Parse value between two strings in bash

I am pretty newe to linux and even though I need something simple I dont know where to start. In a bash script I need to parse the value from a HTML page between the string "VOL. " and "," and pass it to a variable.
newvar=$(grep -oP 'VOL\.\K.*?(?=,)' file.txt)
echo "$newvar"
or from a string :
newvar=$(grep -oP 'VOL\.\K.*?(?=,)' <<< "$string")
echo "$newvar"
if you need something more portable :
newvar=$(perl -lne '/VOL\.\K.*?(?=,)/ && print $&' <<< "$string")
echo "$newvar"
Explanations of the Regex
VOL\. = literal VOL. : the . = any character in regex without backslash
\K = restart the match to zero, see https://stackoverflow.com/a/13543042/465183
.*? = any character, 0 to N occurrences but non-greedy with ? char
(?=,) = it's a positive look-ahead assertion to look up the , char
This can be done using bash's built-in regex matching:
if [[ "$var" =~ "VOL. "([^,]*)"," ]]; then
match="${BASH_REMATCH[1]}"
fi

Linux delete spaces after a character in a line

In Linux, if I have a file with entries like:
My Number is = 1234; #This is a random number
Can I use sed or anything else to replace all spaces after '#' with '+', so that the output looks like:
My Number is = 1234; #This+is+a+random+number
One way using awk:
awk -F# 'OFS=FS { gsub(" ", "+", $2) }1' file.txt
Result:
My Number is = 1234; #This+is+a+random+number
EDIT:
After reading comments below, if your file contains multiple #, you can try this:
awk -F# 'OFS=FS { for (i=2; i <= NF; i++) gsub(" ", "+", $i); print }' file.txt
You can do this in pure shell...
$ foo="My Number is = 1234; #This is a random number"
$ echo -n "${foo%%#*}#"; echo "${foo#*#}" | tr ' ' '+'
My Number is = 1234; #This+is+a+random+number
$
Capturing this data to variables for further use is left as an exercise for the reader. :-)
Note that this also withstands multiple # characters on the line:
$ foo="My Number is = 1234; #This is a # random number"
$ echo -n "${foo%%#*}#"; echo "${foo#*#}" | tr ' ' '+'
My Number is = 1234; #This+is+a+#+random+number
$
Or if you'd prefer to create a variable rather than pipe through tr:
$ echo -n "${foo%%#*}#"; bar="${foo#*#}"; echo "${bar// /+}"
My Number is = 1234; #This+is+a+#+random+number
And finally, if you don't mind subshells with pipes, you could do this:
$ bar=$(echo -n "$foo" | tr '#' '\n' | sed -ne '2,$s/ /+/g;p' | tr '\n' '#')
$ echo "$bar"
My Number is = 1234; #This+is+a+#+random+number
$
And for the fun of it, here's a short awk solution:
$ echo $foo | awk -vRS=# -vORS=# 'NR>1 {gsub(/ /,"+")} 1'
My Number is = 1234; #This+is+a+#+random+number
#$
Note the trailing ORS. I don't know if it's possible to avoid a final record separator. I suppose you could get rid of that by piping the line above through head -1, assuming you're only dealing with the one line of input data.
Not terrible efficient, but:
perl -pe '1 while (s/(.*#[^ ]*) /\1+/);'
This might work for you (GNU sed):
echo 'My Number is = 1234; #This is a random number' |
sed 's/#/\n&/;h;s/.*\n//;y/ /+/;H;g;s/\n.*\n//'
My Number is = 1234; #This+is+a+random+number
Here is yet another perl one-liner:
echo 'My Number is = 1234; #This is a random number' \
| perl -F\# -lane 'join "#", #F[1,-1]; s/ /+/g; print $F[1], "#", $_'
-F specifies how to split string into #F array.
-an wraps stdin with:
while (<>) {
#F = split('#');
# code from -e goes here
}

Resources