Extract substring within a given string - linux

I've read and attempted to extract a substring from a given string with awk, sed or grep but I am unable to get it working or think how to accomplish this.
I have the string below which describes configurations of my VMs:
config: diskSizeGb: 100 diskType: pd-standard imageType: COS_CONTAINERD machineType: e2-micro metadata: disable-legacy-endpoints: 'true' preemptible: true status: RUNNING version: 1.19.9
How can I extract a substring for example, "preemptible: true" or "status: RUNNING" knowing that the values can be different for each VM?
Thank you!

Assumptions:
the VM config name/value pairs may not be in the same order
config names and values are single strings with no embedded white space
each config name is preceded by (at least) one space, and followed immediately by a colon (:)
there may be multiple spaces between the colon (:) and the config value; we want to maintain these spaces in the output
One idea using sed and a capture groups:
# note: extra spaces placed between 'version:' and '1.19.9'
cfg_string="config: diskSizeGb: 100 diskType: pd-standard imageType: COS_CONTAINERD machineType: e2-micro metadata: disable-legacy-endpoints: 'true' preemptible: true status: RUNNING version: 1.19.9"
for config in preemptible status version
do
echo "++++++++++++++ ${config}"
sed -nE "s/.* (${config}:[ ]*[^ ]*).*/\1/p" <<< "${cfg_string}"
done
sed details:
-nE - disable default printing of the input (we'll use /p to explicitly print our capture group; enable Extended regex support
.* (${config}:[ ]*[^ ]*).* - match variable number of characters (.*) + a space ( ) + ${config} + a colon (:) + one or more spaces ([ ]*) + everything that follows that is not a space ([^ ]*) + the rest of the input (.*); the parens mark the start/end of the capture group (only one capture group in this case)
\1 - reference capture group #1 (ie, everything inside of the parens)
/p - print (the capture group)
This generates:
++++++++++++++ preemptible
preemptible: true
++++++++++++++ status
status: RUNNING
++++++++++++++ version
version: 1.19.9 # extra spaces maintained
NOTES:
obviously an invalid config name (eg, stat, versions) is going to produce no output
the sed results could be captured in a variable for further testing/processing (would address issue of an invalid config name)

Here a possibile solution:
#!/bin/bash
data="config: diskSizeGb: 100 diskType: pd-standard imageType: COS_CONTAINERD machineType: e2-micro metadata: disable-legacy-endpoints: 'true' preemptible: true status: RUNNING version: 1.19.9"
preemptible=$(echo ${data} | cut -d ' ' -f 14)
echo "preemptible = ${preemptible}"
status=$(echo ${data} | cut -d ' ' -f 16)
echo "status = ${status}"

Related

Retrieve substrings from individual components of an array in Bash

For a command such as
grubby --info=ALL
the output received is something of the sort -
index=3
kernel="/boot/vmlinuz-4.18.0-80.el8.x86_64"
args="ro crashkernel=auto resume=/dev/mapper/cl-swap rd.lvm.lv=cl/root rd.lvm.lv=cl/swap rhgb quiet"
root="/dev/mapper/cl-root"
initrd="/boot/initramfs-4.18.0-80.el8.x86_64.img"
title="CentOS Linux (4.18.0-80.el8.x86_64) 8 (Core)"
id="d7fe995b9d09403896e1e56a2b02a947-4.18.0-80.el8.x86_64"
index=4
kernel="/boot/vmlinuz-0-rescue-d7fe995b9d09403896e1e56a2b02a947"
args="ro crashkernel=auto resume=/dev/mapper/cl-swap rd.lvm.lv=cl/root rd.lvm.lv=cl/swap rhgb quiet"
root="/dev/mapper/cl-root"
initrd="/boot/initramfs-0-rescue-d7fe995b9d09403896e1e56a2b02a947.img"
title="CentOS Linux (0-rescue-d7fe995b9d09403896e1e56a2b02a947) 8 (Core)"
id="d7fe995b9d09403896e1e56a2b02a947-0-rescue"
I can pipe this output into an array, to try and refer to each block individually -
mapfile -t my_array < <(grubby --info=ALL )
However, this saves each element as a separate string in my_array, such as
printf '%s\n' "${my_array[0]}"
Output -
index=3
Perhaps I can access these elements for comparison based on regularity of offset (7 in this case for each subitem in subsequent blocks).
However, I'd like to retrieve the string value of these array components, doing a
printf '%s\n' "${my_array[1]}" gives me
kernel="/boot/vmlinuz-4.18.0-80.el8.x86_64" from which I'd like to get the value...
Also, if someone could suggest a better way, such as by accessing the value individually in each file, maybe something like -
cd /boot/loader/entries
for filename in $(find -type f -name '*.conf'); do
//Access this field
Not sure how to do it, though..
Subject to the comments about your output of grubby containing multiple index= lines where the name=value pairs have the same names under each index, the general way you handle parsing values from string variables in bash is with a parameter expansion (with substring removal). I say "general" way because the following parameter expansions are also provided in POSIX shell so your script will be portable to other shells. (bash provides an additional number of expansions that are bash-only)
A summary of the parameter expansions with substring removal are:
${var#pattern} Strip shortest match of pattern from front of $var
${var##pattern} Strip longest match of pattern from front of $var
${var%pattern} Strip shortest match of pattern from back of $var
${var%%pattern} Strip longest match of pattern from back of $var
(note: pattern can contain the normal globbing characters such as '*', and front above is the same as"from the left" and back is "from the right" which you will see used interchangeably)
For your output above, you can loop over the lines separating the name=value pairs into name and value. Since the names repeat under each index, you can't use an associative array array[name]="value" directly or you will only end up with the last values. (you can save the index=X and use array[X name]="value", but that gets messy when you want to retrieve things)
You have another caveat with the args name that contains '=' within the value portion. (which you would want to use the ${var#pattern} form to isolate name=value based on the first '=' from the front (left))
As an example, you could redirect the output of grubby directly as you have done using a process substitution (bash-only) or redirect it to a file and read line from the file with something similar to:
#!/bin/bash
while read -r line; do
name="${line%%=*}" ## strip from the right to last =
value="${line#*=}" ## strip from left through first =
## if double quoted -- remove double quotes
[ "${value:0:1}" = '"' ] && value="${value:1:$((${#value}-2))}"
printf "name: %-8s value: '%s'\n" "$name" "$value"
done < "$1"
Example Use/Output
Reading your grubby data from a file would result in:
$ bash readnameval.sh info
name: index value: '3'
name: kernel value: '/boot/vmlinuz-4.18.0-80.el8.x86_64'
name: args value: 'ro crashkernel=auto resume=/dev/mapper/cl-swap rd.lvm.lv=cl/root rd.lvm.lv=cl/swap rhgb quiet'
name: root value: '/dev/mapper/cl-root'
name: initrd value: '/boot/initramfs-4.18.0-80.el8.x86_64.img'
name: title value: 'CentOS Linux (4.18.0-80.el8.x86_64) 8 (Core)'
name: id value: 'd7fe995b9d09403896e1e56a2b02a947-4.18.0-80.el8.x86_64'
name: index value: '4'
name: kernel value: '/boot/vmlinuz-0-rescue-d7fe995b9d09403896e1e56a2b02a947'
name: args value: 'ro crashkernel=auto resume=/dev/mapper/cl-swap rd.lvm.lv=cl/root rd.lvm.lv=cl/swap rhgb quiet'
name: root value: '/dev/mapper/cl-root'
name: initrd value: '/boot/initramfs-0-rescue-d7fe995b9d09403896e1e56a2b02a947.img'
name: title value: 'CentOS Linux (0-rescue-d7fe995b9d09403896e1e56a2b02a947) 8 (Core)'
name: id value: 'd7fe995b9d09403896e1e56a2b02a947-0-rescue'
So that is one way to approach the separation. The other would be to use awk which allows the same approach to simulating 2D arrays using a ',' to separate multiple index values (see SUBSEP in man awk). However, if you can get what you need without storing all values -- then you eliminate the simulated 2D array issue altogether.
Look things over and let me know if you have further questions.
In bash, it is possible to create an associative array, say named grubby, and access its elements like ${grubby[index,key]}. For instance ${grubby[3,kernel]} should expand to /boot/vmlinuz-4.18.0-80.el8.x86_64.
Example script:
#!/bin/bash
declare -A grubby
while read -r line; do
if [[ $line = index=* ]]; then
index=${line#index=}
continue
fi
[[ $line = *=* ]] || continue
key=${line%%=*}
value=${line#*=}
value=${value#\"}
value=${value%\"}
grubby[$index,$key]=$value
done
# Examples
echo "3,kernel = ${grubby[3,kernel]}"
echo "4,root = ${grubby[4,root]}"
The output of the grubby command should be redirected to the script:
grubby --info=ALL | ./script

How to use sed to replace a command followed by 0 or more spaces in bash

I can't figure out how to replace a comma followed by 0 or more spaces in a bash variable. here's what i have:
base="test00 test01 test02 test03"
options="test04,test05, test06"
for b in $(echo $options | sed "s/, \+/ /g")
do
base="${base} $b"
done
What i'm trying to do is append the "options" to the "base". Options is user input which can be empty or a csv list however that list can be
"test04, test05, test06" -> space after the comma
"test04,test05,test06" -> no spaces
"test04,test05, test06" -> mixture
what i need is my output "base" to be a space delimited list however no matter what i try my list keeps getting cut off after the first word.
My expected out is
"test00 test01 test02 test03 test04 test05 test06"
If your goal is to generate a command, this technique is wrong altogether: As described in BashFAQ #50, command arguments should be stored in an array, not a whitespace-delimited string.
base=( test00 test01 test02 test03 )
IFS=', ' read -r -a options_array <<<"$options"
# ...and, to execute the result:
"${base[#]}" "${options_array[#]}"
That said, even this isn't adequate to many legitimate use cases: Consider what happens if you want to pass an option that contains literal whitespace -- for instance, running ./your-base-command "base argument with spaces" "second base argument" "option with spaces" "option with spaces" "second option with spaces". For that, you need something like the following:
base=( ./your-base-command "base argument with spaces" "second base argument" )
options="option with spaces, second option with spaces"
# read options into an array, splitting on commas
IFS=, read -r -a options_array <<<"$options"
# trim leading and trailing spaces from array elements
options_array=( "${options_array[#]% }" )
options_array=( "${options_array[#]# }" )
# ...and, to execute the result:
"${base[#]}" "${options_array[#]}"
No need for sed, bash has built in pattern substitution parameter expansion. With bash 3.0 or later, extglob added support for more advanced regular expressions.
# Enables extended regular expressions for +(pattern)
shopt -s extglob
# Replaces all comma-space runs with just a single space
options="${options//,+( )/ }"
If you don't have bash 3.0+ available or don't like enabling extglob, simply strip all spaces which will work most of the time:
# Remove all spaces
options="${options// /}"
# Then replace commas with spaces
options="${options//,/ }"

How to pass quoted arguments but with blank spaces in linux

I have a file with these arguments and their values ​​this way
# parameters.txt
VAR1 001
VAR2 aaa
VAR3 'Hello World'
and another file to configure like this
# example.conf
VAR1 = 020
VAR2 = kab
VAR3 = ''
when I want to get the values in a function I use this command
while read p; do
VALUE=$(echo $p | awk '{print $2}')
done < parameters.txt
the firsts arguments throw the right values, but the last one just gets the 'Hello for the blank space, my question is how do I get the entire 'Hello World' value?
If you can use bash, there is no need to use awk: read and shell parameter expansion can be combined to solve your problem:
while read -r name rest; do
# Drop the '= ' part, if present.
[[ $rest == '= '* ]] && value=${rest:2} || value=$rest
# $value now contains the line's value,
# but *including* any enclosing ' chars, if any.
# Assuming that there are no *embedded* ' chars., you can remove them
# as follows:
value=${value//\'/}
done < parameters.txt
read by default also breaks a line into fields by whitespace, like awk, but unlike awk it has the ability to assign the remainder of the line to a varaible, namely the last one, if fewer variables than fields found are specified;
read's -r option is generally worth specifying to avoid unexpected interpretation of \ chars. in the input.
As for your solution attempt:
awk doesn't know about quoting in input - by default it breaks input into fields by whitespace, irrespective of quotation marks.
Thus, a string such as 'Hello World' is simply broken into fields 'Hello and World'.
However, in your case you can split each input line into its key and value using a carefully crafted FS value (FS is the input field separator, which can be also be set via option -F; the command again assumes bash, this time for use of <(...), a so-called process substitution, and $'...', an ANSI C-quoted string):
while IFS= read -r value; do
# Work with $value...
done < <(awk -F$'^[[:alnum:]]+ (= )?\'?|\'' '{ print $2 }' parameters.txt)
Again the assumption is that values contain no embedded ' instances.
Field separator regex $'^[[:alnum:]]+ (= )?\'?|\'' splits each line so that $2, the 2nd field, contains the value, stripped of enclosing ' chars., if any.
xargs is the rare exception among the standard utilities in that it does understand single- and double-quoted strings (yet also without support for embedded quotes).
Thus, you could take advantage of xargs' ability to implicitly strip enclosing quotes when it passes arguments to the specified command, which defaults to echo (again assumes bash):
while read -r name rest; do
# Drop the '= ' part, if present.
[[ $rest == '= '* ]] && value=${rest:2} || value=$rest
# $value now contains the line's value, strippe of any enclosing
# single quotes by `xargs`.
done < <(xargs -L1 < parameters.txt)
xargs -L1 process one (1) line (-L) at a time and implicitly invokes echo with all tokens found on each line, with any enclosing quotes removed from the individual tokens.
The default field separator in awk is the space. So you are only printing the first word in the string passed to awk.
You can specify the field separator on the command line with -F[field separator]
Example, setting the field separator to a comma:
$ echo "Hello World" | awk -F, '{print $1}'
Hello World

Shell Extract Text Before Digits in a String

I've found several examples of extractions before a single character and examples of extracting numbers, but I haven't found anything about extracting characters before numbers.
My question:
Some of the strings I have look like this:
NUC320 Syllabus Template - 8wk
SLA School Template - UL
CJ101 Syllabus Template - 8wk
TECH201 Syllabus Template - 8wk
Test Clone ID17
In cases where the string doesn't contain the data I want, I need it to be skipped. The desired output would be:
NUC-320
CJ-101
TECH-201
SLA School Template - UL & Test Clone ID17 would be skipped.
I imagine the process being something to the effect of:
Extract text before " "
Condition - Check for digits in the string
Extract text before digits and assign it to a variable x
Extract digits and assign to a variable y
Concatenate $x"-"$y and assign to another variable z
More information:
The strings are extracted from a line in a couple thousand text docs using a loop. They will be used to append to a hyperlink and rename a file during the loop.
Edit:
#!/bin/sh
# my files are named 1.txt through 9999.txt i both
# increments the loop and sets the filename to be searched
i=1
while [ $i -lt 10000 ]
do
x=$(head -n 31 $i.txt | tail -1 | cut -c 7-)
if [ ! -z "$x" -a "$x" != " " ]; then
# I'd like to insert the hyperlink with the output on the
# same line (1.txt;cj101 Syllabus Template - 8wk;www.link.com/cj101)
echo "$i.txt;$x" >> syllabus.txt
# else
# rm $i.txt
fi
i=`expr $i + 1`
sleep .1
done
sed for printing lines starting with capital letters followed by digits. It also adds a - between them:
sed -n 's/^\([A-Z]\+\)\([0-9]\+\) .*/\1-\2/p' input
Gives:
NUC-320
CJ-101
TECH-201
A POSIX-compliant awk solution:
awk '{ if (match($1, /[0-9]+$/)) print substr($1, 1, RSTART-1) "-" substr($1, RSTART) }' \
file |
while IFS= read -r token; do
# Process token here (append to hyperlink, ...)
echo "[$token]"
done
awk is used to extract the reformatted tokens of interest, which are then processed in a shell while loop.
match($1, /[0-9]+$/) matches the 1st whitespace-separated field ($1) against extended regex [0-9]+$, i.e., matches only if the fields ends in one or more digits.
substr($1, 1, RSTART-1) "-" substr($1, RSTART) joins the part before the first digit with the run of digits using -, via the special RSTART variable, which indicates the 1-based character position where the most recent match() invocation matched.
awk '$1 ~/[0-9]/{sub(/...$/,"-&",$1);print $1}' file
NUC-320
CJ-101
TECH-201

Overwriting /boot/grub/menu.lst?

Before overwriting I have copied /boot/grub/menu.lst to /home/san. I am running sed on the /home/san/menu.lst just for testing.
How can i overwrite
default 0
to
default 1
with the help of sed. I used following commands but none worked. It's most probably because i don't how many spaces are there in between "default" and "0". I thought there were two spaces and one tab but I was wrong.
sed -e 's/default \t0/default \t1/' /home/san/menu.lst
sed -e 's/default\t0/default\t1/' /home/san/menu.lst
sed -e 's/default \t0/default \t1/' /home/san/menu.lst
sed -e 's/default 0/default 1/' /home/san/menu.lst
I actually want to write a script that may see if 'default 0' is written in menu.lst then replace it with 'default 1' and if 'default 1' is written then replace it with 'default 2'.
Thanks
Update:
How can used a conditional statement to see if the line starting with 'default' has "0" or "1" written after it? If "0" replace with "1" and if "1" replace it with "0"?
This works for me (with another file, of course ;)
sed -re 's/^default([ \t]+)0$/default\11/' /home/san/menu.lst
How it works:
Passing -r to sed allows us to use extended regular expressions, thus no need to escape the parentheses and plus sign.
[ \t]+ matches one or more tabs or spaces, in any order.
Putting parentheses around this, that is ([ \t]+), turns this into a group, to which we can refer.
This is the only group in this case, thus \1. That is what happens in the replacement string.
We don't want to replace default 0 as part of a larger string. Thus we use ^ and $, which match the start and end of the line, respectively.
Note that:
Using a group is not strictly necessary. You can also opt to replace all those tabs and spaces by a single tab or space. In that case just omit the parentheses and replace \1 with a space:
sed -re 's/^default[ \t]+0$/default 1/' /home/san/menu.lst
This will fail if there is trailing whitespace after default 0. If that is a concern, then you can match [ \t]* (zero or more tabs and spaces) just before the EOL:
sed -re 's/^default([ \t]+)0[ \t]*$/default\11/' /home/san/menu.lst
Firstly, you could use several lines like this:
sed -re 's/default([ \t]*)0/default\11/' /home/san/menu.lst.
However, you might be better off with one line like this:
awk '/^default/ { if ($2 == 1) print "default 0" ; else print "default 1" } !/^default/ {print}' /boot/grub/menu.lst
This substitutes 1 for 0, 0 for 1:
sed -r '/^default[ \t]+[01]$/{s/0/1/;t;s/1/0/}'
This substitutes 1 for 0, 2 for 1, 0 for 2:
sed -r '/^default[ \t]+[012]$/{s/0/1/;t;s/1/2/;t;s/2/0/}'
Briefly:
/^default[ \t]+[01]$/ uses addressing to select lines that consist only of "default 0" or "default 1" (or "default 2" in the second form) with any non-zero amount of spaces and/or tabs between "default" and the number
s/0/1/ - substitute a "1" for a "0"
t - branch to the end if a substitution was made, if not then continue with the next command
followed by more substitution(s) (and branches)
Edit:
Here's another way to do it:
sed -r '/^default[ \t]+[012]$/y/012/120/'

Resources