bash 4: Generic access to substring (n) of string by arbitrary delimiter?

bash 4: Generic access to substring (n) of string by arbitrary delimiter? - string

Let's assume I have the following string: x="number 1;number 2;number 3".
Access to the first substring is successfull via ${x%%";"*}, access to the last substring is via ${x##*";"}:
$ x="number 1;number 2;number 3"
$ echo "front : ${x%%";"*}" #front-most-part
number 1
$ echo "back : ${x##*";"}" #back-most-part
number 3
$
How do I access the middle part: (eg. number 2)?
Is there a better way to do this if I have (many...) more parts then just three?
In other words: Is there a generic way of accessing substring No. n of string yyy, delimited by string xxx where xxx is an arbitraty string/delimiter?
I have read How do I split a string on a delimiter in Bash?, but I specifically do not want to iterate over the string but rather directly access a given substring.
This specifically does not ask or a split into arrays, but into sub-strings.

With a fixed index:
x="number 1;number 2;number 3"
# Split input into fields by ';' and read the 2nd field into $f2
# Note the need for the *2nd* `unused`, otherwise f2 would
# receive the 2nd field *plus the remainder of the line*.
IFS=';' read -r unused f2 unused <<<"$x"
echo "$f2"
Generically, using an array:
x="number 1;number 2;number 3"
# Split input int fields by ';' and read all resulting fields
# into an *array* (-a).
IFS=';' read -r -a fields <<<"$x"
# Access the desired field.
ndx=1
echo "${fields[ndx]}"
Constraints:
Using IFS, the special variable specifying the Internal Field Separator characters, invariably means:
Only single, literal characters can act as field separators.
However, you can specify multiple characters, in which case any of them is treated as a separator.
The default separator characters are $' \t\n' - i.e., space, tab, and newline, and runs of them (multiple contiguious instances) are always considered a single separator; e.g., 'a b' has 2 fields - the multiple space count as a single separator.
By contrast, with any other character, characters in a run are considered separately, and thus separate empty fields; e.g., 'a;;b' has 3 fields - each ; is its own separator, so there's an empty field between ;;.
The read -r -a ... <<<... technique generally works well, as long as:
the input is single-line
you're not concerned about a trailing empty field getting discarded
If you need a fully generic, robust solution that addresses the issues above,
use the following variation, which is explained in #gniourf_gniourf answer here:
sep=';'
IFS="$sep" read -r -d '' -a fields < <(printf "%s${sep}\0" "$x")
Note the need to use -d '' to read multi-line input all at once, and the need to terminate the input with another separator instance to preserve a trailing empty field; the trailing \0 is needed to ensure that read's exit code is 0.

Don't use:
Create an array with a delimiter of ;:
x="number 1;number 2;number 3"
_IFS=$IFS; IFS=';'
arr=($x)
IFS=$_IFS
echo ${arr[0]} # number 1
echo ${arr[1]} # number 2
echo ${arr[2]} # number 3

Related

How do you interpret ${VAR#::*} in Bourne Shell

I am using Bourne Shell. Need to confirm if my understanding of following is correct?
$ echo $SHELL
/bin/bash
$ VAR="NJ:NY:PA" <-- declare an array with semicolon as separator?
$ echo ${VAR#*} <-- show entire array without separator?
NJ:NY:PA
$ echo ${VAR#*:*} <-- show array after first separator?
NY:PA
$ echo ${VAR#*:*:*} <-- show string after two separator
PA

${var#pattern} is a parameter expansion that expands to the value of $var with the shortest possible match for pattern removed from the front of the string.
Thus, ${VAR#*:} removes everything up and including to the first :; ${VAR#*:*:} removes everything up to and including the second :.
The trailing *s on the end of the expansions given in the question don't have any use, and should be avoided: There's no reason whatsoever to use ${var#*:*:*} instead of ${var#*:*:} -- since these match the smallest amount of text possible, and * is allowed to expand to 0 characters, the final * matches and removes nothing.
If what you really want is an array, you might consider using a real array instead.
# read contents of string VAR into an array of states
IFS=: read -r -a states <<<"$VAR"
echo "${states[0]}" # will echo NJ
echo "${states[1]}" # will echo NY
echo "${#states[#]}" # count states; will emit 3
...which also gives you the ability to write:
printf ' - %s\n' "${states[#]}" # put *all* state names into an argument list

How to use sed to replace a command followed by 0 or more spaces in bash

I can't figure out how to replace a comma followed by 0 or more spaces in a bash variable. here's what i have:
base="test00 test01 test02 test03"
options="test04,test05, test06"
for b in $(echo $options | sed "s/, \+/ /g")
do
base="${base} $b"
done
What i'm trying to do is append the "options" to the "base". Options is user input which can be empty or a csv list however that list can be
"test04, test05, test06" -> space after the comma
"test04,test05,test06" -> no spaces
"test04,test05, test06" -> mixture
what i need is my output "base" to be a space delimited list however no matter what i try my list keeps getting cut off after the first word.
My expected out is
"test00 test01 test02 test03 test04 test05 test06"

If your goal is to generate a command, this technique is wrong altogether: As described in BashFAQ #50, command arguments should be stored in an array, not a whitespace-delimited string.
base=( test00 test01 test02 test03 )
IFS=', ' read -r -a options_array <<<"$options"
# ...and, to execute the result:
"${base[#]}" "${options_array[#]}"
That said, even this isn't adequate to many legitimate use cases: Consider what happens if you want to pass an option that contains literal whitespace -- for instance, running ./your-base-command "base argument with spaces" "second base argument" "option with spaces" "option with spaces" "second option with spaces". For that, you need something like the following:
base=( ./your-base-command "base argument with spaces" "second base argument" )
options="option with spaces, second option with spaces"
# read options into an array, splitting on commas
IFS=, read -r -a options_array <<<"$options"
# trim leading and trailing spaces from array elements
options_array=( "${options_array[#]% }" )
options_array=( "${options_array[#]# }" )
# ...and, to execute the result:
"${base[#]}" "${options_array[#]}"

No need for sed, bash has built in pattern substitution parameter expansion. With bash 3.0 or later, extglob added support for more advanced regular expressions.
# Enables extended regular expressions for +(pattern)
shopt -s extglob
# Replaces all comma-space runs with just a single space
options="${options//,+( )/ }"
If you don't have bash 3.0+ available or don't like enabling extglob, simply strip all spaces which will work most of the time:
# Remove all spaces
options="${options// /}"
# Then replace commas with spaces
options="${options//,/ }"

Concatenating remaining arguments beyond the first N in bash

I did not have to write any bash script before. Here is what I need to do.
My script will be run with a set of string arguments. Number of stings will be more than 8. I will have to concatenate strings 9 and onward and make a single string from those. Like this...
myscript s1 s2 s3 s4 s5 s6 s7 s8 s9 s10....(total unknown)
in the script, I need to do this...
new string = s9 + s10 + ...
I am trying something like this...(from web search).
array="${#}"
tLen=${#array[#]}
# use for loop to read string beyond 9
for (( i=8; i<${tLen}; i++ ));
do
echo ${array[$i]} --> just to show string beyond 9
done
Not working. It prints out if i=0. Here is my input.
./tastest 1 2 3 4 5 6 7 8 A B C
I am expecting A B C to be printed. Finally I will have to make ABC.
Can anyone help?

It should be a lot simpler than the looping in the question:
shift 8
echo "$*"
Lose arguments 1-8; print all the other arguments as a single string with a single space separating arguments (and spaces within arguments preserved).
Or, if you need it in a variable, then:
nine_onwards="$*"
Or if you can't throw away the first 8 arguments in the main shell process:
nine_onwards="$(shift 8; echo "$*")"
You can check that there are at least 9 arguments, of course, complaining if there aren't. Or you can accept an empty string instead — with no error.
And if the arguments must be concatenated with no space (as in the amendment to the question), then you have to juggle with $IFS:
nine_onwards="$(shift 8; IFS=""; echo "$*")"
If I'm interpreting the comments from below this answer correctly, then you want to save the first 8 arguments in 8 separate simple (non-array) variables, and then arguments 9 onwards in another simple variable with no spaces between the argument values.
That's trivially doable:
var1="$1"
var2="$2"
var3="$3"
var4="$4"
var5="$5"
var6="$6"
var7="$7"
var8="$8"
var9="$(shift 8; IFS=""; echo "$*")"
The names don't have to be as closely related as those are. You could use:
teflon="$1"
absinthe="$2"
astronomy="$3"
lobster="$4"
darkest_peru="$5"
mp="$6"
culinary="$7"
dogma="$8"
concatenation="$(shift 8; IFS=""; echo "$*")"
You don't have to do them in that order, either; any sequence (permutation) will do nicely.
Note, too, that in the question, you have:
array="${#}"
Despite the name, that creates a simple variable containing the arguments. To create an array, you must use parentheses like this, where the spaces are optional:
array=( "$#" )

# Create a 0-index-based copy of the array of input arguments.
# (You could, however, work with the 1-based pseudo array $# directly.)
array=( "${#}" )
# Print a concatenation of all input arguments starting with the 9th
# (starting at 0-based index 8), which are passed *individually* to
# `printf`, due to use of `#` to reference the array [slice]
# `%s` as the `printf` format then joins the elements with no separator
# (and no trailing \n).
printf '%s' "${array[#]:8}"
# Alternative: Print the elements separated with a space:
# Note that using `*` instead of `#` causes the array [slice] to be expanded
# to a *single* string using the first char. in `$IFS` as the separator,
# which is a space by default; here you could add a trailing \n by using
# '%s\n' as the `printf` format string.
printf '%s' "${array[*]:8}"
Note that array="${#}" does not create an array - it simply creates a string scalar comprising the concatenation of the input array's elements (invariably) separated by a space each; to create an array, you must enclose it in (...).
To create a space-separated single string from the arguments starting with the 9th enclosed in double quotes, as you request in your follow-up question, use the following:
printf -v var10 '"%s"' "${array[*]:8}"
With the last sample call from your question $var10 will then contain literal "A B C", including the double quotes.
As for assigning arguments 1 through 8 to individual variables.:
Jonathan Leffler's helpful answer shows how to save the first 8 arguments in individual variables.
Here's an algorithmic alternative that creates individual variables based on a given name prefix and sequence number:
n=8 # how many arguments to assign to individual variables
# Create n 'var<i>' variables capturing the first n arguments.
i=0 # variable sequence number
for val in "${array[#]:0:n}"; do
declare "var$((++i))=$val" # create $var<i>, starting with index 1
done
# Print the variables created and their values, using variable indirection.
printf "\nvar<i> variables:\n"
for varName in "${!var#}"; do
printf '%s\n' "$varName=${!varName}"
done

You are close - something like this would work:
array=( ${*} )
# use for loop to read string beyond 9
for (( i=8; i<${#array[*]}; i++ ));
do
echo -n ${array[$i]}
done

How to pass quoted arguments but with blank spaces in linux

I have a file with these arguments and their values this way
# parameters.txt
VAR1 001
VAR2 aaa
VAR3 'Hello World'
and another file to configure like this
# example.conf
VAR1 = 020
VAR2 = kab
VAR3 = ''
when I want to get the values in a function I use this command
while read p; do
VALUE=$(echo $p | awk '{print $2}')
done < parameters.txt
the firsts arguments throw the right values, but the last one just gets the 'Hello for the blank space, my question is how do I get the entire 'Hello World' value?

If you can use bash, there is no need to use awk: read and shell parameter expansion can be combined to solve your problem:
while read -r name rest; do
# Drop the '= ' part, if present.
[[ $rest == '= '* ]] && value=${rest:2} || value=$rest
# $value now contains the line's value,
# but *including* any enclosing ' chars, if any.
# Assuming that there are no *embedded* ' chars., you can remove them
# as follows:
value=${value//\'/}
done < parameters.txt
read by default also breaks a line into fields by whitespace, like awk, but unlike awk it has the ability to assign the remainder of the line to a varaible, namely the last one, if fewer variables than fields found are specified;
read's -r option is generally worth specifying to avoid unexpected interpretation of \ chars. in the input.
As for your solution attempt:
awk doesn't know about quoting in input - by default it breaks input into fields by whitespace, irrespective of quotation marks.
Thus, a string such as 'Hello World' is simply broken into fields 'Hello and World'.
However, in your case you can split each input line into its key and value using a carefully crafted FS value (FS is the input field separator, which can be also be set via option -F; the command again assumes bash, this time for use of <(...), a so-called process substitution, and $'...', an ANSI C-quoted string):
while IFS= read -r value; do
# Work with $value...
done < <(awk -F$'^[[:alnum:]]+ (= )?\'?|\'' '{ print $2 }' parameters.txt)
Again the assumption is that values contain no embedded ' instances.
Field separator regex $'^[[:alnum:]]+ (= )?\'?|\'' splits each line so that $2, the 2nd field, contains the value, stripped of enclosing ' chars., if any.
xargs is the rare exception among the standard utilities in that it does understand single- and double-quoted strings (yet also without support for embedded quotes).
Thus, you could take advantage of xargs' ability to implicitly strip enclosing quotes when it passes arguments to the specified command, which defaults to echo (again assumes bash):
while read -r name rest; do
# Drop the '= ' part, if present.
[[ $rest == '= '* ]] && value=${rest:2} || value=$rest
# $value now contains the line's value, strippe of any enclosing
# single quotes by `xargs`.
done < <(xargs -L1 < parameters.txt)
xargs -L1 process one (1) line (-L) at a time and implicitly invokes echo with all tokens found on each line, with any enclosing quotes removed from the individual tokens.

The default field separator in awk is the space. So you are only printing the first word in the string passed to awk.
You can specify the field separator on the command line with -F[field separator]
Example, setting the field separator to a comma:
$ echo "Hello World" | awk -F, '{print $1}'
Hello World

How to split words in bash

Good evening, People
Currently I have an Array called inputArray which stores an input file 7 lines line by line. I have a word which is 70000($s0), how do I split the word so it is 70000 & ($s0) separate?
I looked at an answer which is on this website already but I couldn't understand it the answer I looked at was:
s='1000($s3)'
IFS='()' read a b <<< "$s"
echo -e "a=<$a>\nb=<$b>"
giving the output a=<1000> b=<$s3>

Let me give this a shot.
In certain circumstances, the shell will perform "word splitting", where a string of text is broken up into words. The word boundaries are defined by the IFS variable. The default value of IFS is: space, tab, newline. When a string is to be split into words, any sequence of this set of characters is removes to extract the words.
In your example, the set of characters that delimit words are ( and ). So the words in that string that are bounded by the IFS set of characters are 1000 and $s3
What is <<< "$s"? This is a here-string. It's used to send a string to some command's standard input. It's like doing
echo "$s" | read a b
except that form doesn't work as expected in bash. read a b <<< "$s" works well.
Now, what are the circumstances where word splitting occurs? One is when a variable is unquoted. A demo:
IFS='()'
echo "$s" | wc # 1 line, 1 word and 10 characters
echo $s | wc # 1 line, 2 words and 9 characters
The read command also splits a string into words, in order to assign words to the named variables. The variable a gets the first word, and b gets all the rest.
The command, broken down is:
IFS='()' read a b <<< "$s"
# ^^^^^^^ 1
# ^^^^^^^^ 2
# ^^^^^^^^ 3
only for the duration of the read command, assign the variable IFS the value ()
send the string "$s" to read's stdin
from stdin, use $IFS to split the input into words: assign the first word to variable a and the rest of the string to variable b. Trailing characters from $IFS at the end of the string are discarded.
Documentation:
Word splitting
Here strings
Simple command execution, describing why this assignment of IFS is only in effect for the duration of the read command.
read command
Hope that helps.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

bash 4: Generic access to substring (n) of string by arbitrary delimiter? - string

Don't use: Create an array with a delimiter of ;: x="number 1;number 2;number 3" _IFS=$IFS; IFS=';' arr=($x) IFS=$_IFS echo ${arr[0]} # number 1 echo ${arr[1]} # number 2 echo ${arr[2]} # number 3

Related

How do you interpret ${VAR#::*} in Bourne Shell

How to use sed to replace a command followed by 0 or more spaces in bash

Concatenating remaining arguments beyond the first N in bash

How to pass quoted arguments but with blank spaces in linux

How to split words in bash

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

bash 4: Generic access to substring (n) of string by arbitrary delimiter? - string

Don't use: Create an array with a delimiter of ;: x="number 1;number 2;number 3" _IFS=$IFS; IFS=';' arr=($x) IFS=$_IFS echo ${arr[0]} # number 1 echo ${arr[1]} # number 2 echo ${arr[2]} # number 3

Related

How do you interpret ${VAR#*:*:*} in Bourne Shell

How to use sed to replace a command followed by 0 or more spaces in bash

Concatenating remaining arguments beyond the first N in bash

How to pass quoted arguments but with blank spaces in linux

How to split words in bash

Categories

Resources

How do you interpret ${VAR#::*} in Bourne Shell