Find a word that only occurs once in a string - string

How do I go about finding the one word that is not repeated in a string in bash? I'd like to know if there is a "native" bash way of doing this, or if I need to use another command line utility (like awk,sed,grep,...).
For instance, var1="thrice once twice twice thrice";. I need something that will split out the word 'once' since it only occurs once (i.e., no duplicates).

You could use sort, uniq after splitting the string by whitespace:
tr ' ' '\n' <<< "$var1" | sort | uniq -u
This would produce once for your input.
(If the input contains punctuation, you might want to remove it before anything else in order to avoid unexpected results.)

#devnull's answer is the better choice (both for simplicity and probably performance), but if you're looking for a bash-only solution:
Caveats:
Uses associative arrays, which are only available in bash 4 or higher:
Using a literal * in the input word list won't work (other glob-like strings are OK, however).
Deals correctly with multi-line input and input with multiple whitespace chars. between words.
# Define the input word list.
# Bonus: multi-line input with multiple inter-word spaces.
var1=$'thrice once twice twice thrice\ntwice again'
# Declare associative array.
declare -A wordCounts
# Read all words and count the occurrence of each.
while read -r w; do
[[ -n $w ]] && (( wordCounts[$w]+=1 ))
done <<<"${var1// /$'\n'}" # split input list into lines for easy parsing
# Output result.
# Note that the output list will NOT automatically be sorted, because the keys of an
# associative array are not 'naturally sorted'; hence piping to `sort`.
echo "Words that only occur once in '$var1':"
echo "---"
for w in "${!wordCounts[#]}"; do
(( wordCounts[$w] == 1 )) && echo "$w"
done | sort
# Expected output:
# again
# once

Just for fun, awk:
awk '{
for (i=1; i<=NF; i++) c[$i]++
for (word in c) if (c[word]==1) print word
}' <<< "$var1"
once

Related

Bash - replace in file strings from first array with strings from second array

I have two arrays. The first one is filled with values greped from the file that I want to replace with the new ones downloaded.
Please note that I don't know exactly how the first array will look like, meaning some values will have _, other - and some won't have any of that and immediately after the name will be placed : (colon).
Example arrays:
array1:
[account:123 shoppingcart-1:123 notification-core_1:123 notification-dispatcher_core_1:123 notification-dispatcher-smschannel_core_1:123]
array2:
[account_custom_2:124 shoppingcart_custom_2:124 notification_custom_2:124 notification-dispatcher_custom_2:124 notification-dispatcher-smschannel_custom_2:124]
Those arrays are the only example, there are more than 50 values to replace.
I am doing a comparison of every item in the second array with every item in the first array as shown below:
file_name="<path_to_file>/file.txt"
for i in "${!array1[#]}"
do
for j in "${!array2[#]}"
do
array_2_base="`echo ${array2[$j]} | awk -F_ '{print $1}'`"
if [[ "${array1[$i]}" == *"$array_2_base"* ]]
then
sed -i "s=${array1[$i]}=${array2[$j]}=g" $file_name
fi
done
done
Here I substring only the first part for every item in the second array so I can compare it with the item in the first array.
e.g. account_custom_2:124 -> account or notification-dispatcher_custom_2:124 -> notification-dispatcher.
This works nice but I encounter problem when notification is in notification-core_1:123 and notification-dispatcher_core_1:123 and notification-dispatcher-smschannel_core_1:123.
Can you please give advice on how to fix this or if you can suggest another approach to this?
The point is the base of array2 element may include other element
as a substring and will cause an improper replacement depending on the order
of matching.
To avoid this, you can sort the array in descending order so the
longer pattern comes first.
Assuming the strings in the arrays do not contain tab characters, would
you please try:
file_name="<path_to_file>/file.txt"
array1=(account:123 shoppingcart-1:123 notification-core_1:123 notification-dispatcher_core_1:123 notification-dispatcher-smschannel_core_1:123)
array2=(account_custom_2:124 shoppingcart_custom_2:124 notification_custom_2:124 notification-dispatcher_custom_2:124 notification-dispatcher-smschannel_custom_2:124)
# insert the following block to sort array2 in descending order
array2=( $(for j in "${array2[#]}"; do
array_2_base=${j%%_*}
printf "%s\t%s\n" "$array_2_base" "$j"
done | sort -r | cut -f2-) )
# the following code will work "as is"
for i in "${!array1[#]}"
do
for j in "${!array2[#]}"
do
array_2_base="`echo ${array2[$j]} | awk -F_ '{print $1}'`"
if [[ "${array1[$i]}" == *"$array_2_base"* ]]
then
sed -i "s=${array1[$i]}=${array2[$j]}=g" "$file_name"
delete="${array1[$i]}"
array1=( "${array1[#]/$delete}" )
fi
done
done
The script above will be inefficent in execution time due to the repetitive
invocation of sed -i command.
The script below will run faster
by pre-generating the sed script and executing it just once.
file_name="<path_to_file>/file.txt"
array1=(
account:123
shoppingcart-1:123
notification-core_1:123
notification-dispatcher_core_1:123
notification-dispatcher-smschannel_core_1:123
)
array2=(
account_custom_2:124
shoppingcart_custom_2:124
notification_custom_2:124
notification-dispatcher_custom_2:124
notification-dispatcher-smschannel_custom_2:124
)
while IFS=$'\t' read -r base a2; do # read the sorted list line by line
for a1 in "${array1[#]}"; do
if [[ $a1 == *$base* ]]; then
scr+="s=$a1=$a2=g;" # generate sed script by appending the "s" command
continue 2
fi
done
done < <(for j in "${array2[#]}"; do
array_2_base=${j%%_*} # substring before the 1st "_"
printf "%s\t%s\n" "$array_2_base" "$j"
# print base and original element side by side
done | sort -r)
sed -i "$scr" "$file_name" # execute the replacement at once
If number of items in your arrays are equal then you can process them in one loop
for i in "${!array1[#]}"; {
value=${array1[$i]}
new_value=${array2[$i]}
sed -i "s/$value/$new_value/" file
}
I found a way to fix this.
I am deleting string from the first array once replaced.
file_name="<path_to_file>/file.txt"
for i in "${!array1[#]}"
do
for j in "${!array2[#]}"
do
array_2_base="`echo ${array2[$j]} | awk -F_ '{print $1}'`"
if [[ "${array1[$i]}" == *"$array_2_base"* ]]
then
sed -i "s=${array1[$i]}=${array2[$j]}=g" $file_name
delete="${array1[$i]}"
array1=( "${array1[#]/$delete}" )
fi
done
done

bash remove duplicate string from list [duplicate]

This question already has answers here:
Sorting and removing duplicate words in a line
(7 answers)
Closed 6 years ago.
I want to delete duplicate strings from a String. Example:
A="Dog Cat Horse Dog Dog Cat"
The string A should look like this:
A="Dog Cat Horse"
How can I write a Shell script for that?
You could use this,
echo "a a b b c c" | tr ' ' '\n' | sort | uniq | tr '\n' ' ' | sed -e 's/[[:space:]]*$//'
If order is not important, you can use an associative array:
declare -A uniq
for k in $A ; do uniq[$k]=1 ; done
echo ${!uniq[#]}
(Safely) split the string on blanks, creating an array with each word:†
read -r -d '' -a words < <(printf '%s\0' "$A")
Loop on the fields of the array, storing the words into an associative array; if the word was already seen, ignore it
declare -A Aseen
Aunique=()
for w in "${words[#]}"; do
[[ ${Aseen[$w]} ]] && continue
Aunique+=( "$w" )
Aseen[$w]=x
done
You can print the Aunique array to standard output:
printf '%s\n' "${Aunique[#]}"
which yields:
Dog
Cat
Horse
or create a new string with it
Anew="${Aunique[*]}"
printf '%s\n' "$Anew"
which yields:
Dog Cat Horse
or join the array with a separator, e.g., with the character ,:‡
IFS=, eval 'Asep="${Aunique[*]}"'
printf '%s\n' "${Asep[#]}"
which yields:
Dog,Cat,Horse
All these use Bash≥4 features. If you're stuck on older Bash versions, there are workarounds but it won't be as safe and nice and easy…
Note. This method will not sort the string: the words remain in the original order, only with the duplicates removed.
†This is the canonical (and safe!) way to split a string on space characters (or, more generally on the characters contained in the special variable IFS, which has default value space-tab-newline). Don't use horrors like words=( $A ): it's subject to filename expansion (globbing). Another method widely encountered is read -r -a words <<< "$A"; this is fine (i.e., safe), but will not handle newlines in A.
‡The use of eval here is 100% safe (because of the single quotes); it's actually the canonical way to join the elements of an array in Bash (or to join the positional parameters in POSIX shells).
With gawk:
awk -v RS="[ \n]" -v ORS=" " '!($0 in a){print;a[$0]}' <(echo $A)

How can I display unique words contained in a Bash string?

I have a string that has duplicate words. I would like to display only the unique words. The string is:
variable="alpha bravo charlie alpha delta echo charlie"
I know several tools that can do this together. This is what I figured out:
echo $variable | tr " " "\n" | sort -u | tr "\n" " "
What is a more effective way to do this?
Use a Bash Substitution Expansion
The following shell parameter expansion will substitute spaces with newlines, and then pass the results into the sort utility to return only the unique words.
$ echo -e "${variable// /\\n}" | sort -u
alpha
bravo
charlie
delta
echo
This has the side-effect of sorting your words, as the sort and uniq utilities both require input to be sorted in order to detect duplicates. If that's not what you want, I also posted a Ruby solution that preserves the original word order.
Rejoining Words
If, as one commenter pointed out, you're trying to reassemble your unique words back into a single line, you can use command substitution to do this. For example:
$ echo $(echo -e "${variable// /\\n}" | sort -u)
alpha bravo charlie delta echo
The lack of quotes around the command substitution are intentional. If you quote it, the newlines will be preserved because Bash won't do word-splitting. Unquoted, the shell will return the results as a single line, however unintuitive that may seem.
You may use xargs:
echo "$variable" | xargs -n 1 | sort -u | xargs
Note: This solution assumes that all unique words should be output in the order they're encountered in the input. By contrast, the OP's own solution attempt outputs a sorted list of unique words.
A simple Awk-only solution (POSIX-compliant) that is efficient by avoiding a pipeline (which invariably involves subshells).
awk -v RS=' ' '{ if (!seen[$1]++) { printf "%s%s",sep,$1; sep=" " } }' <<<"$variable"
# The above prints without a trailing \n, as in the OP's own solution.
# To add a trailing newline, append `END { print }` to the end
# of the Awk script.
Note how $variable is double-quoted to prevent it from accidental shell expansions, notably pathname expansion (globbing), and how it is provided to Awk via a here-string (<<<).
-v RS=' ' tells Awk to split the input into records by a single space.
Note that the last word will have the input line's trailing newline included, which is why we don't use $0 - the entire record - but $1, the record's first field, which has the newline stripped due to Awk's default field-splitting behavior.
seen[$1]++ is a common Awk idiom that either creates an entry for $1, the input word, in associative array seen, if it doesn't exist yet, or increments its occurrence count.
!seen[$0]++ therefore only returns true for the first occurrence of a given word (where seen[$0] is implicitly zero/the empty string; the ++ is a post-increment, and therefore doesn't take effect until after the condition is evaluated)
{printf "%s%s",sep,$1; sep=" "} prints the word at hand $1, preceded by separator sep, which is implicitly the empty string for the first word, but a single space for subsequent words, due to setting sep to " " immediately after.
Here's a more flexible variant that handles any run of whitespace between input words; it works with GNU Awk and Mawk[1]:
awk -v RS='[[:space:]]+' '{if (!seen[$0]++){printf "%s%s",sep,$0; sep=" "}}' <<<"$variable"
-v RS='[[:space:]]s+' tells Awk to split the input into records by any mix of spaces, tabs, and newlines.
[1] Unfortunately, BSD/OSX Awk (in strict compliance with the POSIX spec), doesn't support using regular expressions or even multi-character literals as RS, the input record separator.
Preserve Input Order with a Ruby One-Liner
I posted a Bash-specific answer already, but if you want to return only unique words while preserving the word order of the original string, then you can use the following Ruby one-liner:
$ echo "$variable" | ruby -ne 'puts $_.split.uniq'
alpha
bravo
charlie
delta
echo
This will split the input string on whitespace, and then return unique elements from the resulting array.
Unlike the sort or uniq utilities, Ruby doesn't need the words to be sorted to detect duplicates. This may be a better solution if you don't want your results to be sorted, although given your input sample it makes no practical difference for the posted example.
Rejoining Words
If, as one commenter pointed out, you're then trying to reassemble the words back into a single line after deduplication, you can do that too. For that, we just append the Array#join method:
$ echo "$variable" | ruby -ne 'puts $_.split.uniq.join(" ")'
alpha bravo charlie delta echo
You can use awk:
$ echo "$variable" | awk '{for(i=1;i<=NF;i++){if (!seen[$i]++) printf $i" "}}'
alpha bravo charlie delta echo
If you do not want the trailing space and want a trailing CR, you can do:
$ echo "$variable" | awk 'BEGIN{j=""} {for(i=1;i<=NF;i++){if (!seen[$i]++)j=j==""?j=$i:j=j" "$i}} END{print j}'
alpha bravo charlie delta echo
Using associative arrays in BASH 4+ you can simplify this:
variable="alpha bravo charlie alpha delta echo charlie"
# declare an associative array
declare -A unq
# read sentence into an indexed array
read -ra arr <<< "$variable"
# iterate each word and populate associative array with word as key
for w in "${arr[#]}"; do
unq["$w"]=1
done
# print unique results
printf "%s\n" "${!unq[#]}"
delta
bravo
echo
alpha
charlie
## if you want results in same order as original string
for w in "${arr[#]}"; do
[[ ${unq["$w"]} ]] && echo "$w" && unset unq["$w"]
done
alpha
bravo
charlie
delta
echo
pure, ugly bash:
for x in $vaviable; do
if [ "$(eval echo $(echo \$un__$x))" = "" ]; then
echo -n $x
eval un__$x=1
__usv="$__usv un__$x"
fi
done
unset $__usv

Shell script - check length when splitting string to array

I am using a bash script and I am trying to split a string with urls inside for example:
str=firsturl.com/123416 secondurl.com/634214
So these URLs are separated by spaces, I already used the IFS command to split the string and it is working great, I can iterate trough the two URLs with:
for url in $str; do
#some stuff
done
But my problem is that I need to get how many items this splitting has, so for the str example it should return 2, but using this:
${#str[#]}
return the length of the string (40 for the current example), I mean the number of characters, when I need to get 2.
Also iterating with a counter won't work, because I need the number of elements before iterating the array.
Any suggestions?
Split the string up into an array and use that instead:
str="firsturl.com/123416 secondurl.com/634214"
array=( $str )
echo "Number of elements: ${#array[#]}"
for item in "${array[#]}"
do
echo "$item"
done
You should never have a space separated list of strings though. If you're getting them line by line from some other command, you can use a while read loop:
while IFS='' read -r url
do
array+=( "$url" )
done
For properly encoded URLs, this probably won't make much of a difference, but in general, this will prevent glob expansion and some whitespace issues, and it's the canonical format that other commands (like wget -i) works with.
You should use something like this
declare -a a=( $str )
n=${#a[*]} # number of elements
Several ways:
$ str="firsturl.com/123416 secondurl.com/634214"
bash array:
$ while read -a ary; do echo ${#ary[#]}; done <<< "$str"
2
awk:
$ awk '{print NF}' <<< "$str"
2
*nix utlity:
$ printf "%s\n" $(printf "$str" | wc -w)
2
bash without array:
$ set -- $str
$ echo ${##}
2
If you create a function that will echo $* then that should provide the number of items to split.
count_params () { echo $#; }
Then passing $str to this function will give you the result
str="firsturl.com/123416 secondurl.com/634214"
count_params $str

Bash: Split string into character array

I have a string in a Bash shell script that I want to split into an array of characters, not based on a delimiter but just one character per array index. How can I do this? Ideally it would not use any external programs. Let me rephrase that. My goal is portability, so things like sed that are likely to be on any POSIX compatible system are fine.
Try
echo "abcdefg" | fold -w1
Edit: Added a more elegant solution suggested in comments.
echo "abcdefg" | grep -o .
You can access each letter individually already without an array conversion:
$ foo="bar"
$ echo ${foo:0:1}
b
$ echo ${foo:1:1}
a
$ echo ${foo:2:1}
r
If that's not enough, you could use something like this:
$ bar=($(echo $foo|sed 's/\(.\)/\1 /g'))
$ echo ${bar[1]}
a
If you can't even use sed or something like that, you can use the first technique above combined with a while loop using the original string's length (${#foo}) to build the array.
Warning: the code below does not work if the string contains whitespace. I think Vaughn Cato's answer has a better chance at surviving with special chars.
thing=($(i=0; while [ $i -lt ${#foo} ] ; do echo ${foo:$i:1} ; i=$((i+1)) ; done))
As an alternative to iterating over 0 .. ${#string}-1 with a for/while loop, there are two other ways I can think of to do this with only bash: using =~ and using printf. (There's a third possibility using eval and a {..} sequence expression, but this lacks clarity.)
With the correct environment and NLS enabled in bash these will work with non-ASCII as hoped, removing potential sources of failure with older system tools such as sed, if that's a concern. These will work from bash-3.0 (released 2005).
Using =~ and regular expressions, converting a string to an array in a single expression:
string="wonkabars"
[[ "$string" =~ ${string//?/(.)} ]] # splits into array
printf "%s\n" "${BASH_REMATCH[#]:1}" # loop free: reuse fmtstr
declare -a arr=( "${BASH_REMATCH[#]:1}" ) # copy array for later
The way this works is to perform an expansion of string which substitutes each single character for (.), then match this generated regular expression with grouping to capture each individual character into BASH_REMATCH[]. Index 0 is set to the entire string, since that special array is read-only you cannot remove it, note the :1 when the array is expanded to skip over index 0, if needed.
Some quick testing for non-trivial strings (>64 chars) shows this method is substantially faster than one using bash string and array operations.
The above will work with strings containing newlines, =~ supports POSIX ERE where . matches anything except NUL by default, i.e. the regex is compiled without REG_NEWLINE. (The behaviour of POSIX text processing utilities is allowed to be different by default in this respect, and usually is.)
Second option, using printf:
string="wonkabars"
ii=0
while printf "%s%n" "${string:ii++:1}" xx; do
((xx)) && printf "\n" || break
done
This loop increments index ii to print one character at a time, and breaks out when there are no characters left. This would be even simpler if the bash printf returned the number of character printed (as in C) rather than an error status, instead the number of characters printed is captured in xx using %n. (This works at least back as far as bash-2.05b.)
With bash-3.1 and printf -v var you have slightly more flexibility, and can avoid falling off the end of the string should you be doing something other than printing the characters, e.g. to create an array:
declare -a arr
ii=0
while printf -v cc "%s%n" "${string:(ii++):1}" xx; do
((xx)) && arr+=("$cc") || break
done
If your string is stored in variable x, this produces an array y with the individual characters:
i=0
while [ $i -lt ${#x} ]; do y[$i]=${x:$i:1}; i=$((i+1));done
The most simple, complete and elegant solution:
$ read -a ARRAY <<< $(echo "abcdefg" | sed 's/./& /g')
and test
$ echo ${ARRAY[0]}
a
$ echo ${ARRAY[1]}
b
Explanation: read -a reads the stdin as an array and assigns it to the variable ARRAY treating spaces as delimiter for each array item.
The evaluation of echoing the string to sed just add needed spaces between each character.
We are using Here String (<<<) to feed the stdin of the read command.
I have found that the following works the best:
array=( `echo string | grep -o . ` )
(note the backticks)
then if you do: echo ${array[#]} ,
you get: s t r i n g
or: echo ${array[2]} ,
you get: r
Pure Bash solution with no loop:
#!/usr/bin/env bash
str='The quick brown fox jumps over a lazy dog.'
# Need extglob for the replacement pattern
shopt -s extglob
# Split string characters into array (skip first record)
# Character 037 is the octal representation of ASCII Record Separator
# so it can capture all other characters in the string, including spaces.
IFS= mapfile -s1 -t -d $'\37' array <<<"${str//?()/$'\37'}"
# Strip out captured trailing newline of here-string in last record
array[-1]="${array[-1]%?}"
# Debug print array
declare -p array
string=hello123
for i in $(seq 0 ${#string})
do array[$i]=${string:$i:1}
done
echo "zero element of array is [${array[0]}]"
echo "entire array is [${array[#]}]"
The zero element of array is [h]. The entire array is [h e l l o 1 2 3 ].
Yet another on :), the stated question simply says 'Split string into character array' and don't say much about the state of the receiving array, and don't say much about special chars like and control chars.
My assumption is that if I want to split a string into an array of chars I want the receiving array containing just that string and no left over from previous runs, yet preserve any special chars.
For instance the proposed solution family like
for (( i=0 ; i < ${#x} ; i++ )); do y[i]=${x:i:1}; done
Have left overs in the target array.
$ y=(1 2 3 4 5 6 7 8)
$ x=abc
$ for (( i=0 ; i < ${#x} ; i++ )); do y[i]=${x:i:1}; done
$ printf '%s ' "${y[#]}"
a b c 4 5 6 7 8
Beside writing the long line each time we want to split a problem, so why not hide all this into a function we can keep is a package source file, with a API like
s2a "Long string" ArrayName
I got this one that seems to do the job.
$ s2a()
> { [ "$2" ] && typeset -n __=$2 && unset $2;
> [ "$1" ] && __+=("${1:0:1}") && s2a "${1:1}"
> }
$ a=(1 2 3 4 5 6 7 8 9 0) ; printf '%s ' "${a[#]}"
1 2 3 4 5 6 7 8 9 0
$ s2a "Split It" a ; printf '%s ' "${a[#]}"
S p l i t I t
If the text can contain spaces:
eval a=( $(echo "this is a test" | sed "s/\(.\)/'\1' /g") )
$ echo hello | awk NF=NF FS=
h e l l o
Or
$ echo hello | awk '$0=RT' RS=[[:alnum:]]
h
e
l
l
o
I know this is a "bash" question, but please let me show you the perfect solution in zsh, a shell very popular these days:
string='this is a string'
string_array=(${(s::)string}) #Parameter expansion. And that's it!
print ${(t)string_array} -> type array
print $#string_array -> 16 items
This is an old post/thread but with a new feature of bash v5.2+ using the shell option patsub_replacement and the =~ operator for regex. More or less same with #mr.spuratic post/answer.
str='There can be only one, the Highlander.'
regexp="${str//?/(&)}"
[[ "$str" =~ $regexp ]] &&
printf '%s\n' "${BASH_REMATCH[#]:1}"
Or by just: (which includes the whole string at index 0)
declare -p BASH_REMATCH
If that is not desired, one can remove the value of the first index (index 0), with
unset -v 'BASH_REMATCH[0]'
instead of using printf or echo to print the value of the array BASH_REMATCH
One can check/see the value of the variable "$regexp" with either
declare -p regexp
Output
declare -- regexp="(T)(h)(e)(r)(e)( )(c)(a)(n)( )(b)(e)( )(o)(n)(l)(y)( )(o)(n)(e)(,)( )(t)(h)(e)( )(H)(i)(g)(h)(l)(a)(n)(d)(e)(r)(.)"
or
echo "$regexp"
Using it in a script, one might want to test if the shopt is enabled or not, although the manual says it is on/enabled by default.
Something like.
if ! shopt -q patsub_replacement; then
shopt -s patsub_replacement
fi
But yeah, check the bash version too! If you're not sure which version of bash is in use.
if ! ((BASH_VERSINFO[0] >= 5 && BASH_VERSINFO[1] >= 2)); then
printf 'No dice! bash version 5.2+ is required!\n' >&2
exit 1
fi
Space can be excluded from regexp variable, change it from
regexp="${str//?/(&)}"
To
regexp="${str//[! ]/(&)}"
and the output is:
declare -- regexp="(T)(h)(e)(r)(e) (c)(a)(n) (b)(e) (o)(n)(l)(y) (o)(n)(e) (t)(h)(e) (H)(i)(g)(h)(l)(a)(n)(d)(e)(r)(.)"
Maybe not as efficient as the other post/answer but it is still a solution/option.
If you want to store this in an array, you can do this:
string=foo
unset chars
declare -a chars
while read -N 1
do
chars[${#chars[#]}]="$REPLY"
done <<<"$string"x
unset chars[$((${#chars[#]} - 1))]
unset chars[$((${#chars[#]} - 1))]
echo "Array: ${chars[#]}"
Array: f o o
echo "Array length: ${#chars[#]}"
Array length: 3
The final x is necessary to handle the fact that a newline is appended after $string if it doesn't contain one.
If you want to use NUL-separated characters, you can try this:
echo -n "$string" | while read -N 1
do
printf %s "$REPLY"
printf '\0'
done
AWK is quite convenient:
a='123'; echo $a | awk 'BEGIN{FS="";OFS=" "} {print $1,$2,$3}'
where FS and OFS is delimiter for read-in and print-out
For those who landed here searching how to do this in fish:
We can use the builtin string command (since v2.3.0) for string manipulation.
↪ string split '' abc
a
b
c
The output is a list, so array operations will work.
↪ for c in (string split '' abc)
echo char is $c
end
char is a
char is b
char is c
Here's a more complex example iterating over the string with an index.
↪ set --local chars (string split '' abc)
for i in (seq (count $chars))
echo $i: $chars[$i]
end
1: a
2: b
3: c
zsh solution: To put the scalar string variable into arr, which will be an array:
arr=(${(ps::)string})
If you also need support for strings with newlines, you can do:
str2arr(){ local string="$1"; mapfile -d $'\0' Chars < <(for i in $(seq 0 $((${#string}-1))); do printf '%s\u0000' "${string:$i:1}"; done); printf '%s' "(${Chars[*]#Q})" ;}
string=$(printf '%b' "apa\nbepa")
declare -a MyString=$(str2arr "$string")
declare -p MyString
# prints declare -a MyString=([0]="a" [1]="p" [2]="a" [3]=$'\n' [4]="b" [5]="e" [6]="p" [7]="a")
As a response to Alexandro de Oliveira, I think the following is more elegant or at least more intuitive:
while read -r -n1 c ; do arr+=("$c") ; done <<<"hejsan"
declare -r some_string='abcdefghijklmnopqrstuvwxyz'
declare -a some_array
declare -i idx
for ((idx = 0; idx < ${#some_string}; ++idx)); do
some_array+=("${some_string:idx:1}")
done
for idx in "${!some_array[#]}"; do
echo "$((idx)): ${some_array[idx]}"
done
Pure bash, no loop.
Another solution, similar to/adapted from Léa Gris' solution, but using read -a instead of readarray/mapfile :
#!/usr/bin/env bash
str='azerty'
# Need extglob for the replacement pattern
shopt -s extglob
# Split string characters into array
# ${str//?()/$'\x1F'} replace each character "c" with "^_c".
# ^_ (Control-_, 0x1f) is Unit Separator (US), you can choose another
# character.
IFS=$'\x1F' read -ra array <<< "${str//?()/$'\x1F'}"
# now, array[0] contains an empty string and the rest of array (starting
# from index 1) contains the original string characters :
declare -p array
# Or, if you prefer to keep the array "clean", you can delete
# the first element and pack the array :
unset array[0]
array=("${array[#]}")
declare -p array
However, I prefer the shorter (and easier to understand for me), where we remove the initial 0x1f before assigning the array :
#!/usr/bin/env bash
str='azerty'
shopt -s extglob
tmp="${str//?()/$'\x1F'}" # same as code above
tmp=${tmp#$'\x1F'} # remove initial 0x1f
IFS=$'\x1F' read -ra array <<< "$tmp" # assign array
declare -p array # verification

Resources