How to split a list by comma not space - linux

I want to split a text with comma , not space in for foo in list. Suppose I have a CSV file CSV_File with following text inside it:
Hello,World,Questions,Answers,bash shell,script
...
I used following code to split it into several words:
for word in $(cat CSV_File | sed -n 1'p' | tr ',' '\n')
do echo $word
done
It prints:
Hello
World
Questions
Answers
bash
shell
script
But I want it to split the text by commas not spaces:
Hello
World
Questions
Answers
bash shell
script
How can I achieve this in bash?

Set IFS to ,:
sorin#sorin:~$ IFS=',' ;for i in `echo "Hello,World,Questions,Answers,bash shell,script"`; do echo $i; done
Hello
World
Questions
Answers
bash shell
script
sorin#sorin:~$

Using a subshell substitution to parse the words undoes all the work you are doing to put spaces together.
Try instead:
cat CSV_file | sed -n 1'p' | tr ',' '\n' | while read word; do
echo $word
done
That also increases parallelism. Using a subshell as in your question forces the entire subshell process to finish before you can start iterating over the answers. Piping to a subshell (as in my answer) lets them work in parallel. This matters only if you have many lines in the file, of course.

I think the canonical method is:
while IFS=, read field1 field2 field3 field4 field5 field6; do
do stuff
done < CSV.file
If you don't know or don't care about how many fields there are:
IFS=,
while read line; do
# split into an array
field=( $line )
for word in "${field[#]}"; do echo "$word"; done
# or use the positional parameters
set -- $line
for word in "$#"; do echo "$word"; done
done < CSV.file

kent$ echo "Hello,World,Questions,Answers,bash shell,script"|awk -F, '{for (i=1;i<=NF;i++)print $i}'
Hello
World
Questions
Answers
bash shell
script

Create a bash function
split_on_commas() {
local IFS=,
local WORD_LIST=($1)
for word in "${WORD_LIST[#]}"; do
echo "$word"
done
}
split_on_commas "this,is a,list" | while read item; do
# Custom logic goes here
echo Item: ${item}
done
... this generates the following output:
Item: this
Item: is a
Item: list
(Note, this answer has been updated according to some feedback)

Read: http://linuxmanpages.com/man1/sh.1.php
& http://www.gnu.org/s/hello/manual/autoconf/Special-Shell-Variables.html
IFS The Internal Field Separator that is used for word splitting
after expansion and to split lines into words with the read
builtin command. The default value is ``''.
IFS is a shell environment variable so it will remain unchanged within the context of your Shell script but not otherwise, unless you EXPORT it. ALSO BE AWARE, that IFS will not likely be inherited from your Environment at all: see this gnu post for the reasons and more info on IFS.
You're code written like this:
IFS=","
for word in $(cat tmptest | sed -n 1'p' | tr ',' '\n'); do echo $word; done;
should work, I tested it on command line.
sh-3.2#IFS=","
sh-3.2#for word in $(cat tmptest | sed -n 1'p' | tr ',' '\n'); do echo $word; done;
World
Questions
Answers
bash shell
script

You can use:
cat f.csv | sed 's/,/ /g' | awk '{print $1 " / " $4}'
or
echo "Hello,World,Questions,Answers,bash shell,script" | sed 's/,/ /g' | awk '{print $1 " / " $4}'
This is the part that replace comma with space
sed 's/,/ /g'

For me, use array split is simpler ref
IN="bla#some.com;john#home.com"
arrIN=(${IN//;/ })
echo ${arrIN[1]}

Using readarray(mapfile):
$ cat csf
Hello,World,Questions,Answers,bash shell,script
$ readarray -td, arr < csf
$ printf '%s\n' "${arr[#]}"
Hello
World
Questions
Answers
bash shell
script

Related

How to read comma separated variables and append them in string buffer in unix

I'm looking for option to read the variable which has comma separated fields (Eg: a,b,c,d,e,f)and generate an another variable from that (eg:a,'a',b,'b',c,'c',d,'d',e,'e',f,'f'). I have tried with 'FOR' loop approach but its adding comma at the end.
Eg:
Var1=a,b,c,d,e,f
Expected output:
Var2=a,'a',b,'b',c,'c',d,'d',e,'e',f,'f'
for i in $(echo $Var1 | sed "s/,/ /g")
do
Var2="$i"",'""$i""',"
fi
done
I'm getting Var2=a,'a',b,'b',c,'c',d,'d',e,'e',f,'f', ending with comma
Is there any good approach to get it done without making more complex?
Thanks
DMP
Here is a way to do this in sed.
$ var1="a,b,c,d,e,f,"
$ var2=$(sed -e "s/[a-z]/&,\'&\'/g" -e 's/,$//g' <<<"$var1")
$ echo $var2
a,'a',b,'b',c,'c',d,'d',e,'e',f,'f'
The first -e in sed repeats the single characters and then the next -e removes the end comma ,.
The above will not work if var1 has multiple characters between each comma ,. For that use, -E or regex option for sed
$ var1='abc,192,hk3,def,HoZ,'
$ var2=$(sed -E -e "s/[a-zA-Z0-9]+/&,\'&\'/g" -e 's/,$//g' <<<"$var1")
$ echo "$var2"
abc,'abc',192,'192',hk3,'hk3',def,'def',HoZ,'HoZ'
You'll have to deal with an extra comma one way or another.
Here's what I'd offer as a solution. I'm also using an actual array to make sure we can process strings with spaces:
#!/usr/bin/env bash
# Input variable
VAR1=a,b,c,d,e,f
# Read VAR1 into an array
IFS=',' read -r -a VAR1_ARRAY <<< ${VAR1}
VAR2=''
for EL in ${VAR1_ARRAY[#]}; do
VAR2="${VAR2},${EL},'${EL}'"
done
# Remove a leading comma
VAR2=${VAR2:1:${#VAR2} - 1}
echo ${VAR2}
The output:
a,'a',b,'b',c,'c',d,'d',e,'e',f,'f'

Add suffix to comma-separated strings in bash ecosystem

Is there a way of transforming a comma-delimited variable to add a suffix to each token using standard gnu tools? e.g.
VARIABLE=`aaa,bbb,ccc`
suffix=`-foo`
Expected output = `aaa-foo,bbb-foo,ccc-foo`
Additionally, if I have only one token, the transformation should behave in the same way
e.g. aaa -> aaa-foo
echo "aaa,bbb,ccc" | sed -E 's/([^,]+)/\1-foo/g'
It makes groups of characters that are not "," and then append -foo on it
With variables:
suffix="-foo"; VARIABLE="aaa,bbb,ccc"; echo ${VARIABLE} | sed -E "s/([^,]+)/\1${suffix}/g"
echo $VARIBLE | tr "," "\n" | awk '{print $1"-foo"}' | paste -sd "," -
explanation:
put each token on single line
tr "," "\n"
append "-foo" to each token
awk '{print $1"-foo"}'
join back up with the original comma
paste -sd "," -
Try:
answer = `echo $VARIABLE | sed "s/,/-foo,/g" | sed "s/$/-foo/"`
If you need to have the suffix as a variable then try:
answer = `echo $VARIABLE | sed "s/,/${suffix},/g" | sed "s/$/${suffix}/"`
I don't have access to a Unix box at the moment to prove this works.
The following:
s="aaa,bbb,ccc"
IFS=,
a=( $s )
mapfile -t b < <(printf '%s-foo\n' "${a[#]}")
should give us:
$ declare -p b
declare -a b=([0]="aaa-foo" [1]="bbb-foo" [2]="ccc-foo")
From there, if you can reconstruct the original format in a number of ways...
IFS=, eval 'JOINED="${b[*]}"'
Or if you don't like using eval, perhaps:
d=""; o=""
for x in "${b[#]}"; do
printf -v o '%s%s%s' "$o" "$d" "$x"
d=,
done
... which will put the complete modified string in $o.
With bash Parameter Expansion
var='aaa,bbb,ccc';[ -n "$var" ] && printf "%s\n" "${var//,/-foo,}-foo"

How to extract first letters of dashed separated words in a bash variable?

I would like to extract the first letter of dashed separated words value of my bash variable, like this:
MY_TEXT=this-is-my-custom-text
I would like to create a second variable like this:
MY_INITIALS=timct
This isn't the shortest method, but it doesn't require any external processes.
IFS=- read -a words <<< $MY_TEXT
for word in "${words[#]}"; do MY_INITIALS+=${word:0:1}; done
You can use grep -oP:
MY_TEXT='this-is-my-custom-text'
MY_INITIAL=$(grep -oP '(?<=-|^)\w' <<< "$MY_TEXT" | tr -d '\n')
echo "$MY_INITIAL"
timct
Or using awk:
MY_INITIAL=$(awk -v RS='-' '{printf substr($0,1,1)}' <<< "$MY_TEXT")
echo "$MY_INITIAL"
timct
Try this:
MY_TEXT=this-is-my-custom-text
MY_INITIALS=$(sed 's_\([^-]\)[^-]\+\(-\?\)_\1\2_g' <<< "$MY_TEXT")
echo "$MY_INITIALS"
This should do it:
MY_TEXT=this-is-my-custom-text
MY_INITIALS="${MY_TEXT:0:1}$(grep -oP '(?<=[-])\w' <<< $MY_TEXT |tr -d '\n')"
echo $MY_INITIALS
Output:
timct

Shell script tokenizer

I'm writing a script that queries my JBoss server for some database related data. The thing that is returned after the query looks like this:
ConnectionCount=7
ConnectionCreatedCount=98
MaxConnectionsInUseCount=10
ConnectionDestroyedCount=91
AvailableConnectionCount=10
InUseConnectionCount=0
MaxSize=10
I would like to tokenize this data so the numbers on the right hand side are stored in a variable in the format 7,98,10,91,10,0,10. I tried to use IFS with the equals sign, but that still keeps the parameter names (only the equals signs are eliminated).
I put your input data into file d.txt. The one-liner below extracts the numbers, comma-delimits them and assigns all that to variable TAB (tested with Korn shell):
$ TAB=$(awk -F= '{print $2}' d.txt | xargs echo | sed 's/ /,/g')
$ echo $TAB
7,98,10,91,10,0,10
Or just use cut/tr:
F=($(cut -d'=' -f2 input | tr '\n' ' '))
You can do it with one sed command too:
sed -n 's/^.*=\(.*\)/\1,/;H;${g;s/\n//g;s/,$//;p;}' file
7,98,10,91,10,0,10
A simple cut without any pipes :
arr=( $(cut -d'=' -f2 file) )
Outut
printf '%s\n' "${arr[#]}"
7
98
10
91
10
0
10

linux shell title case

I am wrinting a shell script and have a variable like this: something-that-is-hyphenated.
I need to use it in various points in the script as:
something-that-is-hyphenated, somethingthatishyphenated, SomethingThatIsHyphenated
I have managed to change it to somethingthatishyphenated by stripping out - using sed "s/-//g".
I am sure there is a simpler way, and also, need to know how to get the camel cased version.
Edit: Working function derived from #Michał's answer
function hyphenToCamel {
tr '-' '\n' | awk '{printf "%s%s", toupper(substr($0,1,1)), substr($0,2)}'
}
CAMEL=$(echo something-that-is-hyphenated | hyphenToCamel)
echo $CAMEL
Edit: Finally, a sed one liner thanks to #glenn
echo a-hyphenated-string | sed -E "s/(^|-)([a-z])/\u\2/g"
a GNU sed one-liner
echo something-that-is-hyphenated |
sed -e 's/-\([a-z]\)/\u\1/g' -e 's/^[a-z]/\u&/'
\u in the replacement string is documented in the sed manual.
Pure bashism:
var0=something-that-is-hyphenated
var1=(${var0//-/ })
var2=${var1[*]^}
var3=${var2// /}
echo $var3
SomethingThatIsHyphenated
Line 1 is trivial.
Line 2 is the bashism for replaceAll or 's/-/ /g', wrapped in parens, to build an array.
Line 3 uses ${foo^}, which means uppercase (while ${foo,} would mean 'lowercase' [note, how ^ points up while , points down]) but to operate on every first letter of a word, we address the whole array with ${foo[*]} (or ${foo[#]}, if you would prefer that).
Line 4 is again a replace-all: blank with nothing.
Line 5 is trivial again.
You can define a function:
hypenToCamel() {
tr '-' '\n' | awk '{printf "%s%s", toupper(substr($0,0,1)), substr($0,2)}'
}
CAMEL=$(echo something-that-is-hyphenated | hypenToCamel)
echo $CAMEL
In the shell you are stuck with being messy:
aa="aaa-aaa-bbb-bbb"
echo " $aa" | sed -e 's/--*/ /g' -e 's/ a/A/g' -e 's/ b/B/g' ... -e 's/ *//g'
Note the carefully placed space in the echo and the double space in the last -e.
I leave it as an exercise to complete the code.
In perl it is a bit easier as a one-line shell command:
perl -e 'print map{ $a = ucfirst; $a =~ s/ +//g; $a} split( /-+/, $ARGV[0] ), "\n"' $aa
For the records, here's a pure Bash safe method (that is not subject to pathname expansion)—using Bash≥4:
var0=something-that-is-hyphenated
IFS=- read -r -d '' -a var1 < <(printf '%s\0' "${var0,,}")
printf '%s' "${var1[#]^}"
This (safely) splits the lowercase expansion of var0 at the hyphens, with each split part in array var1. Then we use the ^ parameter expansion to uppercase the first character of the fields of this array, and concatenate them.
If your variable may also contain spaces and you want to act on them too, change IFS=- into IFS='- '.

Resources