Bash: transform key-value lines to CSV format [closed] - linux

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Editor's note: I've clarified the problem definition, because I think the problem is an interesting one, and this question deserves to be reopened.
I've got a text file containing key-value lines in the following format - note that the # lines below are only there to show repeating blocks and are NOT part of the input:
Country:United Kingdom
Language:English
Capital city:London
#
Country:France
Language:French
Capital city:Paris
#
Country:Germany
Language:German
Capital city:Berlin
#
Country:Italy
Language:Italian
Capital city:Rome
#
Country:Russia
Language:Russian
Capital city:Moscow
Using shell commands and utilities, how can I transform such a file to CSV format, so it will look like this?
Country,Language,Capital city
United Kingdom,English,London
France,French,Paris
Germany,German,Berlin
Italy,Italian,Rome
Russia,Russian,Moscow
In other words:
Make the key names the column names of the CSV header row.
Make the values from each block a data row each.
[OP's original] Edit: My idea would be to separate the entries e.g. Country:France would become Country France, and then grep/sed the heading. However I have no idea how to move the headings from a single column to several separate ones.

A simple solution with cut, paste, and head (assumes input file file, outputs to file out.csv):
#!/usr/bin/env bash
{ cut -d':' -f1 file | head -n 3 | paste -d, - - -;
cut -d':' -f2- file | paste -d, - - -; } >out.csv
cut -d':' -f1 file | head -n 3 creates the header line:
cut -d':' -f1 file extracts the first :-based field from each input line, and head -n 3 stops after 3 lines, given that the headers repeat every 3 lines.
paste -d, - - - takes 3 input lines from stdin (one for each -) and combines them to a single, comma-separated output line (-d,)
cut -d':' -f2- file | paste -d, - - - creates the data lines:
cut -d':' -f2- file extracts everything after the : from each input line.
As above, paste then combines 3 values to a single, comma-separated output line.
agc points out in a comment that the column count (3) and the paste operands (- - -) are hard-coded above.
The following solution parameterizes the column count (set it via n=...):
{ n=3; pasteOperands=$(printf '%.s- ' $(seq $n))
cut -d':' -f1 file | head -n $n | paste -d, $pasteOperands;
cut -d':' -f2- file | paste -d, $pasteOperands; } >out.csv
printf '%.s- ' $(seq $n) is a trick that produces a list of as many space-separated - chars. as there are columns ($n).
While the previous solution is now parameterized, it still assumes that the column count is known in advance; the following solution dynamically determines the column count (requires Bash 4+ due to use of readarray, but could be made to work with Bash 3.x):
# Determine the unique list of column headers and
# read them into a Bash array.
readarray -t columnHeaders < <(awk -F: 'seen[$1]++ { exit } { print $1 }' file)
# Output the header line.
(IFS=','; echo "${columnHeaders[*]}") >out.csv
# Append the data lines.
cut -d':' -f2- file | paste -d, $(printf '%.s- ' $(seq ${#columnHeaders[#]})) >>out.csv
awk -F: 'seen[$1]++ { exit } { print $1 } outputs each input line's column name (the 1st :-separated field), remembers the column names in associative array seen, and stops at the first column name that is seen for the second time.
readarray -t columnHeaders reads awk's output line by line into array columnHeaders
(IFS=','; echo "${columnHeaders[*]}") >out.csv prints the array elements using a space as the separator (specified via $IFS); note the use of a subshell ((...)) so as to localize the effect of modifying $IFS, which would otherwise have global effects.
The cut ... pipeline uses the same approach as before, with the operands for paste being created based on the count of the elements of array columnHeaders (${#columnHeaders[#]}).
To wrap the above up in a function that outputs to stdout and also works with Bash 3.x:
toCsv() {
local file=$1 columnHeaders
# Determine the unique list of column headers and
# read them into a Bash array.
IFS=$'\n' read -d '' -ra columnHeaders < <(awk -F: 'seen[$1]++ { exit } { print $1 }' "$file")
# Output the header line.
(IFS=','; echo "${columnHeaders[*]}")
# Append the data lines.
cut -d':' -f2- "$file" | paste -d, $(printf '%.s- ' $(seq ${#columnHeaders[#]}))
}
# Sample invocation
toCsv file > out.csv

My bash script for this would be :
#!/bin/bash
count=0
echo "Country,Language,Capital city"
while read line
do
(( count++ ))
(( count -lt 3 )) && printf "%s," "${line##*:}"
(( count -eq 3 )) && printf "%s\n" "${line##*:}" && (( count = 0 ))
done<file
Output
Country,Language,Capital city
United Kingdom,English,London
France,French,Paris
Germany,German,Berlin
Italy,Italian,Rome
Russia,Russian,Moscow
Edit
Replaced [ stuff ] with (( stuff )) ie test with double parenthesis which is used for arithmetic expansion.

You can also write a slightly more generalized version of a bash script that can take the number of repeating rows holding the data and produce output on that basis to avoid hardcoding the header values and handle additional fields. (you could also just scan the field names for the first repeat and set the repeat rows in that manner as well).
#!/bin/bash
declare -i rc=0 ## record count
declare -i hc=0 ## header count
record=""
header=""
fn="${1:-/dev/stdin}" ## filename as 1st arg (default: stdin)
repeat="${2:-3}" ## number of repeating rows (default: 3)
while read -r line; do
record="$record,${line##*:}"
((hc == 0)) && header="$header,${line%%:*}"
if ((rc < (repeat - 1))); then
((rc++))
else
((hc == 0)) && { printf "%s\n" "${header:1}"; hc=1; }
printf "%s\n" "${record:1}"
record=""
rc=0
fi
done <"$fn"
There are any number of ways to approach the problem. You will have to experiment to find the most efficient for your data file size, etc. Whether you use a script, or a combination of shell tools, cut, paste, etc.. is to a large extent left to you.
Output
$ bash readcountry.sh country.txt
Country,Language,Capital city
United Kingdom,English,London
France,French,Paris
Germany,German,Berlin
Italy,Italian,Rome
Russia,Russian,Moscow
Output with 4 Fields
Example input file adding a Population field:
$ cat country2.txt
Country:United Kingdom
Language:English
Capital city:London
Population:20000000
<snip>
Output
$ bash readcountry.sh country2.txt 4
Country,Language,Capital city,Population
United Kingdom,English,London,20000000
France,French,Paris,10000000
Germany,German,Berlin,150000000
Italy,Italian,Rome,9830000
Russia,Russian,Moscow,622000000

Using datamash, tr, and join:
datamash -t ':' -s -g 1 collapse 2 < country.txt | tr ',' ':' |
datamash -t ':' transpose |
join -t ':' -a1 -o 1.2,1.3,1.1 - /dev/null | tr ':' ','
Output:
Country,Language,Capital city
United Kingdom,English,London
France,French,Paris
Germany,German,Berlin
Italy,Italian,Rome
Russia,Russian,Moscow

Related

Replace part of a file name in Linux

I have a file name in Linux that is something like:
out-06307963554982-8091-20220726-121922-1658834362.208826.wav
I need to replace the first number in bash with X's and struggling to find the right solution. The number can vary in size but the 'out' will always remain the same.
Final file name should look like
out-XXXXXXXXXXXXXX-8091-20220726-121922-1658834362.208826.wav
OR alternatively we can cut everything in front of the second -
20220726-121922-1658834362.208826.wav
Try this:
#!/bin/bash
filename="out-06307963554982-8091-20220726-121922-1658834362.208826.wav"
echo "ORIGINAL=$filename"
IFS='-' read -r -a fields <<< "$filename"
new_second_field=$( echo "${fields[1]}" | tr '[:digit:]' 'X' )
fields[1]="$new_second_field"
new_filename=$(echo "${fields[#]}" | tr ' ' '-')
echo "NEW======$new_filename"
####################################################
# Cut everything in front of the second '-'
echo "$filename" | cut -d'-' -f4-
IFS='-' read -r -a fields <<< "$filename": splits the filename into an array where each item is text, split on -.
then tr is used to replace each digit in the second field by an X. This ensure that the number of X == number of digits.
the second field is replaced by the new value consisting only of Xs.
the new filename is "rebuilt" by doing an echo of the fields array, replacing all spaces by -.
For your alternative method, simply cut everything on - and display the results from the 4th item on (-f4-).
The output of this script is:
ORIGINAL=out-06307963554982-8091-20220726-121922-1658834362.208826.wav
NEW======out-XXXXXXXXXXXXXX-8091-20220726-121922-1658834362.208826.wav
20220726-121922-1658834362.208826.wav

Storing command values into a variable Bash Script

I am trying to loop through files in a directory to find an animal and its value. The command is supposed to only display the animal and total value. For example:
File1 has:
Monkey 11
Bear 4
File2 has:
Monkey 12
If I wanted the total value of monkeys then I would do:
for f in *; do
total=$(grep $animal $f | cut -d " " -f 2- | paste -sd+ | bc)
done
echo $animal $total
This would return the correct value of:
Monkey 23
However, if there is only one instance of an animal like for example Bear, the variable total doesn't return any value, I only get echoed:
Bear
Why is this the case and how do I fix it?
Note: I'm not allowed to use the find command.
you could use this little awk instead of for grep cut paste bc:
awk -v animal="Bear" '
$1 == animal { count += $2 }
END { print count + 0 }
' *
Comments on OP's question about why code behaves as it does:
total is reset on each pass through the loop so ...
upon leaving the loop total will have the count from the 'last' file processed
in the case of Bear the 'last' file processed is File2 and since File2 does not contain any entries for Bear we get total='', which is what's printed by the echo
if the Bear entry is moved from File1 to File2 then OP's code should print Bear 4
OP's current code effectively ignores all input files and prints whatever's in the 'last' file (File2 in this case)
OP's current code generates the following:
# Monkey
Monkey 12 # from File2
# Bear
Bear # no match in File2
I'd probably opt for replacing the whole grep/cut/paste/bc (4x subprocesses) with a single awk (1x subprocess) call (and assuming no matches we report 0):
for animal in Monkey Bear Hippo
do
total=$(awk -v a="${animal}" '$1==a {sum+=$2} END {print sum+0}' *)
echo "${animal} ${total}"
done
This generates:
Monkey 23
Bear 4
Hippo 0
NOTES:
I'm assuming OP's real code does more than echo the count to stdout hence the need of the total variable otherwise we could eliminate the total variable and have awk print the animal/sum pair directly to stdout
if OP's real code has a parent loop processing a list of animals it's likely possible a single awk call could process all of the animals at once; objective being to have awk generate the entire set of animal/sum pairs that could then be fed to the looping construct; if this is the case, and OP has some issues implementing a single awk solution, a new question should be asked
Why is this the case
grep outputs nothing, so nothing is propagated through the pipe and empty string is assigned to total.
Because total is reset every loop (total=anything without referencing previous value), it just has the value for the last file.
how do I fix it?
Do not try to do all at once, just less thing at once.
total=0
for f in *; do
count=$(grep "$animal" "$f" | cut -d " " -f 2-)
total=$((total + count)) # reuse total, reference previous value
done
echo "$animal" "$total"
A programmer fluent in shell will most probably jump to AWK for such problems. Remember to check your scripts with shellcheck.
With what you were trying to do, you could do all files at once:
total=$(
{
echo 0 # to have at least nice 0 if animal is not found
grep "$animal" * |
cut -d " " -f 2-
} |
paste -sd+ |
bc
)
With just bash:
declare -A animals=()
for f in *; do
while read -r animal value; do
(( animals[$animal] = ${animals[$animal]:-0} + value ))
done < "$f"
done
declare -p animals
outputs
declare -A animals=([Monkey]="23" [Bear]="4" )
With this approach, you have all the totals for all the animals by processing each file exactly once
$ head File*
==> File1 <==
Monkey 11
Bear 4
==> File2 <==
Monkey 12
==> File3 <==
Bear
Monkey
Using awk and bash array
#!/bin/bash
sumAnimals(){
awk '
{ NF == 1 ? a[$1]++ : a[$1]=a[$1]+$2 }
END{
for (i in a ) printf "[%s]=%d\n",i, a[i]
}
' File*
}
# storing all animals in bash array
declare -A animalsArr="( $(sumAnimals) )"
# show array content
declare -p animalsArr
# getting total from array
echo "Monkey: ${animalsArr[Monkey]}"
echo "Bear: ${animalsArr[Monkey]}"
Output
declare -A animalsArr=([Bear]="5" [Monkey]="24" )
Monkey: 24
Bear: 5

String split and extract the last field in bash

I have a text file FILENAME. I want to split the string at - of the first column field and extract the last element from each line. Here "$(echo $line | cut -d, -f1 | cut -d- -f4)"; alone is not giving me the right result.
FILENAME:
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram
code I tried:
while read line; do \
DNA="$(echo $line | cut -d, -f1 | cut -d- -f4)";
echo $DNA
done < ${FILENAME}
Result I want
1195060301
1195060302
1195060311
Would you please try the following:
while IFS=, read -r f1 _; do # set field separator to ",", assigns f1 to the 1st field and _ to the rest
dna=${f1##*-} # removes everything before the rightmost "-" from "$f1"
echo "$dna"
done < "$FILENAME"
Well, I had to do with the two lines of codes. May be someone has a better approach.
while read line; do \
DNA="$(echo $line| cut -d, -f1| rev)"
DNA="$(echo $DNA| cut -d- -f1 | rev)"
echo $DNA
done < ${FILENAME}
I do not know the constraints on your input file, but if what you are looking for is a 10-digit number, and there is only ever one 10-digit number per line... This should do niceley
grep -Eo '[0-9]{10,}' input.txt
1195060301
1195060302
1195060311
This essentially says: Show me all 10 digit numbers in this file
input.txt
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram
A sed approach:
sed -nE 's/.*-([[:digit:]]+)\,.*/\1/p' input_file
sed options:
-n: Do not print the whole file back, but only explicit /p.
-E: Use Extend Regex without need to escape its grammar.
sed Extended REgex:
's/.*-([[:digit:]]+)\,.*/\1/p': Search, capture one or more digit in group 1, preceded by anything and a dash, followed by a comma and anything, and print only the captured group.
Using awk:
awk -F[,] '{ split($1,arr,"-");print arr[length(arr)] }' FILENAME
Using , as a separator, take the first delimited "piece" of data and further split it into an arr using - as the delimiter and awk's split function. We then print the last index of arr.

Split string at special character in bash

I'm reading filenames from a textfile line by line in a bash script. However the the lines look like this:
/path/to/myfile1.txt 1
/path/to/myfile2.txt 2
/path/to/myfile3.txt 3
...
/path/to/myfile20.txt 20
So there is a second column containing an integer number speparated by space. I only need the part of the string before the space.
I found only solutions using a "for-loop". But I need a function that explicitly looks for the " "-character (space) in my string and splits it at that point.
In principle I need the equivalent to Matlabs "strsplit(str,delimiter)"
If you are already reading the file with something like
while read -r line; do
(and you should be), then pass two arguments to read instead:
while read -r filename somenumber; do
read will split the line on whitespace and assign the first field to filename and any remaining field(s) to somenumber.
Three (of many) solutions:
# Using awk
echo "$string" | awk '{ print $1 }'
# Using cut
echo "$string" | cut -d' ' -f1
# Using sed
echo "$string" | sed 's/\s.*$//g'
If you need to iterate trough each line of the file anyways, you can cut off everything behind the space with bash:
while read -r line ; do
# bash string manipulation removes the space at the end
# and everything which follows it
echo ${line// *}
done < file
This should work too:
line="${line% *}"
This cuts the string at it's last occurrence (from left) of a space. So it will work even if the path contains spaces (as long as it follows by a space at end).
while read -r line
do
{ rev | cut -d' ' -f2- | rev >> result.txt; } <<< $line
done < input.txt
This solution will work even if you have spaces in your filenames.

How can I get unique values from an array in Bash?

I've got almost the same question as here.
I have an array which contains aa ab aa ac aa ad, etc.
Now I want to select all unique elements from this array.
Thought, this would be simple with sort | uniq or with sort -u as they mentioned in that other question, but nothing changed in the array...
The code is:
echo `echo "${ids[#]}" | sort | uniq`
What am I doing wrong?
A bit hacky, but this should do it:
echo "${ids[#]}" | tr ' ' '\n' | sort -u | tr '\n' ' '
To save the sorted unique results back into an array, do Array assignment:
sorted_unique_ids=($(echo "${ids[#]}" | tr ' ' '\n' | sort -u | tr '\n' ' '))
If your shell supports herestrings (bash should), you can spare an echo process by altering it to:
tr ' ' '\n' <<< "${ids[#]}" | sort -u | tr '\n' ' '
A note as of Aug 28 2021:
According to ShellCheck wiki 2207 a read -a pipe should be used to avoid splitting.
Thus, in bash the command would be:
IFS=" " read -r -a ids <<< "$(echo "${ids[#]}" | tr ' ' '\n' | sort -u | tr '\n' ' ')"
or
IFS=" " read -r -a ids <<< "$(tr ' ' '\n' <<< "${ids[#]}" | sort -u | tr '\n' ' ')"
Input:
ids=(aa ab aa ac aa ad)
Output:
aa ab ac ad
Explanation:
"${ids[#]}" - Syntax for working with shell arrays, whether used as part of echo or a herestring. The # part means "all elements in the array"
tr ' ' '\n' - Convert all spaces to newlines. Because your array is seen by shell as elements on a single line, separated by spaces; and because sort expects input to be on separate lines.
sort -u - sort and retain only unique elements
tr '\n' ' ' - convert the newlines we added in earlier back to spaces.
$(...) - Command Substitution
Aside: tr ' ' '\n' <<< "${ids[#]}" is a more efficient way of doing: echo "${ids[#]}" | tr ' ' '\n'
If you're running Bash version 4 or above (which should be the case in any modern version of Linux), you can get unique array values in bash by creating a new associative array that contains each of the values of the original array. Something like this:
$ a=(aa ac aa ad "ac ad")
$ declare -A b
$ for i in "${a[#]}"; do b["$i"]=1; done
$ printf '%s\n' "${!b[#]}"
ac ad
ac
aa
ad
This works because in any array (associative or traditional, in any language), each key can only appear once. When the for loop arrives at the second value of aa in a[2], it overwrites b[aa] which was set originally for a[0].
Doing things in native bash can be faster than using pipes and external tools like sort and uniq, though for larger datasets you'll likely see better performance if you use a more powerful language like awk, python, etc.
If you're feeling confident, you can avoid the for loop by using printf's ability to recycle its format for multiple arguments, though this seems to require eval. (Stop reading now if you're fine with that.)
$ eval b=( $(printf ' ["%s"]=1' "${a[#]}") )
$ declare -p b
declare -A b=(["ac ad"]="1" [ac]="1" [aa]="1" [ad]="1" )
The reason this solution requires eval is that array values are determined before word splitting. That means that the output of the command substitution is considered a single word rather than a set of key=value pairs.
While this uses a subshell, it uses only bash builtins to process the array values. Be sure to evaluate your use of eval with a critical eye. If you're not 100% confident that chepner or glenn jackman or greycat would find no fault with your code, use the for loop instead.
I realize this was already answered, but it showed up pretty high in search results, and it might help someone.
printf "%s\n" "${IDS[#]}" | sort -u
Example:
~> IDS=( "aa" "ab" "aa" "ac" "aa" "ad" )
~> echo "${IDS[#]}"
aa ab aa ac aa ad
~>
~> printf "%s\n" "${IDS[#]}" | sort -u
aa
ab
ac
ad
~> UNIQ_IDS=($(printf "%s\n" "${IDS[#]}" | sort -u))
~> echo "${UNIQ_IDS[#]}"
aa ab ac ad
~>
If your array elements have white space or any other shell special character (and can you be sure they don't?) then to capture those first of all (and you should just always do this) express your array in double quotes! e.g. "${a[#]}". Bash will literally interpret this as "each array element in a separate argument". Within bash this simply always works, always.
Then, to get a sorted (and unique) array, we have to convert it to a format sort understands and be able to convert it back into bash array elements. This is the best I've come up with:
eval a=($(printf "%q\n" "${a[#]}" | sort -u))
Unfortunately, this fails in the special case of the empty array, turning the empty array into an array of 1 empty element (because printf had 0 arguments but still prints as though it had one empty argument - see explanation). So you have to catch that in an if or something.
Explanation:
The %q format for printf "shell escapes" the printed argument, in just such a way as bash can recover in something like eval!
Because each element is printed shell escaped on it's own line, the only separator between elements is the newline, and the array assignment takes each line as an element, parsing the escaped values into literal text.
e.g.
> a=("foo bar" baz)
> printf "%q\n" "${a[#]}"
'foo bar'
baz
> printf "%q\n"
''
The eval is necessary to strip the escaping off each value going back into the array.
'sort' can be used to order the output of a for-loop:
for i in ${ids[#]}; do echo $i; done | sort
and eliminate duplicates with "-u":
for i in ${ids[#]}; do echo $i; done | sort -u
Finally you can just overwrite your array with the unique elements:
ids=( `for i in ${ids[#]}; do echo $i; done | sort -u` )
this one will also preserve order:
echo ${ARRAY[#]} | tr [:space:] '\n' | awk '!a[$0]++'
and to modify the original array with the unique values:
ARRAY=($(echo ${ARRAY[#]} | tr [:space:] '\n' | awk '!a[$0]++'))
To create a new array consisting of unique values, ensure your array is not empty then do one of the following:
Remove duplicate entries (with sorting)
readarray -t NewArray < <(printf '%s\n' "${OriginalArray[#]}" | sort -u)
Remove duplicate entries (without sorting)
readarray -t NewArray < <(printf '%s\n' "${OriginalArray[#]}" | awk '!x[$0]++')
Warning: Do not try to do something like NewArray=( $(printf '%s\n' "${OriginalArray[#]}" | sort -u) ). It will break on spaces.
Without loosing the original ordering:
uniques=($(tr ' ' '\n' <<<"${original[#]}" | awk '!u[$0]++' | tr '\n' ' '))
If you want a solution that only uses bash internals, you can set the values as keys in an associative array, and then extract the keys:
declare -A uniqs
list=(foo bar bar "bar none")
for f in "${list[#]}"; do
uniqs["${f}"]=""
done
for thing in "${!uniqs[#]}"; do
echo "${thing}"
done
This will output
bar
foo
bar none
cat number.txt
1 2 3 4 4 3 2 5 6
print line into column: cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}'
1
2
3
4
4
3
2
5
6
find the duplicate records: cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}' |awk 'x[$0]++'
4
3
2
Replace duplicate records: cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}' |awk '!x[$0]++'
1
2
3
4
5
6
Find only Uniq records: cat number.txt | awk '{for(i=1;i<=NF;i++) print $i|"sort|uniq -u"}
1
5
6
How about this variation?
printf '%s\n' "${ids[#]}" | sort -u
Another option for dealing with embedded whitespace, is to null-delimit with printf, make distinct with sort, then use a loop to pack it back into an array:
input=(a b c "$(printf "d\ne")" b c "$(printf "d\ne")")
output=()
while read -rd $'' element
do
output+=("$element")
done < <(printf "%s\0" "${input[#]}" | sort -uz)
At the end of this, input and output contain the desired values (provided order isn't important):
$ printf "%q\n" "${input[#]}"
a
b
c
$'d\ne'
b
c
$'d\ne'
$ printf "%q\n" "${output[#]}"
a
b
c
$'d\ne'
All the following work in bash and sh and are without error in shellcheck but you need to suppress SC2207
arrOrig=("192.168.3.4" "192.168.3.4" "192.168.3.3")
# NO SORTING
# shellcheck disable=SC2207
arr1=($(tr ' ' '\n' <<<"${arrOrig[#]}" | awk '!u[$0]++' | tr '\n' ' ')) # #estani
len1=${#arr1[#]}
echo "${len1}"
echo "${arr1[*]}"
# SORTING
# shellcheck disable=SC2207
arr2=($(printf '%s\n' "${arrOrig[#]}" | sort -u)) # #das.cyklone
len2=${#arr2[#]}
echo "${len2}"
echo "${arr2[*]}"
# SORTING
# shellcheck disable=SC2207
arr3=($(echo "${arrOrig[#]}" | tr ' ' '\n' | sort -u | tr '\n' ' ')) # #sampson-chen
len3=${#arr3[#]}
echo "${len3}"
echo "${arr3[*]}"
# SORTING
# shellcheck disable=SC2207
arr4=($(for i in "${arrOrig[#]}"; do echo "${i}"; done | sort -u)) # #corbyn42
len4=${#arr4[#]}
echo "${len4}"
echo "${arr4[*]}"
# NO SORTING
# shellcheck disable=SC2207
arr5=($(echo "${arrOrig[#]}" | tr "[:space:]" '\n' | awk '!a[$0]++')) # #faustus
len5=${#arr5[#]}
echo "${len5}"
echo "${arr5[*]}"
# OUTPUTS
# arr1
2 # length
192.168.3.4 192.168.3.3 # items
# arr2
2 # length
192.168.3.3 192.168.3.4 # items
# arr3
2 # length
192.168.3.3 192.168.3.4 # items
# arr4
2 # length
192.168.3.3 192.168.3.4 # items
# arr5
2 # length
192.168.3.4 192.168.3.3 # items
Output for all of these is 2 and correct. This answer basically summarises and tidies up the other answers in this post and is a useful quick reference. Attribution to original answer is given.
In zsh you can use (u) flag:
$ ids=(aa ab aa ac aa ad)
$ print ${(u)ids}
aa ab ac ad
Try this to get uniq values for first column in file
awk -F, '{a[$1];}END{for (i in a)print i;}'
# Read a file into variable
lines=$(cat /path/to/my/file)
# Go through each line the file put in the variable, and assign it a variable called $line
for line in $lines; do
# Print the line
echo $line
# End the loop, then sort it (add -u to have unique lines)
done | sort -u

Resources