I've got almost the same question as here.
I have an array which contains aa ab aa ac aa ad, etc.
Now I want to select all unique elements from this array.
Thought, this would be simple with sort | uniq or with sort -u as they mentioned in that other question, but nothing changed in the array...
The code is:
echo `echo "${ids[#]}" | sort | uniq`
What am I doing wrong?
A bit hacky, but this should do it:
echo "${ids[#]}" | tr ' ' '\n' | sort -u | tr '\n' ' '
To save the sorted unique results back into an array, do Array assignment:
sorted_unique_ids=($(echo "${ids[#]}" | tr ' ' '\n' | sort -u | tr '\n' ' '))
If your shell supports herestrings (bash should), you can spare an echo process by altering it to:
tr ' ' '\n' <<< "${ids[#]}" | sort -u | tr '\n' ' '
A note as of Aug 28 2021:
According to ShellCheck wiki 2207 a read -a pipe should be used to avoid splitting.
Thus, in bash the command would be:
IFS=" " read -r -a ids <<< "$(echo "${ids[#]}" | tr ' ' '\n' | sort -u | tr '\n' ' ')"
or
IFS=" " read -r -a ids <<< "$(tr ' ' '\n' <<< "${ids[#]}" | sort -u | tr '\n' ' ')"
Input:
ids=(aa ab aa ac aa ad)
Output:
aa ab ac ad
Explanation:
"${ids[#]}" - Syntax for working with shell arrays, whether used as part of echo or a herestring. The # part means "all elements in the array"
tr ' ' '\n' - Convert all spaces to newlines. Because your array is seen by shell as elements on a single line, separated by spaces; and because sort expects input to be on separate lines.
sort -u - sort and retain only unique elements
tr '\n' ' ' - convert the newlines we added in earlier back to spaces.
$(...) - Command Substitution
Aside: tr ' ' '\n' <<< "${ids[#]}" is a more efficient way of doing: echo "${ids[#]}" | tr ' ' '\n'
If you're running Bash version 4 or above (which should be the case in any modern version of Linux), you can get unique array values in bash by creating a new associative array that contains each of the values of the original array. Something like this:
$ a=(aa ac aa ad "ac ad")
$ declare -A b
$ for i in "${a[#]}"; do b["$i"]=1; done
$ printf '%s\n' "${!b[#]}"
ac ad
ac
aa
ad
This works because in any array (associative or traditional, in any language), each key can only appear once. When the for loop arrives at the second value of aa in a[2], it overwrites b[aa] which was set originally for a[0].
Doing things in native bash can be faster than using pipes and external tools like sort and uniq, though for larger datasets you'll likely see better performance if you use a more powerful language like awk, python, etc.
If you're feeling confident, you can avoid the for loop by using printf's ability to recycle its format for multiple arguments, though this seems to require eval. (Stop reading now if you're fine with that.)
$ eval b=( $(printf ' ["%s"]=1' "${a[#]}") )
$ declare -p b
declare -A b=(["ac ad"]="1" [ac]="1" [aa]="1" [ad]="1" )
The reason this solution requires eval is that array values are determined before word splitting. That means that the output of the command substitution is considered a single word rather than a set of key=value pairs.
While this uses a subshell, it uses only bash builtins to process the array values. Be sure to evaluate your use of eval with a critical eye. If you're not 100% confident that chepner or glenn jackman or greycat would find no fault with your code, use the for loop instead.
I realize this was already answered, but it showed up pretty high in search results, and it might help someone.
printf "%s\n" "${IDS[#]}" | sort -u
Example:
~> IDS=( "aa" "ab" "aa" "ac" "aa" "ad" )
~> echo "${IDS[#]}"
aa ab aa ac aa ad
~>
~> printf "%s\n" "${IDS[#]}" | sort -u
aa
ab
ac
ad
~> UNIQ_IDS=($(printf "%s\n" "${IDS[#]}" | sort -u))
~> echo "${UNIQ_IDS[#]}"
aa ab ac ad
~>
If your array elements have white space or any other shell special character (and can you be sure they don't?) then to capture those first of all (and you should just always do this) express your array in double quotes! e.g. "${a[#]}". Bash will literally interpret this as "each array element in a separate argument". Within bash this simply always works, always.
Then, to get a sorted (and unique) array, we have to convert it to a format sort understands and be able to convert it back into bash array elements. This is the best I've come up with:
eval a=($(printf "%q\n" "${a[#]}" | sort -u))
Unfortunately, this fails in the special case of the empty array, turning the empty array into an array of 1 empty element (because printf had 0 arguments but still prints as though it had one empty argument - see explanation). So you have to catch that in an if or something.
Explanation:
The %q format for printf "shell escapes" the printed argument, in just such a way as bash can recover in something like eval!
Because each element is printed shell escaped on it's own line, the only separator between elements is the newline, and the array assignment takes each line as an element, parsing the escaped values into literal text.
e.g.
> a=("foo bar" baz)
> printf "%q\n" "${a[#]}"
'foo bar'
baz
> printf "%q\n"
''
The eval is necessary to strip the escaping off each value going back into the array.
'sort' can be used to order the output of a for-loop:
for i in ${ids[#]}; do echo $i; done | sort
and eliminate duplicates with "-u":
for i in ${ids[#]}; do echo $i; done | sort -u
Finally you can just overwrite your array with the unique elements:
ids=( `for i in ${ids[#]}; do echo $i; done | sort -u` )
this one will also preserve order:
echo ${ARRAY[#]} | tr [:space:] '\n' | awk '!a[$0]++'
and to modify the original array with the unique values:
ARRAY=($(echo ${ARRAY[#]} | tr [:space:] '\n' | awk '!a[$0]++'))
To create a new array consisting of unique values, ensure your array is not empty then do one of the following:
Remove duplicate entries (with sorting)
readarray -t NewArray < <(printf '%s\n' "${OriginalArray[#]}" | sort -u)
Remove duplicate entries (without sorting)
readarray -t NewArray < <(printf '%s\n' "${OriginalArray[#]}" | awk '!x[$0]++')
Warning: Do not try to do something like NewArray=( $(printf '%s\n' "${OriginalArray[#]}" | sort -u) ). It will break on spaces.
Without loosing the original ordering:
uniques=($(tr ' ' '\n' <<<"${original[#]}" | awk '!u[$0]++' | tr '\n' ' '))
If you want a solution that only uses bash internals, you can set the values as keys in an associative array, and then extract the keys:
declare -A uniqs
list=(foo bar bar "bar none")
for f in "${list[#]}"; do
uniqs["${f}"]=""
done
for thing in "${!uniqs[#]}"; do
echo "${thing}"
done
This will output
bar
foo
bar none
cat number.txt
1 2 3 4 4 3 2 5 6
print line into column: cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}'
1
2
3
4
4
3
2
5
6
find the duplicate records: cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}' |awk 'x[$0]++'
4
3
2
Replace duplicate records: cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}' |awk '!x[$0]++'
1
2
3
4
5
6
Find only Uniq records: cat number.txt | awk '{for(i=1;i<=NF;i++) print $i|"sort|uniq -u"}
1
5
6
How about this variation?
printf '%s\n' "${ids[#]}" | sort -u
Another option for dealing with embedded whitespace, is to null-delimit with printf, make distinct with sort, then use a loop to pack it back into an array:
input=(a b c "$(printf "d\ne")" b c "$(printf "d\ne")")
output=()
while read -rd $'' element
do
output+=("$element")
done < <(printf "%s\0" "${input[#]}" | sort -uz)
At the end of this, input and output contain the desired values (provided order isn't important):
$ printf "%q\n" "${input[#]}"
a
b
c
$'d\ne'
b
c
$'d\ne'
$ printf "%q\n" "${output[#]}"
a
b
c
$'d\ne'
All the following work in bash and sh and are without error in shellcheck but you need to suppress SC2207
arrOrig=("192.168.3.4" "192.168.3.4" "192.168.3.3")
# NO SORTING
# shellcheck disable=SC2207
arr1=($(tr ' ' '\n' <<<"${arrOrig[#]}" | awk '!u[$0]++' | tr '\n' ' ')) # #estani
len1=${#arr1[#]}
echo "${len1}"
echo "${arr1[*]}"
# SORTING
# shellcheck disable=SC2207
arr2=($(printf '%s\n' "${arrOrig[#]}" | sort -u)) # #das.cyklone
len2=${#arr2[#]}
echo "${len2}"
echo "${arr2[*]}"
# SORTING
# shellcheck disable=SC2207
arr3=($(echo "${arrOrig[#]}" | tr ' ' '\n' | sort -u | tr '\n' ' ')) # #sampson-chen
len3=${#arr3[#]}
echo "${len3}"
echo "${arr3[*]}"
# SORTING
# shellcheck disable=SC2207
arr4=($(for i in "${arrOrig[#]}"; do echo "${i}"; done | sort -u)) # #corbyn42
len4=${#arr4[#]}
echo "${len4}"
echo "${arr4[*]}"
# NO SORTING
# shellcheck disable=SC2207
arr5=($(echo "${arrOrig[#]}" | tr "[:space:]" '\n' | awk '!a[$0]++')) # #faustus
len5=${#arr5[#]}
echo "${len5}"
echo "${arr5[*]}"
# OUTPUTS
# arr1
2 # length
192.168.3.4 192.168.3.3 # items
# arr2
2 # length
192.168.3.3 192.168.3.4 # items
# arr3
2 # length
192.168.3.3 192.168.3.4 # items
# arr4
2 # length
192.168.3.3 192.168.3.4 # items
# arr5
2 # length
192.168.3.4 192.168.3.3 # items
Output for all of these is 2 and correct. This answer basically summarises and tidies up the other answers in this post and is a useful quick reference. Attribution to original answer is given.
In zsh you can use (u) flag:
$ ids=(aa ab aa ac aa ad)
$ print ${(u)ids}
aa ab ac ad
Try this to get uniq values for first column in file
awk -F, '{a[$1];}END{for (i in a)print i;}'
# Read a file into variable
lines=$(cat /path/to/my/file)
# Go through each line the file put in the variable, and assign it a variable called $line
for line in $lines; do
# Print the line
echo $line
# End the loop, then sort it (add -u to have unique lines)
done | sort -u
Related
I have a string with emails, some duplicated. For example only:
"aaa#company.com,bbb#company.com,aaa#company.com,bbb#company.com,ccc#company.com"
I would like string to contain only unique emails, comma separated. Result should be:
"aaa#company.com,bbb#company.com,ccc#company.com"
Any easy way to do this?
P.S. emails vary, and I don't know what they will contain.
How about this:
echo "aaa#company.com,bbb#company.com,aaa#company.com,bbb#company.com,ccc#company.com" |
tr ',' '\n' |
sort |
uniq |
tr '\n' ',' |
sed -e 's/,$//'
I convert the separating commas into newlines so that I can then use tools (like sort, uniq, and grep) that work with lines.
Using awk and process-substitution only than to use sort and other tools.
awk -vORS="," '!seen[$1]++' < <(echo "aaa#company.com,bbb#company.com,aaa#company.com,bbb#company.com,ccc#company.com" | tr ',' '\n')
aaa#company.com,bbb#company.com,ccc#company.com
Or another way to use pure-bash and avoid tr completely would be
# Read into a bash array with field-separator as ',' read with '-a' for reading to an array
IFS=',' read -ra myArray <<< "aaa#company.com,bbb#company.com,aaa#company.com,bbb#company.com,ccc#company.com"
# Printing the array elements new line and feeding it to awk
awk -vORS="," '!seen[$1]++' < <(printf '%s\n' "${myArray[#]}")
aaa#company.com,bbb#company.com,ccc#company.com
With perl
$ s="aaa#company.com,bbb#company.com,aaa#company.com,bbb#company.com,ccc#company.com"
$ echo $s | perl -MList::MoreUtils=uniq -F, -le 'print join ",",uniq(#F)'
aaa#company.com,bbb#company.com,ccc#company.com
Getting the strings in an array:
IFS=','; read -r -a lst <<< "aaa#company.com,bbb#company.com,aaa#company.com,bbb#company.com,ccc#company.com"
Sorting and filtering:
IFS=$'\n' sort <<< "${lst[*]}" | uniq
This question already has answers here:
Sorting and removing duplicate words in a line
(7 answers)
Closed 6 years ago.
I want to delete duplicate strings from a String. Example:
A="Dog Cat Horse Dog Dog Cat"
The string A should look like this:
A="Dog Cat Horse"
How can I write a Shell script for that?
You could use this,
echo "a a b b c c" | tr ' ' '\n' | sort | uniq | tr '\n' ' ' | sed -e 's/[[:space:]]*$//'
If order is not important, you can use an associative array:
declare -A uniq
for k in $A ; do uniq[$k]=1 ; done
echo ${!uniq[#]}
(Safely) split the string on blanks, creating an array with each word:†
read -r -d '' -a words < <(printf '%s\0' "$A")
Loop on the fields of the array, storing the words into an associative array; if the word was already seen, ignore it
declare -A Aseen
Aunique=()
for w in "${words[#]}"; do
[[ ${Aseen[$w]} ]] && continue
Aunique+=( "$w" )
Aseen[$w]=x
done
You can print the Aunique array to standard output:
printf '%s\n' "${Aunique[#]}"
which yields:
Dog
Cat
Horse
or create a new string with it
Anew="${Aunique[*]}"
printf '%s\n' "$Anew"
which yields:
Dog Cat Horse
or join the array with a separator, e.g., with the character ,:‡
IFS=, eval 'Asep="${Aunique[*]}"'
printf '%s\n' "${Asep[#]}"
which yields:
Dog,Cat,Horse
All these use Bash≥4 features. If you're stuck on older Bash versions, there are workarounds but it won't be as safe and nice and easy…
Note. This method will not sort the string: the words remain in the original order, only with the duplicates removed.
†This is the canonical (and safe!) way to split a string on space characters (or, more generally on the characters contained in the special variable IFS, which has default value space-tab-newline). Don't use horrors like words=( $A ): it's subject to filename expansion (globbing). Another method widely encountered is read -r -a words <<< "$A"; this is fine (i.e., safe), but will not handle newlines in A.
‡The use of eval here is 100% safe (because of the single quotes); it's actually the canonical way to join the elements of an array in Bash (or to join the positional parameters in POSIX shells).
With gawk:
awk -v RS="[ \n]" -v ORS=" " '!($0 in a){print;a[$0]}' <(echo $A)
I have string contains a path
string="toto.titi.12.tata.2.abc.def"
I want to extract only the numbers from this string.
To extract the first number:
tmp="${string#toto.titi.*.}"
num1="${tmp%.tata*}"
To extract the second number:
tmp="${string#toto.titi.*.tata.*.}"
num2="${tmp%.abc.def}"
So to extract a parameter I have to do it in 2 steps. How to extract a number with one step?
You can use tr to delete all of the non-digit characters, like so:
echo toto.titi.12.tata.2.abc.def | tr -d -c 0-9
To extract all the individual numbers and print one number word per line pipe through -
tr '\n' ' ' | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | sed 's/ /\n/g'
Breakdown:
Replaces all line breaks with spaces: tr '\n' ' '
Replaces all non numbers with spaces: sed -e 's/[^0-9]/ /g'
Remove leading white space: -e 's/^ *//g'
Remove trailing white space: -e 's/ *$//g'
Squeeze spaces in sequence to 1 space: tr -s ' '
Replace remaining space separators with line break: sed 's/ /\n/g'
Example:
echo -e " this 20 is 2sen\nten324ce 2 sort of" | tr '\n' ' ' | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | sed 's/ /\n/g'
Will print out
20
2
324
2
Here is a short one:
string="toto.titi.12.tata.2.abc.def"
id=$(echo "$string" | grep -o -E '[0-9]+')
echo $id // => output: 12 2
with space between the numbers.
Hope it helps...
Parameter expansion would seem to be the order of the day.
$ string="toto.titi.12.tata.2.abc.def"
$ read num1 num2 <<<${string//[^0-9]/ }
$ echo "$num1 / $num2"
12 / 2
This of course depends on the format of $string. But at least for the example you've provided, it seems to work.
This may be superior to anubhava's awk solution which requires a subshell. I also like chepner's solution, but regular expressions are "heavier" than parameter expansion (though obviously way more precise). (Note that in the expression above, [^0-9] may look like a regex atom, but it is not.)
You can read about this form or Parameter Expansion in the bash man page. Note that ${string//this/that} (as well as the <<<) is a bashism, and is not compatible with traditional Bourne or posix shells.
This would be easier to answer if you provided exactly the output you're looking to get. If you mean you want to get just the digits out of the string, and remove everything else, you can do this:
d#AirBox:~$ string="toto.titi.12.tata.2.abc.def"
d#AirBox:~$ echo "${string//[a-z,.]/}"
122
If you clarify a bit I may be able to help more.
You can also use sed:
echo "toto.titi.12.tata.2.abc.def" | sed 's/[0-9]*//g'
Here, sed replaces
any digits (class [0-9])
repeated any number of times (*)
with nothing (nothing between the second and third /),
and g stands for globally.
Output will be:
toto.titi..tata..abc.def
Convert your string to an array like this:
$ str="toto.titi.12.tata.2.abc.def"
$ arr=( ${str//[!0-9]/ } )
$ echo "${arr[#]}"
12 2
Use regular expression matching:
string="toto.titi.12.tata.2.abc.def"
[[ $string =~ toto\.titi\.([0-9]+)\.tata\.([0-9]+)\. ]]
# BASH_REMATCH[0] would be "toto.titi.12.tata.2.", the entire match
# Successive elements of the array correspond to the parenthesized
# subexpressions, in left-to-right order. (If there are nested parentheses,
# they are numbered in depth-first order.)
first_number=${BASH_REMATCH[1]}
second_number=${BASH_REMATCH[2]}
Using awk:
arr=( $(echo $string | awk -F "." '{print $3, $5}') )
num1=${arr[0]}
num2=${arr[1]}
Hi adding yet another way to do this using 'cut',
echo $string | cut -d'.' -f3,5 | tr '.' ' '
This gives you the following output:
12 2
Fixing newline issue (for mac terminal):
cat temp.txt | tr '\n' ' ' | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | sed $'s/ /\\\n/g'
Assumptions:
there is no embedded white space
the string of text always has 7 period-delimited strings
the string always contains numbers in the 3rd and 5th period-delimited positions
One bash idea that does not require spawning any subprocesses:
$ string="toto.titi.12.tata.2.abc.def"
$ IFS=. read -r x1 x2 num1 x3 num2 rest <<< "${string}"
$ typeset -p num1 num2
declare -- num1="12"
declare -- num2="2"
In a comment OP has stated they wish to extract only one number at a time; the same approach can still be used, eg:
$ string="toto.titi.12.tata.2.abc.def"
$ IFS=. read -r x1 x2 num1 rest <<< "${string}"
$ typeset -p num1
declare -- num1="12"
$ IFS=. read -r x1 x2 x3 x4 num2 rest <<< "${string}"
$ typeset -p num2
declare -- num2="2"
A variation on anubhava's answer that uses parameter expansion instead of a subprocess call to awk, and still working with the same set of initial assumptions:
$ arr=( ${string//./ } )
$ num1=${arr[2]}
$ num2=${arr[4]}
$ typeset -p num1 num2
declare -- num1="12"
declare -- num2="2"
I have two variables which have values that can be in both. I would like to create a unique list from the two variables.
VAR1="SERVER1 SERVER2 SERVER3"
VAR2="SERVER1 SERVER5"
I am trying to get a result of:
"SERVER1 SERVER2 SERVER3 SERVER5"
The following pipes a combination of the two lists through the sort program with the unique parameter -u:
UNIQUE=$(echo "$VAR1 $VAR2" | tr ' ' '\n' | sort -u)
This gives the output:
> echo $UNIQUE
SERVER1 SERVER2 SERVER3 SERVER5
Edit:
As William Purcell points out in the comments below, this separates the strings by new-lines. If you wish to separate by white space again you can pipe the output from sort back through tr '\n' ' ':
> UNIQUE=$(echo "$VAR1 $VAR2" | tr ' ' '\n' | sort -u | tr '\n' ' ')
> echo "$UNIQUE"
SERVER1 SERVER2 SERVER3 SERVER5
And of course you have
$ var1="a b c"
$ result=$var1" d e f"
$ echo $result
With that you achieve the concatenation.
Also with variables:
$ var1="a b c"
$ var2=" d e f"
$ result=$var1$var2
$ echo $result
To put a variable after another is the simpliest way of concatenation i know. Maybe for your plans is not enough. But it works and is usefull for easy tasks.
It will be usefull for any variable.
If you need to maintain the order, you cannot use sort, but you can do:
for i in $VAR1 $VAR2; do echo "$VAR3" | grep -qF $i || VAR3="$VAR3${VAR3:+ }$i"; done
This appends to VAR3, so you probably want to clear VAR3 first. Also, you may need to be more careful in terms of putting word boundaries on the grep, as FOO will not be added if FOOSERVER is already in the list, but this is a good technique.
Use printf '%s\n' to avoid having to | tr ' ' '\n
printf '%s\n' "$TEST_VAR1" "$TEST_VAR2" | sort -u
I am trying to make a a simple script of finding the largest word and its number/length in a text file using bash. I know when I use awk its simple and straight forward but I want to try and use this method...lets say I know if a=wmememememe and if I want to find the length I can use echo {#a} its word I would echo ${a}. But I want to apply it on this below
for i in `cat so.txt` do
Where so.txt contains words, I hope it makes sense.
bash one liner.
sed 's/ /\n/g' YOUR_FILENAME | sort | uniq | awk '{print length, $0}' | sort -nr | head -n 1
read file and split the words (via sed)
remove duplicates (via sort | uniq)
prefix each word with it's length (awk)
sort the list by the word length
print the single word with greatest length.
yes this will be slower than some of the above solutions, but it also doesn't require remembering the semantics of bash for loops.
Normally, you'd want to use a while read loop instead of for i in $(cat), but since you want all the words to be split, in this case it would work out OK.
#!/bin/bash
longest=0
for word in $(<so.txt)
do
len=${#word}
if (( len > longest ))
then
longest=$len
longword=$word
fi
done
printf 'The longest word is %s and its length is %d.\n' "$longword" "$longest"
Another solution:
for item in $(cat "$infile"); do
length[${#item}]=$item # use word length as index
done
maxword=${length[#]: -1} # select last array element
printf "longest word '%s', length %d" ${maxword} ${#maxword}
longest=""
for word in $(cat so.txt); do
if [ ${#word} -gt ${#longest} ]; then
longest=$word
fi
done
echo $longest
awk script:
#!/usr/bin/awk -f
# Initialize two variables
BEGIN {
maxlength=0;
maxword=0
}
# Loop through each word on the line
{
for(i=1;i<=NF;i++)
# Assign the maxlength variable if length of word found is greater. Also, assign
# the word to maxword variable.
if (length($i)>maxlength)
{
maxlength=length($i);
maxword=$i;
}
}
# Print out the maxword and the maxlength
END {
print maxword,maxlength;
}
Textfile:
[jaypal:~/Temp] cat textfile
AWK utility is a data_extraction and reporting tool that uses a data-driven scripting language
consisting of a set of actions to be taken against textual data (either in files or data streams)
for the purpose of producing formatted reports.
The language used by awk extensively uses the string datatype,
associative arrays (that is, arrays indexed by key strings), and regular expressions.
Test:
[jaypal:~/Temp] ./script.awk textfile
data_extraction 15
Relatively speedy bash function using no external utils:
# Usage: longcount < textfile
longcount ()
{
declare -a c;
while read x; do
c[${#x}]="$x";
done;
echo ${#c[#]} "${c[${#c[#]}]}"
}
Example:
longcount < /usr/share/dict/words
Output:
23 electroencephalograph's
'Modified POSIX shell version of jimis' xargs-based
answer; still very slow, takes two or three minutes:
tr "'" '_' < /usr/share/dict/words |
xargs -P$(nproc) -n1 -i sh -c 'set -- {} ; echo ${#1} "$1"' |
sort -n | tail | tr '_' "'"
Note the leading and trailing tr bit to get around GNU xargs
difficulty with single quotes.
for i in $(cat so.txt); do echo ${#i}; done | paste - so.txt | sort -n | tail -1
Slow because of the gazillion of forks, but pure shell, does not require awk or special bash features:
$ cat /usr/share/dict/words | \
xargs -n1 -I '{}' -d '\n' sh -c 'echo `echo -n "{}" | wc -c` "{}"' | \
sort -n | tail
23 Pseudolamellibranchiata
23 pseudolamellibranchiate
23 scientificogeographical
23 thymolsulphonephthalein
23 transubstantiationalist
24 formaldehydesulphoxylate
24 pathologicopsychological
24 scientificophilosophical
24 tetraiodophenolphthalein
24 thyroparathyroidectomize
You can easily parallelize, e.g. to 4 CPUs by providing -P4 to xargs.
EDIT: modified to work with the single quotes that some dictionaries have. Now it requires GNU xargs because of -d argument.
EDIT2: for the fun of it, here is another version that handles all kinds of special characters, but requires the -0 option to xargs. I also added -P4 to compute on 4 cores:
cat /usr/share/dict/words | tr '\n' '\0' | \
xargs -0 -I {} -n1 -P4 sh -c 'echo ${#1} "$1"' wordcount {} | \
sort -n | tail