Sorting space delimited numbers with Linux/Bash - linux

Is there a Linux utility or a Bash command I can use to sort a space delimited string of numbers?

Here's a simple example to get you going:
echo "81 4 6 12 3 0" | tr " " "\n" | sort -g
tr translates the spaces delimiting the numbers, into carriage returns, because sort uses carriage returns as delimiters (ie it is for sorting lines of text). The -g option tells sort to sort by "general numerical value".
man sort for further details about sort.

This is a variation from #JamesMorris answer:
echo "81 4 6 12 3 0" | xargs -n1 | sort -g | xargs
Instead of tr, I use xargs -n1 to convert to new lines. The final xargs is to convert back, to a space separated sequence of numbers.

This is a variation on ghostdog74's answer that's too big to fit in a comment. It shows digits instead of names of numbers and both the original string and the result are in space-delimited strings (instead of an array which becomes a newline-delimited string).
$ s="3 2 11 15 8"
$ sorted=$(echo $(printf "%s\n" $s | sort -n))
$ echo $sorted
2 3 8 11 15
$ echo "$sorted"
2 3 8 11 15
If you didn't use the echo when setting the value of sorted, then the string has newlines in it. In that case echoing it without quotes puts it all on one line, but, as echoing it with quotes would show, each number would appear on its own line. This is the case whether the original is an array or a string.
# demo
$ s="3 2 11 15 8"
$ sorted=$(printf "%s\n" $s | sort -n)
$ echo $sorted
2 3 8 11 15
$ echo "$sorted"
2
3
8
11
15

$ s=(one two three four)
$ sorted=$(printf "%s\n" ${s[#]}|sort)
$ echo $sorted
four one three two

Using Bash parameter expansion (to replace spaces with newlines) we can do:
str="3 2 11 15 8"
sort -n <<< "${str// /$'\n'}"
# alternative
NL=$'\n'
str="3 2 11 15 8"
sort -n <<< "${str// /${NL}}"

If you actually have a space-delimited string of numbers, then one of the other answers provided would work fine. If your list is a bash array, then:
oldIFS="$IFS"
IFS=$'\n'
array=($(sort -g <<< "${array[*]}"))
IFS="$oldIFS"
might be a better solution. The newline delimiter would help if you want to generalize to sorting an array of strings instead of numbers.

Improving on Evan Krall's nice Bash "array sort" by limiting the scope of IFS to a single command:
printf "%q\n" "${IFS}"
array=(3 2 11 15 8)
array=($(IFS=$'\n' sort -n <<< "${array[*]}"))
echo "${array[#]}"
printf "%q\n" "${IFS}"

$ awk 'BEGIN{split(ARGV[1], numbers);for(i in numbers) {print numbers[i]} }' \
"6 7 4 1 2 3" | sort -n

I added this to my .zshrc (or .bashrc) file:
#sort a space-separated list of words (e.g. a list of HTML classes)
sortwords() {
echo $1 | xargs -n1 | sort -g | xargs
}
Call it from the terminal like this:
sortwords "banana date apple cherry"
# apple banana cherry date
Thanks to #FranMowinckel and others for inspiration.

Related

Linux terminal command line variable using in grep

I want to display the amount of words that have exactly 14, 15 and 16 unique letters. I want to use a for loop. (It has to be a one liner.)
This is what I have so far:
for i in {14..16}; do echo "There are $(cat /usr/share/dict/dutch | grep -P '^.{"$i"}$' | grep -vP -c '(.).*\1') words with exactly $i unique letters"; done
Result:
There are 0 words with exactly 14 unique letters
There are 0 words with exactly 15 unique letters
There are 0 words with exactly 16 unique letters
This means the loop works and when I run it like this:
echo "There are $(cat /usr/share/dict/dutch | grep -P '^.{14}$' | grep -vP -c '(.).*\1') words with exactly 14 unique letters" &&
echo "There are $(cat /usr/share/dict/dutch | grep -P '^.{15}$' | grep -vP -c '(.).*\1') words with exactly 15 unique letters" &&
echo "There are $(cat /usr/share/dict/dutch | grep -P '^.{16}$' | grep -vP -c '(.).*\1') words with exactly 16 unique letters"
The results are:
There are 13 words with exactly 14 unique letters
There are 2 words with exactly 15 unique letters
There are 0 words with exactly 16 unique letters
This shows that I am doing something wrong with the variable ($i) inside the grep-command. I don't know how I should do it or solve this problem.
Thanks in advance
It looks like you need to use single quotes instead of double quotes around the variable in your first regex:
for i in {14..16}; do
echo "there are $(cat /usr/share/dict/dutch | grep -P '^.{'$i'}$' | grep -vP -c '(.).*\1') words with exactly $i unique letters";
done
Because your first regex was wrapped in single quotes, it was being used literally, without any variable expansion. By putting the variable in single quotes, you're actually specifying two literal strings wrapped tightly around your actual variable (put another way, your variable isn't actually wrapped in any quotes at all).
Edit: This is what I get on my system:
$ for i in {14..16}; do echo "there are $(cat /usr/share/dict/dutch | grep -P '^.{'$i'}$' | grep -vP -c '(.).*\1') words with exactly $i unique letters"; done
there are 13 words with exactly 14 unique letters
there are 2 words with exactly 15 unique letters
there are 0 words with exactly 16 unique letters
Edit 2: This might demonstrate the issue more clearly:
$ i=12
$ echo '^.{"$i"}$'
^.{"$i"}$
$ echo '^.{'$i'}$'
^.{12}$

Sort the tab-delimited numbers on each line of a file

I'm trying to sort the numbers on each line of a file individually. The numbers within one line are separated by tabs. (I used spaces but they're actually tabs.)
For example, for the following input
5 8 7 6
1 5 6 8
8 9 7 1
the desired output would be:
5 6 7 8
1 5 6 7
1 7 8 9
My attempt so far is:
let i=1
while read line
do
echo "$line" | tr " " "\n" | sort -g
cut -f $i fileName | paste -s >> tempFile$$
((++i))
done < fileName
This is the best I got - I'm sure it can be done in 6 characters with awk/sed/perl:
while read line
do
echo $(printf "%d\n" $line | sort -n) | tr ' ' \\t >> another-file.txt
done < my-input-file.txt
Using a few features that are specific to GNU awk:
$ awk 'BEGIN{ PROCINFO["sorted_in"] = "#ind_num_asc" }
{ delete(a); n = 0; for (i=1;i<=NF;++i) a[$i];
for (i in a) printf "%s%s", i, (++n<NF?FS:RS) }' file
5 6 7 8
1 5 6 8
1 7 8 9
Each field is set as a key in the array a. In GNU awk it is possible to specify the order in which the for (i in a) loop traverses the array - here, I've set it to do so in ascending numerical order.
Here is a bash script that can do it. It takes a filename argument or reads stdin, was tested on CentOS and assumes IFS=$' \t\n'.
#!/bin/bash
if [ "$1" ] ; then exec < "$1" ; fi
cat - | while read line
do
set $line
echo $(for var in "$#"; do echo $var; done | sort -n) | tr " " "\t"
done
If you want to put the output in another file run it as:
cat input_file | sorting_script > another_file
or
sorting_script input_file > another file
Consider using perl for this:
perl -ape '#F=sort #F;$_="#F\n"' input.txt
Here -a turns on automatic field splitting (like awk does) into the array #F, -p makes it execute the script for each line and print $_ each time, and -e specifies the script directly on the command line.
Not quite 6 characters, I'm afraid, Sean.
This should have been simple in awk, but it doen't quite have the features needed. If there had been an array $# corresponding to the fields $1, $2, etc., then the solution would have been awk '{asort $#}' input.txt, but sadly no such array exits. The loops required to move the fields into an array and out of it again make it longer than the bash version:
awk '{for(i=1;i<=NF;i++)a[i]=$i;asort(a);for(i=1;i<=NF;i++)printf("%s ",a[i]);printf("\n")}' input.txt
So awk isn't the right tool for the job here. It's also a bit odd that sort itself doesn't have a switch to control its sorting direction.
Using awk
$ cat file
5 8 7 6
1 5 6 8
8 9 7 1
$ awk '{c=1;while(c!=""){c=""; for(i=1;i<NF;i++){n=i+1; if($i>$n){c=$i;$i=$n;$n=c}}}}1' file
5 6 7 8
1 5 6 8
1 7 8 9
Better Readable version
awk '{
c=1
while(c!="")
{
c=""
for(i=1;i<NF;i++)
{
n=i+1
if($i>$n)
{
c=$i
$i=$n
$n=c
}
}
}
}1
' file
If you have ksh, you may try this
#!/usr/bin/env ksh
while read line ; do
set -s +A cols $line
echo ${cols[*]}
done < "input_file"
Test
[akshay#localhost tmp]$ cat test.ksh
#!/usr/bin/env ksh
cat <<EOF | while read line ; do set -s +A cols $line; echo ${cols[*]};done
5 8 7 6
1 5 6 8
8 9 7 1
EOF
[akshay#localhost tmp]$ ksh test.ksh
5 6 7 8
1 5 6 8
1 7 8 9

Last n words from a string in bash

I have a string containing many (the total number varies) words, and I need to get last 10 of them. How do I do it? I'm looking at awk, grep and cut but nothing really comes to mind.
An example (although it seems to me that the question is clear):
aaa bda fdkfj fds fsd ... dsad dsas dsad zrthd shshh
I want the last 10 words of this string.
Again, the total number of words in the initial string isn't defined.
Just play with tr, tail and xargs:
$ echo "1 2 3 4 5 6 7 8 9 10" | tr ' ' '\n' | tail -5 | xargs -n5
6 7 8 9 10
This prints the words one in every line, so that tail gets the desired amount of them. Then, xargs "remerges" them in the same line.
You can also set awk's NF to the value you want after reversing the text:
$ echo "1 2 3 4 5 6 7 8 9 10" | rev | awk '{NF=5}1' | rev
6 7 8 9 10
When you're trying to find words or characters at the last, it's better to use end of the line anchor $ in your regex.
$ echo "aaa bda fdkfj fds fsd bar dsad dsas dsad zrthd shshh" | grep -o '[^[:space:]]\+\([[:space:]]\+[^[:space:]]\+\)\{9\} *$'
bda fdkfj fds fsd bar dsad dsas dsad zrthd shshh
You could use the same regex in sed also.
OR
$ echo "aaa bda fdkfj fds fsd bar dsad dsas dsad zrthd shshh" | grep -oP '\S+(?:\s+\S+){9} *$'
bda fdkfj fds fsd bar dsad dsas dsad zrthd shshh
In awk, the builtin variable NF is set to the number of fields (which are by default words) on each line. So you can:
echo "${STRING}" | awk '{
for (i = NF - 9; i <= NF; i++) {printf "%s ", $i}
printf "\n"
}'
assuming that you always have at least 10 words on the line. If not, you can add extra checks for that. And do something more if you don't want the extra space at the end of the line.
The canonical, pure Bash way of doing this is to use read:
string='one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen forty two'
read -r -d '' -a array < <(printf '%s\0' "$string")
# Print only ten last words:
printf '%s\n' "${array[*]: -10}"
If there are less than 10 words, the last expansion fails, but this can be easily fixed:
printf '%s\n' "${array[*]:${#array[#]}<10?0:-10}"
You want shell? This is pure shell. No awk, no cut, no sed, no perl. You can't get more sell than this. (Okay, I do use wc which is a utility and not part of Bash shell, but everything else is part of Bash).
FOO="one two three four five six seven eight nine ten eleven twelve thirteen"
set $FOO
((shift=$(wc -w<<<$FOO)-10))
shift $shift
echo $*
The set sets the positional parameters. (The $1, $2, etc. in command line arguments).
The $(wc -w<<<$FOO) finds the number of parameters.
I subtract that number from 10, and get the number of parameters greater than ten. I set this to $shift
I then shift $shift parameters. This leaves the last ten parameters which I echo.
You don't really need wc. $# expands to the number of positional parameters set. – gniourf_gniourf
Oh, I forgot about that. Now, we have a pure Bash answer:
FOO='one two three four five six seven eight nine ten eleven twelve thirteen'
set $FOO
((shift=$#-10))
shift $shift
echo $*
echo $string | perl -lanE 'say join " ", #F[-10..-1]'
Your string:
string="Lorem ipsum dolor sit amet"
The last four words by using a pure Bash/Shell one-liner:
echo ${string/${string% * * * *} /}
Repeat or remove  * to fetch more or less words.
Explanation
We use the Shell Parameter Expansion ${parameter/pattern/string} to replace x words by nothing. And as the pattern ${str% * * * *}  returns everything in front of the last 4 words it removes the leading Lorem from our string.

How to read n-th line from a text file in bash?

Say I have a text file called "demo.txt" who looks like this:
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Now I want to read a certain line, say line 2, with a command which will look something like this:
Line2 = read 2 "demo.txt"
So when I'll print it:
echo "$Line2"
I'll get:
5 6 7 8
I know how to use 'sed' command in order to print a n-th line from a file, but not how to read it. I also know the 'read' command but dont know how to use it in order a certain line.
Thanks in advance for the help.
Using head and tail
$ head -2 inputFile | tail -1
5 6 7 8
OR
a generalized version
$ line=2
$ head -"$line" input | tail -1
5 6 7 8
Using sed
$ sed -n '2 p' input
5 6 7 8
$ sed -n "$line p" input
5 6 7 8
What it does?
-n suppresses normal printing of pattern space.
'2 p' specifies the line number, 2 or ($line for more general), p commands to print the current patternspace
input input file
Edit
To get the output to some variable use some command substitution techniques.
$ content=`sed -n "$line p" input`
$ echo $content
5 6 7 8
OR
$ content=$(sed -n "$line p" input)
$ echo $content
5 6 7 8
To obtain the output to a bash array
$ content= ( $(sed -n "$line p" input) )
$ echo ${content[0]}
5
$ echo ${content[1]}
6
Using awk
Perhaps an awk solution might look like
$ awk -v line=$line 'NR==line' input
5 6 7 8
Thanks to Fredrik Pihl for the suggestion.
Perl has convenient support for this, too, and it's actually the most intuitive!
The flip-flop operator can be used with line numbers:
$ printf "0\n1\n2\n3\n4" | perl -ne 'printf if 2 .. 4'
1
2
3
Note that it's 1-based.
You can also mix regular expressions:
$ printf "0\n1\nfoo\n3\n4" | perl -ne 'printf if /foo/ .. -1'
foo
3
4
(-1 refers to the last line)

Cannot get this simple sed command

This sed command is described as follows
Delete the cars that are $10,000 or more. Pipe the output of the sort into a sed to do this, by quitting as soon as we match a regular expression representing 5 (or more) digits at the end of a record (DO NOT use repetition for this):
So far the command is:
$ grep -iv chevy cars | sort -nk 5
I have to add another pipe at the end of that command I think which "quits as soon as we match a regular expression representing 5 or more digits at the end of a record"
I tried things like
$ grep -iv chevy cars | sort -nk 5 | sed "/[0-9][0-9][0-9][0-9][0-9]/ q"
and other variations within the // but nothing works! What is the command which matches a regular expression representing 5 or more digits and quits according to this question?
Nominally, you should add a $ before the second / to match 5 digits at the end of the record. If you omit the $, then any sequence of 5 digits will cause sed to quit, so if there is another number (a VIN, perhaps) before the price, it might match when you didn't intend it to.
grep -iv chevy cars | sort -nk 5 | sed '/[0-9][0-9][0-9][0-9][0-9]$/q'
On the whole, it's safer to use single quotes around the regex, unless you need to substitute a shell variable into it (or unless the regex contains single quotes itself). You can also specify the repetition:
grep -iv chevy cars | sort -nk 5 | sed '/[0-9]\{5,\}$/q'
The \{5,\} part matches 5 or more digits. If for any reason that doesn't work, you might find you're using GNU sed and you need to do something like sed --posix to get it working in the normal mode. Or you might be able to just remove the backslashes. There certainly are options to GNU sed to change the regex mechanism it uses (as there are with GNU grep too).
Another way.
As you don't post a file sample, a did it as a guess.
Here I'm looking for lines with the word "chevy" where the field 5 is less than 10000.
awk '/chevy/ {if ( $5 < 10000 ) print $0} ' cars
I forgot the flag -i from grep ... so the correct is:
awk 'BEGIN{IGNORECASE=1} /chevy/ {if ( $5 < 10000 ) print $0} ' cars
$ cat > cars
Chevy 2 3 4 10000
Chevy 2 3 4 5000
chEvy 2 3 4 1000
CHEVY 2 3 4 10000
CHEVY 2 3 4 2000
Prevy 2 3 4 1000
Prevy 2 3 4 10000
$ awk 'BEGIN{IGNORECASE=1} /chevy/ {if ( $5 < 10000 ) print $0} ' cars
Chevy 2 3 4 5000
chEvy 2 3 4 1000
CHEVY 2 3 4 2000
grep -iv chevy cars | sort -nk 5 | sed '/[0-9][0-9][0-9][0-9][0-9]$/d'

Resources