How does sort -k 1,2 work? - linux

Can someone explain what sort -k 1,1 and sort -k 1,2 does?
$ echo -e "9 3 5\n8 2 6\n7 4 1\n"
9 3 5
8 2 6
7 4 1
$ echo -e "9 3 5\n8 2 6\n7 4 1\n" | sort -k 2 -t " " -i
8 2 6
9 3 5
7 4 1
$ echo -e "9 3 5\n8 2 6\n7 4 1\n" | sort -k 1,1 -t " " -i
7 4 1
8 2 6
9 3 5
$ echo -e "9 3 5\n8 2 6\n7 4 1\n" | sort -k 1,2 -t " " -i
7 4 1
8 2 6
9 3 5

Quoting from man sort:
-k, --key=POS1[,POS2]
start a key at POS1 (origin 1), end it at POS2 (default end of
line). See POS syntax below
So:
-k 2
would start at key 2 until the end of the line.
-k 1,1
would start at key 1 and end at key 1. Likewise for -k 1,2.
Your sample input doesn't show the difference, but if you were to modify it slightly then it might be more clear:
$ echo -e "9 3 5\n9 2 6\n7 4 1" | sort -k1,1 -t' '
7 4 1
9 2 6
9 3 5
$ echo -e "9 3 5\n9 2 6\n7 4 1" | sort -k1,2 -t' '
7 4 1
9 2 6
9 3 5
$ echo -e "9 3 5\n9 2 6\n7 4 1" | sort -k1,1 -t' ' -s
7 4 1
9 3 5
9 2 6
Particularly observe case 1 and 3. The output in case 1 was affected even when the sort was to be applied to key 1. Use the -s option in order to stabilize the sort:
-s, --stable
stabilize sort by disabling last-resort comparison

Note the --debug option to GNU sort available since version 8.6 (2010-10-15)
$ echo -e "9 3 5\n8 2 6\n7 4 1" | sort --debug -k 2 -t " " -i
sort: using `en_US.utf8' sorting rules
8 2 6
___
_____
9 3 5
___
_____
7 4 1
___
_____
$ echo -e "9 3 5\n8 2 6\n7 4 1" | sort --debug -k 1,1 -t " " -i
sort: using `en_US.utf8' sorting rules
7 4 1
_
_____
8 2 6
_
_____
9 3 5
_
_____
$ echo -e "9 3 5\n8 2 6\n7 4 1" | sort --debug -k 1,2 -t " " -i
sort: using `en_US.utf8' sorting rules
7 4 1
___
_____
8 2 6
___
_____
9 3 5
___
_____
Note the last _ in each line showing a second comparison used on the whole line is the sort of last resort and can be suppressed using the -s option

Related

Put variable into sed command

I want to replace the 2 in "2p" with the i variable value each time on the for loop.
count1=0
for i in 2 3 4 5 6 7 8 9 10 11
do
var=`cat participantes | cut -d '-' -f4 | sed -n 2p`
echo $var
count1=`expr $count1 + $var`
done
echo $count1

Negative arguments to head

I was trying using head command, in macOS using zsh, code below,
a.txt:
1
2
3
4
5
6
7
8
9
10
tail -n +5 a.txt // line 5 to line end
tail -n -5 a.txt // last line 5 to line end
head -n +5 a.txt // line 1 to line 5
head -n -5 a.txt // # What did this do?
The last command shows an error.
head: illegal line count -- -5
What did head -n -5 actually do?
Some implementations of head like GNU head support negative arguments for -n
But that's not standard! Your case is clearly not supported.
When supported The negative argument should remove the last 5 lines before doing the head
It becomes more clear, if using 3 instead of 5. Note the signs!
# print 10 lines:
seq 10
1
2
3
4
5
6
7
8
9
10
#-------------------------
# get the last 3 lines:
seq 10 | tail -n 3
8
9
10
#--------------------------------------
# start at line 3 (skip first 2 lines)
seq 10 | tail -n +3
3
4
5
6
7
8
9
10
#-------------------------
# get the first 3 lines:
seq 10 | head -n 3
1
2
3
#-------------------------
# skip the last 3 lines:
seq 10 | head -n -3
1
2
3
4
5
6
7
btw, man tail and man head explain this behavior.

How to sort a group of data in a columnwise manner?

I have a group of data like the attached raw data, when I sort the raw data by sort -n , the data were sorted line by line, the output looks like this:
3 6 9 22
2 3 4 5
1 7 16 20
I want to sort the data in a columnwise manner, the output would look like this:
1 2 4 3
3 6 9 16
5 7 20 22
Ok, I did try something.
My primary ideal is to extract the data columnwise and then sort and then paste them, but I can't get through. Here is my script:
for ((i=1; i<=4; i=i+1))
do
awk '{print $i}' file | sort -n >>output
done
The output:
1 7 20 16
3 6 9 22
5 2 4 3
1 7 20 16
3 6 9 22
5 2 4 3
1 7 20 16
3 6 9 22
5 2 4 3
1 7 20 16
3 6 9 22
5 2 4 3
It seems that $i is unchangeable and equals to $0
Thanks a lot.
raw data1
3 6 9 22
5 2 4 3
1 7 20 16
raw data2
488.000000 1236.000000 984.000000 2388.000000 788.000000 704.000000
600.000000 1348.000000 872.000000 2500.000000 900.000000 816.000000
232.000000 516.000000 1704.000000 1668.000000 68.000000 16.000000
244.000000 504.000000 1716.000000 1656.000000 56.000000 28.000000
2340.000000 3088.000000 868.000000 4240.000000 2640.000000 2556.000000
2588.000000 3336.000000 1116.000000 4488.000000 2888.000000 2804.000000
Let me introduce a flexible solution using cut and sort that you can use on any M,N size tab delimited input matrix.
$ cat -vTE data_to_sort.in
3^I6^I9^I22$
5^I2^I4^I3$
1^I7^I20^I16$
$ col=4; line=3;
$ for i in $(seq ${col}); do cut -f$i data_to_sort.in |\
> sort -n; done | paste $(for i in $(seq ${line}); do echo -n "- "; done) |\
> datamash transpose
1 2 4 3
3 6 9 16
5 7 20 22
If the input file is not \t delimited you need to define proper delimiter to using -d"$DELIM_CHAR" have the cut working properly.
for i in $(seq ${col}); do cut -f$i data_to_sort.in | sort -n; done will separate each column of the file and sort it
paste $(for i in $(seq ${line}); do echo -n "- "; done) the paste column will then recreate a matrix structure
datamash transpose is needed to transpose the intermediate matrix
Thanks to the feedback from Sundeep, let me introduce to you a better solution using pr instead of paste command to generate the columns:
$ col=4; line=3
$ for i in $(seq ${col}); do cut -f$i data_to_sort.in |\
> sort -n; done | pr -${line}ats | datamash transpose
Last but not least,
$ col=4; for i in $(seq ${col}); do cut -f$i data_to_sort.in |\
> sort -n; done | pr -${col}ts
1 2 4 3
3 6 9 16
5 7 20 22
The following solution will allow us to not use datamash at all!!!
(many thanks to Sundeep)
Proof that is working for the skeptics and the downvoters...
2nd run with 6 columns:
$ col=6; for i in $(seq ${col}); do cut -f$i <(sed 's/^ \+//g;s/ \+/\t/g' data2) | sort -n; done | pr -${col}ts | tr '\t' ' '
232.000000 504.000000 868.000000 1656.000000 56.000000 16.000000
244.000000 516.000000 872.000000 1668.000000 68.000000 28.000000
488.000000 1236.000000 984.000000 2388.000000 788.000000 704.000000
600.000000 1348.000000 1116.000000 2500.000000 900.000000 816.000000
2340.000000 3088.000000 1704.000000 4240.000000 2640.000000 2556.000000
2588.000000 3336.000000 1716.000000 4488.000000 2888.000000 2804.000000
awk to the rescue!!
awk '{f1[NR]=$1; f2[NR]=$2; f3[NR]=$3; f4[NR]=$4}
END{asort(f1); asort(f2); asort(f3); asort(f4);
for(i=1;i<=NR;i++) print f1[i],f2[i],f3[i],f4[i]}' file
1 2 4 3
3 6 9 16
5 7 20 22
there may a smarter way of doing this as well...

Sum each element of a row from two files

I want to write shell script in which each row's column element from file1 and file2 are added.
file1:
A 10 12 13 14
B 2 5 6 10
C 1
file2:
A 11 13 11 15
B 3 1 1 1
C 2
output:
A 22 25 24 29
B 5 6 7 11
C 3
I have tried to write this, but it seems very chaotic.
So I'd like to get some help to make it better!
awk '{getline v < "file1"; split( v, a );
for (i = 2; i <= NF; i++)
{print a[1], a[i]+ $i}
}' file2 > temp
awk '{a[$1]=a[$1]" "$2}
END{for(i in a)print i,a[i]
}' temp > out
file1
A 10 12 13 14
B 2 5 6 10
C 1
file2
A 11 13 11 15
B 3 1 1 1
C 2
Programme
cat file1 file2 | cut -d" " -f1 | sort -u | while read i
do
line1="`grep ^$i file1 | sed -e "s/ */ /g" | cut -d" " -f2-` "
line2="`grep ^$i file2 | sed -e "s/ */ /g" | cut -d" " -f2-` "
(
echo $i
while [ "${line1}${line2}" != "" ]
do
v1=0`echo "$line1" | cut -d" " -f1`
v2=0`echo "$line2" | cut -d" " -f1`
line1="`echo "$line1" | cut -d" " -f2-`"
line2="`echo "$line2" | cut -d" " -f2-`"
echo `expr $v1 + $v2`
done
) | xargs
done > file3
file3
A 21 25 24 29
B 5 6 7 11
C 3
This solution remains valid if the number of column or lines are not identical, the missing values are considered 0
#file1
A 10 12 13 14
B 2 5 6
C 1 10
D 1 1
#file2
A 11 13 11 15
B 3 1 1 1 5
C 2
F 3 3
#file3
A 21 25 24 29
B 5 6 7 1 5
C 3 10
D 1 1
F 3 3

How can I separate some repeated patterns in a row into multiple rows using bash script?

I have some problem with bash script.
I've got a string which has some repeated patterns like this.
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 ...
Each fields is separated by tab key.
I want it to look like this...
1 2 3 4
1 2 3 4
1 2 3 4
…
How can I solve this problem using bash script like cut, sed, awk ... ?
I've tried some command like cut -f 'seq 4, 4, 40' example.txt
It doesn't work...
It looks very easy but so difficult to me...
You can use sed like this:
s='1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4'
p='1 2 3 4'
echo "$s"|sed "s/$p\s*/&\n/g"
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
Live Demo: http://ideone.com/P59OCJ
Here's a pure bash solution:
IFS=$'\t' set -- $(<input_file)
seen=()
while [[ $1 ]]; do
if (( ${seen[$1]} )); then # If we've seen the value before, start a new line.
echo
unset seen
fi
printf '%s ' "$1"
seen[$1]=1
shift
done
If you know the ending number of your sequence beforehand, you can do something like:
LAST_NUMBER=4
sed -e "s/$LAST_NUMBER\t*/&\n/g" < example.txt
Just replace 4 with the last number from the sequence
If you don't know the number, you have to search through it using the following:
#!/bin/bash
declare -A CHECKED_NUMBERS
LAST_NUMBER=
while read LINE; do
SPLIT_LINE=$(cut -d" " -f1- <<< "$LINE")
for number in $SPLIT_LINE; do
if [ "${CHECKED_NUMBERS[$number]}" == "1" ]; then
LAST_NUMBER=$number
else
CHECKED_NUMBERS[$number]=1
fi
done
done < example.txt
# do the replacement
sed -e "s/$LAST_NUMBER\t*/&\n/g" < example.txt
An awk version
awk '{for (i=1;i<=NF;i++) {printf "%s"(i%4?" ":"\n"),$i}}' file
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
An gnu awk version
awk -v RS="\t" '{printf "%s"(NR%4?" ":"\n"),$0}' file
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
xargs may help:
kent$ echo "1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4"|xargs -n4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
This might work for you:
printf "%s\t%s\t%s\t%s\n" $string
or you want the fields space separated:
printf "%s %s %s %s\n" $string

Resources