linux sort inside a column - linux

I only want to sort a file by the second character in the second column by the number order.
the sample file like this:
aa 19
aa 189
aa 167
ab 13
nd 23
at 32
ca 90
I expect the result like
ca 90
at 32
ab 13
nd 23
aa 167
aa 189
aa 19
I use the command sort -n -k 2.2,2.2 [filename].
But it shows me the result like this:
aa 167
aa 189
aa 19
ab 13
nd 23
at 32
ca 90
It is not the right answer. Does anybody know what's wrong with my command?

The problem is that you didn't specify the correct column delimiter, and sort assumes it's a tab instead of a space.
sort -t ' ' -nk 2.2
works just fine.
Edit: in my man page it says that any whitespace is counted as delimiter by default, but the fact is that adding -t ' ' solves it.

sort -t ' ' -k2.2,2.2 filename

Related

How to put tables side by side in linux

I have the following code to merge several tables in Linux based on the first column. But I'm looking for codes to put several tables side by side not based on column or row. I need to have all columns and rows in each table side by side or beneath each other setting.
For example, if I have these three tables:
AA
BB
CC
25
40
20
13
36
19
DD
EE
16
35
17
30
FF
GG
15
35
17
38
So I would want this resulting table:
AA
BB
CC
DD
EE
FF
GG
25
40
20
16
35
15
35
13
36
19
17
30
17
38
I appreciate it if you can help me.
LANG=en_EN sort AFGEN_2018.txt | sed 's/ */\t/g' | cut -f 1 > tmp.tmp
for f in `ls results/*.txt`
do
join tmp.tmp $f > tmpf
mv tmpf tmp.tmp
done
mv tmp.tmp GSN_ALL.txt
cat GSN_ALL.txt
done
I found this code helpful in putting tables below each other (where *.txt is table 1, 2 and 3).
cat *.txt >Table.txt

Datamash: Transposing the column into rows based on group in bash

I have a tab delim file with a 2 columns like following
A 123
A 23
A 45
A 67
B 88
B 72
B 50
B 23
C 12
C 14
I want to transpose with the above data based on the first column like following
A 123 23 45 67
B 88 72 50 23
C 12 14
I tried the datamash transpose < input-file.txt but it didnt yield the output as expected.
One awk version:
awk '{printf ($1!=f?"\n%s":" "$2),$0;f=$1}' file
A 123 23 45 67
B 88 72 50 23
C 12 14
With this version, you get on blank line, but should be fast and handle large data since no loop or array variable are used.
$1!=f?"\n%s":" "$2),$0 If first field is not equal f, print new line and all fields
if $1 = f, only print field 2.
f=$1 set f to first field
datamash --group=1 --field-separator=' ' collapse 2 <file | tr ',' ' '
Output:
A 123 23 45 67
B 88 72 50 23
C 12 14
Input must be sorted, as in the question.
This might work for you (GNU sed):
sed -E ':a;N;s/^((\S+)\s+.*)\n\2/\1/;ta;P;D' file
Append the next line and if the first field of the first line is the same as the first field of the second line, remove the newline and the first field of the second line. Print the first line in the pattern space and then delete it and the following newline and repeat.

awk split adds whole string to array position 1 (reason unknown)

So I have a .txt file that looks like this:
mona 70 77 85 77
john 85 92 78 80
andreja 89 90 85 94
jasper 84 64 81 66
george 54 77 82 73
ellis 90 93 89 88
I have created a grades.awk script that contains the following code:
{
FS=" "
names=$1
vi1=$2
vi2=$3
vi3=$4
rv=$5
#printf("%s ",names);
split(names,nameArray," ");
printf("%s\t",nameArray[1]); //prints the whole array of names for some reason, instead of just the name at position 1 in array ("john")
}
So my question is, how do I split this correctly? Am I doing something wrong?
How do you read line by line, word by word correctly. I need to add each column into its own array. I've been searching for the answer for quite some time now and can't fix my problem.
here is a template to calculate average grades per student
$ awk '{sum=0; for(i=2;i<=NF;i++) sum+=$i;
printf "%s\t%5.2f\n", $1, sum/(NF-1)}' file
mona 77.25
john 83.75
andreja 89.50
jasper 73.75
george 71.50
ellis 90.00
printf("%s\t",nameArray[1])
is doing exactly what you want it to do but you aren't printing any newline between invocations so it's getting called once per input line and outputting one word at a time but since you aren't outputting any newlines between words you just get 1 line of output. Change it to:
printf("%s\n",nameArray[1])
There are a few other issues with your code of course (e.g. you're setting FS in the wrong place and unnecessarily, names only every contains 1 word so splitting it into an array doesn't make sense, etc.) but I think that's what you were asking about specifically.
If that's not all you want then edit your question to clarify what you're trying to do and add concise, testable sample input and expected output.

How to identify lines ending with 5 in a file

I have a file test.lst whose contents are like below.
Using CYGWIN_NT-6.1-WOW64.
I need to select only those lines which do not end with 5.
12
23
45
56
45
23
09
12
99
100
0000
9999999
The output should be:
12
23
56
23
09
12
99
100
0000
9999999
with grep -v '5$' test.txt, I am getting below:
[2014-11-28 17:42.57] /drives/d/Shantanu/MyScript
[463615.PC172645] ➤ grep -v '5$' test.txt
12
23
45
56
45
23
09
12
99
100
0000
9999999
[2014-11-28 17:43.21]
Just grep out them:
grep -v '5$' file
This looks for lines ending with 5 ($ refers to the end of line). Then -v inverts the match.
For your input it returns:
12
23
56
23
09
12
99
100
0000
9999999
You could use inverse grep search or inverse search using sed like below:
sed -n '/5$/!p' test.txt
or
grep -v "5$" test.txt
You could use a negated character class.
grep '[^5]$' file
[^5]$ matches the last character at the last which wouldn't be a number 5. By default grep would print all the lines which has a match.

merging two files based on two columns

I have a question very similar to a previous post:
Merging two files by a single column in unix
but i want to merge my data based on two columns (The orders are the same, so no need to sort).
Example,
subjectid subID2 name age
12 121 Jane 16
24 241 Kristen 90
15 151 Clarke 78
23 231 Joann 31
subjectid subID2 prob_disease
12 121 0.009
24 241 0.738
15 151 0.392
23 231 1.2E-5
And the output to look like
subjectid SubID2 prob_disease name age
12 121 0.009 Jane 16
24 241 0.738 Kristen 90
15 151 0.392 Clarke 78
23 231 1.2E-5 Joanna 31
when i use join it only considers the first column(subjectid) and repeats the SubID2 column.
Is there a way of doing this with join or some other way please? Thank you
join command doesn't have an option to scan more than one field as a joining criteria. Hence, you will have to add some intelligence into the mix. Assuming your files has a FIXED number of fields on each line, you can use something like this:
join f1 f2 | awk '{print $1" "$2" "$3" "$4" "$6}'
provided the the field counts are as given in your examples. Otherwise, you need to adjust the scope of print in the awk command, by adding or taking away some fields.
If the orders are identical, you could still merge by a single column and specify the format of which columns to output, like:
join -o '1.1 1.2 2.3 1.3 1.4' file_a file_b
as described in join(1).

Resources