Sort range Linux - linux

everyone. I have some questions about sorting in bash. I am working with Ubuntu 14.04 .
The first question is: why if I have file some.txt with this content:
b 8
b 9
a 8
a 9
And when I type this :
sort -n -k 2 some.txt
the result will be:
a 8
b 8
a 9
b 9
which means that the file is sorted first to the second field and after that to the first field, but I thought that is will stay stable i.e.
b 8
a 8
...
...
Maybe if two rows are equal it is applied lexicographical sort or what ?
The second question is: why the following doesn`t working:
sort -n -k 1,2 try.txt
The file try.txt is like this:
8 2
8 11
8 0
8 5
9 2
9 0
The third question is not actally for sorting, but it appears when I try to do this:
sort blank.txt > blank.txt
After this the blank.txt file is empty. Why is that ?

Apparently GNU sort is not stable by default: add the -s option
Finally, as a last resort when all keys compare equal, sort compares entire lines as if no ordering options other than --reverse (-r) were specified. The --stable (-s) option disables this last-resort comparison so that lines in which all fields compare equal are left in their original relative order.
(https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html)
There's no way to answer your question if you don't show the text file
Redirections are handled by the shell before handing off control to the program. The > redirection will truncate the file if it exists. After that, you are giving an empty file to sort
for #2, you don't actually explain what's not working. Expanding your sample data, this happens
$ cat try.txt
8 2
8 11
9 2
9 0
11 11
11 2
$ cat try.txt
8 2
8 11
9 2
9 0
11 11
11 2
I assume you want to know why the 2nd column is not sorted numerically. Let's go back to the sed manual:
‘-n’
‘--numeric-sort’
‘--sort=numeric’
Sort numerically. The number begins each line and consists of ...
Looks like using -n only sorts the first column numerically. After some trial and error, I found this combination that sorts each column numerically:
$ sort -k1,1n -k2,2n try.txt
8 2
8 11
9 0
9 2
11 2
11 11

Related

Merge one-line texts into a data-frame with basic ubuntu shell commands

I have let's say two files Input1.txt and Input2.txt. Each of them is a text file containing a single line of 5 numbers separated by a tab.
For instance Input1.txt is
1 2 3 4 5
and Input2.txt is
6 7 8 9 10
The output that I desire is Output.txt :
Input1 1 2 3 4 5
Input2 6 7 8 9 10
So I want to merge the files in a table with an extra first column containing the names of the original files. Obviously I have more than 2 files (actually 1000) and I would like to make it with a for loop. You can assume that all my files are named as Input*.txt with * between 1 and 1000 and that they are all in the same directory.
I know how to do it with R, but I would like to make it with a basic line of commands in the ubuntu shell. Is it feasible ? Thanks for any help.
Assuming the line in Input1.txt, Input2.txt, etc. is terminated with a newline character, you can use
for i in Input*.txt
do
printf "%s " "$i"
cat "$i"
done > Output.txt
The result is
Input1.txt 1 2 3 4 5
Input2.txt 6 7 8 9 10
If you want to get Input1 etc. without .txt you can use
printf "%s " "${i%.txt}"

How to find average and maximum in an interval using Shell [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I would like to extract sum, mean and average in each 6 numbers interval from a column.
I found many discussions related to this problem, but all those are for whole column. e.g.
To compute sum of a column:
awk '{sum+=$1} END { print sum}'
To calculate Average:
awk '{sum+=$1} END { print sum/NR}'
To find maximum or minimum, sort command can be used.
I need all these in an interval. e.g., my input file is
inputfile.txt
1 3
2 5
3 4
4 3
5 2
6 1
7 3
8 3
9 4
10 2
11 2
12 2
13 5
14 4
15 2
16 3
17 7
18 3
Output files are
sum.txt
1 18
2 16
3 24
average.txt
1 3
2 2.67
3 4
maximum.txt
1 5
2 4
3 7
Please make a search before you ask any question many posts are already there
You can try something like below, modify accordingly
Input
[akshay#localhost tmp]$ cat input.txt
1 3
2 5
3 4
4 3
5 2
6 1
7 3
8 3
9 4
10 2
11 2
12 2
13 5
14 4
15 2
16 3
17 7
18 3
Script
[akshay#localhost tmp]$ cat test.awk
{
sum += $2
max = max > $2 ? max : $2
}
!(FNR%6){
print ++c,sum > "sum.txt"
print c,sum/6 > "average.txt"
print c,max > "maximum.txt"
sum = max = ""
}
Output
[akshay#localhost tmp]$ awk -f test.awk input.txt
Sum
[akshay#localhost tmp]$ cat sum.txt
1 18
2 16
3 24
Average
[akshay#localhost tmp]$ cat average.txt
1 3
2 2.66667
3 4
Maximum
[akshay#localhost tmp]$ cat maximum.txt
1 5
2 4
3 7
This side isn't meant to write whole programs, and that's basically what you're asking us to do.
What you need to do is keep track of how many lines you've read and then every 6th line you produce output. Consider something like this:
awk '{sum += $1} (NR%6)==0 {print(sum); sum=0}' input.txt
I'm not going to explain what I did, because I expect you to please search the internet for awk tutorials, and get an understanding of what I am doing yourself.

Count occurrence of numbers in linux

I have a .txt file with 25,000 lines. Each line there is a number from 1 to 20. I want to compute the total occurrence of each number in the file. I don't know should I use grep or awk and how to use it. And I'm worried about I got confused with 1 and 11, which both contain 1's. Thank you very much for helping!
I was trying but this would double count my numbers.
grep -o '1' degreeDistirbution.txt | wc -l
With grep you can match the beginning and end of a line with '^' and '$' respectively. For the whole thing I'll use an array, but to illustrate this point I'll just use one variable:
one="$(grep -c "^1$" ./$inputfile)"
then we put that together with the magic of bash loops and loop through all the numbers with a while like so:
i=1
while [[ $i -le 20 ]]
do
arr[i]="$(grep -c "^$i$" ./$inputfile)"
i=$[$i+1]
done
if you like you can of course use a for as well
An easier method is:
sort -n file | uniq -c
Which will count the occurrences of each number in the sorted file and display the results like:
$ sort -n dat/twenty.txt | uniq -c
3 1
3 2
3 3
4 4
4 5
4 6
4 7
4 8
4 9
4 10
4 11
3 12
2 13
2 14
4 15
4 16
4 17
2 18
2 19
2 20
Showing I have 3 ones, 3 twos, etc.. in the sample file.

Preserve original order if numeric value is equal in coreutils sort?

Consider this snippet:
echo '7 a
3 c
3 b
2 first
2 second
2 third
2 fourth
2 fifth
9 d
2 sixth
' | sort -n -k 1
It gives an output of:
2 fifth
2 first
2 fourth
2 second
2 sixth
2 third
3 b
3 c
7 a
9 d
While the list is correctly ordered numerically keyed by first character, also for those values which are contiguous and equal, the original order has been shuffled. I would like to obtain:
2 first
2 second
2 third
2 fourth
2 fifth
2 sixth
3 c
3 b
7 a
9 d
Is this possible to do with sort? If not, what would be the easiest way to achieve this kind of sorting using shell tools?
Just add the -s (stable sort) flag, this disables last-resort comparison
echo '7 a
3 c
3 b
2 first
2 second
2 third
2 fourth
2 fifth
9 d
2 sixth
' | sort -k 1,1n -s
2 first
2 second
2 third
2 fourth
2 fifth
2 sixth
3 c
3 b
7 a
9 d
Add line numbers with nl, pipe to sort -k2,1 to use the line numbers as the secondary key, then cut the numbers off with cut. Or use sort -s. :p
You're looking for a "stable" sort. Try the sort -s option (or better yet, check the man page on your system).

sort multiple column file

I have a file a.dat as following.
1 0.246102 21 1 0.0408359 0.00357267
2 0.234548 21 2 0.0401056 0.00264361
3 0.295771 21 3 0.0388905 0.00305116
4 0.190543 21 4 0.0371858 0.00427217
5 0.160047 21 5 0.0349674 0.00713894
I want to sort the file according to values in second column. i.e. output should look like
5 0.160047 21 5 0.0349674 0.00713894
4 0.190543 21 4 0.0371858 0.00427217
2 0.234548 21 2 0.0401056 0.00264361
1 0.246102 21 1 0.0408359 0.00357267
3 0.295771 21 3 0.0388905 0.00305116
How can do this with command line?. I read that sort command can be used for this purpose. But I could not figure out how to use sort command for this.
Use sort -k to indicate the column you want to use:
$ sort -k2 file
5 0.160047 21 5 0.0349674 0.00713894
4 0.190543 21 4 0.0371858 0.00427217
2 0.234548 21 2 0.0401056 0.00264361
1 0.246102 21 1 0.0408359 0.00357267
3 0.295771 21 3 0.0388905 0.00305116
This makes it in this case.
For future references, note (as indicated by 1_CR) that you can also indicate the range of columns to be used with sort -k2,2 (just use column 2) or sort -k2,5 (from 2 to 5), etc.
Note that you need to specify the start and end fields for sorting (2 and 2 in this case), and if you need numeric sorting, add n.
sort -k2,2n file.txt

Resources