How do I find the number of elements in a column greater than a given number in Linux? - linux

I have a text file of list of students with Marks and I want to find how many of them secured more than 80 in Maths, Physics and then Maths and Physics combined. What should be the Linux command to do this?
The text file is here:
#name maths phy
Manila 78 29
Shikhar 49 78
Vandana 65 87
Priyansh 75 22
Bina 52 69
Chitransh 98 93
William 88 73
Kaushal 38 85
Dilruba 65 94
Lalremruata 34 45
Qasim 58 62
Nitya 81 89
Jennita 96 91
Shobha 71 63
Talim 77 88

This can be achieved using awk (don't use grep because this is not fit for number arithmetic). An example:
cat test.txt | awk '{if ($2>80 || $3>80) print $1 " " $2 " " $3}'
This needs to be improved: how to remove the cat command, how to check the sum of both columns, why is the title present, ...? But at least you have something to start.

Try this, and adapt it to your taste:
$ awk '/^[^#]/{
limit = 80;
comb = $2 + $3;
if ($2 > limit && $3 > limit)
print $1, $2, $3, "both";
else if ($2 > limit)
print $1, $2, $3, "maths";
else if ($3 > limit)
print $1, $2, $3, "physics";
else if (comb > limit)
print $1, $2, $3, "combined";
}' <<EOF
#name maths phy
Manila 78 29
Shikhar 49 78
Vandana 65 87
Priyansh 75 22
Bina 52 69
Chitransh 98 93
William 88 73
Kaushal 38 85
Dilruba 65 94
Lalremruata 34 45
Qasim 58 62
Nitya 81 89
Jennita 96 91
Shobha 71 63
Talim 77 88
EOF
which produces the following:
Manila 78 29 combined
Shikhar 49 78 combined
Vandana 65 87 physics
Priyansh 75 22 combined
Bina 52 69 combined
Chitransh 98 93 both
William 88 73 maths
Kaushal 38 85 physics
Dilruba 65 94 physics
Qasim 58 62 combined
Nitya 81 89 both
Jennita 96 91 both
Shobha 71 63 combined
Talim 77 88 physics
If you want it to be read from a file, then use it as:
$ awk '/^[^#]/{
limit = 80;
comb = $2 + $3;
if ($2 > limit && $3 > limit)
print $1, $2, $3, "both";
else if ($2 > limit)
print $1, $2, $3, "maths";
else if ($3 > limit)
print $1, $2, $3, "physics";
else if (comb > limit)
print $1, $2, $3, "combined";
}' marks.txt
the awk script
will read all lines that have not a # in the first column, what will allow you to introduce comments, as the header with the course name.
will permit you to configure easily a trigger level, as the limit variable is initialized in one place and used as a constant.
you can adapt the criterion to what you want to get, even the students that don't pass any of the two courses.
Note:
If you want to do variable substitution inside the awk script, beware that awk uses $<n> notation to refer to input words, which is also used and expanded by the shell, so you you need to do shell variable expansion, the best approach is to close the single quotes and open double quotes on the variable to be expanded only, so you don't get confused which variable will be expanded by the shell and which by awk. Example:
$ export TRIGGER=95
$ awk '/^[^#]/{
limit = '"$TRIGGER"';
comb = $2 + $3;
if ($2 > limit && $3 > limit)
print $1, $2, $3, "both";
else if ($2 > limit)
print $1, $2, $3, "maths";
else if ($3 > limit)
print $1, $2, $3, "physics";
else if (comb > limit)
print $1, $2, $3, "combined";
}'

Related

Use printf to format list that is uneven

I have a small list of student grades, I need to format it them side by side depending on the gender of the student. So one column is Male the other Female. The problem is the list doesn't go male female male female, it is uneven.
I've tried using printf to format the output so the 2 columns are side by side, but the format is ruined because of the uneven list.
Name Gender Mark1 Mark2 Mark3
AA M 20 15 35
BB F 22 17 44
CC F 19 14 25
DD M 15 20 42
EE F 18 22 30
FF M 0 20 45
This is the list I am talking about ^^
awk 'BEGIN {print "Male" " Female"} {if (NR!=1) {if ($2 == "M") {printf "%-s %-s %-s", $3, $4, $5} else if ($2 == "F") {printf "%s %s %s\n", $3, $4 ,$5}}}' text.txt
So I'm getting results like
Male Female
20 15 35 22 17 44
19 14 25
15 20 42 18 22 30
0 20 45
But I want it like this:
Male Female
20 15 35 22 17 44
15 20 42 19 14 25
0 20 45 18 22 30
I haven't added separators yet I'm just trying to figure this out, not sure if it would be better to put the marks into 2 arrays depending on gender then printing them out.
another solution tries to address if M/F is not unity
$ awk 'NR==1 {print "Male\tFemale"}
NR>1 {k=$2;$1=$2="";sub(/ +/,"");
if(k=="M") m[++mc]=$0; else f[++fc]=$0}
END {max=mc>fc?mc:fc;
for(i=1;i<=max;i++) print (m[i]?m[i]:"-") "\t" (f[i]?f[i]:"-")}' file |
column -ts$'\t'
Male Female
20 15 35 22 17 44
15 20 42 19 14 25
0 20 45 18 22 30
Something like this?
awk 'BEGIN{format="%2s %2s %2s %2s\n";printf("Male Female\n"); }NR>1{if (s) { if ($2=="F") {printf(format, s, $3, $4, $5);} else {printf(format, $3,$4,$5,s);} s=""} else {s=sprintf("%2s %2s %2s", $3, $4, $5)}}' file
Another approach using awk
awk '
BEGIN {
print "Male\t\tFemale"
}
NR > 1 {
I = ++G[$2]
A[$2 FS I] = sprintf("%2d %2d %2d", $(NF-2), $(NF-1), $NF)
}
END {
M = ( G["M"] > G["F"] ? G["M"] : G["F"] )
for ( i = 1; i <= M; i++ )
print A["M" FS i] ? A["M" FS i] : OFS, A["F" FS i] ? A["F" FS i] : OFS
}
' OFS='\t' file
This might work for you (GNU sed):
sed -Ee '1c\Male Female' -e 'N;s/^.. M (.*)\n.. F(.*)/\1\2/;s/^.. F(.*)\n.. M (.*)/\2\1/' file
Change the header line. Then compare a pair of lines and re-arrange them as appropriate.

Compare two files having different column numbers and print the requirement to a new file if condition satisfies

I have two files with more than 10000 rows:
File1 has 1 col File2 has 4 col
23 23 88 90 0
34 43 74 58 5
43 54 87 52 3
54 73 52 35 4
. .
. .
I want to compare each value in file-1 with that in file-2. If exists then print the value along with other three values in file-2. In this example output will be:
23 88 90 0
43 74 58 5
54 87 52 3
.
.
I have written following script, but it is taking too much time to execute.
s1=1; s2=$(wc -l < File1.txt)
while [ $s1 -le $s2 ]
do n=$(awk 'NR=="$s1" {print $1}' File1.txt)
p1=1; p2=$(wc -l < File2.txt)
while [ $p1 -le $p2 ]
do awk '{if ($1==$n) printf ("%s %s %s %s\n", $1, $2, $3, $4);}'> ofile.txt
(( p1++ ))
done
(( s1++ ))
done
Is there any short/ easy way to do it?
You can do it very shortly using awk as
awk 'FNR==NR{found[$1]++; next} $1 in found'
Test
>>> cat file1
23
34
43
54
>>> cat file2
23 88 90 0
43 74 58 5
54 87 52 3
73 52 35 4
>>> awk 'FNR==NR{found[$1]++; next} $1 in found' file1 file2
23 88 90 0
43 74 58 5
54 87 52 3
What it does?
FNR==NR Checks if FNR file number of record is equal to NR total number of records. This will be same only for the first file, file1 because FNR is reset to 1 when awk reads a new file.
{found[$1]++; next} If the check is true then creates an associative array indexed by $1, the first column in file1
$1 in found This check is only done for the second file, file2. If column 1 value, $1 is and index in associative array found then it prints the entire line ( which is not written because it is the default action)

Cut a file between two lines numbers using awk

Say I have a file with 100 lines (not including header). I want to cut that file down, only keeping the content between line 51 and 70 (inclusive), as well as the header so that the resulting file is 20+1 lines.
So far, I have this code:
awk 'NR==1 {h=$0; next} (NR-1)>50 && (NR-1)<71 {filename = "file20.csv"; print h >> filename} {print >> filename}' file100.csv
But it's giving me an error:
fatal: expression for `>>' redirection has null string value
Can somebody help me understand where my syntax is wrong?
You can directly use:
awk 'NR==1 || (NR>=51 && NR<=70)'
Note that this evaluates the condition of NR. In case it is true, it performs awk's default action: {print $0}. Hence, you do not have to explicit it.
Then you can redirect to another file:
awk 'NR==1 || (NR>=51 && NR<=70)' file > new_file
Test
$ seq 100 | awk 'NR==1 || (NR>=51 && NR<=70)'
1
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
It returns 21 lines:
$ seq 100 | awk 'NR==1 || (NR>=51 && NR<=70)' | wc -l
21

Linux-Change line between two awk scripts

I have two awk scripts to run in Linux. The output of each one is in one line.
How can I separate the two output into two lines?
For example:
awk '{printf $1}' f.txt >> a.txt
awk '{printf $3}' f.txt >> a.txt
The output of the first script is:
35 56 40 28 57
And the second output is:
29 48 73 26
If I run them one after another, the output will become:
35 56 40 28 57 29 48 73 26
Is there any way to get the result to:
35 56 40 28 57
29 48 73 26
Thank you!~
Although I don't understand how you manage to get the spaces between fields the way you do it, you can add an END statement to the first script:
awk '{printf $1} END{print "\n"}'
You can also do this with a single awk command:
awk -v ORS=" " 'BEGIN{ARGV[ARGC++] = ARGV[1]; i = 1 }
NR!=FNR && FNR==1 { printf "\n"; i=3 }
{ print $i }
END { printf "\n" }' f.txt

Gawk print largest value from each column

I am writing a awk script that takes some columns of input in a text file and print out the largest value in each column
Input:
$cat numbers
10 20 30.3 40.5
20 30 45.7 66.1
40 75 107.2 55.6
50 20 30.3 40.5
60 30 45.O 66.1
70 1134.7 50 70
80 75 107.2 55.6
Output:
80 1134.7 107.2 70
Script:
BEGIN {
val=0;
line=1;
}
{
if( $2 > $3 )
{
if( $2 > val )
{
val=$2;
line=$0;
}
}
else
{
if( $3 > val )
{
val=$3;
line=$0;
}
}
}
END{
print line
}
Current output:
60 30 45.O 66.1
What am I doing wrong first awk script
=======SOLUTION======
END {
for (i = 0; ++i <= NF;)
printf "%s", (m[i] (i < NF ? FS : RS))
}
{
for (i = 0; ++i <= NF;)
$i > m[i] && m[i] = $i
}
Thanks for the help
Since you have four columns, you'll need at least four variables, one for each column (or an array if you prefer). And you won't need to hold any line in its entirety. Treat each column independently.
You need to adapt something like the following for your purposes which will find the maximum in a particular column (the second in this case).
awk 'BEGIN {max = 0} {if ($2>max) max=$2} END {print max}' numbers.dat
The approach you are taking with $2 > $3 seems to be comparing two columns with each other.
You can create one user defined function and then pass individual column arrays to it to retrieve the max value. Something like this -
[jaypal:~/Temp] cat numbers
10 20 30.3 40.5
20 30 45.7 66.1
40 75 107.2 55.6
50 20 30.3 40.5
60 30 45.O 66.1
70 1134.7 50.0 70
80 75 107.2 55.6
[jaypal:~/Temp] awk '
function max(x){i=0;for(val in x){if(i<=x[val]){i=x[val];}}return i;}
{a[$1]=$1;b[$2]=$2;c[$3]=$3;d[$4]=$4;next}
END{col1=max(a);col2=max(b);col3=max(c);col4=max(d);print col1,col2,col3,col4}' numbers
80 1134.7 107.2 70
or
awk 'a<$1{a=$1}b<$2{b=$2}c<$3{c=$3}d<$4{d=$4} END{print a,b,c,d}' numbers

Resources