Difference between two files after average using shell script or awk - linux

I have two files. Each has one column with some missing data as 9999, 9000. e.g.
ifile1.txt ifile2.txt
30 20
9999 10
10 40
40 30
10 31
29 9000
9000 9999
9999 9999
31 1250
550 29
I would like to calculate the difference between the averages of the above two files without considering the missing values. i.e.
average (ifile1.txt) - average (ifile2.txt)
I tried like this, but not getting the result.
ave1=$(awk '!/\9999/ && !/\9000/{sum += $1; count++} END {print count ? (sum/count) : count;sum=count=0}' ifile1.txt)
ave2=$(awk '!/\9999/ && !/\9000/{sum += $1; count++} END {print count ? (sum/count) : count;sum=count=0}' ifile2.txt)
result=$(ave1-ave2)
echo $result

awk '!/9000|9999/{a[FILENAME]+=$0;b[FILENAME]++}END{for(i in a)c=c?c-a[i]/b[i]:a[i]/b[i];print c}' file1 file2
Update:
awk '!/9000|9999/{a[ARGIND]+=$0;b[ARGIND]++}END{print a[1]/b[1]-a[2]/b[2]}' file1 file2
or
awk '!/9000|9999/{a[ARGIND]+=$0;b[ARGIND]++}END{for(i=1;i<=ARGIND;i++)c=c?c-a[i]/b[i]:a[i]/b[i];print c}' file1 file2

Your awk will compute the averages but bash won't do floating point arithmetic. You can always use bc though.
$ echo "$ave1 - $ave2" | bc
-101.429
Also for expressions you have to use $(( ... ))

Related

how to write awk code with specific condition

I want to create a code that operates on a certain number of a row of data, for which I just want to count negative numbers to make them positive by multiplying by the number itself negative
example
data
10
11
-12
-13
-14
expected output
10
11
144
169
196
this is what I've been try
awk 'int($0)<0 {$4 = int($0) + 360}
END {print $4}' data.txt
but I don't even get the output, anyone can help me?
awk '$0 < 0 { $0 = $0 * $0 } 1' data.txt
The first condition multiplies the value by itself when it's negative. The condition 1 is always true, so the line is printed unconditionally.
Also:
awk '{print($0<0)?$0*$0:$0}' input
$ awk '{print $0 ^ (/-/ ? 2 : 1)}' file
10
11
144
169
196
You could also match only digits that start with - and in that case multiply them by themselves
awk '{print (/^-[0-9]+$/ ? $0 * $0 : $0)}' data.txt
Output
10
11
144
169
196

Sum of two maximum patterns in linux file

I am a newbie in linux. Need help for a command.
I have file in linux with following values:
2-1
2-10
2-11
2-12
2-2
2-3
1-1
1-10
1-11
1-2
1-3
1-9
Needed output needed is 23. Sum of maximum from 1- & 2- pattern i.e. 11 from 1-11 & 12 from 2-12
awk -F"-" 'BEGIN{a=0; b=0;} {if(int($1)==1 && int($2)>a){a=int($2)}; if(int($1)==2 && int($2)>b){b=int($2)}}END{print a+b}' file
output:
23
Another awk using ternary operator
awk -v FS='-' '{m1=($1==1?(m1>$2?m1:$2):m1);m2=($1==2?(m2>$2?m2:$2):m2)}END{print m1+m2}' file
sort + awk pipeline:
sort -t- -k2 -n file | awk -F'-' '{a[$1]=$2}END{ print a[1]+a[2] }'
The output:
23
$ awk -F'-' '{max[$1] = ($2 > max[$1] ? $2 : max[$1])} END{for (key in max) sum+=max[key]; print sum}' file
23
$ awk -F- 'a[$1]<$2{a[$1]=$2}END{for(i in a)s+=a[i]; print s}' infile
23

Sum all the numbers in a file given by positional parameter

I want to sum all the numbers in a file (columns and lines) given by the first parameter, but my program shows sum=sum+$i instead of the numeric sum:
sum=0;
file=$1
for i in $file
do
sum=sum+$i;
done;
echo "The sum is: " $sum
Input file:
$cat file.txt
10 20 10
40
50
Expected output :
The sum is: 21
Maybe if there is an awk method to solve this?
Try this -
$cat file1.txt
10 20 10
40
50
$awk '{for(i=1;i<=NF;i++) {sum+=$i}} END {print sum}' file1.txt
130
OR
$xargs < file1.txt| tr ' ' + | bc
130
cat file.txt | xargs | sed -e 's/\ /+/g' | bc
You can also use a simple read and an array to sum the value relying on word splitting to separate the values into an array via the default IFS (Internal Field Separator), e.g.
#!/bin/bash
declare -i sum=0
fn="${1:-/dev/stdin}" ## read from file as 1st argument (default stdin)
while read -r line; do ## read each line
a=( $line ) ## separate values into array
for i in ${a[#]}; do ## for each value in array
((sum += i)) ## add to sum
done
done <"$fn"
echo "sum: $sum"
Example Input File
$ cat dat/numfile.txt
10 20 10
40
50
Example Use/Output
$ bash sumnumfile.sh dat/numfile.txt
sum: 130
Another for some awks (at least mawk and gawk):
$ awk -v RS="[^0-9]" '{s+=$1}END{print s}' file
130

Difference between column sum in file 1 and column sum in file 2

I have this input
file 1 file 2
A 10 222 77.11 11
B 20 2222 1.215 22
C 30 22222 12.021 33
D 40 222222 145.00 44
The output I need is (11+22+33+44)- (10+20+30+40) = 110-100=10
thank you in advance for your help
Here you go:
paste file1.txt file2.txt | awk '{s1+=$5; s2+=$2} END {print s1-s2}'
Or better yet (clever of #falsetru's answer to do the summing with a single variable):
paste file1.txt file2.txt | awk '{sum+=$5-$2} END {print sum}'
If you want to work with column N in file1 and column M in file2, this might be "easier", but less efficient:
paste <(awk '{print $N}' file1) <(awk '{print $M}' file2.txt) | awk '{sum+=$2-$1} END {print sum}'
It's easier in the sense that you don't have to count the right position of the column in the second file, but less efficient because of the added extra awk sub-processes.
Using awk:
$ cat test.txt
file 1 file 2
A 10 222 77.11 11
B 20 2222 1.215 22
C 30 22222 12.021 33
D 40 222222 145.00 44
$ tail -n +2 test.txt | awk '{s += $5 - $2} END {print s}'
10

print $i unles $i is less than 10. using awk or otherwise

I have some data with a series of values on each line like this:
49.01024263 49.13389087 49.38177387 (more numbers...)
42.71585143 43.48711477 44.25625756 (ect..)
43.18826160 43.15332580 43.13094893
30.69076014 28.74489096 26.85725970
eventually the numbers reach values less than 10, at that point I'd like to delete all the remaining numbers in that line.
so far I have this, but its returning several errors.
awk '{for (i=1;i++)do{if ($i > 10.0 ) print $i ; next ; else ; exit}}' input > output
What could I be doing wrong?
Any better ways to carry out this task?
try this line:
awk '{for(i=1;i<=NF;i++)if($i>10)printf "%s ",$i;else break;print ""}' file
test with an example:
kent$ cat f
30 20 15 9 8
50 40 30 20 7 2000
100 200 300 400 5 444
kent$ awk '{for(i=1;i<=NF;i++)if($i>10)printf "%s ",$i;else break;print ""}' f
30 20 15
50 40 30 20
100 200 300 400

Resources