Using integers ONLY (no floating-point), is there a way to determine between two fractions, which result is greater?
for example say we have these two fraction:
1000/51 = 19(.60) && 1000/52 = 19(.23)
If we were to use floating point numbers obviously the first fraction is greater; however, both fractions equal 19 if we were to use integers only. How might one find out which is greater with out using floating point math?
I have tried to get the remainder using the % operator but does not seem to work in all cases.
1/2 can be think one apple give two people, so every people take 0.5 apple.
so 1000/51 consider as 1000 apples give 51 people.
1000/51 > 1000/52, because the apple the same,but we wanna give it to more people.
it is simple example, more complex exmaple:
1213/109 1245/115 which is greater?
1245 is greater than 1213 and 115 is greater than 109, difference:
1245 - 1213 = 32, and 115 - 109 = 6, 32/6 replce 1245/109, compare 1213/109 to 32/6.
32/6 ≈ 5 and less 6, 6*109 = 654 < 1213, so 1213/109 > 1245/115.
1213/109 1245/115
1213/109 32/6 # make diff 1245 - 1213 = 32 115 - 109 = 6
# compare diff to 1213/109
1213 > 109 * 6
# then
1213/109 > 1245/115
Related
I was following this question to address a similar situation:
How to Calculate Area Under the Curve in Spotfire?
My data is in the following format:
PLANT
OBS_DATE_RECORDED
TRAIT_VALUE
period
A
3/16/2021
225
A3/16/2021
A
3/23/2021
227
A3/23/2021
A
3/30/2021
220
A3/30/2021
A
4/7/2021
240
A4/7/2021
A
4/13/2021
197
A4/13/2021
A
4/20/2021
197
A4/20/2021
A
4/27/2021
218
A4/27/2021
B
3/16/2021
253
B3/16/2021
B
3/23/2021
274
B3/23/2021
B
3/30/2021
271
B3/30/2021
B
4/7/2021
257
B4/7/2021
B
4/13/2021
250
B4/13/2021
A
4/20/2021
241
A4/20/2021
B
4/27/2021
255
B4/27/2021
Following the answer's formula as a calculated column:
([TRAIT_VALUE] + Avg([TRAIT_VALUE]) over (Intersect(NextPeriod([period]),[PLANT]))) / 2 * (Avg([OBS_DATE_RECORDED]) over (Intersect(NextPeriod([period]),[PLANT])) - [OBS_DATE_RECORDED])
However, the results don't appear correct.
AUDPC
1603.19:59:59.928
1608.17:59:59.956
2924.20:0:0.100
7732.21:0:0.000
1395.14:41:44.404
1461.23:30:0.050
-4393.7:59:59.712
I think the problem might be the date format but don't understand the formula well enough to troubleshoot. In Excel I usually compute the AUDPC by using the SUMPRODUCTS multiplying the days between two dates by the average TRAIT_VALUE between those two dates.
This might sound bonkers, but looking to see if there are any ideas on how to do this.
I have N categories (say 7) where a set number of people (say 1000) have to be allocated. I know from historical data the minimum and maximum for each category (there is limited historical data, say 15 samples, so I have data that looks like this - if I had a larger sample, I would try to generate a distribution for each category from all the samples, but there isn't.
-Year 1: [78 97 300 358 132 35 0]
-Year 2: [24 74 346 300 148 84 22]
-.
-.
-Year 15:[25 85 382 302 146 52 8]
The min and max for each category over these 15 years of data is:
Min: [25 74 252 278 112 27 0 ]
Max: [132 141 382 360 177 84 22]
I am trying to scale this using simulation - by allocating 1000 to each category within the min and max limits, and repeating it. The only condition is that the sum of the allocation across the seven categories in each simulation has to sum to 1000.
Any ideas would be greatly appreciated!
The distribution you want is called the multinomial distribution. You can use the RandMultinomial function in SAS/IML to produce random samples from the multinomial distribution. To use the multinomial distribution, you need to know the probability of an individual in each category. If this probability has not changed over time, the best estimate of this probability is to take the average proportion in each category.
Thus, I would recommend using ALL the data to estimate the probability, not just max and min:
proc iml;
X = {...}; /* X is a 15 x 7 matrix of counts, each row is a year */
mean = mean(X);
p = mean / sum(mean);
/* simulate new counts by using the multinomial distribution */
numSamples = 10;
SampleSize = 1000;
Y = randmultinomial(numSamples, SampleSize, p);
print Y;
Now, if you insist on using the max/min, you could use the midrange to estimate the most likely value and use that to estimate the probabilty, as follows:
Min = {25 74 252 278 112 27 0};
Max = {132 141 382 360 177 84 22};
/* use midrange to estimate probabilities */
midrange = (Min + Max)/2;
p = midrange / sum(midrange);
/* now use RandMultinomial, as before */
If you use the second method, there is no guarantee that the simulated values will not exceed the Min/Max values, although in practice many of the samples will obey that criterion.
Personally, I advocate the first method, which uses the average count. Or you can use a time-weighted count, if you think recent observations are more relevant than observations from 15 years ago.
I am trying to apply normalization to my data and I have tried the Conventional scaling techniques using sklearn packages readily available for this kind of requirement. However, I am looking to implement something called Decimal scaling.
I read about it in this research paper and looks like a technique which can improve results of a neural network regression. As per my understanding, this is what I believe needs to be done -
Suppose the range of attribute X is −4856 to 28. The maximum absolute value of X is 4856.
To normalize by decimal scaling I will need to divide each value by 10000 (c = 4). In this case, −4856 becomes −0.4856 while 28 becomes 0.0028.
So for all values: new value = old value/ 10^c
How can I reproduce this as a function in Python so as to normalize all the features(column by column) in my data set?
Input:
A B C
30 90 75
56 168 140
28 84 70
369 1107 922.5
485 1455 1212.5
4856 14568 12140
40 120 100
56 168 140
45 135 112.5
78 234 195
899 2697 2247.5
Output:
A B C
0.003 0.0009 0.0075
0.0056 0.00168 0.014
0.0028 0.00084 0.007
0.0369 0.01107 0.09225
0.0485 0.01455 0.12125
0.4856 0.14568 1.214
0.004 0.0012 0.01
0.0056 0.00168 0.014
0.0045 0.00135 0.01125
0.0078 0.00234 0.0195
0.0899 0.02697 0.22475
Thank you guys for asking questions which led me to think about my problem more clearly and break it into steps. I have arrived to a solution. Here's how my solution looks like:
def Dec_scale(df):
for x in df:
p = df[x].max()
q = len(str(abs(p)))
df[x] = df[x]/10**q
I hope this solution looks agreeable!
def decimal_scaling (df):
df_abs = abs(df)
max_valus= df_abs.max()
log_num=[]
for i in range(max_valus.shape[0]):
log_num.append(int(math.log10(max_valus[i]))+1)
log_num = np.array(log_num)
log_num = [pow(10, number) for number in log_num]
X_full =df/log_num
return X_full
I have a file with 4 columns separated by space like this bellow:
1_86500000 50 1_87500000 19
1_87500000 13 1_89500000 42
1_89500000 25 1_90500000 10
1_90500000 3 1_91500000 11
1_91500000 23 1_92500000 29
1_92500000 34 1_93500000 4
1_93500000 39 1_94500000 49
1_94500000 35 1_95500000 26
2_35500000 1 2_31500000 81
2_31500000 12 2_4150000 50
The First and Third columns are not in phase so I can not divide the value of one by another.
As there are only two or one possible columns $1 or $3, a solution would be look for the pattern and divide its value in the another column or set it to 0 if there is none like this expected result shows:
P.S. the second field in this expected result is just illustrative to shown the division.
1_86500000 0/50 0
1_87500000 19/13 1.46154
1_89500000 42/25 1.68
1_90500000 10/3 3.333
1_91500000 11/23 0.47826
1_92500000 29/34 0.85294
1_93500000 4/39 0.10256
1_94500000 49/35 1.4
2_35500000 0/1 0
2_31500000 81/12 6.75
2_4150000 50/0 50
I do not archived anything by myself other than this. So I do not have any starting point by now.
I tried separate the fields merged with _ to see if I could match by subtracting the coordinates. If I got 0 would mean that the columns was in phase and correct. But I could not go further.
awk '{if( ($5-$2)==0) print $1,$2,$3,$4,$5,$6}' file
I tried to match both columns but I only got phased results:
awk '{if(($1==$3)) print $1,$4/$2}' file
Can you help me?
awk to the rescue!
$ awk '{d[$1]=$2; n[$3]=$4}
END {for(k in n)
if(k in d) {print k,n[k]"/"d[k],n[k]/d[k]; delete d[k]}
else print k,n[k]"/0",n[k];
for(k in d) print k,"0/"d[k],0}' file | sort
1_86500000 0/50 0
1_87500000 19/13 1.46154
1_89500000 42/25 1.68
1_90500000 10/3 3.33333
1_91500000 11/23 0.478261
1_92500000 29/34 0.852941
1_93500000 4/39 0.102564
1_94500000 49/35 1.4
1_95500000 26/0 26
2_31500000 81/12 6.75
2_35500000 0/1 0
2_4150000 50/0 50
your division by zero result is little strange though!
Explanation keep two arrays for numerator and denominator. Once scanned the file, go over numerator array and find the corresponding denominator and make the division. For the denominators not used apply the convention given.
170! approaches the limit of a floating point double: 171! will overflow.
However 170! is over 300 digits long.
There is, therefore, no way that 170! can be represented precisely in floating point.
Yet Excel returns the correct answer for 170! / 169!.
Why is this? I'd expect some error to creep in, but it returns an integral value. Does Excel somehow know how to optimise this calculation?
If you find the closest doubles to 170! and 169!, they are
double oneseventy = 5818033100654137.0 * 256;
double onesixtynine = 8761273375102700.0;
times the same power of two. The closest double to the quotient of these is exactly 170.0.
Also, Excel may compute 170! by multiplying 169! by 170.
William Kahan has a paper called "How Futile are Mindless Assessments of Roundoff in Floating-Point Computation?" where he discusses some of the insanity that goes on in Excel. It may be that Excel is not computing 170 exactly, but rather it's hiding an ulp of reality from you.
The answer of tmyklebu is already perfect. But I wanted to know more.
What if implementation of n! was something trivial as return double(n)*(n-1)!...
Here is a Smalltalk snippet, but you can translate in many other languages, that's not the point:
(2 to: 170) count: [:n |
| num den |
den := (2 to: n - 1) inject: 1.0 into: [:p :e | p*e].
num := n*den.
num / den ~= n].
And the answer is 12
So you have not been particulary lucky, due to good properties of round to nearest even rounding mode, out of these 169 numbers, only 12 don't behave as expected.
Which ones? Replace count: by select: and you get:
#(24 47 59 61 81 96 101 104 105 114 122 146)
If I had an Excel handy, I would ask to evaluate 146!/145!.
Curiously (only apparently curiously), a less naive solution that computes the exact factorial with large integer arithmetic, then convert to nearest float, does not perform better !
(2 to: 170) reject: [:n |
n factorial asFloat / (n-1) factorial asFloat = n]
leads to:
#(24 31 34 40 41 45 46 57 61 70 75 78 79 86 88 92 93 111 115 116 117 119 122 124 141 144 147 164)