How to get Apache commons math SummaryStatistics to use a better minimum? - statistics

org.apache.commons.math4.stat.descriptive.SummaryStatistics
SummaryStatistics appears to use a naive implementation of min(). It uses the default constructor of the internal container which defaults to a zero value. If your data set is greater than zero the statistics will never represent the true minimum.
I'm hoping there is a way to initialize it with a known value to avoid this, but I am not seeing that. Is there a way around this without using my own implementation for statistics?
thanks

SummaryStatitics uses the Min univariate statistic object to compute minimums.
Based on the implementation for the 3.6.1 release, Min is initialized to Double.NaN.
When adding new values to SummaryStatistics, Min checks if a new value is less than the current minimum as well as checks if the current minimum is Double.NaN. If either of those conditions is true, the new value becomes the minimum.
In short, SummaryStatistics correctly computes the minimum even when all added values are positive.
As example:
SummaryStatistics summary = new SummaryStatistics();
System.out.println("Initial Minimum (should be NaN): " + summary.getMin());
summary.addValue(10.0);
System.out.println("First Value Minimum (should be 10): " + summary.getMin());
summary.addValue(5.0);
System.out.println("Smaller Value Minimum (should be 5): " + summary.getMin());
summary.addValue(20.0);
System.out.println("Larger Value Minimum (should be 5): " + summary.getMin());
generates the following output:
Initial Minimum (should be NaN): NaN
First Value Minimum (should be 10): 10.0
Smaller Value Minimum (should be 5): 5.0
Larger Value Minimum (should be 5): 5.0

Related

Round Function Erratic -- Micropython

I am working with an MPU-6050 3-axis accelerometer and using this code to read the current Z axis value with 1/10 second between readings:
az=round(imu.accel.z,2) + 0.04 (0.04 is the calibration value)
print(str(az))
Most times the value displayed with the print statement is correct (i.e., 0.84). But sometimes the value printed is the full seven-decimal place value (0.8400001). Is there a way to correct this so the two-decimal place value is displayed consistently?
Simply, perform math with calibration value and round after
az=round(float(imu.accel.z) + 0.04,2)
print(str(az))

Should Grad-Cam attributions be greater than 1?

I'm using the captum library to calculate LayerGradCam.
layer_gc = LayerGradCam(model, model.layer4)
attr = layer_gc.attribute(x, class_idx, relu_attributions=True)
Some values in attr have values greater than 1. Is this supposed to be the case?
If it is, then is it valid to minmax normalize the attributions in a batch?
I'm using these attributions to calculate Dice loss using pixel maps where the max value is 1. So when the attributions are greater than 1, Dice loss becomes negative, which is not valid.
So,
Question 1: Are the attributions supposed to be going over 1 or am I doing something wrong?
Question 2: If they're supposed to go over 1, is it valid to normalize the them per batch?

why is np.exp(x) not equal to np.exp(1)**x

Why is why is np.exp(x) not equal to np.exp(1)**x?
For example:
np.exp(400)
>>>5.221469689764144e+173
np.exp(1)**400
>>>5.221469689764033e+173
np.exp(400)-np.exp(1)**400
>>>1.1093513018771065e+160
This is optimisation of numpy that raise this diff.
Indeed, you have to understand how is calculated the Euler number in math:
e = (1/n)**n with n == inf.
I think numpy stop at a certain order:
You have in the numpy exp documentation here that is not very clear about how the Euler number is calculated.
Because of this order that is not equal to infinity, you have this small difference in the two calculations.
Indeed the value np.exp(400) is calculated using this: (1 + 400/n)**n
>>> (1 + 400/n)**n
5.221642085428121e+173
>>> numpy.exp(400)
5.221469689764144e+173
Here you have n = 1000000000000 wich is very small and raise this difference at 10e-5.
Indeed there is no exact value of the Euler number. Like Pi, you can only have an approched value.
It looks like a rounding issue. In the first case it's internally using a very precise value of e, while in the second you get a less precise value, which when multiplied 400 times the precision issues become more apparent.
The actual result when using the Windows calculator is 5.2214696897641439505887630066496e+173, so you can see your first outcome is fine, while the second is not.
5.2214696897641439505887630066496e+173 // calculator
5.221469689764144e+173 // exp(400)
5.221469689764033e+173 // exp(1)**400
Starting from your result, it looks it's using a value with 15 digits of precision.
2.7182818284590452353602874713527 // e
2.7182818284590450909589085441968 // 400th root of the 2nd result

Very large float in python

I'm trying to construct a neural network for the Mnist database. When computing the softmax function I receive an error to the same ends as "you can't store a float that size"
code is as follows:
def softmax(vector): # REQUIRES a unidimensional numpy array
adjustedVals = [0] * len(vector)
totalExp = np.exp(vector)
print("totalExp equals")
print(totalExp)
totalSum = totalExp.sum()
for i in range(len(vector)):
adjustedVals[i] = (np.exp(vector[i])) / totalSum
return adjustedVals # this throws back an error sometimes?!?!
After inspection, most recommend using the decimal module. However when I've messed around with the values being used in the command line with this module, that is:
from decimal import Decimal
import math
test = Decimal(math.exp(720))
I receive a similar error for any values which are math.exp(>709).
OverflowError: (34, 'Numerical result out of range')
My conclusion is that even decimal cannot handle this number. Does anyone know of another method I could use to represent these very large floats.
There is a technique which makes the softmax function more feasible computationally for a certain kind of value distribution in your vector. Namely, you can subtract the maximum value in the vector (let's call it x_max) from each of its elements. If you recall the softmax formula, such operation doesn't affect the outcome as it reduced to multiplication of the result by e^(x_max) / e^(x_max) = 1. This way the highest intermediate value you get is e^(x_max - x_max) = 1 so you avoid the overflow.
For additional explanation I recommend the following article: https://nolanbconaway.github.io/blog/2017/softmax-numpy
With a value above 709 the function 'math.exp' exceeds the floating point range and throws this overflow error.
If, instead of math.exp, you use numpy.exp for such large exponents you will see that it evaluates to the special value inf (infinity).
All this apart, I wonder why you would want to produce such a big number (not sure you are aware how big it is. Just to give you an idea, the number of atoms in the universe is estimated to be in the range of 10 to the power of 80. The number you are trying to produce is MUCH larger than that).

Min function while using strings in python

I'm using chisquare test for my data. I'm appending them in a loop in that way:
My .txt file looks like below, it has 180 rows with strings like that. Now I want to find the minimum value from those 180 rows, which is contained in parentesis, like in example below (15.745037950673217,), but I don't want to lose information which is assigned to a string in that row, which is 201701241800 Chi for 75 degree model.
...
201701241800 Chi for 75 degree model (15.745037950673217,)
201701241800 Chi for 76 degree model (16.014744332924252,)
...
The code I use looks like this:
o = chisquare(f_obs=fin, f_exp=y)
rows = str(Date) + str(Start_time_hours_format) + str(Start_time_minutes_format) + " Chi for {} degree model ".format(r) + str(o[0:1])
table.append(rows)
The problem is that number of those calculations is enormously huge. My task is to find minimum value in each iteration, which is defined by a for loop. Example above came from one iteration (There are 180 degree models in each iteration). The problem is I cannot use min(table) because I've got there strings, but I cannot erase them, because that information is important. Do you have any ideas how to find min value here? I mean specificly min value in parentesis.
If you have a list lst, min(lst) returns the minimum value without modifying the list. If you don't have a list, but objects from which you want to consider a value, let's say obj[i].myvalue, then you can do something like
min = 1000 # a huge number much bigger than your expected values
for o in obj:
if o.myvalue < min:
min = o.myvalue
which assigns to min the minimum value (probably it is not the best way, but it works for sure).
[I would be more specific, but it is not clear what kind of object you have to find the minimum of. Please consider to update your question to be more explicative.]
Ok, so I've found a way to solve this problem. Code below:
o = chisquare(f_obs=fin, f_exp=y)
rows = str(Date) + str(Start_time_hours_format) + str(Start_time_minutes_format) + " Chi for {} degree model ".format(r) + str(o[0:1])
print(rows)
table.append(rows)
with open('measurements.txt', 'a') as j:
j.write(min(table))
j.write('\n')
j.close()

Resources