Why RuntimeWarning encountered is dependent on the position of variables? - python-3.x

In the following code I receive a message that "RuntimeWarning: invalid value encountered in log". I know that this message appears when log values a are two small. But why it is dependent on the position of the variable? Like in the following code while defining s if I use np.log((Q[j])/np.log(P[j])) i get error but if I replace the numerator with denominator the message diappers. Why is it so?
`Q= np.array([0., 0., 2.02575004])
P=np.array([0.90014722, 0.93548378, 0.92370304])
for i in range(len(Spectrum_bins)):
for j in range(len(P)):
if Q[j] !=0:
s= (P[j])*np.log((Q[j])/np.log(P[j]))
print(s)`

Well because the values of P are all below 1 then the value of np.log(P[j]) is negative. It is not mathematically possible to find the log of a negative number so numpy returns nan (Not a Number).
This is where the first error comes from.
To address your second question, I assume you are changing the equation to
np.log(np.log(P[j])/np.log(P[j]))
which would result in the natural log of 1, which equals 0. This is a real number and so no error would be returned.

Related

Round Function Erratic -- Micropython

I am working with an MPU-6050 3-axis accelerometer and using this code to read the current Z axis value with 1/10 second between readings:
az=round(imu.accel.z,2) + 0.04 (0.04 is the calibration value)
print(str(az))
Most times the value displayed with the print statement is correct (i.e., 0.84). But sometimes the value printed is the full seven-decimal place value (0.8400001). Is there a way to correct this so the two-decimal place value is displayed consistently?
Simply, perform math with calibration value and round after
az=round(float(imu.accel.z) + 0.04,2)
print(str(az))

why is np.exp(x) not equal to np.exp(1)**x

Why is why is np.exp(x) not equal to np.exp(1)**x?
For example:
np.exp(400)
>>>5.221469689764144e+173
np.exp(1)**400
>>>5.221469689764033e+173
np.exp(400)-np.exp(1)**400
>>>1.1093513018771065e+160
This is optimisation of numpy that raise this diff.
Indeed, you have to understand how is calculated the Euler number in math:
e = (1/n)**n with n == inf.
I think numpy stop at a certain order:
You have in the numpy exp documentation here that is not very clear about how the Euler number is calculated.
Because of this order that is not equal to infinity, you have this small difference in the two calculations.
Indeed the value np.exp(400) is calculated using this: (1 + 400/n)**n
>>> (1 + 400/n)**n
5.221642085428121e+173
>>> numpy.exp(400)
5.221469689764144e+173
Here you have n = 1000000000000 wich is very small and raise this difference at 10e-5.
Indeed there is no exact value of the Euler number. Like Pi, you can only have an approched value.
It looks like a rounding issue. In the first case it's internally using a very precise value of e, while in the second you get a less precise value, which when multiplied 400 times the precision issues become more apparent.
The actual result when using the Windows calculator is 5.2214696897641439505887630066496e+173, so you can see your first outcome is fine, while the second is not.
5.2214696897641439505887630066496e+173 // calculator
5.221469689764144e+173 // exp(400)
5.221469689764033e+173 // exp(1)**400
Starting from your result, it looks it's using a value with 15 digits of precision.
2.7182818284590452353602874713527 // e
2.7182818284590450909589085441968 // 400th root of the 2nd result

Very large float in python

I'm trying to construct a neural network for the Mnist database. When computing the softmax function I receive an error to the same ends as "you can't store a float that size"
code is as follows:
def softmax(vector): # REQUIRES a unidimensional numpy array
adjustedVals = [0] * len(vector)
totalExp = np.exp(vector)
print("totalExp equals")
print(totalExp)
totalSum = totalExp.sum()
for i in range(len(vector)):
adjustedVals[i] = (np.exp(vector[i])) / totalSum
return adjustedVals # this throws back an error sometimes?!?!
After inspection, most recommend using the decimal module. However when I've messed around with the values being used in the command line with this module, that is:
from decimal import Decimal
import math
test = Decimal(math.exp(720))
I receive a similar error for any values which are math.exp(>709).
OverflowError: (34, 'Numerical result out of range')
My conclusion is that even decimal cannot handle this number. Does anyone know of another method I could use to represent these very large floats.
There is a technique which makes the softmax function more feasible computationally for a certain kind of value distribution in your vector. Namely, you can subtract the maximum value in the vector (let's call it x_max) from each of its elements. If you recall the softmax formula, such operation doesn't affect the outcome as it reduced to multiplication of the result by e^(x_max) / e^(x_max) = 1. This way the highest intermediate value you get is e^(x_max - x_max) = 1 so you avoid the overflow.
For additional explanation I recommend the following article: https://nolanbconaway.github.io/blog/2017/softmax-numpy
With a value above 709 the function 'math.exp' exceeds the floating point range and throws this overflow error.
If, instead of math.exp, you use numpy.exp for such large exponents you will see that it evaluates to the special value inf (infinity).
All this apart, I wonder why you would want to produce such a big number (not sure you are aware how big it is. Just to give you an idea, the number of atoms in the universe is estimated to be in the range of 10 to the power of 80. The number you are trying to produce is MUCH larger than that).

Warning of divide by zero encountered in log2 even after filtering out negative values

We get the error - "divide by zero encountered in log2"
if the value is less than zero. I am facing the error even when I exempt the non-positive values using where statement.
a = pd.Series([1,0,5,6,8])
np.where(a<=0, 1, np.log2(a))
When you are computing the values that should be substituted when values in a are non-positive, log2 is called and applied to a. This doesn't really affect your output though. To suppress this error, you could replace non-positive values with 1 first, and perform log2.
Try the following:
import pandas as pd
import numpy as np
a = pd.Series([1,0,5,6,8])
for digit in a:
np.where(digit<=0, 1, np.log2(digit))
it seems the issue is if you just do
np.where(a<=0, 1, np.log2(a))
it sees the entire series not as an iteration through a.

numpy.cov returning a matrix which is not positive semi-definite

I'm calculating a covariance matrix from a 2D array using np.cov, and using it to get nearest neighbors with Mahalanobis distance.
c = np.cov(arr)
neigh = NearestNeighbors(100,metric='mahalanobis',metric_params = {'VI':np.linalg.inv(c)})
neigh.fit(dfeatures)
But for some reason, I'm getting
/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py:131: RuntimeWarning: invalid value encountered in sqrt
and the values of the distance of any query point returns nan.
Instead of passing c to NearestNeighbors, if I pass an identity matrix the NearestNeighbors works as expected. I suspected that c might actually not be positive semidefinite and therefore the values in the sqrt in Mahalanobis distance might get a negative value as input.
I checked the eigenvalue of resulting c and many of them turned out to be negative(and complex) but close to 0.
I'd a few questions:
Is this totally because of the numerical errors(or am I doing something wrong)?
If it is because of numerical errors is there a way to fix it?
Turns out this is in-fact because of numerical error. A workaround to correct this is to add a small number to diagonal element of covariance matrix. The larger this number the closer the distance will be to euclidean distance, so one must be careful while choosing this number.

Resources