Why is pyTorch "destroying" my numpy arrays? - pytorch

I have am working with numpy tensors of shape (N,2,128,128).
When trying to visualize these as images (I reconstruct via ifft2), numpy and pyTorch seems to mix things up in an crazy manner ...
I have checked with small dummy arrays and when I pass a numpy ndarray to a torch.FloatTensor the values are exactly the same at the same positions (same shape!), but when I try to do an ifft2 on the torch tensor ones, the result is different than on the non-torch tensor! Can someone help me make sense of this ?
A small reproducible example is:
x=np.random.rand(3,2,2,2)
xTorch=torch.FloatTensor(x)
#visualize then in the interpreter, they are the same!
#
#now show the magnitude of an inverse fourier transform
plt.imshow(np.abs(np.fft.ifft2(xTorch[0,0,:,:]+1j*xTorch[0,1,:,:])))
plt.show()
plt.imshow(np.abs(np.fft.ifft2(x[0,0,:,:]+1j*x[0,1,:,:])))
plt.show()
#they are not the same ! What is the problem!?
I found out that if I use: torch.Tensor.cpu(xTorch).detach().numpy() I can get the same result, but what does that mean?!
P.S.
ALso, note that I know the correct visualization is with the x and not the xTensor, so it seems that torch is changing something when I do the ifft2 .. or when I reconstruct the 2 channels...or maybe there is a problem/bug with complex numbers ...
If you look inside : np.abs(np.fft.ifft2(x[0,0,:,:]+1j*x[0,1,:,:])) and the xTorch one, the values are so different, that it is not just a problem of floating point error, it is something serious, but I can't figure it out and it's driving me crazy.

Related

Robotic arm python code for 3R spatial partitioned getting errors

Hello Everyone I wrote this sample code for 3R spatial geometry for inverse kinematics of a 6 axis robot.
Although the math looks fine to me I am getting different values in my result
import numpy as np
x=795
y=0
z=1264
d1=450
a1=155
a2=614
a3=np.sqrt(200**2+640**2)
print(a3)
#first joint angle
alpha=np.arctan(200/640)
#print(alpha)
theta1=np.arctan(y/x)
v1=np.cos(theta1)-a1
exd=x/v1
ezd=z-d1
v2=exd*exd+ezd*ezd-a2*a2-a3*a3
v3=2*a2*a3
v4=v2/v3
#print('\n',v4)
theta3=-np.arccos(v4)
theta2=np.arctan2(ezd,exd)-np.arctan2((a3*np.sin(theta3)),(a2+(a3*np.cos(theta3))))
theta1t1=theta1*180/np.pi
theta2t2=theta2*180/np.pi
theta3t3=(theta3+(np.pi/2-alpha))*180/np.pi
print(theta1t1,'\n')
print(theta2t2,'\n')
print(theta3t3)
My calculations are based on the following snippets
illustration
calculations
The output is coming out to be
670.5221845696084
0.0
100.9534267368054
52.12008227279429
which I do not understand why
I tried figuring out the mathematics many times but can't get any other result.
Maybe something is wrong with the interpreted output or I am not handling some cases.

Does fitting Weibull distribution to data using scipy.stats perform poor?

I am working on fitting Weibull distribution on some integer data and estimating relevant shape, scale, location parameters. However, I noticed poor performance of scipy.stats library while doing so.
So, I took a different direction and checked the fit performance by using the code below. I first create 100 numbers using Weibull distribution with parameters shape=3, scale=200, location=1. Subsequently, I estimate the best distribution fit using fitter library.
from fitter import Fitter
import numpy as np
from scipy.stats import weibull_min
# generate numbers
x = weibull_min.rvs(3, scale=200, loc=1, size=100)
# make them integers
data = np.asarray(x, dtype=int)
# fit one of the four distributions
f = Fitter(data, distributions=["gamma", "rayleigh", "uniform", "weibull_min"])
f.fit()
f.summary()
I expect the best fit to be Weibull distribution. I have tried re-running this test. Sometimes Weibull fit is a good estimate. However, most of the time Weibull fit is reported as the worst result. In this case, the estimated parameters are = (0.13836651040093312, 66.99999999999999, 1.3200752378443505). I assume these parameters correspond to shape, scale, location in order. Below is the summary of the fit procedure.
$ f.summary()
sumsquare_error aic bic kl_div
gamma 0.001601 1182.739756 -1090.410631 inf
rayleigh 0.001819 1154.204133 -1082.276256 inf
uniform 0.002241 1113.815217 -1061.400668 inf
weibull_min 0.004992 1558.203041 -976.698452 inf
Additionally, the following plot is produced.
Also, Rayleigh distribution is a special case of Weibull with shape parameter = 2. So, I expect the resulting Weibull fit to be at least as good as Rayleigh.
Update
I ran the tests above on Linux/Ubuntu 20.04 machine with numpy version 1.19.2 and scipy version 1.5.2. The code above seems to run as expected and return proper results for Weibull distribution on a Mac machine.
I have also tested fitting a Weibull distribution on data x generated above on the Linux machine by using an R library fitdistrplus as:
fit.weib <- fitdist(x, "weibull")
and observed that the estimated shape and scale values are found to be very close to the initially given values. The best guess so far is that the problem is due to some Python-Ubuntu bug/incompatibility.
I can be considered as a newbie in this area. So, I am wondering, am I doing something wrong here? Or is this result somehow expected? Any help is greatly appreciated.
Thank you.
Library fitter doesn't allow to specify parameters for distributions such as a, loc, etc. And strangely, Mac produces better fit while Linux heavily pains the results for best fit, for the same version of Numpy and Scipy. Underlying reasons may include different BLAS-LAPACK algorithms designed for Linux and Mac, https://stackoverflow.com/a/49274049/6806531, or weibull_min may not initialize parameter a = 1 which is discussed online, or default floating-point accuracy. However, one can solve the error inside fitter library. Knowing the fact that weib_min is expon_weib with parameter a is fixed as 1, changing the run function inside of _timed_run function in fitter.py as
def run(self):
try:
if distribution == "exponweib":
self.result = func(args,floc=0,fa = 1, **kwargs)
else:
self.result = func(args, floc=0, **kwargs)
except Exception as err:
self.exc_info = sys.exc_info()
and using exponweib as weib_min gives nearly same results as R fitdist.
I am not familiar with the Fitter library, but in order to draw some conclusions I would suggest:
Retry your code, but by taking size=10,000. In this case, there are sufficient datapoints for the fitting methods to utilize. Theoretically, you would then expect the Weibull to deliver the best fit.
I noticed that the location parameter can sometimes be a pain. You could try to run your fits by fixing the location parameter with floc=1 (i.e. equal to your sampling parameter for location). What do you get? Aditionally, FYI, with MLE, it suffices to take loc=min(x), where x is your dataset. For the exponential distribution, this in fact the MLE of the location parameter. For other distributions I am not sure, but I wouldn't be surprised if this holds for other distributions as well. This would reduce the fitting procedure with 1 parameter.
Lastly, I noticed that if you take small values for location/scale/shape for some distributions, the functions logpdf and logcdf of scipy.stats distributions result in np.inf values. In this scenario, you could perhaps use the Powell optimization algorithm and set bounds on the values of your parameters.

numpy ifft output has much larger power than original signal

I'm having a weird problem using the numpy fft class. I have the following bit of test code:
import numpy as np
import scipy.io.wavfile
import matplotlib.pyplot as plt
fs, a = scipy.io.wavfile.read('test.wav') # import audio file
spectrum = np.fft.fft(a) # create spectrum
b = np.real(np.fft.ifft(spectrum)) # reconstruct signal
# Print power of original and output signal
print(np.average(a**2))
print(np.average(b**2))
It outputs:
1497.887578558565
4397203.934254291
As expected for these values, the output is much louder than the input. The documentation for numpy.fft.ifft states:
"This function computes the inverse of the one-dimensional n-point discrete Fourier transform computed by fft. In other words, ifft(fft(a)) == a to within numerical accuracy."
Thus the signal should be nearly identical. Yet they are obviously not.
What am I doing wrong here?
Okay I managed to find the solution myself in the end.
The problem arises because the output of wavfile.read is an integer array. For some reason, the fft function handles integers in a different manner than floats. The problem is solved by typecasting a to an np.float64 type.
Why this happens is still not quite clear to me though.

handling large non-sparse matrices for computing SVD

I have a large matrix (right now about 450000 x 50, might be even larger) that I want to compute its SVD decomposition. The matrix isn't sparse and numpy can't seem to handle it and exits with MemoryError.
I tried using np.float16 and it didn't help. python's table package can't seem to help either (since I need to use the whole matrix later to find eigenvalues).
Do any of you have an idea how can I compute and use massive matrices?

Can I avoid using `Theano.scan`?

I have 3-dimensional tensor ("tensor3" -- an array of matrices), and I'd like to compute the determinant (theano.sandbox.linalg.det) of each matrix. Is there a way to compute each determinant without using theano.scan? When I try calling det directly on the tensor I get the error
3-dimensional array given. Array must be two-dimensional.
But I read that scan is slow and doesn't parallelize well, and that one should use only tensor operations if possible. Is that so? Can I avoid using scan in this case?
I see 3 possibilities:
If you know before compiling the Theano function the number of matrix in the tensor3 variable, you could use the split() op or just call det() on all matrix in the tensor3.
If you don't know the shape, you can make your own op, that will loop over the input and call the numpy fct. See for an example on how to make an op.
Use scan. It is easy to use it for this case. See this example, just change the call from tensordot to det().

Resources