numpy ifft output has much larger power than original signal - python-3.x

I'm having a weird problem using the numpy fft class. I have the following bit of test code:
import numpy as np
import scipy.io.wavfile
import matplotlib.pyplot as plt
fs, a = scipy.io.wavfile.read('test.wav') # import audio file
spectrum = np.fft.fft(a) # create spectrum
b = np.real(np.fft.ifft(spectrum)) # reconstruct signal
# Print power of original and output signal
print(np.average(a**2))
print(np.average(b**2))
It outputs:
1497.887578558565
4397203.934254291
As expected for these values, the output is much louder than the input. The documentation for numpy.fft.ifft states:
"This function computes the inverse of the one-dimensional n-point discrete Fourier transform computed by fft. In other words, ifft(fft(a)) == a to within numerical accuracy."
Thus the signal should be nearly identical. Yet they are obviously not.
What am I doing wrong here?

Okay I managed to find the solution myself in the end.
The problem arises because the output of wavfile.read is an integer array. For some reason, the fft function handles integers in a different manner than floats. The problem is solved by typecasting a to an np.float64 type.
Why this happens is still not quite clear to me though.

Related

Why is pyTorch "destroying" my numpy arrays?

I have am working with numpy tensors of shape (N,2,128,128).
When trying to visualize these as images (I reconstruct via ifft2), numpy and pyTorch seems to mix things up in an crazy manner ...
I have checked with small dummy arrays and when I pass a numpy ndarray to a torch.FloatTensor the values are exactly the same at the same positions (same shape!), but when I try to do an ifft2 on the torch tensor ones, the result is different than on the non-torch tensor! Can someone help me make sense of this ?
A small reproducible example is:
x=np.random.rand(3,2,2,2)
xTorch=torch.FloatTensor(x)
#visualize then in the interpreter, they are the same!
#
#now show the magnitude of an inverse fourier transform
plt.imshow(np.abs(np.fft.ifft2(xTorch[0,0,:,:]+1j*xTorch[0,1,:,:])))
plt.show()
plt.imshow(np.abs(np.fft.ifft2(x[0,0,:,:]+1j*x[0,1,:,:])))
plt.show()
#they are not the same ! What is the problem!?
I found out that if I use: torch.Tensor.cpu(xTorch).detach().numpy() I can get the same result, but what does that mean?!
P.S.
ALso, note that I know the correct visualization is with the x and not the xTensor, so it seems that torch is changing something when I do the ifft2 .. or when I reconstruct the 2 channels...or maybe there is a problem/bug with complex numbers ...
If you look inside : np.abs(np.fft.ifft2(x[0,0,:,:]+1j*x[0,1,:,:])) and the xTorch one, the values are so different, that it is not just a problem of floating point error, it is something serious, but I can't figure it out and it's driving me crazy.

Wav audio level is too large

I have a mono wav file for a 'glass breaking' sound. When I graphically display it's levels in python using librosa library, it shows very large range of amplitudes, between +/ 20000 instead of +/- 1. When I open same wav file with Audacity, the levels are between +/- 1.
My question is what generates this difference in displayed amplitude levels and how can I correct it in Python? MinMax scaling will distort the sound and I want to avoid it if possible.
The code is:
from scipy.io import wavfile
fs1, glass_break_data = wavfile.read('test_break_glass_normalized.wav')
%matplotlib inline
import matplotlib.pyplot as plt
import librosa.display
sr=44100
x = glass_break_data.astype('float')
plt.figure(figsize=(14, 5))
librosa.display.waveplot(x, sr=sr)
These are the images from the notebook and Audacity:
WAV usually uses integer values to represent individual samples, not floats. So what you see in the librosa plot is accurate for a 16 bit/sample audio file.
Programs like VLC show the format, including bit depth per sample in their info dialog, so you can easily check.
Another way to check the format might be using soxi or ffmpeg.
Audacity normalizes everything to floats in the range of -1 to 1—it does not show you the original format.
The same is true for librosa.load()—it also normalizes to [-1,1]. wavfile.read() on the other hand, does not normalize. For more info on ways to read WAV audio, please see for example this answer.
If you use librosa.load instead of wavfile.read it will normalize the range to -1, 1
glass_break_data, fs1 = librosa.load('test_break_glass_normalized.wav')

Generating a 2d mesh file from points

I have to generate a 2d mesh in a format compatible with optimesh, in order to refine it with the algorithms included in that library, (in particular Centroidal Voronoi tesselation smoothing). I'm starting from a set of unordered points, so I'm trying to understand which is the easiest chain of tools to do the job.I have no familiarity at all with geometry processing, so forgive me if my questions are stupid.
I found a lot of libraries to process a mesh from a file in a huge variety of format, but I'm missing how to generate it from points.
I've seen that with scipy I can get a triangulation, but the object returning from scipy, can't be fed directly to optimesh.
So, my problem now is basically something like this:
import numpy as np
from scipy.spatial import Delaunay,delaunay_plot_2d
points = np.random.random((100,2))
delaun = Delaunay(points)
#Magic code that I wish
delaun.to_meshfile('meshfile.xxx')
#
with a file format that i can process later with optimesh
optimesh author here. Your delaun object has delaun.points and delaun.simplices. Those can be fed into optimesh:
import numpy as np
from scipy.spatial import Delaunay, delaunay_plot_2d
import optimesh
points = np.random.random((100, 2))
delaun = Delaunay(points)
points, cells = optimesh.cvt.quasi_newton_uniform_blocks(
delaun.points, delaun.simplices, tol=1.0e-5, max_num_steps=100
)
If you really want to store them in a file, check out meshio.

Reconstruction of original image using the Laplacian Filter output

I have applied Laplacian filter to the image for detecting the edges in the image.
import matplotlib.pyplot as plt
from skimage import filters
output= filters.laplace(image)
plt.imshow(output, cmap = 'gray')
plt.title('Laplace', size=20)
plt.show()
I need to reconstruct the original image using the output obtained from the code above.
I'm not sure if 'filters.inverse' works or if there is any other method available.
What you are looking for is called deconvolution. If you search for "scikit-image deconvolution", you will probably land at the documentation for the Richardson-Lucy deconvolution function, or at this example usage. Note: it is not always theoretically possible to reconstruct the original signal (it's a little bit like unmixing paint), but you can get reasonable approximations, especially if your convolution is exactly known.
You can look at the source code for the Laplace filter, where you see that the image is convolved with a laplacian kernel. That is the kernel we need to deconvolve the image. (Note that you can always regenerate the kernel by convolving an image containing just a 1 in the center and 0s everywhere else. That's why the kernel in deconvolution is referred to as the point-spread function.)
So, to restore your image:
from skimage.restoration.uft import laplacian
from skimage.restoration import richardson_lucy
kernel_size = 3 # default for filters.laplace, increase if needed
kernel = laplacian(output.ndim, (kernel_size,) * output.ndim)
restored = richardson_lucy(output, kernel)

n-components doesn't seem to truncate the number of components calculated

I'm trying to perform Kernal Principal Component Analysis (KPCA) on a large data set that I will want to find the pre-image of after removal of the low energy/high entropy components.
I would had assumed that specifying the n_components parameter would prevent the nxn calculation (and storage thereof), but that doesn't seem to be the case; at least kpca.alphas_ and .lambdas_ still have nxn components stored and calculated.
Is there something I'm doing wrong, or can this function not operate similarly to truncated_svd?
I've read up on streaming KPCA approaches that would assuage the memory and processing time issue, but then I would need to auger a way to form the pre-image which I don't feel well equipped to do.
from sklearn.decomposition import KernelPCA as KPCA
from sklearn.datasets import make_blobs as mb
import numpy as np
X,y=mb(n_samples=400,cluster_std=[1,2,.25,.5,0.1],centers=5,n_features=2)
kpca=KPCA(kernel='rbf',fit_inverse_transform=True,gamma=10,n_components=50)
Xk=kpca.fit_transform(X)
print np.shape(kpca.lambdas_)
It occurred to me that telling sklearn to fit the inverse might also require all eignvalues/vectors to be calculated.
Without this field, it performs the same way as truncated_svd.
Suppose I'll need to make/discover a pre-image approximation scheme after all.
If you know of any, feel free to post in the comments.

Resources