Fastest way to find indices of a 3-D numpy array - python-3.x

I have a 3-D numpy array from which I need to find the indices of the locations in the array which have values greater than 0.
voxel_space = np.random.rand(100, 240, 180)
I currently use the following numpy method to solve my problem.
vol_vectors = np.argwhere(voxel_space > 0)
While this works, it is quite slow for my application. I was wondering if there was a faster way to do the same.

Related

np.where issue above a certain value (#Numpy)

I'm facing to 2 issues in the following snippet using np.where (looking for indexes where A[:,0] is identical to B)
Numpy error when n is above a certain value (see error)
quite slow
DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
So I'm wondering what I'm missing and/or misunderstanding, how to fix it, and how to speed-up the code. This is a basic example I've made to mimic my code, but in fact I'm dealing with arrays having (dozens of) millions of rows.
Thanks for your support
Paul
import numpy as np
import time
n=100_000 # with n=10 000 ok but quit slow
m=2_000_000
#matrix A
# A=np.random.random ((n, 4))
A = np.arange(1, 4*n+1, dtype=np.uint64).reshape((n, 4), order='F')
#Matrix B
B=np.random.randint(1, m+1, size=(m), dtype=np.uint64)
B=np.unique(B) # duplicate values are generally generated, so the real size remains lower than n
# use of np.where
t0=time.time()
ind=np.where(A[:, 0].reshape(-1, 1) == B)
# ind2=np.where(B == A[:, 0].reshape(-1, 1))
t1=time.time()
print(f"duration={t1-t0}")
In your current implementation, A[:, 0] is just
np.arange(n/4, dtype=np.uint64)
And if you are interested only in row indexes where A[:, 0] is in B, then you can get them like this:
row_indices = np.where(np.isin(first_col_of_A, B))[0]
If you then want to select the rows of A with these indices, you don't even have to convert the boolean mask to index locations. You can just select the rows with the boolean mask: A[np.isin(first_col_of_A, B)]
There are better ways to select random elements from an array. For example, you could use numpy.random.Generator.choice with replace=False. Also, Numpy: Get random set of rows from 2D array.
I feel there is almost certainly a better way to do the whole thing that you are trying to do with these index locations.
I recommend you study the Numpy User Guide and the Pandas User Guide to see what cool things are available there.
Honestly, with your current implementation you don't even need the first column of A at all, because row indicies simply equal the elements of A[:, 0]. Here:
row_indices = B[B < n]
row_indices.sort()
print(row_indices)

Flip zeros (with probability alpha) and ones (with a probability beta) of a numpy array?

I am working with a numpy array A of size 15000*15000.I want to flip zeros of this array with a probability alpha and ones of this array with a probability beta.Is there a better way to this?
cn=0
n=A.shape[0]
ans=np.zeros((n,n))
for i in A:
zeros=np.where(i==0)[0]
ones=np.where(i==1)[0]
fl_zeros=list(np.random.choice(zeros,round(len(zeros)*alpha),replace=False))
fl_ones=list(np.random.choice(ones,round(len(ones)*beta),replace=False))
ind=np.array(fl_zeros+fl_ones).astype(np.int32)
tmp=np.zeros(n)
np.put_along_axis(tmp,ind,1.0,axis=0)
ans[cn]=np.logical_xor(i,tmp)
cn+=1

Efficient way to find unique numpy arrays from a large set having the same shape and dtype

I have a large set (~ 10000) of numpy arrays, (a1, a2, a3,...,a10000). Each array has the same shape (10, 12) and all are of dtype = int. In any row of any array, the 12 values are unique.
Now, there are many doubles, triples, etc. I suspect only about a tenth of the arrays are actually unique (ie: having the same values in the same positions).
Could I get some advice on how I might isolate the unique arrays? I suspect numpy.array_equal will be involved, but I'm new enough to the language that I'm struggling with how to implement it.
numpy.unique can be used to find the unique elements of an array. Supposing your data is contained in a list; first, stack data to generate a 3D array. Then perform np.unique to find unique 2D arrays:
import numpy as np
# dummy list of numpy array to simulate your data
list_of_arrays = [np.stack([np.random.permutation(12) for i in range(10)]) for i in range(10000)]
# stack arrays to form a 3D array
arr = np.stack(list_of_arrays)
# find unique arrays
unq = np.unique(arr, axis = 0)

Finding the maximum slope in an Lat\Lon array

I'm plotting sea surface height by Latitude for a 20 different Longitudes.
The result is a line plot with 20 lines. I need to find in which line has the steepest slope and then pinpoint that lat lon.
I've tried so far with np.gradient and then max() but I keep getting an error (ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all())
I have a feeling theres a much better way to do it.Thanks to those willing to help.
example of plot
slice3lat= lat[20:40]
slice3lon= lon[20:40]
slice3ssh=ssh[:,0,20:40,20:40]
plt.plot(slice3lat,slice3ssh)
plt.xlabel("Latitude")
plt.ylabel("SSH (m)")
plt.legend()
When you say max(), I assume you mean Python's built-in max function. This works on numpy arrays only if they are one-dimensional / flat, where by iterating over the elements, size comparable scalars are obtained. If you have a 2D array like in your case, the top-level elements of the array become its rows, where the size comparison fails with the message you presented.
In this case, you should use np.max on the array or call the arr.max() method directly.
Here's some example code using np.gradient, vector adding the gradients in each direction and obtaining the max together with its coordinate position in the original data:
grad_y, grad_x = np.gradient(ssh)
grad_total = np.sqrt(grad_y**2 + grad_x**2) # or just grad_y ?
max_grad = grad_total.max()
max_grad_pos = np.unravel_index(grad_total.argmax(), grad_total.shape)
print("Gradient max is {} at pos {}.".format(max_grad, max_grad_pos))
Might ofc still need to fiddle a liddle with it.

np.fft.fft not working properly

I'm trying to use python 3.x to do an fft from some data. But when I plot I get my original data (?) not the data's fft. I'm using matlab so I can compare the results.
I've already tried many examples from this site but nothing seems to work. I'm not used to work with python. How can I get a plot similar to matlab's? I don't care if I get -f/2 to f/2 or 0 to f/2 spectrum.
My data
import scipy.io
import numpy as np
import matplotlib.pyplot as plt
mat = scipy.io.loadmat('sinal2.mat')
sinal2 = mat['sinal2']
Fs = 1000
L = 1997
T = 1.0/1000.0
fsig = np.fft.fft(sinal2)
freq = np.fft.fftfreq(len(sinal2), 1/Fs)
plt.figure()
plt.plot( freq, np.abs(fsig))
plt.figure()
plt.plot(freq, np.angle(fsig))
plt.show()
FFT from python:
FFT from matlab:
The imported signal sinal2 has a size (1997,1). In case of 2 dimensional arrays like this, numpy.fft.fft by default computes the FFT along the last axis. In this case that means computing 1997 FFTs of size 1. As you may know a 1-point FFT is an identity mapping (meaning the FFT of a single value gives the same value), hence the resulting 2D array is identical to the original array.
To avoid this, you can either specify the other axis explicitly:
fsig = np.fft.fft(sinal2, axis=0)
Or otherwise convert the data to a single dimensional array, then compute the FFT of a 1D array:
sinal2 = singal2[:,0]
fsig = np.fft.fft(sinal2)
On a final note, you FFT plot shows a horizontal line connecting the upper and lower halfs of the frequency spectrum. See my answer to another question to address this problem. Since you mention that you really only need half the spectrum, you could also truncate the result to the first N//2+1 points:
plt.plot( freq[0:len(freq)//2+1], np.abs(fsig[0:len(fsig)//2+1]))

Resources