I'm plotting sea surface height by Latitude for a 20 different Longitudes.
The result is a line plot with 20 lines. I need to find in which line has the steepest slope and then pinpoint that lat lon.
I've tried so far with np.gradient and then max() but I keep getting an error (ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all())
I have a feeling theres a much better way to do it.Thanks to those willing to help.
example of plot
slice3lat= lat[20:40]
slice3lon= lon[20:40]
slice3ssh=ssh[:,0,20:40,20:40]
plt.plot(slice3lat,slice3ssh)
plt.xlabel("Latitude")
plt.ylabel("SSH (m)")
plt.legend()
When you say max(), I assume you mean Python's built-in max function. This works on numpy arrays only if they are one-dimensional / flat, where by iterating over the elements, size comparable scalars are obtained. If you have a 2D array like in your case, the top-level elements of the array become its rows, where the size comparison fails with the message you presented.
In this case, you should use np.max on the array or call the arr.max() method directly.
Here's some example code using np.gradient, vector adding the gradients in each direction and obtaining the max together with its coordinate position in the original data:
grad_y, grad_x = np.gradient(ssh)
grad_total = np.sqrt(grad_y**2 + grad_x**2) # or just grad_y ?
max_grad = grad_total.max()
max_grad_pos = np.unravel_index(grad_total.argmax(), grad_total.shape)
print("Gradient max is {} at pos {}.".format(max_grad, max_grad_pos))
Might ofc still need to fiddle a liddle with it.
Related
I have about 100 spectra that I need to eventually interpolate onto a single x-axis range. The data are stored as a "velocity" array of 100 sub-arrays, and a "flux" array of 100 sub-arrays. Each sub-array has a different length (in this particular case varying from about 100 to 1700), that eventually need to be interpolated onto a single velocity range, for comparison and plotting purposes. Currently, I have been trying to interpolate the arrays with respect to the shortest-length array (i.e. 100 in this case), resulting in my code looking something like this:
lengths=[len(VELOCITY[i]) for i in range(len(VELOCITY))]
minlen=min(lengths)
idx = np.where(np.asarray(lengths) == minlen)[0][0]
vel_new = np.linspace(VELOCITY[idx][0], VELOCITY[idx][-1],num=len(VELOCITY[idx]), endpoint=True)
for i in range(len(VELOCITY)):
interp = interp1d(VELOCITY[i], FLUX[i], kind='cubic', fill_value='extrapolate')
flux.append(interp(vel_new))
flux = np.asarray(flux)
However, the resulting plot isn't quite what I expect, with what I can only assume is extrapolated points from the interpolation becoming dominant. I'm surprised by this, considering I selected the smallest velocity array, and thus the extrapolation should be minimal.
My question is, am I going about the interpolation incorrectly? Alternatively, do you have a suggestion for a better method than the one I am pursuing? Thanks in advance for your feedback!
I'm trying to extract a mask with an "unknown" shape. I'm going to explain my self better:
My original data consists of a matrix with NaNs that, more or less, surround the true data. I have used a sobel operator to detect the edge:
#data is a matrix with Nan
mask = np.isnan(data)
data[mask] = 0
data[~mask] = 1
out = sobel(data) #sobel is a function that returns the gradient
In figure the output of sobel operation is reported. Since the original data has also NaNs among true data, sobel operator detect inner edges.
I want to try a method to detect only the outer edges (the figure that looks like a rhombus). Consider that not only this shape can vary (it can be a square or a rectangle), but also the position can change (i.e. can be decentered, or very small respect to the image dimension). The result that I would obtain should be a mask with all outer pixel set to True (or False), while all inner pixel set to False (or True).
Thanks!
A possible, partial, solution is using an opening operation, defined as an erosion followed by a dilation. I used the one provided by skimage:
from skimage.morphology import opening
#data has shape shape_1, shape_2
mask_data = np.ones((shape_1, shape_2), dtype=bool)
mask = np.isnan(data)
mask_data[_mask] = 0
mask_data = opening(mask_data).astype(bool)
Such method returns something that is similar to what I'm looking for. As the picture suggests, this actually leaves some black inner dots, but it is the best I found.
This is my problem:
The first input is the observed data of MUSE, which is an astronomical instrument provides cubes, i.e. an image for each wavelength with a certain range. This means that, taken all the wavelengths corresponding to the pixel i,j, I can extract the spectrum for this pixel. Since these images are observed, for each pixel I have an error.
The second input is a spectrum template, i.e. a model of a spectrum. This template is assumed to be without error. I map this spectra at various redshift (this means multiply the wavelenghts for a factor 1+z, where z belong to a certain range).
The core of my code is the cross-correlation between the cube, i.e. the spectra extracted from each pixel, and the template mapped at different redshift. The result is a cross-correlation function for each pixel for each z, let's call this computed function as f(z). Taking, for each pixel, the argmax of f(z), I get the best redshift.
This is a common and widely-used process, indeed, it actually works well.
My question:
Since my input, i.e. the MUSE cube, has an error, I have propagated this error through the cross-correlation, obtaining an error on f(z), i.e. each f_i has a error sigma_i. So, how can I compute the error on z_max, which is the value of z corresponding to the maximum of f?
Maybe a solution could be the implementation of bootstrap method: I can extract, within the error of f, a certain number of function, for each of them I computed the argamx, so i can have an idea about the scatter of z_max.
By the way, I'm using python (3.x) and tensorflow has been used to compute the cross-correlation function.
Thanks!
EDIT
Following #TF_Support suggestion I'm trying to add some code and some figures to better understand the problem. But, before this, maybe it's better a little of math.
With this expression I had computed the cross-correlation:
where S is the spectra, T is the template and N is the normalization coefficient. Since S has an error, I had propagated these errors through the previous relation founding:
where SST_k is the the sum of the template squared and sigma_ij is the error on on S_ij (actually, I should have written sigma_S_ij).
The follow function (implemented with tensorflow 2.1) makes the cross-correlation between one template and the spectra of batch pixels, and computes the error on the cross-correlation function:
#tf.function
def make_xcorr_err1(T, S, sigma_S):
sum_spectra_sq = tf.reduce_sum(tf.square(S), 1) #shape (batch,)
sum_template_sq = tf.reduce_sum(tf.square(T), 0) #shape (Nz, )
norm = tf.sqrt(tf.reshape(sum_spectra_sq, (-1,1))*tf.reshape(sum_template_sq, (1,-1))) #shape (batch, Nz)
xcorr = tf.matmul(S, T, transpose_a = False, transpose_b= False)/norm
foo1 = tf.matmul(sigma_S**2, T**2, transpose_a = False, transpose_b= False)/norm**2
foo2 = xcorr**2 * tf.reshape(sum_template_sq**2, (1,-1)) * tf.reshape(tf.reduce_sum((S*sigma_S)**2, 1), (-1,1))/norm**4
foo3 = - 2 * xcorr * tf.reshape(sum_template_sq, (1,-1)) * tf.matmul(S*(sigma_S)**2, T, transpose_a = False, transpose_b= False)/norm**3
sigma_xcorr = tf.sqrt(tf.maximum(foo1+foo2+foo3, 0.))
Maybe, in order to understand my problem, more important than code is an image representing an output. This is the cross-correlation function for a single pixel, in red the maximum value, let's call z_best, i.e. the best cross-correlated value. The figure also shows the 3 sigma errors (the grey limits are +3sigma -3sigma).
If i zoom-in near the peak, I get this:
As you can see the maximum (as any other value) oscillates within a certain range. I would like to find a way to map this fluctuations of maximum (or the fluctuations around the maximum, or the fluctuations of the whole function) to an error on the value corresponding the maximum, i.e. an error on z_best.
I'm trying to use KBinsDiscretizer from sklearn.preprocessing, but it returns integer values as 1,2,..,N (representing the interval). Is it possible to return a correct interval as (0.2, 0.5) or this is not implemented yet?
based on the docs: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html:
Attributes: n_bins_ : int array, shape (n_features,):
Number of bins per feature. Bins whose width are too small (i.e., <= 1e-8) are removed with a warning. bin_edges_ : array of arrays,
shape (n_features, ):
The edges of each bin. Contain arrays of varying shapes (n_bins_, ) Ignored features will have empty arrays.
This would mean a no in your case. There is also another hint:
The inverse_transform function converts the binned data into the original feature space. Each value will be equal to the mean of the two bin edges.```
I know that scipy.cluster.hierarchy focused on dealing with the distance matrix. But now I have a similarity matrix... After I plot it by using Dendrogram, something weird just happens.
Here is the code:
similarityMatrix = np.array(([1,0.75,0.75,0,0,0,0],
[0.75,1,1,0.25,0,0,0],
[0.75,1,1,0.25,0,0,0],
[0,0.25,0.25,1,0.25,0.25,0],
[0,0,0,0.25,1,1,0.75],
[0,0,0,0.25,1,1,0.75],
[0,0,0,0,0.75,0.75,1]))
here is the linkage method
Z_sim = sch.linkage(similarityMatrix)
plt.figure(1)
plt.title('similarity')
sch.dendrogram(
Z_sim,
labels=['1','2','3','4','5','6','7']
)
plt.show()
But here is the outcome:
My question is:
Why is the label for this dendrogram not right?
I am giving a similarity matrix for the linkage method, but I cannot fully understand what the vertical axes means. For example, as the maximum similarity is 1, why is the maximum value in the vertical axes almost 1.6?
Thank you very much for your help!
linkage expects "distances", not "similarities". To convert your matrix to something like a distance matrix, you can subtract it from 1:
dist = 1 - similarityMatrix
linkage does not accept a square distance matrix. It expects the distance data to be in "condensed" form. You can get that using scipy.spatial.distance.squareform:
from scipy.spatial.distance import squareform
dist = 1 - similarityMatrix
condensed_dist = squareform(dist)
Z_sim = sch.linkage(condensed_dist)
(When you pass a two-dimensional array with shape (m, n) to linkage, it treats the rows as points in n-dimensional space, and computes the distances internally.)