Python: create a mask with unknown shape - python-3.x

I'm trying to extract a mask with an "unknown" shape. I'm going to explain my self better:
My original data consists of a matrix with NaNs that, more or less, surround the true data. I have used a sobel operator to detect the edge:
#data is a matrix with Nan
mask = np.isnan(data)
data[mask] = 0
data[~mask] = 1
out = sobel(data) #sobel is a function that returns the gradient
In figure the output of sobel operation is reported. Since the original data has also NaNs among true data, sobel operator detect inner edges.
I want to try a method to detect only the outer edges (the figure that looks like a rhombus). Consider that not only this shape can vary (it can be a square or a rectangle), but also the position can change (i.e. can be decentered, or very small respect to the image dimension). The result that I would obtain should be a mask with all outer pixel set to True (or False), while all inner pixel set to False (or True).
Thanks!

A possible, partial, solution is using an opening operation, defined as an erosion followed by a dilation. I used the one provided by skimage:
from skimage.morphology import opening
#data has shape shape_1, shape_2
mask_data = np.ones((shape_1, shape_2), dtype=bool)
mask = np.isnan(data)
mask_data[_mask] = 0
mask_data = opening(mask_data).astype(bool)
Such method returns something that is similar to what I'm looking for. As the picture suggests, this actually leaves some black inner dots, but it is the best I found.

Related

Change the mask for few numpy arrays

So, I have on the input few masked arrays. For computation I use slices: top left, top right, bottom left and bottom right:
dy0 = dy__[:-1, :-1]
dy1 = dy__[:-1, 1:]
dy2 = dy__[1:, 1:]
dy3 = dy__[1:, :-1]
The same is done with dx and g values.
To compute sums or differences correctly I need to change the mask to make it the same for all of them. For now I count the sum of converted into int masks of 4 arrays and check if it's more than one. So if there is more than one masked element - I mask it.
import functools
sum = functools.reduce(lambda x1, x2: x1.astype('int') + x2.astype('int'), list_of_masks)
mask = sum > 1 # mask output if more than 1 input is masked
But when I initialize masks like dy0.mask = new_mask they don't change.
Also when I replace 0 elements in one array with 1 using numpy.where() the mask disappears, so I can initialize the new one. But for those arrays which stay the same mask still doesn't change. (I checked the numpy.ma documentation, and it should)
The problem is in some functions there are too many arrays which mask might be changed to the new one, so it's better to find a good way to initialize it in one operation for few arrays and be sure it works.
Is there any way to do it or to find why it doesn`t work as it should?

Which is the error of a value corresponding to the maximum of a function?

This is my problem:
The first input is the observed data of MUSE, which is an astronomical instrument provides cubes, i.e. an image for each wavelength with a certain range. This means that, taken all the wavelengths corresponding to the pixel i,j, I can extract the spectrum for this pixel. Since these images are observed, for each pixel I have an error.
The second input is a spectrum template, i.e. a model of a spectrum. This template is assumed to be without error. I map this spectra at various redshift (this means multiply the wavelenghts for a factor 1+z, where z belong to a certain range).
The core of my code is the cross-correlation between the cube, i.e. the spectra extracted from each pixel, and the template mapped at different redshift. The result is a cross-correlation function for each pixel for each z, let's call this computed function as f(z). Taking, for each pixel, the argmax of f(z), I get the best redshift.
This is a common and widely-used process, indeed, it actually works well.
My question:
Since my input, i.e. the MUSE cube, has an error, I have propagated this error through the cross-correlation, obtaining an error on f(z), i.e. each f_i has a error sigma_i. So, how can I compute the error on z_max, which is the value of z corresponding to the maximum of f?
Maybe a solution could be the implementation of bootstrap method: I can extract, within the error of f, a certain number of function, for each of them I computed the argamx, so i can have an idea about the scatter of z_max.
By the way, I'm using python (3.x) and tensorflow has been used to compute the cross-correlation function.
Thanks!
EDIT
Following #TF_Support suggestion I'm trying to add some code and some figures to better understand the problem. But, before this, maybe it's better a little of math.
With this expression I had computed the cross-correlation:
where S is the spectra, T is the template and N is the normalization coefficient. Since S has an error, I had propagated these errors through the previous relation founding:
where SST_k is the the sum of the template squared and sigma_ij is the error on on S_ij (actually, I should have written sigma_S_ij).
The follow function (implemented with tensorflow 2.1) makes the cross-correlation between one template and the spectra of batch pixels, and computes the error on the cross-correlation function:
#tf.function
def make_xcorr_err1(T, S, sigma_S):
sum_spectra_sq = tf.reduce_sum(tf.square(S), 1) #shape (batch,)
sum_template_sq = tf.reduce_sum(tf.square(T), 0) #shape (Nz, )
norm = tf.sqrt(tf.reshape(sum_spectra_sq, (-1,1))*tf.reshape(sum_template_sq, (1,-1))) #shape (batch, Nz)
xcorr = tf.matmul(S, T, transpose_a = False, transpose_b= False)/norm
foo1 = tf.matmul(sigma_S**2, T**2, transpose_a = False, transpose_b= False)/norm**2
foo2 = xcorr**2 * tf.reshape(sum_template_sq**2, (1,-1)) * tf.reshape(tf.reduce_sum((S*sigma_S)**2, 1), (-1,1))/norm**4
foo3 = - 2 * xcorr * tf.reshape(sum_template_sq, (1,-1)) * tf.matmul(S*(sigma_S)**2, T, transpose_a = False, transpose_b= False)/norm**3
sigma_xcorr = tf.sqrt(tf.maximum(foo1+foo2+foo3, 0.))
Maybe, in order to understand my problem, more important than code is an image representing an output. This is the cross-correlation function for a single pixel, in red the maximum value, let's call z_best, i.e. the best cross-correlated value. The figure also shows the 3 sigma errors (the grey limits are +3sigma -3sigma).
If i zoom-in near the peak, I get this:
As you can see the maximum (as any other value) oscillates within a certain range. I would like to find a way to map this fluctuations of maximum (or the fluctuations around the maximum, or the fluctuations of the whole function) to an error on the value corresponding the maximum, i.e. an error on z_best.

What is the difference between cv2.addWeighted and numpy mean, in this case?

Suppose I have two OpenCV (python package cv2) loaded grayscale images img1 and img2, both of same dimensions. Now, I wish to take the mean of both img1 and img2. Here are two ways to do it:
# Method 1
mean = (img1 * 0.5) + (img2 * 0.5)
# Method 2
mean = cv2.addWeighted(img1,0.5,img2,0.5,0)
However, mean is visually different in both methods, when I display them using cv2.imshow. Why is this so?
I am glad that you have found a working solution to your problem, but this seems to be a workaround. The real reason for this behaviour lies somewhere else. The problem here is that mean = (img1 * 0.5) + (img2 * 0.5) is returning a matrix with float32 data type which contains values in range 0.0 - 255.0. You can verify this by using print mean.dtype. Since the new matrix values have been converted to float unintentionally, we can revert this operation by using (img_1 * 0.5 + img_2 * 0.5).astype("uint8"). In case of cv2.addWeighted() it automatically returns you a matrix of data type uint8 and all things would work fine.
My concern is with the conclusion that you have drawn:
The issue is that the cv2.imshow() method used to display images,
expects your image arrays to be normalized, i.e. in the range [0,1].
cv2.imshow() works just fine with range of [0-255] and [0.0-1.0], but the issue arises when you pass a matrix whose values are in range [0-255], but the dtype is float32 instead of uint8.
Answering my own question, to help others who get confused by this:
Both methods 1 and 2 yield the same result. You can verify this by writing the mean image to disk using cv2.imwrite. The issue is not with the methods.
The issue is that the cv2.imshow method used to display images, expects your image arrays to be normalized, i.e. in the range [0,1]. In my case, both the image arrays are 8-bit unsigned integers and so, its pixel values are in the range [0,255]. Since mean is an average of the two arrays, its pixel values are also in the range [0,255]. So when I passed mean to cv2.imshow, pixels having values greater than 1 were interpreted as having a value of 255, resulting in vastly different visuals.
The solution is to normalize mean before passing it to cv2.imshow:
# Method 1
mean = (img1 * 0.5) + (img2 * 0.5)
# Method 2
mean = cv2.addWeighted(img1,0.5,img2,0.5,0)
# Note that the division by 255 results in the image array values being squeezed to [0,1].
cv2.imshow("Averaged", mean/255.)

using Geopandas, How to randomly select in each polygon 5 Points by sampling method

I want to select 5 Points in each polygon based on random sampling method. And required 5 points co-ordinates(Lat,Long) in each polygon for identify which crop is grawn.
Any ideas for do this using geopandas?
Many thanks.
My suggestion involves sampling random x and y coordinates within the shape's bounding box and then checking whether the sampled point is actually within the shape. If the sampled point is within the shape then return it, otherwise repeat until a point within the shape is found. For sampling, we can use the uniform distribution, such that all points in the shape have the same probability of being sampled. Here is the function:
from shapely.geometry import Point
def random_point_in_shp(shp):
within = False
while not within:
x = np.random.uniform(shp.bounds[0], shp.bounds[2])
y = np.random.uniform(shp.bounds[1], shp.bounds[3])
within = shp.contains(Point(x, y))
return Point(x,y)
and here's an example how to apply this function to an example GeoDataFrame called geo_df to get 5 random points for each entry:
for num in range(5):
geo_df['Point{}'.format(num)] = geo_df['geometry'].apply(random_point_in_shp)
There might be more efficient ways to do this, but depending on your application the algorithm could be sufficiently fast. With my test file, which contains ~2300 entries, generating five random points for each entry took around 15 seconds on my machine.

Finding the maximum slope in an Lat\Lon array

I'm plotting sea surface height by Latitude for a 20 different Longitudes.
The result is a line plot with 20 lines. I need to find in which line has the steepest slope and then pinpoint that lat lon.
I've tried so far with np.gradient and then max() but I keep getting an error (ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all())
I have a feeling theres a much better way to do it.Thanks to those willing to help.
example of plot
slice3lat= lat[20:40]
slice3lon= lon[20:40]
slice3ssh=ssh[:,0,20:40,20:40]
plt.plot(slice3lat,slice3ssh)
plt.xlabel("Latitude")
plt.ylabel("SSH (m)")
plt.legend()
When you say max(), I assume you mean Python's built-in max function. This works on numpy arrays only if they are one-dimensional / flat, where by iterating over the elements, size comparable scalars are obtained. If you have a 2D array like in your case, the top-level elements of the array become its rows, where the size comparison fails with the message you presented.
In this case, you should use np.max on the array or call the arr.max() method directly.
Here's some example code using np.gradient, vector adding the gradients in each direction and obtaining the max together with its coordinate position in the original data:
grad_y, grad_x = np.gradient(ssh)
grad_total = np.sqrt(grad_y**2 + grad_x**2) # or just grad_y ?
max_grad = grad_total.max()
max_grad_pos = np.unravel_index(grad_total.argmax(), grad_total.shape)
print("Gradient max is {} at pos {}.".format(max_grad, max_grad_pos))
Might ofc still need to fiddle a liddle with it.

Resources