Cosmic ray removal in spectra - python-3.x

Python developers
I am working on spectroscopy in a university. My experimental 1-D data sometimes shows "cosmic ray", 3-pixel ultra-high intensity, which is not what I want to analyze. So I want to remove this kind of weird peaks.
Does anybody know how to fix this issue in Python 3?
Thanks in advance!!

A simple solution could be to use the algorithm proposed by Whitaker and Hayes, in which they use modified z scores on the derivative of the spectrum. This medium post explains how it works and its implementation in python https://towardsdatascience.com/removing-spikes-from-raman-spectra-8a9fdda0ac22 .
The idea is to calculate the modified z scores of the spectra derivatives and apply a threshold to detect the cosmic spikes. Afterwards, a fixer is applied to remove the spike points and replace it by the mean values of the surrounding pixels.
# definition of a function to calculate the modified z score.
def modified_z_score(intensity):
median_int = np.median(intensity)
mad_int = np.median([np.abs(intensity - median_int)])
modified_z_scores = 0.6745 * (intensity - median_int) / mad_int
return modified_z_scores
# Once the spike detection works, the spectrum can be fixed by calculating the average of the previous and the next point to the spike. y is the intensity values of a spectrum, m is the window which we will use to calculate the mean.
def fixer(y,m):
threshold = 7 # binarization threshold.
spikes = abs(np.array(modified_z_score(np.diff(y)))) > threshold
y_out = y.copy() # So we don't overwrite y
for i in np.arange(len(spikes)):
if spikes[i] != 0: # If we have an spike in position i
w = np.arange(i-m,i+1+m) # we select 2 m + 1 points around our spike
w2 = w[spikes[w] == 0] # From such interval, we choose the ones which are not spikes
y_out[i] = np.mean(y[w2]) # and we average the value
return y_out

The answer depends a on what your data looks like: If you have access to two-dimensional CCD readouts that the one-dimensional spectra were created from, then you can use the lacosmic module to get rid of the cosmic rays there. If you have only one-dimensional spectra, but multiple spectra from the same source, then a quick ad-hoc fix is to make a rough normalisation of the spectra and remove those pixels that are several times brighter than the corresponding pixels in the other spectra. If you have only one one-dimensional spectrum from each source, then a less reliable option is to remove all pixels that are much brighter than their neighbours. (Depending on the shape of your cosmics, you may even want to remove the nearest 5 pixels or something, to catch the wings of the cosmic ray peak as well).

Related

FFT loss in PyTorch

I want to compute the loss between the GT and the output of my network (called TDN) in the frequency domain by computing 2D FFT. The tensors are of dim batch x channel x height x width
amp_ip, phase_ip = 2DFFT(TDN(ip))
amp_gt, phase_gt = 2DFFT(TDN(gt))
loss = ||amp_ip - amp_gt||
For computing FFT I can use torch.fft(ip, signal_ndim = 2). But the output is in a + j b format i.e rectangular coordinates and NOT decomposed into phase and amplitude. How can I convert a + j b into amp exp(j phase) format in PyTorch? A side concern is also if signal_ndims be kept 2 to compute 2D FFT or something else?
The following description, which describes the loss that I plan to implement, maybe useful.
The question is answered by the GITHUB code file shared by #akshayk07 in the comments. Extracting the relevant information from that code, the concise answer to the question is,
fft_im = torch.rfft(img.clone(), signal_ndim=2, onesided=False)
# fft_im: size should be bx3xhxwx2
fft_amp = fft_im[:,:,:,:,0]**2 + fft_im[:,:,:,:,1]**2
fft_amp = torch.sqrt(fft_amp) # this is the amplitude
fft_pha = torch.atan2( fft_im[:,:,:,:,1], fft_im[:,:,:,:,0] ) # this is the phase
As of PyTorch 1.7.1 choose torch.rfft over torch.fft as the latter does not work off the shelf with real valued tensors propagating in CNNs. Also a good idea will be ti use the normalisation flag of torch.rfft.

What is the upsampling method called 'area' used for?

The PyTorch function torch.nn.functional.interpolate contains several modes for upsampling, such as: nearest, linear, bilinear, bicubic, trilinear, area.
What is the area upsampling modes used for?
As jodag said, it is resizing using adaptive average pooling. While the answer at the link aims to explain what adaptive average pooling is, I find the explanation a bit vague.
TL;DR the area mode of torch.nn.functional.interpolate is probably one of the most intuitive ways to think of when one wants to downsample an image.
You can think of it as applying an averaging Low-Pass Filter(LPF) to the original image and then sampling. Applying an LPF before sampling is to prevent potential aliasing in the downsampled image. Aliasing can result in Moiré patterns in the downscaled image.
It is probably called "area" because it (roughly) preserves the area ratio between the input and output shapes when averaging the input pixels. More specifically, every pixel in the output image will be the average of a respective region in the input image where the 1/area of this region will be roughly the ratio between output image's area and input image's area.
Furthermore, the interpolate function with mode = 'area' calls the source function adaptie_avg_pool2d (implemented in C++) which assigns each pixel in the output tensor the average of all pixel intensities within a computed region of the input. That region is computed per pixel and can vary in size for different pixels. The way it is computed is by multiplying the output pixel's height and width by the ratio between the input and output (in that order) height and width (respectively) and then taking once the floor (for the region's starting index) and once the ceil (for the region's ending index) of the resulting value.
Here's an in-depth analysis of what happens in nn.AdaptiveAvgPool2d:
First of all, as stated there you can find the source code for adaptive average pooling (in C++) here: source
Taking a look at the function where the magic happens (or at least the magic on CPU for a single frame), static void adaptive_avg_pool2d_single_out_frame, we have 5 nested loops, running over channel dimension, then width, then height and within the body of the 3rd loop the magic happens:
First compute the region within the input image which is used to calculate the value of the current pixel (recall we had width and height loop to run over all pixels in the output).
How is this done?
Using a simple computation of start and end indices for height and width as follows: floor((input_height/output_height) * current_output_pixel_height) for the start and ceil((input_height/output_height) * (current_output_pixel_height+1)) and similarly for the width.
Then, all that is done is to simply average the intensities of all pixels in that region and current channel and place the result in the current output pixel.
I wrote a simple Python snippet that does the same thing, in the same fashion (loops, naive) and produces equivalent results. It takes tensor a and uses adaptive average pool to resize a to shape output_shape in 2 ways - once using the built-in nn.AdaptiveAvgPool2d and once with my translation into Python of the source function in C++: static void adaptive_avg_pool2d_single_out_frame. Built-in function's result is saved into b and my translation is saved into b_hat. You can see that the results are equivalent (you can further play with the spatial shapes and validate this):
import torch
from math import floor, ceil
from torch import nn
a = torch.randn(1, 3, 15, 17)
out_shape = (10, 11)
b = nn.AdaptiveAvgPool2d(out_shape)(a)
b_hat = torch.zeros(b.shape)
for d in range(a.shape[1]):
for w in range(b_hat.shape[3]):
for h in range(b_hat.shape[2]):
startW = floor(w * a.shape[3] / out_shape[1])
endW = ceil((w + 1) * a.shape[3] / out_shape[1])
startH = floor(h * a.shape[2] / out_shape[0])
endH = ceil((h + 1) * a.shape[2] / out_shape[0])
b_hat[0, d, h, w] = torch.mean(a[0, d, startH: endH, startW: endW])
'''
Prints Mean Squared Error = 0 (or a very small number, due to precision error)
as both outputs are the same, proof of output equivalence:
'''
print(nn.MSELoss()(b_hat, b))
Looking at the source code it appears area interpolation is equivalent to resizing a tensor via adaptive average pooling. You can refer to this question for an explanation of adaptive average pooling. Therefore area interpolation is more applicable to downsampling than upsampling.

using Geopandas, How to randomly select in each polygon 5 Points by sampling method

I want to select 5 Points in each polygon based on random sampling method. And required 5 points co-ordinates(Lat,Long) in each polygon for identify which crop is grawn.
Any ideas for do this using geopandas?
Many thanks.
My suggestion involves sampling random x and y coordinates within the shape's bounding box and then checking whether the sampled point is actually within the shape. If the sampled point is within the shape then return it, otherwise repeat until a point within the shape is found. For sampling, we can use the uniform distribution, such that all points in the shape have the same probability of being sampled. Here is the function:
from shapely.geometry import Point
def random_point_in_shp(shp):
within = False
while not within:
x = np.random.uniform(shp.bounds[0], shp.bounds[2])
y = np.random.uniform(shp.bounds[1], shp.bounds[3])
within = shp.contains(Point(x, y))
return Point(x,y)
and here's an example how to apply this function to an example GeoDataFrame called geo_df to get 5 random points for each entry:
for num in range(5):
geo_df['Point{}'.format(num)] = geo_df['geometry'].apply(random_point_in_shp)
There might be more efficient ways to do this, but depending on your application the algorithm could be sufficiently fast. With my test file, which contains ~2300 entries, generating five random points for each entry took around 15 seconds on my machine.

What is the fastest way to find the center of an irregular convex polygon?

I'm interested in a fast way to calculate the rotation-independent center of a simple, convex, (non-intersecting) 2D polygon.
The example below (on the left) shows the mean center (sum of all points divided by the total), and the desired result on the right.
Some options I've already considered.
bound-box center (depends on rotation, and ignores points based on their relation to the axis).
Straight skeleton - too slow to calculate.
I've found a way which works reasonably well, (weight the points by the edge-lengths) - but this means a square-root call for every edge - which I'd like to avoid.(Will post as an answer, even though I'm not entirely satisfied with it).
Note, I'm aware of this questions similarity with:What is the fastest way to find the "visual" center of an irregularly shaped polygon?
However having to handle convex polygons increases the complexity of the problem significantly.
The points of the polygon can be weighted by their edge length which compensates for un-even point distribution.
This works for convex polygons too but in that case the center point isn't guaranteed to be inside the polygon.
Psudo-code:
def poly_center(poly):
sum_center = (0, 0)
sum_weight = 0.0
for point in poly:
weight = ((point - point.next).length +
(point - point.prev).length)
sum_center += point * weight
sum_weight += weight
return sum_center / sum_weight
Note, we can pre-calculate all edge lengths to halve the number of length calculations, or reuse the previous edge-length for half+1 length calculations. This is just written as an example to show the logic.
Including this answer for completeness since its the best method I've found so far.
There is no much better way than the accumulation of coordinates weighted by the edge length, which indeed takes N square roots.
If you accept an approximation, it is possible to skip some of the vertices by curve simplification, as follows:
decide of a deviation tolerance;
start from vertex 0 and jump to vertex M (say M=N/2);
check if the deviation along the polyline from 0 to M exceeds the tolerance (for this, compute the height of the triangle formed by the vertices 0, M/2, M);
if the deviation is exceeded, repeat recursively with 0, M/4, M/2 and M/2, 3M/4, M;
if the deviation is not exceeded, assume that the shape is straight between 0 and M.
continue until the end of the polygon.
Where the points are dense (like the left edge on your example), you should get some speedup.
I think its easiest to do something with the center of masses of the delaunay triangulation of the polygon points. i.e.
def _centroid_poly(poly):
T = spatial.Delaunay(poly).simplices
n = T.shape[0]
W = np.zeros(n)
C = 0
for m in range(n):
sp = poly[T[m,:],:]
W[m] = spatial.ConvexHull(sp).volume
C += W[m] +np.mean(sp, axis = 0)
return C / np.sum(W)
This works well for me!

Given a set of points, how do I approximate the major axis of its shape?

Given a "shape" drawn by the user, I would like to "normalize" it so they all have similar size and orientation. What we have is a set of points. I can approximate the size using bounding box or circle, but the orientation is a bit more tricky.
The right way to do it, I think, is to calculate the majoraxis of its bounding ellipse. To do that you need to calculate the eigenvector of the covariance matrix. Doing so likely will be way too complicated for my need, since I am looking for some good-enough estimate. Picking min, max, and 20 random points could be some starter. Is there an easy way to approximate this?
Edit:
I found Power method to iteratively approximate eigenvector. Wikipedia article.
So far I am liking David's answer.
You'd be calculating the eigenvectors of a 2x2 matrix, which can be done with a few simple formulas, so it's not that complicated. In pseudocode:
// sums are over all points
b = -(sum(x * x) - sum(y * y)) / (2 * sum(x * y))
evec1_x = b + sqrt(b ** 2 + 1)
evec1_y = 1
evec2_x = b - sqrt(b ** 2 + 1)
evec2_y = 1
You could even do this by summing over only some of the points to get an estimate, if you expect that your chosen subset of points would be representative of the full set.
Edit: I think x and y must be translated to zero-mean, i.e. subtract mean from all x, y first (eed3si9n).
Here's a thought... What if you performed a linear regression on the points and used the slope of the resulting line? If not all of the points, at least a sample of them.
The r^2 value would also give you information about the general shape. The closer to 0, the more circular/uniform the shape is (circle/square). The closer to 1, the more stretched out the shape is (oval/rectangle).
The ultimate solution to this problem is running PCA
I wish I could find a nice little implementation for you to refer to...
Here you go! (assuming x is a nx2 vector)
def majAxis(x):
e,v = np.linalg.eig(np.cov(x.T)); return v[:,np.argmax(e)]

Resources