What is the upsampling method called 'area' used for? - pytorch

The PyTorch function torch.nn.functional.interpolate contains several modes for upsampling, such as: nearest, linear, bilinear, bicubic, trilinear, area.
What is the area upsampling modes used for?

As jodag said, it is resizing using adaptive average pooling. While the answer at the link aims to explain what adaptive average pooling is, I find the explanation a bit vague.
TL;DR the area mode of torch.nn.functional.interpolate is probably one of the most intuitive ways to think of when one wants to downsample an image.
You can think of it as applying an averaging Low-Pass Filter(LPF) to the original image and then sampling. Applying an LPF before sampling is to prevent potential aliasing in the downsampled image. Aliasing can result in Moiré patterns in the downscaled image.
It is probably called "area" because it (roughly) preserves the area ratio between the input and output shapes when averaging the input pixels. More specifically, every pixel in the output image will be the average of a respective region in the input image where the 1/area of this region will be roughly the ratio between output image's area and input image's area.
Furthermore, the interpolate function with mode = 'area' calls the source function adaptie_avg_pool2d (implemented in C++) which assigns each pixel in the output tensor the average of all pixel intensities within a computed region of the input. That region is computed per pixel and can vary in size for different pixels. The way it is computed is by multiplying the output pixel's height and width by the ratio between the input and output (in that order) height and width (respectively) and then taking once the floor (for the region's starting index) and once the ceil (for the region's ending index) of the resulting value.
Here's an in-depth analysis of what happens in nn.AdaptiveAvgPool2d:
First of all, as stated there you can find the source code for adaptive average pooling (in C++) here: source
Taking a look at the function where the magic happens (or at least the magic on CPU for a single frame), static void adaptive_avg_pool2d_single_out_frame, we have 5 nested loops, running over channel dimension, then width, then height and within the body of the 3rd loop the magic happens:
First compute the region within the input image which is used to calculate the value of the current pixel (recall we had width and height loop to run over all pixels in the output).
How is this done?
Using a simple computation of start and end indices for height and width as follows: floor((input_height/output_height) * current_output_pixel_height) for the start and ceil((input_height/output_height) * (current_output_pixel_height+1)) and similarly for the width.
Then, all that is done is to simply average the intensities of all pixels in that region and current channel and place the result in the current output pixel.
I wrote a simple Python snippet that does the same thing, in the same fashion (loops, naive) and produces equivalent results. It takes tensor a and uses adaptive average pool to resize a to shape output_shape in 2 ways - once using the built-in nn.AdaptiveAvgPool2d and once with my translation into Python of the source function in C++: static void adaptive_avg_pool2d_single_out_frame. Built-in function's result is saved into b and my translation is saved into b_hat. You can see that the results are equivalent (you can further play with the spatial shapes and validate this):
import torch
from math import floor, ceil
from torch import nn
a = torch.randn(1, 3, 15, 17)
out_shape = (10, 11)
b = nn.AdaptiveAvgPool2d(out_shape)(a)
b_hat = torch.zeros(b.shape)
for d in range(a.shape[1]):
for w in range(b_hat.shape[3]):
for h in range(b_hat.shape[2]):
startW = floor(w * a.shape[3] / out_shape[1])
endW = ceil((w + 1) * a.shape[3] / out_shape[1])
startH = floor(h * a.shape[2] / out_shape[0])
endH = ceil((h + 1) * a.shape[2] / out_shape[0])
b_hat[0, d, h, w] = torch.mean(a[0, d, startH: endH, startW: endW])
'''
Prints Mean Squared Error = 0 (or a very small number, due to precision error)
as both outputs are the same, proof of output equivalence:
'''
print(nn.MSELoss()(b_hat, b))

Looking at the source code it appears area interpolation is equivalent to resizing a tensor via adaptive average pooling. You can refer to this question for an explanation of adaptive average pooling. Therefore area interpolation is more applicable to downsampling than upsampling.

Related

Which is the error of a value corresponding to the maximum of a function?

This is my problem:
The first input is the observed data of MUSE, which is an astronomical instrument provides cubes, i.e. an image for each wavelength with a certain range. This means that, taken all the wavelengths corresponding to the pixel i,j, I can extract the spectrum for this pixel. Since these images are observed, for each pixel I have an error.
The second input is a spectrum template, i.e. a model of a spectrum. This template is assumed to be without error. I map this spectra at various redshift (this means multiply the wavelenghts for a factor 1+z, where z belong to a certain range).
The core of my code is the cross-correlation between the cube, i.e. the spectra extracted from each pixel, and the template mapped at different redshift. The result is a cross-correlation function for each pixel for each z, let's call this computed function as f(z). Taking, for each pixel, the argmax of f(z), I get the best redshift.
This is a common and widely-used process, indeed, it actually works well.
My question:
Since my input, i.e. the MUSE cube, has an error, I have propagated this error through the cross-correlation, obtaining an error on f(z), i.e. each f_i has a error sigma_i. So, how can I compute the error on z_max, which is the value of z corresponding to the maximum of f?
Maybe a solution could be the implementation of bootstrap method: I can extract, within the error of f, a certain number of function, for each of them I computed the argamx, so i can have an idea about the scatter of z_max.
By the way, I'm using python (3.x) and tensorflow has been used to compute the cross-correlation function.
Thanks!
EDIT
Following #TF_Support suggestion I'm trying to add some code and some figures to better understand the problem. But, before this, maybe it's better a little of math.
With this expression I had computed the cross-correlation:
where S is the spectra, T is the template and N is the normalization coefficient. Since S has an error, I had propagated these errors through the previous relation founding:
where SST_k is the the sum of the template squared and sigma_ij is the error on on S_ij (actually, I should have written sigma_S_ij).
The follow function (implemented with tensorflow 2.1) makes the cross-correlation between one template and the spectra of batch pixels, and computes the error on the cross-correlation function:
#tf.function
def make_xcorr_err1(T, S, sigma_S):
sum_spectra_sq = tf.reduce_sum(tf.square(S), 1) #shape (batch,)
sum_template_sq = tf.reduce_sum(tf.square(T), 0) #shape (Nz, )
norm = tf.sqrt(tf.reshape(sum_spectra_sq, (-1,1))*tf.reshape(sum_template_sq, (1,-1))) #shape (batch, Nz)
xcorr = tf.matmul(S, T, transpose_a = False, transpose_b= False)/norm
foo1 = tf.matmul(sigma_S**2, T**2, transpose_a = False, transpose_b= False)/norm**2
foo2 = xcorr**2 * tf.reshape(sum_template_sq**2, (1,-1)) * tf.reshape(tf.reduce_sum((S*sigma_S)**2, 1), (-1,1))/norm**4
foo3 = - 2 * xcorr * tf.reshape(sum_template_sq, (1,-1)) * tf.matmul(S*(sigma_S)**2, T, transpose_a = False, transpose_b= False)/norm**3
sigma_xcorr = tf.sqrt(tf.maximum(foo1+foo2+foo3, 0.))
Maybe, in order to understand my problem, more important than code is an image representing an output. This is the cross-correlation function for a single pixel, in red the maximum value, let's call z_best, i.e. the best cross-correlated value. The figure also shows the 3 sigma errors (the grey limits are +3sigma -3sigma).
If i zoom-in near the peak, I get this:
As you can see the maximum (as any other value) oscillates within a certain range. I would like to find a way to map this fluctuations of maximum (or the fluctuations around the maximum, or the fluctuations of the whole function) to an error on the value corresponding the maximum, i.e. an error on z_best.

What is the difference between cv2.addWeighted and numpy mean, in this case?

Suppose I have two OpenCV (python package cv2) loaded grayscale images img1 and img2, both of same dimensions. Now, I wish to take the mean of both img1 and img2. Here are two ways to do it:
# Method 1
mean = (img1 * 0.5) + (img2 * 0.5)
# Method 2
mean = cv2.addWeighted(img1,0.5,img2,0.5,0)
However, mean is visually different in both methods, when I display them using cv2.imshow. Why is this so?
I am glad that you have found a working solution to your problem, but this seems to be a workaround. The real reason for this behaviour lies somewhere else. The problem here is that mean = (img1 * 0.5) + (img2 * 0.5) is returning a matrix with float32 data type which contains values in range 0.0 - 255.0. You can verify this by using print mean.dtype. Since the new matrix values have been converted to float unintentionally, we can revert this operation by using (img_1 * 0.5 + img_2 * 0.5).astype("uint8"). In case of cv2.addWeighted() it automatically returns you a matrix of data type uint8 and all things would work fine.
My concern is with the conclusion that you have drawn:
The issue is that the cv2.imshow() method used to display images,
expects your image arrays to be normalized, i.e. in the range [0,1].
cv2.imshow() works just fine with range of [0-255] and [0.0-1.0], but the issue arises when you pass a matrix whose values are in range [0-255], but the dtype is float32 instead of uint8.
Answering my own question, to help others who get confused by this:
Both methods 1 and 2 yield the same result. You can verify this by writing the mean image to disk using cv2.imwrite. The issue is not with the methods.
The issue is that the cv2.imshow method used to display images, expects your image arrays to be normalized, i.e. in the range [0,1]. In my case, both the image arrays are 8-bit unsigned integers and so, its pixel values are in the range [0,255]. Since mean is an average of the two arrays, its pixel values are also in the range [0,255]. So when I passed mean to cv2.imshow, pixels having values greater than 1 were interpreted as having a value of 255, resulting in vastly different visuals.
The solution is to normalize mean before passing it to cv2.imshow:
# Method 1
mean = (img1 * 0.5) + (img2 * 0.5)
# Method 2
mean = cv2.addWeighted(img1,0.5,img2,0.5,0)
# Note that the division by 255 results in the image array values being squeezed to [0,1].
cv2.imshow("Averaged", mean/255.)

using Geopandas, How to randomly select in each polygon 5 Points by sampling method

I want to select 5 Points in each polygon based on random sampling method. And required 5 points co-ordinates(Lat,Long) in each polygon for identify which crop is grawn.
Any ideas for do this using geopandas?
Many thanks.
My suggestion involves sampling random x and y coordinates within the shape's bounding box and then checking whether the sampled point is actually within the shape. If the sampled point is within the shape then return it, otherwise repeat until a point within the shape is found. For sampling, we can use the uniform distribution, such that all points in the shape have the same probability of being sampled. Here is the function:
from shapely.geometry import Point
def random_point_in_shp(shp):
within = False
while not within:
x = np.random.uniform(shp.bounds[0], shp.bounds[2])
y = np.random.uniform(shp.bounds[1], shp.bounds[3])
within = shp.contains(Point(x, y))
return Point(x,y)
and here's an example how to apply this function to an example GeoDataFrame called geo_df to get 5 random points for each entry:
for num in range(5):
geo_df['Point{}'.format(num)] = geo_df['geometry'].apply(random_point_in_shp)
There might be more efficient ways to do this, but depending on your application the algorithm could be sufficiently fast. With my test file, which contains ~2300 entries, generating five random points for each entry took around 15 seconds on my machine.

Cosmic ray removal in spectra

Python developers
I am working on spectroscopy in a university. My experimental 1-D data sometimes shows "cosmic ray", 3-pixel ultra-high intensity, which is not what I want to analyze. So I want to remove this kind of weird peaks.
Does anybody know how to fix this issue in Python 3?
Thanks in advance!!
A simple solution could be to use the algorithm proposed by Whitaker and Hayes, in which they use modified z scores on the derivative of the spectrum. This medium post explains how it works and its implementation in python https://towardsdatascience.com/removing-spikes-from-raman-spectra-8a9fdda0ac22 .
The idea is to calculate the modified z scores of the spectra derivatives and apply a threshold to detect the cosmic spikes. Afterwards, a fixer is applied to remove the spike points and replace it by the mean values of the surrounding pixels.
# definition of a function to calculate the modified z score.
def modified_z_score(intensity):
median_int = np.median(intensity)
mad_int = np.median([np.abs(intensity - median_int)])
modified_z_scores = 0.6745 * (intensity - median_int) / mad_int
return modified_z_scores
# Once the spike detection works, the spectrum can be fixed by calculating the average of the previous and the next point to the spike. y is the intensity values of a spectrum, m is the window which we will use to calculate the mean.
def fixer(y,m):
threshold = 7 # binarization threshold.
spikes = abs(np.array(modified_z_score(np.diff(y)))) > threshold
y_out = y.copy() # So we don't overwrite y
for i in np.arange(len(spikes)):
if spikes[i] != 0: # If we have an spike in position i
w = np.arange(i-m,i+1+m) # we select 2 m + 1 points around our spike
w2 = w[spikes[w] == 0] # From such interval, we choose the ones which are not spikes
y_out[i] = np.mean(y[w2]) # and we average the value
return y_out
The answer depends a on what your data looks like: If you have access to two-dimensional CCD readouts that the one-dimensional spectra were created from, then you can use the lacosmic module to get rid of the cosmic rays there. If you have only one-dimensional spectra, but multiple spectra from the same source, then a quick ad-hoc fix is to make a rough normalisation of the spectra and remove those pixels that are several times brighter than the corresponding pixels in the other spectra. If you have only one one-dimensional spectrum from each source, then a less reliable option is to remove all pixels that are much brighter than their neighbours. (Depending on the shape of your cosmics, you may even want to remove the nearest 5 pixels or something, to catch the wings of the cosmic ray peak as well).

svg feGaussianBlur: correlation between stdDeviation and size

When I blur an object in Inkscape by let's say 10%, it get's a filter with a feGaussionBlur with a stdDeviation of 10% * size / 2.
However the filter has a size of 124% (it is actually that big, Inkscape doesn't add a bit just to be on the safe-side).
Where does this number come from? My guess would be 100% + 2.4 * (2*stdDeviation/size), but then where does this 2.4 come from?
From the SVG 1.1 spec:
This filter primitive performs a Gaussian blur on the input image.
The Gaussian blur kernel is an approximation of the normalized convolution:
G(x,y) = H(x)I(y)
where
H(x) = exp(-x2/ (2s2)) / sqrt(2* pis2)
and
I(y) = exp(-y2/ (2t2)) / sqrt(2 pi*t2)
with 's' being the standard deviation in the x direction and 't' being the standard deviation in the y direction, as specified by ‘stdDeviation’.
The value of ‘stdDeviation’ can be either one or two numbers. If two numbers are provided, the first number represents a standard deviation value along the x-axis of the current coordinate system and the second value represents a standard deviation in Y. If one number is provided, then that value is used for both X and Y.
Even if only one value is provided for ‘stdDeviation’, this can be implemented as a separable convolution.
For larger values of 's' (s >= 2.0), an approximation can be used: Three successive box-blurs build a piece-wise quadratic convolution kernel, which approximates the Gaussian kernel to within roughly 3%.
let d = floor(s * 3*sqrt(2*pi)/4 + 0.5)
... if d is odd, use three box-blurs of size 'd', centered on the output pixel.
... if d is even, two box-blurs of size 'd' (the first one centered on the pixel boundary between the output pixel and the one to the left, the second one centered on the pixel boundary between the output pixel and the one to the right) and one box blur of size 'd+1' centered on the output pixel.
Note: the approximation formula also applies correspondingly to 't'.*

Resources