svg feGaussianBlur: correlation between stdDeviation and size - svg

When I blur an object in Inkscape by let's say 10%, it get's a filter with a feGaussionBlur with a stdDeviation of 10% * size / 2.
However the filter has a size of 124% (it is actually that big, Inkscape doesn't add a bit just to be on the safe-side).
Where does this number come from? My guess would be 100% + 2.4 * (2*stdDeviation/size), but then where does this 2.4 come from?

From the SVG 1.1 spec:
This filter primitive performs a Gaussian blur on the input image.
The Gaussian blur kernel is an approximation of the normalized convolution:
G(x,y) = H(x)I(y)
where
H(x) = exp(-x2/ (2s2)) / sqrt(2* pis2)
and
I(y) = exp(-y2/ (2t2)) / sqrt(2 pi*t2)
with 's' being the standard deviation in the x direction and 't' being the standard deviation in the y direction, as specified by ‘stdDeviation’.
The value of ‘stdDeviation’ can be either one or two numbers. If two numbers are provided, the first number represents a standard deviation value along the x-axis of the current coordinate system and the second value represents a standard deviation in Y. If one number is provided, then that value is used for both X and Y.
Even if only one value is provided for ‘stdDeviation’, this can be implemented as a separable convolution.
For larger values of 's' (s >= 2.0), an approximation can be used: Three successive box-blurs build a piece-wise quadratic convolution kernel, which approximates the Gaussian kernel to within roughly 3%.
let d = floor(s * 3*sqrt(2*pi)/4 + 0.5)
... if d is odd, use three box-blurs of size 'd', centered on the output pixel.
... if d is even, two box-blurs of size 'd' (the first one centered on the pixel boundary between the output pixel and the one to the left, the second one centered on the pixel boundary between the output pixel and the one to the right) and one box blur of size 'd+1' centered on the output pixel.
Note: the approximation formula also applies correspondingly to 't'.*

Related

Custom smoothing kernel

I would like to use Smooth.ppp in spatstat to calculate a sort of "moving average" according to a specific function. The specific distance-dependent weights I would like to use are given by a function wt; for simplicity
wt=function(x,y) exp(-1e5*(x-y)^2)
In the extreme case where wt=kernel, I'd expect no smoothing (ie input marks = smoothed estimates). I'm wondering what I am mis-understanding here about the kernel and how it is applied?
remotes::install_github("spatstat/spatstat.core")
n=4; PPP=ppp(rep(1:n,each=n),rep(1:n,n), c(1,n),c(1,n), marks=1:n^2);
smo=Smooth.ppp(PPP,cutoff=2,kernel=wt,at="points")
rbind(marks(PPP),smo)
(I'm using the latest spatstat build to allow estimates at points using a custom kernel)
This example may have been misinterpreted.
The kernel should be a function(x, y) in the R language which gives the value, at a spatial location (x,y), of the kernel centred at the origin (0,0). Generally the kernel takes its largest values when (x,y) is close to (0,0), and drops to zero when (x,y) is far from (0,0).
The function wt defined in your example has values close to 1 along the diagonal line x = y, and drops to zero rapidly away from the diagonal.
That is unusual. It means that a data point at location (a,b) will be 'smoothed' along the infinite line through the data point with unit slope, with equation y = x + b-a, rather than being smoothed over a region close to (a,b) as it normally would.
The example point pattern PPP consists of points along the diagonal y=x.
The smoothed value at a data point is the weighted average of the mark values at all data points, with weights proportional to the kernel value. In your example, the kernel value for each pair of data points, wt(x1-x2, y1-y2), is equal to 1 because all the data and query points lie on the same line with slope 1.
The kernel weights are all equal in this example, so the smoothed values should all be equal to the average mark value, if leaveoneout=FALSE, and if leaveoneout=TRUE then the smoothed value at data point i is the average of the mark values at the data points excluding point i.

What is the upsampling method called 'area' used for?

The PyTorch function torch.nn.functional.interpolate contains several modes for upsampling, such as: nearest, linear, bilinear, bicubic, trilinear, area.
What is the area upsampling modes used for?
As jodag said, it is resizing using adaptive average pooling. While the answer at the link aims to explain what adaptive average pooling is, I find the explanation a bit vague.
TL;DR the area mode of torch.nn.functional.interpolate is probably one of the most intuitive ways to think of when one wants to downsample an image.
You can think of it as applying an averaging Low-Pass Filter(LPF) to the original image and then sampling. Applying an LPF before sampling is to prevent potential aliasing in the downsampled image. Aliasing can result in Moiré patterns in the downscaled image.
It is probably called "area" because it (roughly) preserves the area ratio between the input and output shapes when averaging the input pixels. More specifically, every pixel in the output image will be the average of a respective region in the input image where the 1/area of this region will be roughly the ratio between output image's area and input image's area.
Furthermore, the interpolate function with mode = 'area' calls the source function adaptie_avg_pool2d (implemented in C++) which assigns each pixel in the output tensor the average of all pixel intensities within a computed region of the input. That region is computed per pixel and can vary in size for different pixels. The way it is computed is by multiplying the output pixel's height and width by the ratio between the input and output (in that order) height and width (respectively) and then taking once the floor (for the region's starting index) and once the ceil (for the region's ending index) of the resulting value.
Here's an in-depth analysis of what happens in nn.AdaptiveAvgPool2d:
First of all, as stated there you can find the source code for adaptive average pooling (in C++) here: source
Taking a look at the function where the magic happens (or at least the magic on CPU for a single frame), static void adaptive_avg_pool2d_single_out_frame, we have 5 nested loops, running over channel dimension, then width, then height and within the body of the 3rd loop the magic happens:
First compute the region within the input image which is used to calculate the value of the current pixel (recall we had width and height loop to run over all pixels in the output).
How is this done?
Using a simple computation of start and end indices for height and width as follows: floor((input_height/output_height) * current_output_pixel_height) for the start and ceil((input_height/output_height) * (current_output_pixel_height+1)) and similarly for the width.
Then, all that is done is to simply average the intensities of all pixels in that region and current channel and place the result in the current output pixel.
I wrote a simple Python snippet that does the same thing, in the same fashion (loops, naive) and produces equivalent results. It takes tensor a and uses adaptive average pool to resize a to shape output_shape in 2 ways - once using the built-in nn.AdaptiveAvgPool2d and once with my translation into Python of the source function in C++: static void adaptive_avg_pool2d_single_out_frame. Built-in function's result is saved into b and my translation is saved into b_hat. You can see that the results are equivalent (you can further play with the spatial shapes and validate this):
import torch
from math import floor, ceil
from torch import nn
a = torch.randn(1, 3, 15, 17)
out_shape = (10, 11)
b = nn.AdaptiveAvgPool2d(out_shape)(a)
b_hat = torch.zeros(b.shape)
for d in range(a.shape[1]):
for w in range(b_hat.shape[3]):
for h in range(b_hat.shape[2]):
startW = floor(w * a.shape[3] / out_shape[1])
endW = ceil((w + 1) * a.shape[3] / out_shape[1])
startH = floor(h * a.shape[2] / out_shape[0])
endH = ceil((h + 1) * a.shape[2] / out_shape[0])
b_hat[0, d, h, w] = torch.mean(a[0, d, startH: endH, startW: endW])
'''
Prints Mean Squared Error = 0 (or a very small number, due to precision error)
as both outputs are the same, proof of output equivalence:
'''
print(nn.MSELoss()(b_hat, b))
Looking at the source code it appears area interpolation is equivalent to resizing a tensor via adaptive average pooling. You can refer to this question for an explanation of adaptive average pooling. Therefore area interpolation is more applicable to downsampling than upsampling.

How do I QUICKLY find the closest intersection in 2D between a ray and m polylines?

How do I find the closest intersection in 2D between a ray:
x = x0 + t*cos(a), y = y0 + t*sin(a)
and m polylines:
{(x1,y1), (x2,y2), ..., (xn,yn)}
QUICKLY?
I started by looping trough all linesegments and for each linesegment;
{(x1,y1),(x2,y2)} solving:
x1 + u*(x2-x1) = x0 + t*cos(a)
y1 + u*(y2-y1) = y0 + t*sin(a)
by Cramer's rule, and afterward sorting the intersections on distance, but that was slow :-(
BTW: the polylines happens to be monotonically increasing in x.
Coordinate system transformation
I suggest you first transform your setup to something with easier coordinates:
Take your point p = (x, y).
Move it by (-x0, -y0) so that the ray now starts at the center.
Rotate it by -a so that the ray now lies on the x axis.
So far the above operations have cost you four additions and four multiplications per point:
ca = cos(a) # computed only once
sa = sin(a) # likewise
x' = x - x0
y' = y - y0
x'' = x'*ca + y'*sa
y'' = y'*ca - x'*sa
Checking for intersections
Now you know that a segment of the polyline will only intersect the ray if the sign of its y'' value changes, i.e. y1'' * y2'' < 0. You could even postpone the computation of the x'' values until after this check. Furthermore, the segment will only intersect the ray if the intersection of the segment with the x axis occurs for x > 0, which can only happen if either value is greater than zero, i.e. x1'' > 0 or x2'' > 0. If both x'' are greater than zero, then you know there is an intersection.
The following paragraph is kind of optional, don't worry if you don't understand it, there is an alternative noted later on.
If one x'' is positive but the other is negative, then you have to check further. Suppose that the sign of y'' changed from negative to positive, i.e. y1'' < 0 < y2''. The line from p1'' to p2'' will intersect the x axis at x > 0 if and only if the triangle formed by p1'', p2'' and the origin is oriented counter-clockwise. You can determine the orientation of that triangle by examining the sign of the determinant x1''*y2'' - x2''*y1'', it will be positive for a counter-clockwise triangle. If the direction of the sign change is different, the orientation has to be different as well. So to take this together, you can check whether
(x1'' * y2'' - x2'' * y1'') * y2'' > 0
If that is the case, then you have an intersection. Notice that there were no costly divisions involved so far.
Computing intersections
As you want to not only decide whether an intersection exists, but actually find a specific one, you now have to compute that intersection. Let's call it p3. It must satisfy the equations
(x2'' - x3'')/(y2'' - y3'') = (x1'' - x3'')/(y1'' - y3'') and
y3'' = 0
which results in
x3'' = (x1'' * y1'' - x2'' * y2'')/(y1'' - y2'')
Instead of the triangle orientation check from the previous paragraph, you could always compute this x3'' value and discard any results where it turns out to be negative. Less code, but more divisions. Benchmark if in doubt about performance.
To find the point closest to the origin of the ray, you take the result with minimal x3'' value, which you can then transform back into its original position:
x3 = x3''*ca + x0
y3 = x3''*sa + y0
There you are.
Note that all of the above assumed that all numbers were either positive or negative. If you have zeros, it depends on the exact interpretation of what you actually want to compute, how you want to handle these border cases.
To avoid checking intersection with all segments, some space partition is needed, like Quadtree, BSP tree. With space partition it is needed to check ray intersection with space partitions.
In this case, since points are sorted by x-coordinate, it is possible to make space partition with boxes (min x, min y)-(max x, max y) for parts of polyline. Root box is min-max of all points, and it is split in 2 boxes for first and second part of a polyline. Number of segments in parts is same or one box has one more segment. This box splitting is done recursively until only one segment is in a box.
To check ray intersection start with root box and check is it intersected with a ray, if it is than check 2 sub-boxes for an intersection and first test closer sub-box then farther sub-box.
Checking ray-box intersection is checking if ray is crossing axis aligned line between 2 positions. That is done for 4 box boundaries.

How do you pack a 3-floats (space vector) into 4 bytes (pixel)?

I've successfully packed floats with values in [0,1] without losing too much precision using:
byte packedVal = floatVal * 255.0f ; // [0,1] -> [0,255]
Then when I want to unpack the packedVal back into a float, I simply do
float unpacked = packedVal / 255.0f ; // [0,255] -> [0,1]
That works fine, as long as the floats are between 0 and 1.
Now here's the real deal. I'm trying to turn a 3d space vector (with 3 float components) into 4 bytes. The reason I'm doing this is because I am using a texture to store these vectors, with 1 pixel per vector. It should be something like a "normal map", (but not exactly this, you'll see why after the jump)
So there, each pixel represents a 3d space vector. Where the value is very red, the normal vector's direction is mostly +x (to the right).
So of course, normals are normalized. So they don't require a magnitude (scaling) vector. But I'm trying to store a vector with arbitrary magnitude, 1 vector per pixel.
Because textures have 4 components (rgba), I am thinking of storing a scaling vector in the w component.
Any other suggestions for packing an arbitrary sized 3 space vector, (say with upper limit on magnitude of 200 or so on each of x,y,z), into a 4-byte pixel color value?
Storing the magnitude in the 4th component sounds very reasonable. As long as the magnitude is bounded to something reasonable and not completely arbitrary.
If you want a more flexible range of magnitudes you can pre-multiply the normalized direction vector by (0.5, 1.0] when you store it, and when you unpack it multiply it by pow(2, w).
Such method is used for storing high dynamic range images - RGBM encoding (M stands for magnitude). One of it's drawbacks is wrong results from interpolation so you can't use bilinear filtering for your texture.
You can look for other options from HDR encodings: here is a small list of few most popular

How can I translate an image with subpixel accuracy?

I have a system that requires moving an image on the screen. I am currently using a png and just placing it at the desired screen coordinates.
Because of a combination of the screen resolution and the required frame rate, some frames are identical because the image has not yet moved a full pixel. Unfortunately, the resolution of the screen is not negotiable.
I have a general understanding of how sub-pixel rendering works to smooth out edges but I have been unable to find a resource (if it exists) as to how I can use shading to translate an image by less than a single pixel.
Ideally, this would be usable with any image but if it was only possible with a simple shape like a circle or a ring, that would also be acceptable.
Sub-pixel interpolation is relatively simple. Typically you apply what amounts to an all-pass filter with a constant phase shift, where the phase shift corresponds to the required sub-pixel image shift. Depending on the required image quality you might use e.g. a 5 point Lanczos or other windowed sinc function and then apply this in one or both axes depending on whether you want an X shift or a Y shift or both.
E.g. for a 0.5 pixel shift the coefficients might be [ 0.06645, 0.18965, 0.27713, 0.27713, 0.18965 ]. (Note that the coefficients are normalised, i.e. their sum is equal to 1.0.)
To generate a horizontal shift you would convolve these coefficients with the pixels from x - 2 to x + 2, e.g.
const float kCoeffs[5] = { 0.06645f, 0.18965f, 0.27713f, 0.27713f, 0.18965f };
for (y = 0; y < height; ++y) // for each row
for (x = 2; x < width - 2; ++x) // for each col (apart from 2 pixel border)
{
float p = 0.0f; // convolve pixel with Lanczos coeffs
for (dx = -2; dx <= 2; ++dx)
p += in[y][x + dx] * kCoeffs[dx + 2];
out[y][x] = p; // store interpolated pixel
}
Conceptually, the operation is very simple. First you scale up the image (using any method of interpolation, as you like), then you translate the result, and finally you subsample down to the original image size.
The scale factor depends on the precision of sub-pixel translation you want to do. If you want to translate by 0.5 degrees, you need scale up the original image by a factor of 2 then you translate the resulting image by 1 pixel; if you want to translate by 0.25 degrees, you need to scale up by a factor of 4, and so on.
Note that this implementation is not efficient because when you scale up you end up calculating pixel values that you won't actually use because they're just dropped when you subsample back to the original image size. The implementation in Paul's answer is more efficient.

Resources