radial basis function (RBF) kernel - svm

Suppose we use the following radial basis function (RBF) kernel: K(xi; xj) =
exp(− 1 2 kxi − xjk2), which has some implicit unknown mapping φ(x).
• Prove that the mapping φ(x) corresponding to RBF kernel has infinite dimensions.
• Prove that for any two input examples xi and xj, the squared Euclidean distance of their
corresponding points in the higher-dimensional space defined by φ is less than 2, i.e.,
kφ(xi) − φ(xj)k2 ≤ 2.

Related

Is X′X/(n − 1) an unbiased estimator of Σ when mean is 0

For example, X is Multivariate normally distributed with mean vector equal to zero. Is X′X/(n − 1) an unbiased estimator of Σ. If the answer is Yes or No, can you explain or prove it mathematically?

Custom smoothing kernel

I would like to use Smooth.ppp in spatstat to calculate a sort of "moving average" according to a specific function. The specific distance-dependent weights I would like to use are given by a function wt; for simplicity
wt=function(x,y) exp(-1e5*(x-y)^2)
In the extreme case where wt=kernel, I'd expect no smoothing (ie input marks = smoothed estimates). I'm wondering what I am mis-understanding here about the kernel and how it is applied?
remotes::install_github("spatstat/spatstat.core")
n=4; PPP=ppp(rep(1:n,each=n),rep(1:n,n), c(1,n),c(1,n), marks=1:n^2);
smo=Smooth.ppp(PPP,cutoff=2,kernel=wt,at="points")
rbind(marks(PPP),smo)
(I'm using the latest spatstat build to allow estimates at points using a custom kernel)
This example may have been misinterpreted.
The kernel should be a function(x, y) in the R language which gives the value, at a spatial location (x,y), of the kernel centred at the origin (0,0). Generally the kernel takes its largest values when (x,y) is close to (0,0), and drops to zero when (x,y) is far from (0,0).
The function wt defined in your example has values close to 1 along the diagonal line x = y, and drops to zero rapidly away from the diagonal.
That is unusual. It means that a data point at location (a,b) will be 'smoothed' along the infinite line through the data point with unit slope, with equation y = x + b-a, rather than being smoothed over a region close to (a,b) as it normally would.
The example point pattern PPP consists of points along the diagonal y=x.
The smoothed value at a data point is the weighted average of the mark values at all data points, with weights proportional to the kernel value. In your example, the kernel value for each pair of data points, wt(x1-x2, y1-y2), is equal to 1 because all the data and query points lie on the same line with slope 1.
The kernel weights are all equal in this example, so the smoothed values should all be equal to the average mark value, if leaveoneout=FALSE, and if leaveoneout=TRUE then the smoothed value at data point i is the average of the mark values at the data points excluding point i.

What are the parameters to the kernel function in an SVM?

I'm trying to understand kernel functions, particularly the gaussian/RBF function K(a,b) = exp(-gamma||a-b||**2).
As I understand, this is computing a similarity measure for vectors a and b in part using euclidean distance. My question isn't about the specifics of this kernel, though.
What I don't understand: what are vectors a and b when you use this kernel in an SVM?
SVM is a supervised learning algorithm, so there will be a training phase and a testing phase in which you use a sample of collected data.
A sample of data used for training is usually indicated with {x_i, y_i}, where x are real-valued attributes for each datum and y are the corresponding labels (See wikipedia SVM page, at section "Linear SVM" for example).
For each kernel K(a, b). the value "a" and "b" are the x_i and x_j of the data you have.
In the testing phase you will have only the set {x_i} and you want to estimate the corresponding y. Also in this case the "a" and "b" are the x_i and x_j of the data you have.
EDIT
K(a, b) is calculated for every pair (a, b) = (x_i, x_j), varying i and j. The kernel represents a dot product (Kernel trick), defined on the feature space by the so called function phi.
The SVM needs all the dot products of all the pairs because the hinge-loss comprehends a sum over i and j of all the dot products (that means of all the K(x_i, x_j)).
For example, if you have the set {x_i} = {x_1, x_2} you need
K(x_1, x_1), K(x_1, x_2), K(x_2, x_1), K(x_2, x_2)
(For each kernel K(a,b) = K(b,a), being a dot product, then symmetric. In the end you don't need K(x_2, x_1))

svg feGaussianBlur: correlation between stdDeviation and size

When I blur an object in Inkscape by let's say 10%, it get's a filter with a feGaussionBlur with a stdDeviation of 10% * size / 2.
However the filter has a size of 124% (it is actually that big, Inkscape doesn't add a bit just to be on the safe-side).
Where does this number come from? My guess would be 100% + 2.4 * (2*stdDeviation/size), but then where does this 2.4 come from?
From the SVG 1.1 spec:
This filter primitive performs a Gaussian blur on the input image.
The Gaussian blur kernel is an approximation of the normalized convolution:
G(x,y) = H(x)I(y)
where
H(x) = exp(-x2/ (2s2)) / sqrt(2* pis2)
and
I(y) = exp(-y2/ (2t2)) / sqrt(2 pi*t2)
with 's' being the standard deviation in the x direction and 't' being the standard deviation in the y direction, as specified by ‘stdDeviation’.
The value of ‘stdDeviation’ can be either one or two numbers. If two numbers are provided, the first number represents a standard deviation value along the x-axis of the current coordinate system and the second value represents a standard deviation in Y. If one number is provided, then that value is used for both X and Y.
Even if only one value is provided for ‘stdDeviation’, this can be implemented as a separable convolution.
For larger values of 's' (s >= 2.0), an approximation can be used: Three successive box-blurs build a piece-wise quadratic convolution kernel, which approximates the Gaussian kernel to within roughly 3%.
let d = floor(s * 3*sqrt(2*pi)/4 + 0.5)
... if d is odd, use three box-blurs of size 'd', centered on the output pixel.
... if d is even, two box-blurs of size 'd' (the first one centered on the pixel boundary between the output pixel and the one to the left, the second one centered on the pixel boundary between the output pixel and the one to the right) and one box blur of size 'd+1' centered on the output pixel.
Note: the approximation formula also applies correspondingly to 't'.*

Is bilinear filtering reversible?

When using a bilinear filter to magnify an image (by some non-integer factor), is that process lossless? That is, is there some way to calculate the original image, as long as the original resolution, the upscaled image and the exact algorithm used are known, and there is no loss in precision when upscaling (no rounding errors)?
My guess would be that it is, but that is based on some calculations on a napkin regarding the one-dimensional case only.
Taking the 1D case as a simplification. Each output point can be expressed as a linear combination of two of the input points, i.e.:
y_n = k_n * x_m + (1-k_n) * x_{m+1}
You have a whole set of these equations, which can be expressed in vector notation as:
Y = K * X
where X is a length-M vector of input points, Y is a length-N vector of output points, and K is a sparse matrix (size NxM) containing the (known) values of k.
For the interpolation to be reversible, K must be an invertible matrix. This means that there must be at least M linearly-independent rows. This is true if and only if there is at least one output point in-between each pair of input points.

Resources