Why is string interpolation named the way it is? - string

The term interpolation is usually used in mathematical functions when determining a function for given values, which makes perfect sense. I don't see how that applies for strings, what is being interpolated? Am I missing something obvious?

Interpolation in mathematics is simply working out the things between two points(a). For example, cubic spline fitting over a series of points will give you a curve of some description (I consider a straight line to be a degenerate curve here so don't bother pointing out that some formulae generate such a beast) between each set of points, even though you have no actual data there.
Contrast this with extrapolation which will give you data beyond the endpoints. An example of that is seeing that, based on history, the stock market indices rise at x percent per annum so, in a hundred years, will be much higher than they are now.
So it's a short step to the most likely explanation as to why variable substitution within strings is called interpolation, since you're changing things within the bounds of the data:
xyzzy="42"
plugh="abc${xyzzy}xyz"
// now plugh is equal to "abc42xyz"
(a) The actual roots of the word are Latin inter + polare, those translating to "within" and "polish" (in the sense of modify or improve). See here for more detail.

Related

How do I analyze the change in the relationship between two variables?

I'm working on a simple project in which I'm trying to describe the relationship between two positively correlated variables and determine if that relationship is changing over time, and if so, to what degree. I feel like this is something people probably do pretty often, but maybe I'm just not using the correct terminology because google isn't helping me very much.
I've plotted the variables on a scatter plot and know how to determine the correlation coefficient and plot a linear regression. I thought this may be a good first step because the linear regression tells me what I can expect y to be for a given x value. This means I can quantify how "far away" each data point is from the regression line (I think this is called the squared error?). Now I'd like to see what the error looks like for each data point over time. For example, if I have 100 data points and the most recent 20 are much farther away from where the regression line/function says it should be, maybe I could say that the relationship between the variables is showing signs of changing? Does that make any sense at all or am I way off base?
I have a suspicion that there is a much simpler way to do this and/or that I'm going about it in the wrong way. I'd appreciate any guidance you can offer!
I can suggest two strands of literature that study changing relationships over time. Typing these names into google should provide you with a large number of references so I'll stick to more concise descriptions.
(1) Structural break modelling. As the name suggest, this assumes that there has been a sudden change in parameters (e.g. a correlation coefficient). This is applicable if there has been a policy change, change in measurement device, etc. The estimation approach is indeed very close to the procedure you suggest. Namely, you would estimate the squared error (or some other measure of fit) on the full sample and the two sub-samples (before and after break). If the gains in fit are large when dividing the sample, then you would favour the model with the break and use different coefficients before and after the structural change.
(2) Time-varying coefficient models. This approach is more subtle as coefficients will now evolve more slowly over time. These changes can originate from the time evolution of some observed variables or they can be modeled through some unobserved latent process. In the latter case the estimation typically involves the use of state-space models (and thus the Kalman filter or some more advanced filtering techniques).
I hope this helps!

Linear interpolation in PromQL or MetricsQL

I am evaluating VictoriaMetrics for an IoT application where we sometimes have gaps in a series due to hardware or communication issues. In some time series reporting situations it is helpful for us to interpolate values for the missing time intervals. I see that MetricsQL (which extends PromQL) has a keep_last_value() function that will fill gaps by holding the last observed value until a new one appears (which will be helpful to us) but in some situations a linear interpolation between the values before and after the gap is a more realistic estimate for the missing portion. Is there a function in PromQL or MetricsQL that will do linear interpolation of missing data in a series, or is it possible to construct a more complex query that will achieve this?
Clarifying the desired interpolation
What I would like is a simple interpolation between the points immediately before and after the gap; this is, I believe, what TimescaleDB's interpolate() function does. In other words, if my time series is:
(1:00, 2)
(2:00, 4)
(3:00, NaN)
(4:00, 5)
(5:00, 1)
I would like the interpolated 3:00 value to be 4.5, half way between the points immediately before and after. I don't want it to be 6 (which is what I would get by extrapolating from the points before the missing one, ignoring the points after) and I don't want whatever value I would get if I did linear regression on the whole series and interpolated at 3:00 (presumably 3, or something close to it).
Of course, this is a simple illustration and it's also possible that the gap could last more than one time step. But in that case I would still like the interpolation to be based solely off of the points immediately before and immediately after the gap, ignoring the rest of the series.
Final answer
Use the interpolate function, now available in VictoriaMetrics starting from v1.38.0.
Original suggestion
This does not achieve the exact interpolation requested in the revised question, but may be useful for others with slightly different requirements
Try combining predict_linear function with default operator from MetricsQL in the following way:
metric default predict_linear(metric[1h], 0)
Try modifying the value in square brackets in order to get the desired level of interpolation.

Excel - how to find some exact f(x) = x from measurement graph

I am calculating an dynamic resistance of a diode and I have a lot of measurements and I've created a graph from them. And the question is, how do I find from this graph an exact value of arguments, for example: I want to obtain f(x) value for x=5 where i have measurement for exact value fe. x=10 -> y=213, x=1 y->110, and got a graph curve, but how to find f(5) = ?
This is not trivial: it will depend on your interpolation scheme and Excel does not expose the scheme it uses when drawing a graph.
Unless you tell it otherwise, Excel (I think) uses a Bezier Curve with 2 control points to perform its graphing.
This interpolation scheme transforms, via some linear algebra, to a cubic spline interpolation.
But to use cubic spline interpolation, you need more than two data points.
Since you've only given us two points, the best thing you can do is to interpolate linearly but that will not be what Excel does.
An answer more detailed than this if anything will epitomise the rather broad nature of your question. Do Google any terms that I've used: armed with a bit of time and a good internet connection, you ought to be able to solve this problem adequately.
See https://en.wikipedia.org/wiki/Spline_interpolation, https://en.wikipedia.org/wiki/B%C3%A9zier_curve
I think that you can use a preinstalled add-on named Solver. You have to activate it as shown here.
Then you have to follow one of the tutorial you can find over the Internet (like this one) without finding min o max but finding the exact value you want.

Interpolation technique for weirdly spaced point data

I have a spatial dataset that consists of a large number of point measurements (n=10^4) that were taken along regular grid lines (500m x 500m) and some arbitrary lines and blocks in between. Single measurements taken with a spacing of about 0.3-1.0m (varying) along these lines (see example showing every 10th point).
The data can be assumed to be normally distributed but shows a strong small-scale variability in some regions. And there is some trend with elevation (r=0.5) that can easily be removed.
Regardless of the coding platform, I'm looking for a good or "the optimal" way to interpolate these points to a regular 25 x 25m grid over the entire area of interest (5000 x 7000m). I know about the wide range of kriging techniques but I wondered if somebody has a specific idea on how to handle the "oversampling along lines" with rather large gaps between the lines.
Thank you for any advice!
Leo
Kriging technique does not perform well when the points to interpolate are taken on a regular grid, because it is necessary to have a wide range of different inter-points distances in order to well estimate the covariance model.
Your case is a bit particular... The oversampling over the lines is not a problem at all. The main problem is the big holes you have in your grid. If think that these holes will create problems whatever the interpolation technique you use.
However it is difficult to predict a priori if kriging will behave well. I advise you to try it anyway.
Kriging is only suited for interpolating. You cannot extrapolate with kriging metamodel, so that you won't be able to predict values in the bottom left part of your figure for example (because you have no point here).
To perform kriging, I advise you to use the following tools (depending the languages you're more familiar with):
DiceKriging package in R (the one I use preferably)
fields package in R (which is more specialized on spatial fields)
DACE toolbox in matlab
Bonus: a link to a reference book about kriging which is available online: http://www.gaussianprocess.org/
PS: This type of question is more statistics oriented than programming and may be better suited to the stats.stackexchange.com website.

Two Dimensional Curve Approximation

here is what I want to do (preferably with Matlab):
Basically I have several traces of cars driving on an intersection. Each one is noisy, so I want to take the mean over all measurements to get a better approximation of the real route. In other words, I am looking for a way to approximate the Curve, which has the smallest distence to all of the meassured traces (in a least-square sense).
At the first glance, this is quite similar what can be achieved with spap2 of the CurveFitting Toolbox (good example in section Least-Squares Approximation here).
But this algorithm has some major drawback: It assumes a function (with exactly one y(x) for every x), but what I want is a curve in 2d (which may have several y(x) for one x). This leads to problems when cars turn right or left with more then 90 degrees.
Futhermore it takes the vertical offsets and not the perpendicular offsets (according to the definition on wolfram).
Has anybody an idea how to solve this problem? I thought of using a B-Spline and change the number of knots and the degree until I reached a certain fitting quality, but I can't find a way to solve this problem analytically or with the functions provided by the CurveFitting Toolbox. Is there a way to solve this without numerical optimization?
mbeckish is right. In order to get sufficient flexibility in the curve shape, you must use a parametric curve representation (x(t), y(t)) instead of an explicit representation y(x). See Parametric equation.
Given n successive points on the curve, assign them their true time if you know it or just integers 0..n-1 if you don't. Then call spap2 twice with vectors T, X and T, Y instead of X, Y. Now for arbitrary t you get a point (x, y) on the curve.
This won't give you a true least squares solution, but should be good enough for your needs.

Resources