Linear interpolation in PromQL or MetricsQL - promql

I am evaluating VictoriaMetrics for an IoT application where we sometimes have gaps in a series due to hardware or communication issues. In some time series reporting situations it is helpful for us to interpolate values for the missing time intervals. I see that MetricsQL (which extends PromQL) has a keep_last_value() function that will fill gaps by holding the last observed value until a new one appears (which will be helpful to us) but in some situations a linear interpolation between the values before and after the gap is a more realistic estimate for the missing portion. Is there a function in PromQL or MetricsQL that will do linear interpolation of missing data in a series, or is it possible to construct a more complex query that will achieve this?
Clarifying the desired interpolation
What I would like is a simple interpolation between the points immediately before and after the gap; this is, I believe, what TimescaleDB's interpolate() function does. In other words, if my time series is:
(1:00, 2)
(2:00, 4)
(3:00, NaN)
(4:00, 5)
(5:00, 1)
I would like the interpolated 3:00 value to be 4.5, half way between the points immediately before and after. I don't want it to be 6 (which is what I would get by extrapolating from the points before the missing one, ignoring the points after) and I don't want whatever value I would get if I did linear regression on the whole series and interpolated at 3:00 (presumably 3, or something close to it).
Of course, this is a simple illustration and it's also possible that the gap could last more than one time step. But in that case I would still like the interpolation to be based solely off of the points immediately before and immediately after the gap, ignoring the rest of the series.

Final answer
Use the interpolate function, now available in VictoriaMetrics starting from v1.38.0.
Original suggestion
This does not achieve the exact interpolation requested in the revised question, but may be useful for others with slightly different requirements
Try combining predict_linear function with default operator from MetricsQL in the following way:
metric default predict_linear(metric[1h], 0)
Try modifying the value in square brackets in order to get the desired level of interpolation.

Related

Normalizing audio waveforms code implementation (Peak, RMS)

I have some audio data (array of floats) which I use to plot a simple
waveform.
When plotted, the waveform doesn't max out at the edges.
No problem - the data just needs to be normalized. I iterate once to find the max, and then iterate again dividing each by the max. Plot again and everything looks great!
But wait videos which have a loud intro, or loud explosion, causes the rest of the waveform to still be tiny.
After some research, I come across RMS that is supposed to address this. I iterate through the samples and calculate the RMS, and again divide each sample by the RMS value. This results in considerable "clipping":
What is the best method to solve this?
Intuitively, it seems I might need to calculate a local max or average based on a moving window (rather than the entire set) but I'm not entirely sure. Help?
Note: The waveform is purely for visual purposes (the audio will not be played back to the user).
You could transpose it (effectively making the y-axis non-linear, or you can think it as a form of companding).
Assuming the signal is within the range [-1, 1].
One popular quick and simple solution is to simply apply the hyperbolic tangens function (tanh). This will limit values to [-1, 1] by penalizing higher values more. If you amplify the signal before applying tanh, the effect will be more pronounced.
Another alternative is a logarithmic transform. As the signal changes sign some pre-processing has to be performed.
If r is a series of sample values one approach could be something like this:
r.log1p <- log2(1.1 * (abs(r) + 1)) * sign(r)
That is, for every value take its absolute, add one, multiply with some small constant, take the log and then finally multiply it with the sign of its corresponding old value.
The effect can be something like this:

Why is string interpolation named the way it is?

The term interpolation is usually used in mathematical functions when determining a function for given values, which makes perfect sense. I don't see how that applies for strings, what is being interpolated? Am I missing something obvious?
Interpolation in mathematics is simply working out the things between two points(a). For example, cubic spline fitting over a series of points will give you a curve of some description (I consider a straight line to be a degenerate curve here so don't bother pointing out that some formulae generate such a beast) between each set of points, even though you have no actual data there.
Contrast this with extrapolation which will give you data beyond the endpoints. An example of that is seeing that, based on history, the stock market indices rise at x percent per annum so, in a hundred years, will be much higher than they are now.
So it's a short step to the most likely explanation as to why variable substitution within strings is called interpolation, since you're changing things within the bounds of the data:
xyzzy="42"
plugh="abc${xyzzy}xyz"
// now plugh is equal to "abc42xyz"
(a) The actual roots of the word are Latin inter + polare, those translating to "within" and "polish" (in the sense of modify or improve). See here for more detail.

D3 - Difference between basis and linear interpolation in SVG line

I implemented a multi-series line chart like the one given here by M. Bostock and ran into a curious issue which I cannot explain myself. When I choose linear interpolation and set my scales and axis everything is correct and values are well-aligned.
But when I change my interpolation to basis, without any modification of my axis and scales, values between the lines and the axis are incorrect.
What is happening here? With the monotone setting I can achieve pretty much the same effect as the basis interpolation but without the syncing problem between lines and axis. Still I would like to understand what is happening.
The basis interpolation is implementing a beta spline, which people like to use as an interpolation function precisely because it smooths out extreme peaks. This is useful when you are modeling something you expect to vary smoothly but only have sharp, infrequently sampled data. A consequence of this is that resulting line will not connect all data points, changing the appearance of extreme values.
In your case, the sharp peaks are the interesting features, the exception to the typically 0 baseline value. When you use a spline interpolation, you are smoothing over these peaks.
Here is a fun demo to play with the different types of line interpoations:
http://bl.ocks.org/mbostock/4342190
You can drag the data around so they resemble a sharp peak like yours, even click to add new points. Then, switch to a basis interpolation and watch the peak get averaged out.

Working Out Big O Of Functions Using Java + Excel

I have been trying to get the big O of four different functions using java and excel. I have no idea what these functions are as they have been hidden. I am not sure if this is the right place / forum to ask.
I have got the functions to give various pieces of data using some java and put them into excel along with the steps (1-n). I then put them into graphs straight away using just the n and the arbitrary measure of time they took if the output was constantly the same. For example if n = 1 always equal 200 for every time its run. For the ones that varied each time the function was run I ran the function 10 times and did an average for each step.
After I had the data I created a graph for each one and put a trendline on it. My f(1) for example was best fitted to a polynomial trendline order 2, which I assume is Quadratic (n2) of big O?. But I needed to prove it was n2, so I did =Steps/LOG(N) which made it fit best to a polynomial trendline order 3, which I assume is Cubic (n3)? (Is that right?)
I really have no idea what to do next to 'prove' that this function is Quadratic or Cubic or how to prove its best case / worst case.
So basically I am trying to work out what the missing step is.
Computation
Graph
Trendline
??? - Proof that the function has big O(?)
When you say "if n=1 always equal 200" does that mean if n=1 it takes 200 steps to run? If that's the case this function would be 200n and this O(n).
I think to solve this you should call each function on different values (I'd start with 10, 20, 30 ... ect) up to some high number. Capture these values and plot them in Excel. Then use the built in trend line function. This should give you a rough estimate of what the run time is. From there you should be able to get the Big-O.

Reverse engineer a new set of points from an original set by altering moments, skew, and/or Kurtosis?

I don't know if this is even possible but I'd like to be able to take a set of points, run something on them that calculates the moments, skew and kurtosis values, and have another function that would take those elements and reverse engineer a new set of points using modified values for the moments, skew and/or kurtosis. I already have the analytical function in Delphi Pro 6 which is:
procedure MomentSkewKurtosis(const Data: array of Double;var M1, M2, M3, M4, Skew,Kurtosis: Extended);
I'm looking for a partner function that could return a new Data array after I make alterations to any of the output parameters "var" in MomentSkewKurtosis() and pass them back in to the partner function as input parameters. For example, suppose I wanted to increase the Skew of the data and get a new set of points back that would be the original set of points altered just enough to generate the new Skew value.
The problem is not easy, and probably better targetted at stats, but I'll give you a pointer to a paper that I think is very good, and straight to the mark: Towards the Optimal Reconstruction of a Distribution from its Moments
Hope this helps!
Obviously you can't reconstruct an arbitrary density distribution from a finite amount of variables. You can create a distribution which fits the parameters, but it's not necessarily the original distribution.
And as far as I remember Mean, Variance, Skew and Kurtosis are just functions of the first 4 momenta. So you can't choose them independently from the momenta.
On the other hand there exists a function which you can apply on each data member and that produces a new dataset with the desired properties. I suspect that since you fixed the first 4 momenta it's a polynomial of grade 3.

Resources