Azure Stream Analytics Window based on variable time

Azure Stream Analytics Window based on variable time - azure

I know I can create Stream Analytics windows as follows:
TumblingWindow(second, 30)
This would make fixed windows every 30 seconds.
Is it possible to make the 30 seconds dynamic? This would mean we get multiple windows through each other, all on different time schedules.
I'm experimenting with reference input file's, and I would like to get the amount of seconds from the reference file, rather than fixed in the query.
If I create the Window with input from a reference file, I get the error:
Error : Invalid window duration: 'timespanInSeconds'. Window duration must be a positive float constant.
Even though it seems to be a valid json number. Is what I'm trying to do even possible?

Something in the docs that I've found:
https://msdn.microsoft.com/en-us/azure/stream-analytics/reference/tumbling-window-azure-stream-analytics
It states:
A big integer which describes the size of the window. The windowsize is static and cannot be changed dynamically at runtime.

Related

Set Azure Batch MaxWallClockTime Node SDK

I'm trying to set a maxWallClockTime of 72 hours using the ISO 8601 Duration format. The documentation for this property is useless, so I'm basing my guess on using the 8601 format on that being the way to set the same property at the Batch Job level when using the CLI. My constraints object is as follows:
const taskConstraints = {
maxWallClockTime: 'P3D' //ISO 8601 Duration Format e.g. P3Y6M4DT12H30M5S represents a duration of three years, six months, four days, twelve hours, thirty minutes, and five seconds.
};
However, this results in the following error:
task.constraints.maxWallClockTime must be a TimeSpan/Duration.
I cannot find any examples that set this property and use the Javascript API, any pointers to better documentation or example code would be greatly appreciated.

Agreed the docs are lacking here. I haven't tested this out locally yet, but from looking at the code I believe the answer depends on whether you are using the older Node.js-specific azure-batch package or the newer #azure/batch which also runs in web browsers.
For the "azure-batch" package, it looks like it takes a Moment.js duration object. Here's the related JSDoc string:
* #property {moment.duration} [maxWallClockTime] The maximum elapsed time
* that the Task may run, measured from the time the Task starts. If the Task
* does not complete within the time limit, the Batch service terminates it.
* If this is not specified, there is no time limit on how long the Task may
* run.
For the newer "#azure/batch" package, it should take an ISO-8601 duration string. If you're using that package then the value you're trying to use looks right to me, and maybe it's a bug (I'd have to try to repro it).

How to aggregate data by period in a rrdtool graph

I have a rrd file with average ping times to a server (GAUGE) every minute and when the server is offline (which is very frequent for reasons that doesn't matter now) it stores a NaN/unknown.
I'd like to create a graph with the percentage the server is offline each hour which I think can be achieved by counting every NaN within 60 samples and then dividing by 60.
For now I get to the point where I define a variable that is 1 when the server is offline and 0 otherwise, but I already read the docs and don't know how to aggregate this:
DEF:avg=server.rrd:rtt:AVERAGE CDEF:offline=avg,UN,1,0,IF
Is it possible to do this when creating a graph? Or I will have to store that info in another rrd?

I don't think you can do exactly what you want, but you have a couple of options.
You can define a sliding window average, that shows the percentage of the previous hour that was unknown, and graph that, using TRENDNAN.
DEF:avg=server.rrd:rtt:AVERAGE:step=60
CDEF:offline=avg,UN,100,0,IF
CDEF:pcavail=offline,3600,TREND
LINE:pcavail#ff0000:Availability
This defines avg as the 1-min time series of ping data. Note we use step=60 to ensure we get the best resolution of data even in a smaller graph. Then we define offline as 100 when the server is there, 0 when not. Then, pcavail is a 1-hour sliding window average of this, which will in effect be the percentage of time during the previous hour during which the server was available.
However, there's a problem in that RRDTool will silently summarise the source data before you get your hands on it, if there are many data points to a pixel in the graph (this won't happen if doing a fetch of course). To get around that, you'd need to have the offline CDEF done at store time -- IE, have a COMPUTE type DS that is 100 or 0 depending on if the avg DS is known. Then, any averaging will preserve data (normal averaging omits the unknowns, or the xff setting makes the whole cdp unknown).
rrdtool create ...
DS:rtt:GAUGE:120:0:9999
DS:offline:COMPUTE:rtt,UN,100,0,IF
rrdtool graph ...
DEF:offline=server.rrd:offline:AVERAGE:step=3600
LINE:offline#ff0000:Availability
If you are able to modify your RRD, and do not need historical data, then use of a COMPUTE in this way will allow you to display your data in a 1-hour stepped graph as you wanted.

Tracking a counter value in application insights

I'm trying to use application insights to keep track of a counter of number of active streams in my application. I have 2 goals to achieve:
Show the current (or at least recent) number of active streams in a dashboard
Activate a kind of warning if the number exceeds a certain limit.
These streams can be quite long lived, and sometimes brief. So the number can sometimes change say 100 times a second, and sometimes remain unchanged for many hours.
I have been trying to track this active streams count as an application insights metric.
I'm incrementing a counter in my application when a new stream opens, and decrementing when one closes. On each change I use the telemetry client something like this
var myMetric = myTelemetryClient.GetMetric("Metricname");
myMetric.TrackValue(myCount);
When I query my metric values with Kusto, I see that because of these clusters of activity within a 10 sec period, my metric values get aggregated. For the purposes of my alarm, I can live with that, as I can look at the max value of the aggregate. But I can't present a dashboard of the number of active streams, as I have no way of knowing the number of active streams between my measurement points. I know the min value, max and average, but I don't know the last value of the aggregate period, and since it can be somewhere between 0 and 1000, its no help.
So the solution I have doesn't serve my needs, I thought of a couple of changes:
Adding a scheduled pump to my counter component, which will send the current counter value, once every say 5 minutes. But I don't like that I then have to add a thread for each of these counters.
Adding a timer to send the current value once, 5 minutes after the last change. Countdown gets reset each time the counter changes. This has the same problem as above, and does an excessive amount of work to reset the counter when it could be changing thousands of times a second.
In the end, I don't think my needs are all that exotic, so I wonder if I'm using app insights incorrectly.
Is there some way I can change the metric's behavior to suit my purposes? I appreciate that it's pre-aggregating before sending data in order to reduce ingest costs, but it's preventing me from solving a simple problem.
Is a metric even the right way to do this? Are there alternative approaches within app insights?

You can use TrackMetric instead of the GetMetric ceremony to track individual values withouth aggregation. From the docs:
Microsoft.ApplicationInsights.TelemetryClient.TrackMetric is not the preferred method for sending metrics. Metrics should always be pre-aggregated across a time period before being sent. Use one of the GetMetric(..) overloads to get a metric object for accessing SDK pre-aggregation capabilities. If you are implementing your own pre-aggregation logic, you can use the TrackMetric() method to send the resulting aggregates.
But you can also use events as described next:
If your application requires sending a separate telemetry item at every occasion without aggregation across time, you likely have a use case for event telemetry; see TelemetryClient.TrackEvent (Microsoft.ApplicationInsights.DataContracts.EventTelemetry).

In Azure Anomaly Detector API,Why is changing sensitivity parameter is not changing response output of detected anomaly?

with reference to notebook available on Azure-site,
I have created an experiment, where am pushing some 5000 records of the parameter. I tried changing sensitivity from 90 to 25 but I can-not see any changes on output bokeh plot.
sensitivity = 95
sensitivity = 25
I even checked the JSON that is being loaded before running Anomaly-Detector API and sensitivity value is being updated as per required format.
Can you suggest me what could be the reason? Where should I look into to resolve the issue?
Thanks!

What are the other parameters you set? From the figure you attached, it seems that your data is periodic. Can you try to set the period value, and set maxAnomalyRatio to a small value?

Tumbling window with dynamic duration

While passing reference data field as a duration in TumblingWindow I am getting compile time error related to Window duration require positive float constant.
Can anyone please guide?
group by TumblingWindow(minute, referencetable.EntryTime)

At the moment we don't support variable time windows, so you need to set the time explicitly and not load it from the reference data. Sorry for the inconvenience.
A workaround, in the case you have only few different time durations, would be to have different steps/subqueries for the different times and use a where clause to create or not an output for that step.
Let me know if you have further question.
JS (from the Azure Stream Analytics team)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string