How to sum all values between a certain start time and the current time in Azure Stream Analytics? - azure

I'm continuously streaming the difference between two sensor values. I'd like to display the sum of those differences across a shift live while also logging the number at the end of the shift as well.
I've looked into tumbling windows but it seems I can't specify an actual time for those. I can specify the amount of time but not an actual time to start the window. If I could specify a time I could at least capture the historical values across shifts.
For the live data, I'd need to sum the data from start of tumbling window to the current data point, which to my knowledge is also not an option. Only the final aggregate is output after the window has ended.
I also took a look at using LAG() for this but that also does not allow specifying a timestamp and it would only allow comparing two values, not aggregate values.
Based on the documentation these are the two functions that are closest to having the functionality I need but neither accomplish the job.
Is there any way to aggregate all values from a certain start time to a current time (keep in mind the start time changes at the beginning of each shift)?

Related

Use excel to calcluate average and stdev of time differences in a time series?

EDIT1: download file with 2 days of real data
My home automation controller collects data from several 4-in-1 motion sensors in different rooms of my house. The sensor prioritizes motion, sending motion reports every few seconds, but also independently reports temperature, humidity, and illuminance. I am trying to determine if the temp and humidity reports are sent frequently enough to automate control of heaters and exhaust fans.
Sensors independently report each category to the controller, which sends data to excel. Sample data below, but without motion reports that clutter up the real data.
A pivot table generated from the raw data:
Answering the question of frequency takes me several manual steps. Sorting/filtering the dataset for temp/humidity by room, then manually adding a time diff column
where time diff = (<current Date-Time cell> - <prev Date-Time cell>)*24*60. I then calculate the average and stdev of minutes between reports by manually selecting, in turn, each room/category subset in the time diff column; once for the average and once for the stdev.
After a few more manual steps, I end up with this desired result:
BUT I have to do it all over every time new data is added to the table. I'm certain excel can do this automatically, but I didn't find a solution through pivots, power pivots, slicing, or queries. I'm hoping one of you excel gurus can help. Thanks!

Locate When a Range of Values Begin to Increase

I'm performing a temperature profile curve and need to automate the process as much as possible. I need to determine when the part actually begins to heat up over a time period. I currently do this manually by scrolling through the thousands of data points until I see an increase. The part will be at room temp for a time and then slowly start to increase.
The increase should be slight at ~1°C meaning if B327 reads 27.5°C and B328 reads 28.6°C this would be my start point.
Once the temperature increases, the time would be displayed and further calculations will be performed.
I'm using Excel for the data analysis and would prefer to use formulas vs VBA so the document is easier to maintain for future use.
You can use an array formula like:
=MATCH(TRUE,(A2:A11-A1:A10)>0.1,0) '<< Use Ctrl+Shift+Enter
this calculates an array of TRUE/FALSE where the difference between consecutive rows is >0.1 (adjust to suit) and uses MATCH to find the position of the first TRUE value

Using QDigest over a date range

I need to keep a 28 day history for some dashboard data. Essentially I have an event/action that is recorded through our BI system. I want to count the number of events and the distinct users who do that event for the past 1 day, 7 days and 28 days. I also use grouping sets (cube) to get the fully segmented data by country/browser/platform etc.
The old way was to do this keeping a 28 day history per user, for all segments. So if a user accessed the site from mobile and desktop every day for all 28 days they would have 54 rows in the DB. This ends up being a large table and is time consuming even to calculate approx_distinct and not distinct. But the issue is that I also wish to calculate approx_percentiles.
So I started investigating the user of HyperLogLog https://prestodb.io/docs/current/functions/hyperloglog.html
This works great, its much more efficient storing the sketches daily rather than the entire list of unique users per day. As I am using approx_distinct the values are close enough and it works.
I then noticed a similar function for medians. Qdigest.
https://prestodb.io/docs/current/functions/qdigest.html
Unfortunately the documentation is not nearly as good on this page as it is on previous pages, so it took me a while to figure it out. This works great for calculating daily medians. But it does not work if I want to calculate the median actions per user over the longer time period. The examples in HyperLogLog demonstrate how to calculate approx_distinct users over a time period but the Qdigest docs do not give such an example.
The results that I get when I try something to the HLL example for date ranges with Qdigest I get results similar to 1 day results.
Because you're in need of medians that are aggregated (summed) across multiple days on a per user basis, you'll need to perform that aggregation prior to insertion into the qdigest in order for this to work for 7- and 28-day per-user counts. In other words, the units of the data need to be consistent, and if daily values are being inserted into qdigest, you can't use that qdigest for 7- or 28-day per-user counts of the events.

Series Design in Influxdb for showing number of repeat customers

Consider an analytics where you need to find out repeat customers for a date range. Repeat customers are defined for date range as customers who use the service 3*(Given Date Range Interval) before the starting range and also used the service in given date range.
For example repeat customer for this week is all customers who used service 3 weeks before starting of this week and all such customers used it this week.
I am using influxdb. I haven't decided the series yet, I am looking for inputs into how I can define a series such that I can do available operations in influxdb to obtain this analytics.
Data available to me is the timestamp at which user used the facility, user_id , service_category, service_instance_id, and a json dump of further details about service.
may be my thought process is limited, I need some intervention on how to approach this and any input is welcome.
So I thought about this and came to a decent solution. I have to save the last time a user visited along with the entry. So at least one reference will be there for any time period if user is repeating for that time period.
This is similar to a linkedlist except that we have direct access to time based filtering of nodes.

Grouping data in chap-links-library

I am using chap-links-library in the project I am working on. We are trying to show different kind of activities that a user performed. We want to display the activities and number of times that has been performed by the user.
The initial zoom will be for 12 months and the user can zoom in to day level. I want to group all activities of same type performed in a month and show the cumulative count on initial display. When user zooms in they should be able to see the split up of activities and its count based on the visible range.
Any pointers on how to handle this in chap-links-library will be useful.
You can listen for the rangechange or rangechanged events (see docs), and when the visible range exceeds a certain threshold, replace all data in the Timeline with more coarse grained data when above it, or with the detailed data when below the threshold.
You may want to have a look at the successor of chap-links-library, vis.js. You can find an overview of the main changes here: http://almende.github.io/chap-links-library/#successor.

Resources