Azure Time Series Insights query multiple time series in 1 query - azure

Azure time series insights (TSI) will no longer be supported in 2025 but for now, it is a good cheap solution for something I am working on. There is one major problem though. I need to query over 100 time series in 1 query and I can't find a way to do this. For now, I have 100 API calls running in a loop to do this, which is terrible. On top of that, I need to do this every 60 seconds. Is there a way to grab all of these time series in a single query? Also, if not, is there another solution besides TSI that isn't expensive? ADX is an alternative but it is exponentially more expensive than TSI.

Related

Node JS architecture to handle huge amount of Data returned by DB in better possible way

We have NodeJs application and SQL Server database, and there are couple of badly written queries with a lot of inner joins.
Problem and Use Case
We have use case of generating report (15-20 thousand reports) in PDF / Excel format and there is a query with a lot of joins, which takes almost 8-9 seconds, as there is a huge amount of data - 2-3 tables used in query which have a few million rows each.
For report generation we don't need the real-time data, it can contain a day old or week old data which is fine.
What I'm looking for: a few suggestions to handle this situation in better possible way.
We have few options on table
Dump data from multiple queries in separate table and use it (we are planning to do this activity in periodic manner with the help of scheduler or something on similar lines)
Use time series DB to store the result of query with the help of scheduler, and use it at the time of report generation.
Limiting report generation to use at max last 1 year of data.
Implement sharding in SQL Server
And yes improving query is also something we are working on; but I think there is scope to make it better and that's the reason I'm reaching out here to get few more suggestions.
Denormalization is a tried and true method of speeding up reporting. As Preben suggested, creating an indexed view in SQL server is an efficient way to do this with minimal plumbing. Alternatively, it may be worth thinking about whether a data warehouse implementation is needed for future queries.
If this is a 1-off issue, put together your indexed view (pay attention to the requirements), and move on. If this is the first of many reports that you need to optimize, think about creating a more substantial solution.

How can I aggregate data in Time Series Insights preview using the hierarchy?

I am storing 15 minute electricity consumption measurements in a TSI preview environment. Is it possible to aggregate the total energy consumption per day of multiple meters using the TSI query API?
I have configured a hierarchy as Area-Building and the Time Series ID is the 'MeterId' of the Meter.
The query API (https://learn.microsoft.com/en-us/rest/api/time-series-insights/preview-query#aggregate-series-api) enabled me to aggregate to consumption per day for a single meter. Then I expected to find an API to aggregate the electricity consumption to Building and Area, but could only find the aggregate operation with a single "timeSeriesId" or "timeSeriesName" as required parameter. Is aggregation to a level in the hierarchy not possible? If not, what would be a good alternative (within or outside TSI) to obtain these aggregated values?
What you may do, is get all the instances you need with the search api(docs).(mind that the documentation is wrong for the
url, it should contain "search" instead of "suggest" like this:
)
Then loop through the instances you get in the response to call the aggregates by id one by one. And finally sum the results yourself to have a daily result for all the telemetry sensors responding to your search.
Note: You can only make 9 aggregate calls at the same time(limitations).
I hope they fix aggregates soon.
In the meanwhile I hope it helps you out.
Good luck,

Spark streaming - Does reduceByKeyAndWindow() use constant memory?

I'm playing with the idea of having long-running aggregations (possibly a one day window). I realize other solutions on this site say that you should use batch processing for this.
I'm specifically interested in understanding this function though. It sounds like it would use constant space to do an aggregation over the window, one interval at a time. If that is true, it sounds like a day-long aggregation would be possible-viable (especially since it uses check-pointing in case of failure).
Does anyone know if this is the case?
This function is documented as: https://spark.apache.org/docs/2.1.0/streaming-programming-guide.html
A more efficient version of the above reduceByKeyAndWindow() where the reduce value of each window is calculated incrementally using the reduce values of the previous window. This is done by reducing the new data that enters the sliding window, and “inverse reducing” the old data that leaves the window. An example would be that of “adding” and “subtracting” counts of keys as the window slides. However, it is applicable only to “invertible reduce functions”, that is, those reduce functions which have a corresponding “inverse reduce” function (taken as parameter invFunc). Like in reduceByKeyAndWindow, the number of reduce tasks is configurable through an optional argument. Note that checkpointing must be enabled for using this operation.
After researching this on the MapR forums, it seems that it would definitely use a constant level of memory, making a daily window possible assuming you can fit one day of data in your allocated resources.
The two downsides are that:
Doing a daily aggregation may only take 20 minutes. Doing a window over a day means that you're using all those cluster resources permanently rather than just for 20 minutes a day. So, stand-alone batch aggregations are far more resource efficient.
Its hard to deal with late data when you're streaming exactly over a day. If your data is tagged with dates, then you need to wait till all your data arrives. A 1 day window in streaming would only be good if you were literally just doing an analysis of the last 24 hours of data regardless of its content.

Simple select count(id) uses 100% of Azure SQL DTUs

This started off as this question but now seems more appropriately asked specifically since I realised it is a DTU related question.
Basically, running:
select count(id) from mytable
EDIT: Adding a where clause does not seem to help.
Is taking between 8 and 30 minutes to run (whereas the same query on a local copy of SQL Server takes about 4 seconds).
Below is a screen shot of the MONITOR tab in the Azure portal when I run this query. Note I did this after not touching the Database for about a week and Azure reporting I had only used 1% of my DTUs.
A couple of extra things:
In this particular test, the query took 08:27s to run.
While it was running, the above chart actually showed the DTU line at 100% for a period.
The database is configured Standard Service Tier with S1 performance level.
The database is about 3.3GB and this is the largest table (the count is returning approx 2,000,000).
I appreciate it might just be my limited understanding but if somebody could clarify if this is really the expected behaviour (i.e. a simple count taking so long to run and maxing out my DTUs) it would be much appreciated.
From the query stats in your previous question we can see:
300ms CPU time
8000 physical reads
8:30 is about 500sec. We certainly are not CPU bound. 300ms CPU over 500sec is almost no utilization. We get 16 physical reads per second. That is far below what any physical disk can deliver. Also, the table is not fully cached as evidenced by the presence of physical IO.
I'd say you are throttled. S1 corresponds to
934 transactions per minute
for some definition of transaction. Thats about 15 trans/sec. Maybe you are hitting a limit of one physical IO per transaction?! 15 and 16 are suspiciously similar numbers.
Test this theory by upgrading the instance to a higher scale factor. You might find that SQL Azure Database cannot deliver the performance you want at an acceptable price.
You also should find that repeatedly scanning half of the table results in a fast query because the allotted buffer pool seems to fit most of the table (just not all of it).
I had the same issue. Updating the statistics with fullscan on the table solved it:
update statistics mytable with fullscan
select count
should perform clustered index scan if one is available and its up to date. Azure SQL should update statistics automatically, but does not rebuild indexes automatically if they are completely out of date.
if there's a lot of INSERT/UPDATE/DELETE traffic on that table I suggest manually rebuilding the indexes every once in a while.
http://blogs.msdn.com/b/dilkushp/archive/2013/07/28/fragmentation-in-sql-azure.aspx
and SO post for more info
SQL Azure and Indexes

Adding and retrieving sorted counts in Cassandra

I have a case where I need to record a user action in Cassandra, then later retrieve a sorted list of users with the highest number of that action in an arbitrary time period.
Can anyone suggest a way to store and retrieve this data in a pre-aggregated method?
Outside of Cassandra I would recommend using stream-summary or count min sketch you would be able to solve this with much less space and have immediate results. Just update and periodically serialize and persist it (assuming you don't need guaranteed accuracy)
In Cassandra you can keep a row per period of time like by hours and have a counter per user in that row, incrementing them on use. Then use a batch job to run through them and find the heavy hitters. You would be constrained to having the minimal queryable time be 1 hour and it wont be particularly cheap or fast to compute but it would work.
Generally it would be good treating these as a log of operation, every time there is an event store it and have batch jobs do analytics against it with hadoop or custom. If need it realtime id recommend the above approach of keeping stream summaries in memory.

Resources