In azure TSI, we have 3 main query types, getSeries, getEvents, and getAggregate. I am trying to query all of the historical data for many series. I already found out that I must do these queries in a loop, 1 for each series, which is terrible. However, I now need to be able to parse the query with an interval. For example, if I am sending TSI data every 5 seconds and I want to get 1 months worth of data, I don't need every 5 seconds data points. I could do every day instead. If I use getAggregate, with filter: null and interval: "P1D", It returns a ton of null values every 1 day and doesn't return any data. The same thing happens if I reduce this interval to 60M or even 1M. I then used getEvents and it returns all the data points. I could then create a function to parse this but the query speed will be much worse because I would prefer to parse this in the query itself. Is there a way to achieve this?
Ideally, if there is 20 data points, 5 seconds apart and nothing else for that day, it would average these into 1 datapoint for the day. Currently, with getAggregate, it returns null values instead.
Related
I have data of customer care executives which tells about how many calls they have attend from one time another time continuously. I need to find out whether particular executive is either busy or free in a particular period. The timings for the office is 10:00 to 17:00, so I have sliced the time with one hour in each slice from 10:00 to 17:00.
The data that I have would look like as:
Note:
The data given here is some part of original data and we have 20 executives and they have 5 to 10 rows of data. For simplification we have used 3 executives with less than 5 rows for each one.
The start timings do not follow any ascending or descending order
Please suggest the formulas with out any sorting and filtering on the given data
Required: The result table should give whether particular executive is busy or free in every hour. If he is on call for one minute it should give busy for that entire one hour period
The result should be like:
The same file is attached here:
Thanks In Advance!!!
You need to put in an extra logical test in your OR function that tests for start times less than the time interval start and end times greater than the time interval end. So in cell G31 your formula should read:
=IF(OR(COUNTIFS($A$3:$A$14,A31,$C$3:$C$14,">0",$D$3:$D$14,">=14:00",$D$3:$D$14,"<15:00"),COUNTIFS($A$3:$A$14,A31,C$3:$C$14,">0",$E$3:$E$14,">=14:00",$E$3:$E$14,"<15:00"),COUNTIFS($A$3:$A$14,A31,C$3:$C$14,">0",$D$3:$D$14,"<14:00",$E$3:$E$14,">=15:00")),"Busy","Free")
I have a dataset:
I want to apply the clustering technique to create clusters of every 5 minutes' data and want to calculate the average of the last column i.e. percentage congestion.
How to create such clusters of every 5 minutes? I want to use this analysis further for decision making. The decision will be made on the basis of average percentage calculated.
That is a simple aggregation, and not clustering.
Use a loop, read one record at a time, and every 5 minutes output the average and reinitialize the accumulators.
Or round ever time to the 5 minute granularity. Then take the average of now identical keys. That would be a SQL GROUP_BY.
Fairly new to MongoDB, I have a collection that holds multiple documents. I need a query that gives the a document value over multiple time intervals. For instance, I need a value from 1 minute ago, 5 minutes ago, 10 minutes ago and so on.
Currently I do this to get what I need.
I create an object of the different timeIntervals I need
var timeIntervals = {
"1Min" : 60000,
"5Min" : 300000,
"10Min" : 600000,
"30Min" : 1800000,
}
and then loop through them and perform this query on each interval to get the value I need.
db.collection.findOne({TimeStamp:{$gt: new Date(currentTime - timeInterval)})
While this does what I need, it is terribly inefficient especially if I need to add larger or more precise intervals. Is there a better more efficient way of performing this operation?
Thanks!
EDIT
Here is a more through example of what I'm trying to do.
I have a backend that receives stock ticker prices in real time for particular symbols so it receives hundreds a second. Each time a new price comes in it gets timestamped and stored in a MongoDB. Each time a new price comes in I need to get the price change for the last 1 Minute, 5 Minutes, 10 Minutes etc. So I can see how the price is varying over those intervals. And then store that data in a different collection.
Let's say I have an application, which receives periodically some measurement data.
I know the exact time the data was measured and i want every piece of data to be deleted in 30 days after it was measured.
I'm not inserting the data immediately to the database, but i want to use the time-to-live functionality of Cassandra.
Is there a way to manipulate the system intern time-stamp of a row in Cassandra so, that I can set time-to-live to 60 days, but it actually measures the lifespan of each row with my time-stamp?
E.g. I measure something at the 27.08.2014 - 19:00. I insert this data at 27.08.2014 - 20:00 into the database and set the time-to-live value to 1 day. I now want the row to be deleted at 28.08.2014 - 19:00 and not at 28.08.2014 - 20:00 like it normally would.
Is something like this possible?
I suggest you the folowing approach based on your example:
before insertion calculate Δx = insertTime - measureTime
set TTL = 1day - Δx for inserting row
Addition on the basis of a comment:
You can use Astyanax-client with Batch mutation "to simultaneously enter multiple values at once". There is possibility to set TTL on each column and on whole row at once.
So lets say I have 24 hours of data split into 1 hour chunks, and the data is per second sequentially. I want to graph a rolling hour at a time, such that every second for the user, the graph will update to reflect an hour of data. I want this to look as seamless as possible.
I guess there are two questions here: one is how do you do the physical drawing of the data? The second is: how can you progressively load files? I assume that you would load hour 1 and hour 2 at first, chart hour 1, then sort of "queue" the seconds in hour 2 and take an element and draw it every second. At some point you have to load the next hour of data and continue the process...not really sure how to do this in a web context.
I appreciate the help!