Fairly new to MongoDB, I have a collection that holds multiple documents. I need a query that gives the a document value over multiple time intervals. For instance, I need a value from 1 minute ago, 5 minutes ago, 10 minutes ago and so on.
Currently I do this to get what I need.
I create an object of the different timeIntervals I need
var timeIntervals = {
"1Min" : 60000,
"5Min" : 300000,
"10Min" : 600000,
"30Min" : 1800000,
}
and then loop through them and perform this query on each interval to get the value I need.
db.collection.findOne({TimeStamp:{$gt: new Date(currentTime - timeInterval)})
While this does what I need, it is terribly inefficient especially if I need to add larger or more precise intervals. Is there a better more efficient way of performing this operation?
Thanks!
EDIT
Here is a more through example of what I'm trying to do.
I have a backend that receives stock ticker prices in real time for particular symbols so it receives hundreds a second. Each time a new price comes in it gets timestamped and stored in a MongoDB. Each time a new price comes in I need to get the price change for the last 1 Minute, 5 Minutes, 10 Minutes etc. So I can see how the price is varying over those intervals. And then store that data in a different collection.
Related
In azure TSI, we have 3 main query types, getSeries, getEvents, and getAggregate. I am trying to query all of the historical data for many series. I already found out that I must do these queries in a loop, 1 for each series, which is terrible. However, I now need to be able to parse the query with an interval. For example, if I am sending TSI data every 5 seconds and I want to get 1 months worth of data, I don't need every 5 seconds data points. I could do every day instead. If I use getAggregate, with filter: null and interval: "P1D", It returns a ton of null values every 1 day and doesn't return any data. The same thing happens if I reduce this interval to 60M or even 1M. I then used getEvents and it returns all the data points. I could then create a function to parse this but the query speed will be much worse because I would prefer to parse this in the query itself. Is there a way to achieve this?
Ideally, if there is 20 data points, 5 seconds apart and nothing else for that day, it would average these into 1 datapoint for the day. Currently, with getAggregate, it returns null values instead.
I have data of customer care executives which tells about how many calls they have attend from one time another time continuously. I need to find out whether particular executive is either busy or free in a particular period. The timings for the office is 10:00 to 17:00, so I have sliced the time with one hour in each slice from 10:00 to 17:00.
The data that I have would look like as:
Note:
The data given here is some part of original data and we have 20 executives and they have 5 to 10 rows of data. For simplification we have used 3 executives with less than 5 rows for each one.
The start timings do not follow any ascending or descending order
Please suggest the formulas with out any sorting and filtering on the given data
Required: The result table should give whether particular executive is busy or free in every hour. If he is on call for one minute it should give busy for that entire one hour period
The result should be like:
The same file is attached here:
Thanks In Advance!!!
You need to put in an extra logical test in your OR function that tests for start times less than the time interval start and end times greater than the time interval end. So in cell G31 your formula should read:
=IF(OR(COUNTIFS($A$3:$A$14,A31,$C$3:$C$14,">0",$D$3:$D$14,">=14:00",$D$3:$D$14,"<15:00"),COUNTIFS($A$3:$A$14,A31,C$3:$C$14,">0",$E$3:$E$14,">=14:00",$E$3:$E$14,"<15:00"),COUNTIFS($A$3:$A$14,A31,C$3:$C$14,">0",$D$3:$D$14,"<14:00",$E$3:$E$14,">=15:00")),"Busy","Free")
I need to keep a 28 day history for some dashboard data. Essentially I have an event/action that is recorded through our BI system. I want to count the number of events and the distinct users who do that event for the past 1 day, 7 days and 28 days. I also use grouping sets (cube) to get the fully segmented data by country/browser/platform etc.
The old way was to do this keeping a 28 day history per user, for all segments. So if a user accessed the site from mobile and desktop every day for all 28 days they would have 54 rows in the DB. This ends up being a large table and is time consuming even to calculate approx_distinct and not distinct. But the issue is that I also wish to calculate approx_percentiles.
So I started investigating the user of HyperLogLog https://prestodb.io/docs/current/functions/hyperloglog.html
This works great, its much more efficient storing the sketches daily rather than the entire list of unique users per day. As I am using approx_distinct the values are close enough and it works.
I then noticed a similar function for medians. Qdigest.
https://prestodb.io/docs/current/functions/qdigest.html
Unfortunately the documentation is not nearly as good on this page as it is on previous pages, so it took me a while to figure it out. This works great for calculating daily medians. But it does not work if I want to calculate the median actions per user over the longer time period. The examples in HyperLogLog demonstrate how to calculate approx_distinct users over a time period but the Qdigest docs do not give such an example.
The results that I get when I try something to the HLL example for date ranges with Qdigest I get results similar to 1 day results.
Because you're in need of medians that are aggregated (summed) across multiple days on a per user basis, you'll need to perform that aggregation prior to insertion into the qdigest in order for this to work for 7- and 28-day per-user counts. In other words, the units of the data need to be consistent, and if daily values are being inserted into qdigest, you can't use that qdigest for 7- or 28-day per-user counts of the events.
I'm trying to get 2 date limits from date restrict to calculate the difference between them.
So for example, I get January 1 and January 7, and subtract the second result from the first one.
Theoretically I should end up with the amount of new references to the given searched words in the week, right?
The thing is, sometimes I get a negative result for this, like -60000. How come 60000 references could disappear from one week to another?
Am i misinterpreting the way dateRestrict works?
my url so far is like:
first query:
https://www.googleapis.com/customsearch/v1?q=MYSEARCH&dateRestrict=d209&key=MYKEY&cx=MYID
second query:
https://www.googleapis.com/customsearch/v1?q=MYSEARCH&dateRestrict=d203&key=MYKEY&cx=MYID
The field of the response I'm using is the totalResults.
So lets say I have 24 hours of data split into 1 hour chunks, and the data is per second sequentially. I want to graph a rolling hour at a time, such that every second for the user, the graph will update to reflect an hour of data. I want this to look as seamless as possible.
I guess there are two questions here: one is how do you do the physical drawing of the data? The second is: how can you progressively load files? I assume that you would load hour 1 and hour 2 at first, chart hour 1, then sort of "queue" the seconds in hour 2 and take an element and draw it every second. At some point you have to load the next hour of data and continue the process...not really sure how to do this in a web context.
I appreciate the help!