sum by hours/days/month based on timestamp in mongoengine/pymongo

sum by hours/days/month based on timestamp in mongoengine/pymongo - python-3.x

At first - I am a beginner with mongodb. So i have next probleb. I am using such a model as below with mongoengine:
class Stats(Document):
Name = StringField(max_length=250)
timestamp = LongField(default=mktime(datetime.now().timetuple()))
count = IntField()
<some other fields>
What exactly I want is to filter by the name (it's clear) and use aggregation operation sum over field count. But I want to count the sum of records grouped by hours/days/months.
As example, if we have records with such timestamps [1532970603, 1532972103, 153293600, 1532974500], then 1-2 form first group, and 3-4 form second group.
And that is where I have stuck. I have some ideas about grouping by every n records, or by dividing timestamp on 3600
(1 hour = 3600 seconds), but how to make it with mongoengine. Or even how to insert some expressions with python in a pipeline?
I will very appreciate any help.

I would recommend to use ISO date format and store complete date in timestamp. Here is your model
class Stats(Document):
Name = Document.StringField(max_length=250)
timestamp = Document.DateTime(default=datetime.utcnow()) //ISO time format recommended
count = Document.FloatField()
meta = {'strict': False}
Now you can aggregate them accordingly.
Stats.objects.aggregate(
{
'$group': {
'_id': {'year': {$year: '$timestamp'},
'month': {$month: '$timestamp'},
'day' : {$dayOfMonth: '$timestamp'},
'hour': {'$hour: '$timestamp'},
}
}
}
)

Related

Prisma how to add hours while comparing columns in the same table

I am using NestJS and Prisma[4.4.0].
My table:
id: int
created_at: Timestamp
first_active: Timestamp
Query that I want to implement
select count(*) from {table} where id = {id} and first_active <= {created_at} + 48hours
I want to get a count of users which were active within 48 hours of creation.
With https://www.prisma.io/docs/reference/api-reference/prisma-client-reference#compare-columns-in-the-same-table now I can access the column name.
Example
where: {
// find all users where 'name' is in a list of tags
id: ${id},
first_active: {
this.prisma.table.fields.created_at // Not sure how to + 48 hours
}
},
any suggestion on how I can add time (72 hours) to the created_at

You will need to perform two queries to accomplish this, for now. First, retrieve the created_at, then add the necessary hours.
You could create a feature request if you would like to see this functionality added to Prisma.

How to compare and get data between given dates in mongodb?

I need to get the data from MongoDB between two given dates. The same mongo db query is working for ( yy-mm-dd hh:mm:ss.ms ) format but it is not working for ( dd-mm-yy hh:mm:ss) format.
Sample Data in DB
{
"name":"user1",
"Value":"Success",
"Date": "02-06-2020 00:00:00",
"Status":"available",
"Updated_on":"2021-01-09 00:00:00.0000"
}
Python:
start_date = "02-06-2020 00:00:00"
end_date = "11-06-2020 10:16:41"
data = list(db.collection.find({"Date":{"gte":start_date,"Slte":end_date},"Value":"Success"},{'_id':False,"Date":1,"name":1,"Value":1}))
print(data)
I need to get the data based on the "Date" field.
The problem is it is giving extra data than the start_date and end_date.
Example: if my start_date is "02-06-2020 00:00:00"and end_date is "11-06-2020 10:16:41", it is giving data from "02-04-2020 00:00:00" to "11-06-2020 10:16:41"
Any idea to achieve this and please explain why it is not taking dates correctly.

MongoDB: finding the max and min for every category

I need to find the greatest and the smallest flight_time for every category (for every line).
I think I should use max, min and group but I'm not sure.
Is there any possibility to find max and min at once, and then group the data for category?
EDIT:
in my excercise:
db.myflights.insert([
//1
{
start_time: new Date("2020-05-18T06:15:00Z"),
land_time: new Date("2020-05-18T07:30:00Z"),
flight_time: 1.25,
passengers: "90",
line_name: "WizzAir"
},
//2
{
start_time: new Date("2020-06-18T07:30:00Z"),
land_time: new Date("2020-06-18T09:30:00Z"),
flight_time: 2,
passengers: "111",
line_name: "Lufthansa"
}
])

Convert all human-readable times to durations in seconds in your applications, store the durations in the database (in addition to or instead of the human-readable text times). Query based on the durations in seconds.

Big difference in time taken to perform graph queries

Based on the following collections:
data_invoices (document, 100,000 total records, 2 tenants)
hash: tenantId
persistent: createdOn
data_jobs (document, 10,000 total records, 2 tenants)
hash: tenantId
persistent: createdOn
data_links (edge, 100,000 total records)
persistent: createdOn
persistent (sparse): replacedOn
The links collection will connect one invoice to a random job, so a job may have zero or more invoices. An invoice should have one or more jobs, but in my data, each invoice is matched to only one job. The date filter does not actually filter out any data (they are all less than the specified date value) and neither does the tenantId filter since all the data is either xxx or yyy.
The generic structure of data_jobs and data_invoices is:
tenantId: string;
createdOn: number;
data: [{
createdOn: number;
values: {
...collection specific data here...
};
}];
The collection specific data structure for data_invoices is:
number: number;
amount: number;
The collection specific data structure for data_jobs is:
name: string;
The structure of the data_links table is:
createdOn: number;
replacedOn?: number; // though I don't have any records with this value set
The createdOn field is the date value represented as ticks from 1970, and is a random date from 01 Jan 2000 to today.
The amount field is a random currency value (2 decimal places) from 10 to 10,000.
The number field is an autonumber type field.
I have two very similar (in my opinion) queries, one way (jobs to invoices) works very, very quickly, the other one takes ages.
This query takes 1.85 seconds:
LET date = 1495616898128
FOR job IN data_jobs
FILTER job.tenantId IN ['xxx', 'yyy']
FILTER job.createdOn<=date
LET jobData = (job.data[* FILTER CURRENT.createdOn<=date LIMIT 1])[0]
FILTER CONTAINS(jobData.values.name, 'a')
LET invoices = (
FOR invoice, link IN 1 INBOUND job data_links
FILTER link.createdOn<=date AND (link.replacedOn == NULL OR
link.replacedOn>date)
LET invoiceData = (invoice.data[* FILTER CURRENT.createdOn<=date LIMIT 1])[0]
FILTER invoiceData.values.amount>1000
COLLECT WITH COUNT INTO count
RETURN {
count
}
)[0]
FILTER invoices.count>0
SORT jobData.values.name ASC
LIMIT 0,8
RETURN job
This query takes 8.5 seconds:
LET date = 1495616898128
FOR invoice IN data_invoices
FILTER invoice.tenantId IN ['xxx', 'yyy']
FILTER invoice.createdOn<=date
LET invoiceData = (invoice.data[* FILTER CURRENT.createdOn<=date LIMIT 1])[0]
FILTER invoiceData.values.amount>1000
LET jobs = (
FOR job, link IN 1 OUTBOUND invoice data_links
FILTER link.createdOn<=date AND (link.replacedOn == NULL
OR link.replacedOn>date)
LET jobData = (job.data[* FILTER CURRENT.createdOn<=date LIMIT 1])[0]
FILTER CONTAINS(jobData.values.name, 'a')
COLLECT WITH COUNT INTO count
RETURN {
count
}
)[0]
FILTER jobs.count>0
SORT invoiceData.values.amount ASC
LIMIT 0,8
RETURN invoice
I realise that both queries are providing different data, but the processing time should be the same shouldn't it? They are both filtering both tables through the links table and both performing aggregations on the other. I don't understand why one way is much quicker than the other way. Is there anything I can do to increase the performance of these queries please?

Okay, strange but I have stumbled upon a very counter-intuitive (at least to me) solution. Sort first, then filter...???
This query now takes 1.4 seconds:
LET date = 1495616898128
FOR invoice IN data_invoices
FILTER invoice.tenantId IN ['xxx', 'yyy']
FILTER invoice.createdOn<=date
LET invoiceData = (invoice.data[* FILTER CURRENT.createdOn<=date LIMIT 1])[0]
SORT invoiceData.values.amount ASC
FILTER invoiceData.values.amount>1000
LET jobs = (
FOR job, link IN 1 OUTBOUND invoice data_links
FILTER link.createdOn<=date AND (link.replacedOn == NULL
OR link.replacedOn>date)
LET jobData = (job.data[* FILTER CURRENT.createdOn<=date LIMIT 1])[0]
FILTER CONTAINS(jobData.values.name, 'a')
COLLECT WITH COUNT INTO count
RETURN {
count
}
)[0]
FILTER jobs.count>0
LIMIT 0,8
RETURN invoice
Despite adding a persistent index on data[*].values.amount, it still does not use it (I even tried SORT invoice.data[0].values.amount ASC and it still didn't seem to use the index?)
Can anyone explain this please?

Linq to entity: How to groupby date part of datetime

I would like to simply query table Flights, contains datetime field Arrival, and group results by date, then add paging using linq to entity.
How can we group flights by arrival dates?

Since you're looking for something simple, try this out.
int recordsPerPage = 10, currentIndex = 0;
var groupQuery =
myDC.Flights.
GroupBy(f => EntityFunctions.TruncateTime(f.Arrival)).
Skip(recordsPerPage * currentIndex).
Take(recordsPerPage);
This will give you back groups of Flight objects which you can use however you plan.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

sum by hours/days/month based on timestamp in mongoengine/pymongo - python-3.x

Related

Prisma how to add hours while comparing columns in the same table

How to compare and get data between given dates in mongodb?

MongoDB: finding the max and min for every category

Big difference in time taken to perform graph queries

Linq to entity: How to groupby date part of datetime

Categories

Resources