SqlAlchemy dynamic loading of related entities from query

SqlAlchemy dynamic loading of related entities from query - python-3.x

The use case is pretty simple: I have 3 cascading entities
class Customer():
users = relationship(
'User',
backref=backref('customer', lazy="subquery"),
cascade=DbConstants.RELATIONSHIP_CASCADE_ALL_DELETE)
class User():
reports = relationship('Report',
backref=backref('user', lazy="subquery"),
lazy="subquery",
cascade=DbConstants.RELATIONSHIP_CASCADE_ALL_DELETE)
class Report():
date_time_start = Column(DateTime())
date_time_end = Column(DateTime())
I want to get all these entities in one query, but i want to filter the reports by their date.
customers = session.query(Customer).join(
User, Customer.users, isouter=True
).join(
Report,
# this is where the reports should be filtered
and_(Report.user_id == User.id, Report.date_time_start > date_start, Report.date_time_start < date_end),
).all()
From this I get the expected entity tree:
[
customers:
users: [ reports: [] ]
]
Except i get ALL the reports of the user in the array, no matter the start date.
This means if I check the result like customers[0].users[0].reports, all the reports belonging to this user will be output. Is there a way so the reports attribute is only populated with the rows from the query ?

Related

How to add multiple fields' reference to "unique_together" error message

I have a model with multiple fields being checked for uniqueness:
class AudQuestionList(BaseTimeStampModel):
aud_ques_list_id = models.AutoField(primary_key=True,...
aud_ques_list_num = models.CharField(max_length=26,...
aud_ques_list_doc_type = models.ForeignKey(DocType,...
short_text = models.CharField(max_length=55,...
aud_scope_standards = models.ForeignKey(ScopeStandard, ...
aud_freqency = models.ForeignKey(AuditFrequency, ...
aud_process = models.ForeignKey(AuditProcesses, ...
unique_together = [['aud_scope_standards', 'aud_freqency', 'aud_process',],]
My model form is as described below:
class CreateAudQuestionListForm(forms.ModelForm):
class Meta:
model = AudQuestionList
fields = ('aud_ques_list_doc_type', 'aud_scope_standards', 'aud_freqency', 'aud_process', 'short_text', ...
def validate_unique(self):
try:
self.instance.validate_unique()
except ValidationError:
self._update_errors({'aud_scope_standards': _('Record exists for the combination of key values.')})
The scenario works perfectly well, only that the field names (labels) itself are missing from the message.
Is there a way to add the field names to the message above, say something like:
Record exists for the combination of key fields + %(field_labels)s.

pandas aggregation based on timestamp threshold

I hope somebody can help me to solve this issue.
I have a csv file structured as follow:
I am trying to group the events based on message, name, userID if the events manifests in a 10min threshold starting from the first event matched.
the output I am expecting from the csv, is to see only 3 rows, because the second and third (as they are in 10min threshold and the message and name and ID are the same, they should be grouped) and have an extra columns name event_count that report how many time that event occurred.like this
I start working on this and my script looks like this:
import csv
import pandas as pd
# 0. sort data by timestamp if not already sorted
file_csv = 'test.csv'
f = pd.read_csv(file_csv)
f['#timestamp'] = pd.to_datetime(f['#timestamp'])
f = f.sort_values('#timestamp')
# lazy groupby
groups = f.groupby(['message','name','userID'])
# 1. compute the time differences `timediff` and compare to threshold
f['timediff'] = groups['#timestamp'].diff() < pd.Timedelta(minutes=10)
# 2. find the blocks with cumsum
f['event_count'] = groups['timediff'].cumsum()
# 3. groupby the blocks
out = (f.groupby(['message','name', 'userID'])
.agg({'#timestamp':'first', 'timediff':'count'})
)
keep_col = ['#timestamp', 'message', 'name', 'userID', 'event_count']
new_f = f[keep_col]
new_f.to_csv("aggregationtest.csv", index=False)
But the aggregation is totally wrong because is grouping all the event together even if they don't fall in the 10min threshold.
I am really struggling to understand what I am doing wrong if somebody can help me to understand the issue
UPDATE:
After some testing I managed to get a closer output to what I am expecting but still wrong.
I did some updated on the out variable as follow
out = (f.groupby(['message','name', 'userID', 'timediff']).agg({'#timestamp':'first','message': 'unique','name': 'unique', 'userID': 'unique', 'timediff': 'count'}))
This bit of code now produce an output that looks like:
But even if its grouping now, the count is wrong. Having this csv file
#timestamp,message,name,userID
2021-07-13 21:36:18,Failed to download file,Failed to download file,admin
2021-07-14 03:46:16,Successful Logon for user "user1",Logon Attempt,1
2021-07-14 03:51:16,Successful Logon for user "user1",Logon Attempt,1
2021-07-14 03:54:16,Successful Logon for user "user1",Logon Attempt,1
2021-07-14 04:55:16,Successful Logon for user "user1",Logon Attempt,1
I am expecting to have the following event_count
1
3
1
But I am getting different out come.

You'll have to somehow identify the different periods within the groups. The solution below gives each period within the group a name, which can then be included in the groupby that generates the count:
import pandas as pd
file_csv = 'test.csv'
f = pd.read_csv(file_csv)
f['#timestamp'] = pd.to_datetime(f['#timestamp'])
f = f.sort_values('#timestamp')
def check(item): #taken from https://stackoverflow.com/a/53189777/11380795
diffs = item - item.shift()
laps = diffs > pd.Timedelta('10 min')
periods = laps.cumsum().apply(lambda x: 'period_{}'.format(x+1))
return periods
#create period names
f['period'] = f.groupby(['message','name','userID'])['#timestamp'].transform(check)
#groupby and count
(f.groupby(['message','name', 'userID', 'period']).agg({'#timestamp':'first', 'period': 'count'})).rename(columns={"period": "timediff"}).reset_index()
Output:
message
name
userID
period
#timestamp
timediff
0
Failed to download file
Failed to download file
admin
period_1
2021-07-13 21:36:18
1
1
Successful Logon for user "user1"
Logon Attempt
1
period_1
2021-07-14 03:46:16
3
2
Successful Logon for user "user1"
Logon Attempt
1
period_2
2021-07-14 04:55:16
1

Odoo 12 : How to prevent default field method to be executed

I scheduled an cron that execute every 1st of the month, the purpose is to allocate leave for all employee according to their tags. here is sample of my code :
for leave in leave_type_ids:
for employee_tag in employee_tag_ids:
values = {
'name': 'Allocation mensuelle %s %s' % (now.strftime('%B'), now.strftime('%Y')),
'holiday_status_id': leave.id,
'number_of_days': employee_tag.allocation,
'holiday_type': 'category',
'category_id': employee_tag.id,
}
try:
self.create(values).action_approve()
except Exception as e:
_logger.critical(e)
I want to point out that self is instance of 'hr.leave.allocation'.
The problem is when I create the record, the field employee_id is automatically fill with the user/employee OdooBot (the one who executed the program in the cron) and that is not all, the employee OdooBot was allocated a leaves.
This behavior is due to those codes in odoo native modules :
def _default_employee(self):
return self.env.context.get('default_employee_id') or self.env['hr.employee'].search([('user_id', '=', self.env.uid)], limit=1)
employee_id = fields.Many2one(
'hr.employee', string='Employee', index=True, readonly=True,
states={'draft': [('readonly', False)], 'confirm': [('readonly', False)]}, default=_default_employee, track_visibility='onchange')
So my question is how to prevent this when it's the cron and set it to normal when it's in Form view?
The field "employé" should be empty here (in image below), because it is an allocation by tag.

You must loop over hr.employee because then you can do either of the following:
self.with_context({'default_employee_id': employee.id}).create(...)
OR
self.sudo(employee.id).create(...)

When working with the Stripe API, is it better to sort each request or store locally and perform queries?

This is my first post, I've been lurking for a while.
Some context to my question;
I'm working with the Stripe API to pull transaction data and match these with booking numbers from another API source. (property reservations --> funds received for reconciliation)
I started by just making calls to the API and sorting the data in place using python 3, however it started to get very complicated and I thought I should persist the data in a mongodb stored on localhost. I began to do this, however I decided that storing the sorted data was still just as complicated and the request times were getting quite long, I thought, maybe I should pull all the stripe data and store it locally and then query whatever I needed.
So here I am, with a bunch of code I've written for both and still not alot of progress. I'm a bit lost with the next move. I feel like I should probably pick a path and stick with it. I'm a little unsure what is the "best practise" when working with API's, usually I would turn to YouTube, but I haven't been able to find a video which covers this specific scenario. The amount of data being pulled from the API would be around 100kb per request.
Here is the original code which would grab each query. Recently I've learnt I can use the expand method (I think this is what it's called) so I don't need to dig down so many levels in my for loop.
The goal was to get just the metadata which contains the booking reference numbers that can then be match against a response from my property management systems API. My code is a bit embarrassing, I've kinda just learnt it over the last little while in my downtime from work.
import csv
import datetime
import os
import pymongo
import stripe
"""
We need to find a Valid reservation_ref or reservation_id in the booking.com Metadata. Then we need to match this to a property ID from our list of properties in the book file.
"""
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
stripe_payouts = mydb["stripe_payouts"]
stripe.api_key = "sk_live_thisismyprivatekey"
r = stripe.Payout.list(limit=4)
payouts = []
for data in r['data']:
if data['status'] == 'paid':
p_id = data['id']
amount = data['amount']
meta = []
txn = stripe.BalanceTransaction.list(payout=p_id)
amount_str = str(amount)
amount_dollar = str(amount / 100)
txn_len = len(txn['data'])
for x in range(txn_len):
if x != 0:
charge = (txn['data'][x]['source'])
if charge.startswith("ch_"):
meta_req = stripe.Charge.retrieve(charge)
meta = list(meta_req['metadata'])
elif charge.startswith("re_"):
meta_req = stripe.Refund.retrieve(charge)
meta = list(meta_req['metadata'])
if stripe_payouts.find({"_id": p_id}).count() == 0:
payouts.append(
{
"_id": str(p_id),
"payout": str(p_id),
"transactions": txn['data'],
"metadata": {
charge: [meta]
}
}
)
# TODO: Add error exception to check for po id already in the database.
if len(payouts) != 0:
x = stripe_payouts.insert_many(payouts)
print("Inserted into Database ", len(x.inserted_ids), x.inserted_ids)
else:
print("No entries made")
"_id": str(p_id),
"payout": str(p_id),
"transactions": txn['data'],
"metadata": {
charge: [meta]
This last section doesn't work properly, this is kinda where I stopped and starting calling all the data and storing it in mongodb locally.
I appreciate if you've read this wall of text this far.
Thanks
EDIT:
I'm unsure what the best practise is for adding additional information, but I've messed with the code below per the answer given. I'm now getting a "Key error" when trying to insert the entries into the database. I feel like It's duplicating keys somehow.
payouts = []
def add_metadata(payout_id, transaction_type):
transactions = stripe.BalanceTransaction.list(payout=payout_id, type=transaction_type, expand=['data.source'])
for transaction in transactions.auto_paging_iter():
meta = [transaction.source.metadata]
if stripe_payouts.Collection.count_documents({"_id": payout_id}) == 0:
payouts.append(
{
transaction.id: transaction
}
)
for data in r['data']:
p_id = data['id']
add_metadata(p_id, 'charge')
add_metadata(p_id, 'refund')
# TODO: Add error exception to check for po id already in the database.
if len(payouts) != 0:
x = stripe_payouts.insert_many(payouts)
#print(payouts)
print("Inserted into Database ", len(x.inserted_ids), x.inserted_ids)
else:
print("No entries made")```

To answer your high level question. If you're frequently accessing the same data and that data isn't changing much then it can make sense to try to keep your local copy of the data in sync and make your frequent queries against your local data.
No need to be embarrassed by your code :) we've all been new at something at some point.
Looking at your code I noticed a few things:
Rather than fetch all payouts, then use an if statement to skip all except paid, instead you can pass another filter to only query those paid payouts.
r = stripe.Payout.list(limit=4, status='paid')
You mentioned the expand [B] feature of the API, but didn't use it so I wanted to share how you can do that here with an example. In this case, you're making 1 API call to get the list of payouts, then 1 API call per payout to get the transactions, then 1 API call per charge or refund to get the metadata for charges or metadata for refunds. This results in 1 * (n payouts) * (m charges or refunds) which is a pretty big number. To cut this down, let's pass expand=['data.source'] when fetching transactions which will include all of the metadata about the charge or refund along with the transaction.
transactions = stripe.BalanceTransaction.list(payout=p_id, expand=['data.source'])
Fetching the BalanceTransaction list like this will only work as long as your results fit on one "page" of results. The API returns paginated [A] results, so if you have more than 10 transactions per payout, this will miss some. Instead, you can use an auto-pagination feature of the stripe-python library to iterate over all results from the BalanceTransaction list.
for transaction in transactions.auto_paging_iter():
I'm not quite sure why we're skipping over index 0 with if x != 0: so that may need to be addressed elsewhere :D
I didn't see how or where amount_str or amount_dollar was actually used.
Rather than determining the type of the object by checking the ID prefix like ch_ or re_ you'll want to use the type attribute. Again in this case, it's better to filter by type so that you only get exactly the data you need from the API:
transactions = stripe.BalanceTransaction.list(payout=p_id, type='charge', expand=['data.source'])
I'm unable to test because I lack the same database that you have, but wanted to share a refactoring of your code that you may consider.
r = stripe.Payout.list(limit=4, status='paid')
payouts = []
for data in r['data']:
p_id = data['id']
amount = data['amount']
meta = []
amount_str = str(amount)
amount_dollar = str(amount / 100)
transactions = stripe.BalanceTransaction.list(payout=p_id, type='charge', expand=['data.source'])
for transaction in transactions.auto_paging_iter():
meta = list(transaction.source.metadata)
if stripe_payouts.find({"_id": p_id}).count() == 0:
payouts.append(
{
"_id": str(p_id),
"payout": str(p_id),
"transactions": transactions,
"metadata": {
charge: [meta]
}
}
)
transactions = stripe.BalanceTransaction.list(payout=p_id, type='refund', expand=['data.source'])
for transaction in transactions.auto_paging_iter():
meta = list(transaction.source.metadata)
if stripe_payouts.find({"_id": p_id}).count() == 0:
payouts.append(
{
"_id": str(p_id),
"payout": str(p_id),
"transactions": transactions,
"metadata": {
charge: [meta]
}
}
)
# TODO: Add error exception to check for po id already in the database.
if len(payouts) != 0:
x = stripe_payouts.insert_many(payouts)
print("Inserted into Database ", len(x.inserted_ids), x.inserted_ids)
else:
print("No entries made")
Here's a further refactoring using functions defined to encapsulate just the bit adding to the database:
r = stripe.Payout.list(limit=4, status='paid')
payouts = []
def add_metadata(payout_id, transaction_type):
transactions = stripe.BalanceTransaction.list(payout=payout_id, type=transaction_tyep, expand=['data.source'])
for transaction in transactions.auto_paging_iter():
meta = list(transaction.source.metadata)
if stripe_payouts.find({"_id": payout_id}).count() == 0:
payouts.append(
{
"_id": str(payout_id),
"payout": str(payout_id),
"transactions": transactions,
"metadata": {
charge: [meta]
}
}
)
for data in r['data']:
p_id = data['id']
add_metadata('charge')
add_metadata('refund')
# TODO: Add error exception to check for po id already in the database.
if len(payouts) != 0:
x = stripe_payouts.insert_many(payouts)
print("Inserted into Database ", len(x.inserted_ids), x.inserted_ids)
else:
print("No entries made")
[A] https://stripe.com/docs/api/pagination
[B] https://stripe.com/docs/api/expanding_objects

Arango query, collection with Edge count

Really new to Arango and I'm experimenting with it, so please bear with me. I have a feed collection and every feed can be liked by a user
[user]---likes_feed--->[feed].
I'm trying to create a query that will return a feed by its author, and add the number of likes to the result. This is what I have so far and it seems to work, but it only returns feed that have at least 1 like (An edge exists between the feed and a user)
below is my query
FOR f IN feed
SORT f.creationDate
FILTER f.author == #user_key
LIMIT #start_index, #end_index
FOR x IN INBOUND CONCAT('feed','/',f._key) likes_feed
OPTIONS {bfs: true, uniqueVertices: 'global'}
COLLECT feed = f WITH COUNT INTO counter
RETURN {
'feed':feed,
likes: counter
}
this is an example of result
[
"feed":{
"_key":"string",
"_id":"users_feed/1680835",
"_rev":"_W8zRPqe--_",
"author":"author_a",
"creationDate":"98879845467787979",
"title":"some title",
"caption":"some caption'
},
"likes":1
]
If a feed has no likes, no edge inbound to that feed, how do I return the likes count as 0
Something like this?
[
"feed":{
"_key":"string",
"_id":"users_feed/1680835",
"_rev":"_W8zRPqe--_",
"author":"author_a",
"creationDate":"98879845467787979",
"title":"some title",
"caption":"some caption'
},
"likes":0
]

So finally I found the solution. I had to create a graph and traverse it. Final result below
FOR f IN users_feed
SORT f.creationDate
FILTER f.author == #author_id
LIMIT #start_index, #end_index
COLLECT feed = f
LET inboundEdges = LENGTH(FOR v IN 1..1 INBOUND feed GRAPH 'likes_graph' RETURN 1)
RETURN {
feed :feed
,likes: inboundEdges
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

SqlAlchemy dynamic loading of related entities from query - python-3.x

Related

How to add multiple fields' reference to "unique_together" error message

pandas aggregation based on timestamp threshold

Odoo 12 : How to prevent default field method to be executed

When working with the Stripe API, is it better to sort each request or store locally and perform queries?

Arango query, collection with Edge count

Categories

Resources