Arango query, collection with Edge count

Arango query, collection with Edge count - arangodb

Really new to Arango and I'm experimenting with it, so please bear with me. I have a feed collection and every feed can be liked by a user
[user]---likes_feed--->[feed].
I'm trying to create a query that will return a feed by its author, and add the number of likes to the result. This is what I have so far and it seems to work, but it only returns feed that have at least 1 like (An edge exists between the feed and a user)
below is my query
FOR f IN feed
SORT f.creationDate
FILTER f.author == #user_key
LIMIT #start_index, #end_index
FOR x IN INBOUND CONCAT('feed','/',f._key) likes_feed
OPTIONS {bfs: true, uniqueVertices: 'global'}
COLLECT feed = f WITH COUNT INTO counter
RETURN {
'feed':feed,
likes: counter
}
this is an example of result
[
"feed":{
"_key":"string",
"_id":"users_feed/1680835",
"_rev":"_W8zRPqe--_",
"author":"author_a",
"creationDate":"98879845467787979",
"title":"some title",
"caption":"some caption'
},
"likes":1
]
If a feed has no likes, no edge inbound to that feed, how do I return the likes count as 0
Something like this?
[
"feed":{
"_key":"string",
"_id":"users_feed/1680835",
"_rev":"_W8zRPqe--_",
"author":"author_a",
"creationDate":"98879845467787979",
"title":"some title",
"caption":"some caption'
},
"likes":0
]

So finally I found the solution. I had to create a graph and traverse it. Final result below
FOR f IN users_feed
SORT f.creationDate
FILTER f.author == #author_id
LIMIT #start_index, #end_index
COLLECT feed = f
LET inboundEdges = LENGTH(FOR v IN 1..1 INBOUND feed GRAPH 'likes_graph' RETURN 1)
RETURN {
feed :feed
,likes: inboundEdges
}

Related

SqlAlchemy dynamic loading of related entities from query

The use case is pretty simple: I have 3 cascading entities
class Customer():
users = relationship(
'User',
backref=backref('customer', lazy="subquery"),
cascade=DbConstants.RELATIONSHIP_CASCADE_ALL_DELETE)
class User():
reports = relationship('Report',
backref=backref('user', lazy="subquery"),
lazy="subquery",
cascade=DbConstants.RELATIONSHIP_CASCADE_ALL_DELETE)
class Report():
date_time_start = Column(DateTime())
date_time_end = Column(DateTime())
I want to get all these entities in one query, but i want to filter the reports by their date.
customers = session.query(Customer).join(
User, Customer.users, isouter=True
).join(
Report,
# this is where the reports should be filtered
and_(Report.user_id == User.id, Report.date_time_start > date_start, Report.date_time_start < date_end),
).all()
From this I get the expected entity tree:
[
customers:
users: [ reports: [] ]
]
Except i get ALL the reports of the user in the array, no matter the start date.
This means if I check the result like customers[0].users[0].reports, all the reports belonging to this user will be output. Is there a way so the reports attribute is only populated with the rows from the query ?

How can we run loop inside feature file to get data from one api and pass in another api [duplicate]

I am using Karate version 0.8.0.1 and I want to perform following steps to test some responses.
I make a Get to web service 1
find the value for currencies from the response of web service 1 using jsonpath: $.currencies
Step 2 gives me following result: ["USD","HKD","SGD","INR","GBP"]
Now I use Get method for web service 2
From the response of web service 2 I want to get the value of price field with json-path something like below(passing the values from step 3 above):
$.holding[?(#.currency=='USD')].price
$.holding[?(#.currency=='HKD')].price
$.holding[?(#.currency=='SGD')].price
$.holding[?(#.currency=='INR')].price
$.holding[?(#.currency=='GBP')].price
So there are so many currencies but I want to verify price for only the currencies returned by web service 1(which will be always random) and pass it on to the the output of web service 2 to get the price.
Once i get the price I will match each price value with the value returned from DB.
I am not sure if there is any simple way in which I can pass the values returned by service 1 into the json-path of service 2 one by one and get the results required. Any suggestions for doing this will be helpful As this will be the case for most of the web services I will be automating.

There are multiple ways to do this in Karate. The below should give you a few pointers. Note how there is a magic variable _$ when you use match each. And since you can reference any other JSON in scope, you have some very powerful options.
* def expected = { HKD: 1, INR: 2, USD: 3}
* def response1 = ['USD', 'HKD', 'INR']
* def response2 = [{ currency: 'INR', price: 2 }, { currency: 'USD', price: 3 }, { currency: 'HKD', price: 1 }]
* match response2[*].currency contains only response1
* match each response2 contains { price: '#(expected[_$.currency])' }
You probably already have seen how you can call a second feature file in a loop which may be needed for your particular use case. One more piece of the puzzle may be this - it is very easy to transform any JSON array into the form Karate expects for calling a feature file in a loop:
* def response = ['USD', 'HKD', 'INR']
* def data = karate.map(response, function(x){ return { code: x } })
* match data == [{code: 'USD'}, {code: 'HKD'}, {code: 'INR'}]
EDIT - there is a short-cut to convert an array of primitives to an array of objects now: https://stackoverflow.com/a/58985917/143475
Also see this answer: https://stackoverflow.com/a/52845718/143475

When working with the Stripe API, is it better to sort each request or store locally and perform queries?

This is my first post, I've been lurking for a while.
Some context to my question;
I'm working with the Stripe API to pull transaction data and match these with booking numbers from another API source. (property reservations --> funds received for reconciliation)
I started by just making calls to the API and sorting the data in place using python 3, however it started to get very complicated and I thought I should persist the data in a mongodb stored on localhost. I began to do this, however I decided that storing the sorted data was still just as complicated and the request times were getting quite long, I thought, maybe I should pull all the stripe data and store it locally and then query whatever I needed.
So here I am, with a bunch of code I've written for both and still not alot of progress. I'm a bit lost with the next move. I feel like I should probably pick a path and stick with it. I'm a little unsure what is the "best practise" when working with API's, usually I would turn to YouTube, but I haven't been able to find a video which covers this specific scenario. The amount of data being pulled from the API would be around 100kb per request.
Here is the original code which would grab each query. Recently I've learnt I can use the expand method (I think this is what it's called) so I don't need to dig down so many levels in my for loop.
The goal was to get just the metadata which contains the booking reference numbers that can then be match against a response from my property management systems API. My code is a bit embarrassing, I've kinda just learnt it over the last little while in my downtime from work.
import csv
import datetime
import os
import pymongo
import stripe
"""
We need to find a Valid reservation_ref or reservation_id in the booking.com Metadata. Then we need to match this to a property ID from our list of properties in the book file.
"""
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
stripe_payouts = mydb["stripe_payouts"]
stripe.api_key = "sk_live_thisismyprivatekey"
r = stripe.Payout.list(limit=4)
payouts = []
for data in r['data']:
if data['status'] == 'paid':
p_id = data['id']
amount = data['amount']
meta = []
txn = stripe.BalanceTransaction.list(payout=p_id)
amount_str = str(amount)
amount_dollar = str(amount / 100)
txn_len = len(txn['data'])
for x in range(txn_len):
if x != 0:
charge = (txn['data'][x]['source'])
if charge.startswith("ch_"):
meta_req = stripe.Charge.retrieve(charge)
meta = list(meta_req['metadata'])
elif charge.startswith("re_"):
meta_req = stripe.Refund.retrieve(charge)
meta = list(meta_req['metadata'])
if stripe_payouts.find({"_id": p_id}).count() == 0:
payouts.append(
{
"_id": str(p_id),
"payout": str(p_id),
"transactions": txn['data'],
"metadata": {
charge: [meta]
}
}
)
# TODO: Add error exception to check for po id already in the database.
if len(payouts) != 0:
x = stripe_payouts.insert_many(payouts)
print("Inserted into Database ", len(x.inserted_ids), x.inserted_ids)
else:
print("No entries made")
"_id": str(p_id),
"payout": str(p_id),
"transactions": txn['data'],
"metadata": {
charge: [meta]
This last section doesn't work properly, this is kinda where I stopped and starting calling all the data and storing it in mongodb locally.
I appreciate if you've read this wall of text this far.
Thanks
EDIT:
I'm unsure what the best practise is for adding additional information, but I've messed with the code below per the answer given. I'm now getting a "Key error" when trying to insert the entries into the database. I feel like It's duplicating keys somehow.
payouts = []
def add_metadata(payout_id, transaction_type):
transactions = stripe.BalanceTransaction.list(payout=payout_id, type=transaction_type, expand=['data.source'])
for transaction in transactions.auto_paging_iter():
meta = [transaction.source.metadata]
if stripe_payouts.Collection.count_documents({"_id": payout_id}) == 0:
payouts.append(
{
transaction.id: transaction
}
)
for data in r['data']:
p_id = data['id']
add_metadata(p_id, 'charge')
add_metadata(p_id, 'refund')
# TODO: Add error exception to check for po id already in the database.
if len(payouts) != 0:
x = stripe_payouts.insert_many(payouts)
#print(payouts)
print("Inserted into Database ", len(x.inserted_ids), x.inserted_ids)
else:
print("No entries made")```

To answer your high level question. If you're frequently accessing the same data and that data isn't changing much then it can make sense to try to keep your local copy of the data in sync and make your frequent queries against your local data.
No need to be embarrassed by your code :) we've all been new at something at some point.
Looking at your code I noticed a few things:
Rather than fetch all payouts, then use an if statement to skip all except paid, instead you can pass another filter to only query those paid payouts.
r = stripe.Payout.list(limit=4, status='paid')
You mentioned the expand [B] feature of the API, but didn't use it so I wanted to share how you can do that here with an example. In this case, you're making 1 API call to get the list of payouts, then 1 API call per payout to get the transactions, then 1 API call per charge or refund to get the metadata for charges or metadata for refunds. This results in 1 * (n payouts) * (m charges or refunds) which is a pretty big number. To cut this down, let's pass expand=['data.source'] when fetching transactions which will include all of the metadata about the charge or refund along with the transaction.
transactions = stripe.BalanceTransaction.list(payout=p_id, expand=['data.source'])
Fetching the BalanceTransaction list like this will only work as long as your results fit on one "page" of results. The API returns paginated [A] results, so if you have more than 10 transactions per payout, this will miss some. Instead, you can use an auto-pagination feature of the stripe-python library to iterate over all results from the BalanceTransaction list.
for transaction in transactions.auto_paging_iter():
I'm not quite sure why we're skipping over index 0 with if x != 0: so that may need to be addressed elsewhere :D
I didn't see how or where amount_str or amount_dollar was actually used.
Rather than determining the type of the object by checking the ID prefix like ch_ or re_ you'll want to use the type attribute. Again in this case, it's better to filter by type so that you only get exactly the data you need from the API:
transactions = stripe.BalanceTransaction.list(payout=p_id, type='charge', expand=['data.source'])
I'm unable to test because I lack the same database that you have, but wanted to share a refactoring of your code that you may consider.
r = stripe.Payout.list(limit=4, status='paid')
payouts = []
for data in r['data']:
p_id = data['id']
amount = data['amount']
meta = []
amount_str = str(amount)
amount_dollar = str(amount / 100)
transactions = stripe.BalanceTransaction.list(payout=p_id, type='charge', expand=['data.source'])
for transaction in transactions.auto_paging_iter():
meta = list(transaction.source.metadata)
if stripe_payouts.find({"_id": p_id}).count() == 0:
payouts.append(
{
"_id": str(p_id),
"payout": str(p_id),
"transactions": transactions,
"metadata": {
charge: [meta]
}
}
)
transactions = stripe.BalanceTransaction.list(payout=p_id, type='refund', expand=['data.source'])
for transaction in transactions.auto_paging_iter():
meta = list(transaction.source.metadata)
if stripe_payouts.find({"_id": p_id}).count() == 0:
payouts.append(
{
"_id": str(p_id),
"payout": str(p_id),
"transactions": transactions,
"metadata": {
charge: [meta]
}
}
)
# TODO: Add error exception to check for po id already in the database.
if len(payouts) != 0:
x = stripe_payouts.insert_many(payouts)
print("Inserted into Database ", len(x.inserted_ids), x.inserted_ids)
else:
print("No entries made")
Here's a further refactoring using functions defined to encapsulate just the bit adding to the database:
r = stripe.Payout.list(limit=4, status='paid')
payouts = []
def add_metadata(payout_id, transaction_type):
transactions = stripe.BalanceTransaction.list(payout=payout_id, type=transaction_tyep, expand=['data.source'])
for transaction in transactions.auto_paging_iter():
meta = list(transaction.source.metadata)
if stripe_payouts.find({"_id": payout_id}).count() == 0:
payouts.append(
{
"_id": str(payout_id),
"payout": str(payout_id),
"transactions": transactions,
"metadata": {
charge: [meta]
}
}
)
for data in r['data']:
p_id = data['id']
add_metadata('charge')
add_metadata('refund')
# TODO: Add error exception to check for po id already in the database.
if len(payouts) != 0:
x = stripe_payouts.insert_many(payouts)
print("Inserted into Database ", len(x.inserted_ids), x.inserted_ids)
else:
print("No entries made")
[A] https://stripe.com/docs/api/pagination
[B] https://stripe.com/docs/api/expanding_objects

PyMongo: how to query a series and find the closest match

This is a simplified example of how my data is stored in MongoDB of a single athlete:
{ "_id" : ObjectId('5bd6eab25f74b70e5abb3326'),
"Result" : 12,
"Race" : [0.170, 4.234, 9.170]
"Painscore" : 68,
}
Now when this athlete has performed a race I want to search for the race that was MOST similar to the current one, and hence I want to compare both painscores.
IOT get the best 'match' I tried this:
query = [0.165, 4.031, 9.234]
closestBelow = db[athlete].find({'Race' : {"$lte": query}}, {"_id": 1, "Race": 1}).sort("Race", -1).limit(2)
for i in closestBelow:
print(i)
closestAbove = db[athlete].find({'Race' : {"$gte": query}}, {"_id": 1, "Race": 1}).sort("Race", 1).limit(2)
for i in closestAbove:
print(i)
This does not seem to work.
Question1: How can I give the mentioned query IOT find the race in Mongo that matches the best/closes?.. When taken in account that a race is almost never exactly the same.
Question2: How can i see a percentage of match per document so that an athlete knows how 'serious' he must interpreted the pain score?
Thank you.

Thanks to this website I found a solution: http://dataaspirant.com/2015/04/11/five-most-popular-similarity-measures-implementation-in-python/
Step 1: find your query;
Step 2: make a first selection based on query and append the results into a list (for example average);
Step 3: use a for loop to compare every item in the list with your query. Use Euclidean distance for this;
Step 4: when you have your matching processed, define the best match into a variable.
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
Database = 'Firstclass'
def newSearch(Athlete):
# STEP 1
db = client[Database]
lastDoc = [i for i in db[Athlete].find({},{ '_id': 1, 'Race': 1, 'Avarage': 1}).sort('_id', -1).limit(1)]
query = { '$and': [ { 'Average' : {'$gte': lastDoc[0].get('Average')*0.9} }, { 'Average' : {'$lte': lastDoc[0].get('Average')*1.1} } ] }
funnel = [x for x in db[Athlete].find(query, {'_id': 1, 'Race': 1}).sort('_id', -1).limit(15)]
#STEP 2
compareListID = []
compareListRace = []
for x in funnel:
if lastDoc[0].get('_id') != x.get('_id'):
compareListID.append(x.get('_id'))
compareListRace.append(x.get('Race'))
#STEP 3
for y in compareListRace:
ED = euclidean_distance(lastDoc[0].get('Race'),y)
ESlist.append(ED)
#STEP 4
matchObjID = compareListID[numpy.argmax(ESlist)]
matchRace = compareListRace[numpy.argmax(ESlist)]
newSearch('Jim')

ArangoDB: GRAPH_EDGES command very slow (more than 20 sec) on small collections

I'am evaluating ArangoDB and I see that GRAPH_EDGES and GRAPH_VERTICES commands are very slow, on small collections (300 vertices).
I have 3 collections:
TactiveService( 300 Vertices) --> TusesCommand( 300 Edges) --> Tcommand (1 Vertex)
Using GRAPH_EDGES, this query take 24 sec
FOR service IN TactiveService
LET usesCommand = (
return FIRST(GRAPH_EDGES("topvision", {}, { edgeExamples : [{_from: service._id}], edgeCollectionRestriction : "TusesCommand", includeData:true, maxDepth : 1 }))
)
LET command = DOCUMENT(usesCommand[0]._to)
RETURN { service : service, usesCommand: usesCommand[0], command:command}
For the same result, this query takes 0.020 sec
FOR service IN TactiveService
LET usesCommand = (
FOR usesCommand IN TusesCommand
FILTER usesCommand._from == service._id
RETURN usesCommand
)
LET command = DOCUMENT(usesCommand[0]._to)
RETURN { service : service, usesCommand: usesCommand[0], command:command}
GRAPH_EDGES is unusable for me in FOR statement (same problem with GRAPH_VERTICES).
Ideas on the reason of this slowness are welcome.

We are well aware that GRAPH_EDGES is not well suited to be used like this in a query.
We therefore introduced AQL pattern matching traversals, which should perform significantly better.
You could formulate your query like this, replacing the GRAPH_EDGES with a traversal:
FOR service IN TactiveService
LET usesCommand = (
FOR v, e IN 1..1 OUTBOUND service "TusesCommand"
FILTER e._from == service._id RETURN e
)
LET command = DOCUMENT(usesCommand[0]._to)
RETURN { service : service, usesCommand: usesCommand[0], command:command}
Please note that the specified filter is implicitely true because of we queried for OUTBOUND edges starting from service - so e._from will always be equal to service._id. Instead of specifying GRAPH "topvision" and later on limit the edge collections we want to take into account in the traversal, we use the an anonymous graph query only taking into account the edge collection TusesCommand as you did.
So simplifying it a little more, the query could look like:
FOR service IN TactiveService
LET usesCommand = (
FOR v, e IN 1..1 OUTBOUND service "TusesCommand" RETURN {v: v, e: e}
)
RETURN { service : service, usesCommand: usesCommand}
This may return more vertices than your query, but it will only fetch them once; so the result set may be bigger, but the number of index lookups is reduced by the removed DOCUMENT calls of the query.
As you already noticed and formulated with your second query, if your actual problem works better with a classic join ArangoDB offers you the freedom of choice to work with your data like that.
edit: Michael is right for sure, the direction has to be OUTBOUND

if for some reason you do not want to upgrade to 2.8 as #dothebart suggests. You can also fix the old query.
Original:
FOR service IN TactiveService
LET usesCommand = (
return FIRST(GRAPH_EDGES("topvision", {}, { edgeExamples : [{_from: service._id}], edgeCollectionRestriction : "TusesCommand", includeData:true, maxDepth : 1 }))
)
LET command = DOCUMENT(usesCommand[0]._to)
RETURN { service : service, usesCommand: usesCommand[0], command:command}
The slow part of the query is finding the starting point. The API of GRAPH_EDGES uses the second parameter as start example. {} matches to all start Points. So it now computes all outbound edges for all vertices first (this is expensive, as this actually means for every vertex in the start collection, we collect all edges for every vertex in the start collection). Than it post filters all found edges with the example you gave (Which removes almost all of the edges again).
If you replace the start example by the _id of the start vertex it will just collect the edges for this specific vertex.
Now you are also interested in the edges of only one direction (OUTBOUND) so you can just give it in the options as well (so only edges with _from == service._id are fetched by GRAPH_EDGES in first place).
FOR service IN TactiveService
LET usesCommand = (
RETURN FIRST(GRAPH_EDGES("topvision", service._id, { edgeCollectionRestriction : "TusesCommand", includeData:true, maxDepth : 1, direction: 'outbound' }))
)
LET command = DOCUMENT(usesCommand[0]._to)
RETURN { service : service, usesCommand: usesCommand[0], command:command}
However I still expect that the version of #dothebart is faster in 2.8 and i would also recommend to switch to the newest version.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Arango query, collection with Edge count - arangodb

Related

SqlAlchemy dynamic loading of related entities from query

How can we run loop inside feature file to get data from one api and pass in another api [duplicate]

When working with the Stripe API, is it better to sort each request or store locally and perform queries?

PyMongo: how to query a series and find the closest match

ArangoDB: GRAPH_EDGES command very slow (more than 20 sec) on small collections

Categories

Resources