Arangodb: can I use a subquery to determine starting node for graph traversal - subquery

I am using arangodb to build a permissions system.
I would like to use a subquery to determine starting node for a graph traversal. This is what I attempted
FOR vert IN 1..5
OUTBOUND (
FOR v, e, p IN 1..2
OUTBOUND 'users/jill'
GranterGrant, GrantGrantee
FILTER e.edgeType == 'GrantGrantee'
AND v.type == 'User'
AND v._id == 'users/jim'
LIMIT 1
RETURN p.vertices[1]
)
RolePrivilege, GrantRole, GrantPrivilege
FILTER vert.type == 'Privilege'
AND vert._id == 'privileges/JournalRead'
LIMIT 1
RETURN vert._id
The inner query is meant to find a permissions grant node from jill to jim, and the outer query determines if there is a path from the grant node to the privilege JournalRead
The subquery works when run by itself, and returns a vertex that I would like to use as the starting node for the outer query. The outer query works if I hard code the starting node.
However, when I run the entire query above, arango responds with
Warnings:
[10], 'Invalid input for traversal: Only id strings or objects with _id are allowed'
Result:
[]
Note, I also tried RETURN p.vertices[1]._id within the subquery with the same outcome.
So, is it possible to use a subquery to determine the starting node of a graph traversal?

You can to get the list of starting nodes first and assign them to a variable instead of using a subquery.
So in you case, it would be something like:
let initialNodes = (
FOR v, e, p IN 1..2
OUTBOUND 'users/jill'
GranterGrant, GrantGrantee
FILTER e.edgeType == 'GrantGrantee'
AND v.type == 'User'
AND v._id == 'users/jim'
LIMIT 1
RETURN p.vertices[1]
)
FOR initialNode IN initialNodes
FOR vert IN 1..5
OUTBOUND initialNode
RolePrivilege, GrantRole, GrantPrivilege
FILTER vert.type == 'Privilege'
AND vert._id == 'privileges/JournalRead'
LIMIT 1
RETURN vert._id
Note that you are not limited to a single initial node. You can return multiple one in the subquery if needed.

Related

How to execute python function at the given datetime

I have a list of dictionaries. Each item contains a datetime field in string format:
items = [{"Name":"Fooo","Time":"2 Jun, 7:20PM","Location":"LA"},
{"Name":"Yeam","Time":"27 Jun, 9:20PM","Location":"CA"},
{"Name":"Bar","Time":"12 Aug, 7:50PM","Location":"NY"},
{"Name":"Ahoy","Time":"20 Jul, 3:20AM","Location":"TX"}]
def myawesomefunc(item):
# Do something awesome
# and return the result
pass
Now I want to invoke myawesomefunc for each item that satisfy:
datetime.now() >= datetime.strptime(item['Time'], '%d %b, %I:%M%p')
I can't sort items because it will changes continuously. Since the list may contain 30k+ items, iterating throw each item in items will be very time-consuming.
So how do I do this?
I suggest to use some sort of data structure that make search and insertion more efficient, for example Binary Search Tree (BST).
Let us define some notation:
SubTree(N) : Function that return the set of descendent nodes of N including N.
Parent(N) : Function that return the parent of N.
X.left, X.right : The left and right children of a node X.
In case of BST, your search key will be the timestamp of each item. At equal intervals you will search for a node X with a key less or equal to datetime.now(), and you will execute myawesomefunc for each node in the set S:
S = {X} ⋃ SubTree(X.left) ⋃ (SubTree(X.right) if X.right <= datetime.now() else {})
Then you have to update your tree to exclude all processed node:
Parent(X).left = None if X.right <= datetime.now() else X.right
The insertion of new item is straight forward (normal insertion as any BST).
Now regarding execution of myawesomefunc you have two cases:
myawesomefunc is IO/bound operation: use ThreadPoolExecutor.
myawesomefunc is CPU/bound operation: use ProcessPoolExecutor.

Peewee 3 - Order by and Recursive common table expression (cte)

Tools: Peewee 3, SQLite, Python 3
Official documentation for Peewee 3 recursive common table expression (cte):
http://docs.peewee-orm.com/en/latest/peewee/querying.html#common-table-expressions
I am storing a family tree in a simple self-referencing table called Person.
Structure (see below): id, name, parent, custom_order
Notes:
- parent field equals null if this person is an ancestor / root item, otherwise equals to id of parent record if this person is a child
- custom_order is a float number (score to determine who is the user's favourite person)
Objective:
I would like to retrieve the whole family tree and ORDER the results FIRST by parent and SECOND by custom_order.
Issue:
I managed to get the results list but the ORDER is wrong.
DB model
class Person(Model):
name = CharField()
parent = ForeignKeyField('self', backref='children', null = True)
custom_order = FloatField()
Note: if parent field is null then it's a root item
Query code
# Define the base case of our recursive CTE. This will be people that have a null parent foreign-key.
Base = Person.alias()
base_case = (Base
.select(Base)
.where(Base.parent.is_null())
.cte('base', recursive=True))
# Define the recursive terms.
RTerm = Person.alias()
recursive = (RTerm
.select(RTerm)
.join(base_case, on=(RTerm.parent == base_case.c.id)))
# The recursive CTE is created by taking the base case and UNION ALL with the recursive term.
cte = base_case.union_all(recursive)
# We will now query from the CTE to get the people
query = cte.select_from(cte.c.id, cte.c.name, cte.c.parent_id, cte.c.custom_order).order_by(cte.c.parent_id, cte.c.custom_order)
print(query.sql())
Printed query syntax
('WITH RECURSIVE "base" AS
(
SELECT "t1"."id", "t1"."name", "t1"."parent_id", "t1"."custom_order" FROM "person" AS "t1" WHERE ("t1"."parent_id" IS ?)
UNION ALL
SELECT "t2"."id", "t2"."name", "t2"."parent_id", "t2"."custom_order" FROM "person" AS "t2" INNER JOIN "base" ON ("t2"."parent_id" = "base"."id")
)
SELECT "base"."id", "base"."name", "base"."parent_id" FROM "base"
ORDER BY "base"."parent_id", "base"."custom_order"',
[None])
Root of the problem
The code posted in the question works correctly. I verified it by printing the query results to the console:
query = cte.select_from(cte.c.id, cte.c.name, cte.c.parent_id, cte.c.custom_order).order_by(cte.c.parent_id, cte.c.custom_order).dicts()
print(json.dumps(list(query), indent=4))
The problem originated from the fact that I was passing the query results to a nested python Dictionary before printing them to the console, BUT the Python Dictionary is unordered. So, no wonder the printed results were in a different order than the database results.
Solution
Use a Python Ordered Dictionary if you want to store the query results in a fixed order:
import collections
treeDictionary = collections.OrderedDict()

Neo4j Attempting to perform second MATCH query on results from first MATCH

In Neo4j, I am using a community detection algorithm and returning only the nodes and relationships belonging to the community assigned with the id '10', as seen below.
MATCH p=(a:Function)-[:BASED_ON]->(b:Requirement)-[:RESULT_OF]->(c:Scenario)
WHERE a.community = 10 AND b.community = 10 AND c.community = 10
RETURN p
I now want to further filter this subset of the graph database and display nodes and relationships belonging to community '10' that have a PageRank, determined using the PageRank centrality algorithm, that is greater than a specified value, for example 1.
I have attempted to do so using the following:
MATCH p=(a:Function)-[:BASED_ON]->(b:Requirement)-[:RESULT_OF]->(c:Scenario)
WHERE a.community = 10 AND b.community = 10 AND c.community = 10 AND c.pagerank >1
RETURN p
However this doesnt return the required result. Nodes of type 'Function' and 'Requirement' that themselves have pagerank greater than 1 are excluded if they are related to a 'Scenario' node with pagerank less than 1, as these nodes do not satisfy the MATCH clause.
What query could I use to ONLY display nodes belonging to community '10' that have pagerank greater than 1, independent of the pagerank of nodes they are connected to. In other words I want to return nodes with pagerank greater than 1 even if they are connected to another node, such as a 'Scenario' with pagerank less than 1, as seen in the previous code sample.
Any assistance would be greatly appreciated.
without knowing much more i'd suggest that you create a node Community with id and link them to Function , Requirement and Scenario. and start there. the traversal will be much quicker.
match (:Community{Id:10})-[]-(x)
where (x:Function or x:Req or req:Sec) and x.pagerank >...
Wouldn't you achieve what you want with an OR clause ?
MATCH p=(a:Function)-[:BASED_ON]->(b:Requirement)-[:RESULT_OF]->(c:Scenario)
WHERE a.community = 10 AND b.community = 10 AND c.community = 10 AND (a.pagerank >1 OR b.pagerank >1 OR c.pagerank >1)
RETURN p
Achieving then to return results that have a pagerank > 1 on one of the nodes ?

How to get the root node's key of a graph in ArangoDB?

I have a graph in ArangoDB whose root node is 'X'. Now "a,b,c,d,e,f" are the siblings of 'X' direct or grand grand siblings. Now from a given siblings node "a,b,c,d,e or f" I want to get to node 'X'. Is there any general AQL query to traverse directly to the root node of any graph ?
To provide an exact example I would need to know a bit more, but this is one of several solutions.
Assuming that the nodes are connected by "child" edges and the direction of the edges go from parent to child. You would traverse the tree up or INBOUND
FOR v,e,p IN 1..50 INBOUND '(id of starting node a,b,etc.)' child RETURN p.vertices
If you know how many hops maximum to the root, change the 50 to that value.
This statement will return all of the paths and intermediate paths from the starting node through the child links to the head node. To return only the path to the head node, you would have to filter out the intermediate paths. This could be done with a check to see that there is no parent vertex.
FOR v,e,p IN 1..50 INBOUND '(id of starting node a,b,etc.)' child
FILTER LENGTH(EDGES(child,v._id,'inbound'))==0 RETURN p.vertices
which will filter out all paths that do not end at a root vertex.
RHSMan's answer helped me but here is cleaned up a bit
LET ref_people = (
FOR p IN people RETURN p._id
)
LET l = (
FOR id IN ref_people
FOR link IN links FILTER id == link._from RETURN id
)
RETURN MINUS(ref_people, l)
I came across this as I had the same problem but the above is outdated:
I did the following:
let ref_items = (for s in skills
return s._id)
let c = (for item in ref_skills
for sk in skill_skill
filter item == sk._to
return item)
return MINUS(ref_skills, c)

GRAPH_COMMON_NEIGHBORS returning only common neighbors without the path?

I'm using graph common neighbours, but I only need the resulting vertexes to be returned, no need for fancy
[
{collection/1:{
collection/2:[
{_id:3 ...},
{_id:7 ...}
]
}
]
I need only the part _id:3 and _id:7 returned:
[
{_id:3 ...},
{_id:7 ...}
]
now I'm trying to break my head how to apprach this, as it's not list I can not FLATTEN it. Is there some hidden feature or hook to return only resulting vertices? Or should I do this manually using two GRAPH_NEIGHBORS query, as I believe that what it does in general, and limit the second query with first query?
OK final query to return results in format I want:
FOR entry
IN GRAPH_COMMON_NEIGHBORS('nodes', "node/137789480179", "node/137987398899", {direction: 'outbound'}, {direction: 'outbound'})
FOR a in ATTRIBUTES(entry) FOR b in ATTRIBUTES(entry[a]) RETURN entry[a][b]
It returns the result as I wanted, though It still may take some improvements like sorting of results by title alphabetically... I have tried and it does not work.
the reason for the results structure is that you can call it with examples instead of single vertices. In your case of course it is obvious to which vertex pair the neighbors belong.
To sort your result you could do the following:
FOR entry IN GRAPH_COMMON_NEIGHBORS("nodes", "node/137789480179", "node/137987398899", {maxDepth : 2}, {maxDepth : 2}) LET X = entry["node/137789480179"]["node/137987398899"] FOR v IN X SORT v.name DESC RETURN v

Resources