I am new to rtree/btree data structures. The creation of the tree is a bottom-to-up process but searching for a node/range search/knn search are all top-to-bottom process. I am using knn search but wanting to do some improvement: my data are a trajectory of points, which are spatially close to each other. In order to search the KNNs for every point on the entire trajectory, I want to search one point first, then for other points, I don't want to start from the root again, instead I want to start from the results of the first point, and go upper to their parents. This will enable me to avoid searching a lot of unnecessary pages. The problem here is how can i go upper from the child to its parent in a rtree/btree structure? Should I change the tree creation process and whenever the split happens, fill the parent[] property of the child? Is there any other simpler ways for this problem?
You could:
Store a pointer to parent node in children node to know how to move upwards in the nodes structure. So between queries store some pointer to the last leaf node and from there using the pointer to parent move upwards, check the parent node, then again move upwards etc. until a node where a different subtree should be picked.
Store only pointers to children nodes in every node. Then between queries save the whole path of nodes used to get from the root to a leaf in the last query. Then having a last path you could go backwards in this collection which would represent going upwards from the leaf used in the last query to a node where you should pick a different subtree.
Related
Given each node, represented with rectangle, how can I place nodes that are placed around parent node in circle without overlapping with any other nodes?
Additional criteria are following.
There is one and only one origin node
each node can add new child
there are no limitation of offspring
closer the node is to origin, the higher the placement priority is.
following image is example of what the end result may look like.
I want to create a new node (event nodes) among a set of nodes (report nodes) according to the indicator nodes (each report node has several indicator nodes related to it). I want to set the new event nodes with the rules:
a report nodes is only connected one event node
if more than one indicator nodes has the same property "pattern", then they belong to the same event node
here are my query code :
OPTIONAL MATCH
(indicator_1_1:indicator)<-[:REFERS_TO]-(report_1:report)-[:REFERS_TO]->(indicator_1_2:indicator),
(indicator_2_1:indicator)<-[:REFERS_TO]-(report_2:report)-[:REFERS_TO]->(indicator_2_2:indicator)
WHERE
indicator_1_1.pattern=indicator_2_1.pattern
and
indicator_1_2.pattern=indicator_2_2.pattern
MERGE
(report_1)-[:related_to]->(event:EVENT)<-[:related_to]-(report_2)
and get the result as below,
But i want the three report nodes belong to one event node.
I want to know what changes should I make to my query ,or what next step should I take after getting the two event nodes.
What's more , I want to know wheter there is a more efficient query code than mine.
Thanks!
I don't have any data to confirm, but I think a small change to your Cypher query will produce what you want.
From the Neo4j Cypher Manual chapter on MERGE (my emphasis added).
When using MERGE on full patterns, the behavior is that either the
whole pattern matches, or the whole pattern is created. MERGE will
not partially use existing patterns — it’s all or nothing. If
partial matches are needed, this can be accomplished by splitting a
pattern up into multiple MERGE clauses.
So, following this, I think if you change
MERGE (report_1)-[:related_to]->(event:EVENT)<-[:related_to]-(report_2)
to
MERGE (report_1)-[:related_to]->(event:EVENT)
MERGE (event)<-[:related_to]-(report_2)
... you will prevent the extra :EVENT nodes from being created and get the graph you are looking for.
Finally, I find the answer. My solution is merge the :event node ,and then the relaionships
step 1 : merge the :event nodes
MATCH ()-[r_1:related_to]->(event_1:EVENT)<-[r_2:related_to]-()-[r_3:related_to]->(event_2:EVENT)<-[r_4:related_to]-()
call apoc.refactor.mergeNodes([event_1,event_2]) YIELD node
RETURN node
step 2 : merge the dupicate relationships
MATCH (X)-[r]-(Y)
WITH X,Y, TAIL (collect(r)) as rr
FOREACH (r IN rr | DELETE r)
I am trying to build a Watson Conversation for an application. I have created a single intent and it has multiple child dialog nodes. I am having two sibling dialog nodes having same child nodes and the hierarchy would be repeated.
So, is there any way to handle this situation? (I mean to reduce duplicate nodes or to reuse the existing nodes.) Because it repeats the nodes multiple times for each sibling dialog nodes.
Below image is self-explanatory.
When you look at the image below, you see there are two dialog nodes are similar for both siblings nodes(#boolean:yes / #boolean:no).
So, Without creating two similar nodes, how can I create a common node which will be used by both siblings?
Any help, please...
To solve your issue you can use a continue from and point it to the input node prior to where you want to continue on with the tree.
I have large amount of data that consists of users who visit web sites. I have time stamp for each visit. Using the http://jexp.de/blog/2012/10/parallel-batch-inserter-with-neo4j/ script, I created a graph that has a separate path for each page
U1-->T1-->P1
|
--->T2-->P2
etc.
Now, I would want to have the following structure:
U1->T1->P1->T2->P2...
Obviously, each user visits different number of pages. I have the file that looks like this:
person,time,place
U1,t1,P1
U1,t2,P2
U1,t3,P3
U2,t4,P1
U2,t5,P6
each user sequence is ordered by visit time, so t1about me->blog etc.
Is the above structure U1->T1->P1->T2->P2 a good approach? (I have around 30 million entries)
I need to modify the groovy script so that it can automatically add relationships and nodes in the same sequence. I was thinking to keep the previous user id in memory and if new user id=old id, then I will add only relationship and place. Otherwise, I will create a new user and build new path.
I assume that your nodes are labeled U for users, T for timestamps, and P for pages.
You do not need timestamp nodes. You can, instead, put the timestamp value in the relationship between a U and a P. This will greatly reduce the number of nodes and relationships.
For example, instead of this (I am making up the relationship
types):
(:U)-[:VISITED_AT]->(:T {timestamp: 123})-[:PAGE]->(:P)
you can use this, which saves you 1 node and 1 relationship per visit:
(:U)-[:VISITED {timestamp: 123}]->(:P)
What you describe seems reasonable, BUT you could create multiple nodes for the same page (e.g., P1 in your example file, since it appears twice), whereas you really want to have one node per page. Also, if the file were to contain another U1 row after the U2 rows, you'd create a second U1 node. To prevent such duplication, you should use MERGE instead of CREATE for your U and P nodes. MERGE will create a node only if it does not already exist, else it just returns the existing node. Once you have the nodes, you can go ahead and CREATE the relationship (with the timestamp as a property) linking them together.
I need a way to cache searches on my node.js app. I had an idea that uses redis but I am not sure how to implement it.
What I want to do is have a hard limit on how many searches are going to be cached because I have a limited amount of RAM. For each search, I want to store the search query and the corresponding search results.
Lets say that my hard limit on the number of searches cached is 4. Each search query is a box in the following diagram:
If there was a new search that was not cached, the new search gets pushed to the top, and the search query at the bottom gets removed.
But if there was a search that was cached, the cached search query gets removed from its position and added to the top of the cache. For example, if search 3 was searched.
By doing this, I use a relatively same amount of memory while the most searched queries would always float around in the cache and less popular searches would go through the cache and get removed.
My question is, how exactly would I do this? I thought I may have been able to do it with lists, but I am not sure how you can check if a value exists in a list. I also thought I might be able to do this with sorted sets, where I would set the score of the set to be the index, but then if a search query gets moved within the cache, I would need to change the score of every single element in the set.
Simplest for you is to spin up new redis instance just for handling search cache. For this instance you can set max memory as needed. Then you will set maxmemory-policy for this instance to allkeys-lru. By doing this, redis will automatically delete least recently used cache entry (which is what you want). Also, you will be limiting really by memory usage not by max number of cache entries.
To this redis instance you will then insert keys as: search:$seachterm => $cachedvalue and set expire for this key for few minutes for example (so you don't serve stale answers). By doing this redis will do hard work for you.
You definitely want to use a sortedset
here's what you do:
1st query: select the top element from your sortedset: zrevrange(0,1) WITHSCORES
2nd query: in a multi, do:
A. insert your element with the score that you retrieved + 1. If that element exists in the list already it will simply be rescored and not added twice.
B. zremrankbyrank. I haven't tested this, but I think the parameters you want are (0,-maxListSize)
Take a look at ZREMRANGEBYRANK . You can limit the amount of data in your sorted set to a given size.
http://redis.io/commands/zremrangebyrank