Neo4j : How to create a unique node instead of set of nodes

Neo4j : How to create a unique node instead of set of nodes - node.js

I want to create a new node (event nodes) among a set of nodes (report nodes) according to the indicator nodes (each report node has several indicator nodes related to it). I want to set the new event nodes with the rules:
a report nodes is only connected one event node
if more than one indicator nodes has the same property "pattern", then they belong to the same event node
here are my query code :
OPTIONAL MATCH
(indicator_1_1:indicator)<-[:REFERS_TO]-(report_1:report)-[:REFERS_TO]->(indicator_1_2:indicator),
(indicator_2_1:indicator)<-[:REFERS_TO]-(report_2:report)-[:REFERS_TO]->(indicator_2_2:indicator)
WHERE
indicator_1_1.pattern=indicator_2_1.pattern
and
indicator_1_2.pattern=indicator_2_2.pattern
MERGE
(report_1)-[:related_to]->(event:EVENT)<-[:related_to]-(report_2)
and get the result as below,
But i want the three report nodes belong to one event node.
I want to know what changes should I make to my query ,or what next step should I take after getting the two event nodes.
What's more , I want to know wheter there is a more efficient query code than mine.
Thanks!

I don't have any data to confirm, but I think a small change to your Cypher query will produce what you want.
From the Neo4j Cypher Manual chapter on MERGE (my emphasis added).
When using MERGE on full patterns, the behavior is that either the
whole pattern matches, or the whole pattern is created. MERGE will
not partially use existing patterns — it’s all or nothing. If
partial matches are needed, this can be accomplished by splitting a
pattern up into multiple MERGE clauses.
So, following this, I think if you change
MERGE (report_1)-[:related_to]->(event:EVENT)<-[:related_to]-(report_2)
to
MERGE (report_1)-[:related_to]->(event:EVENT)
MERGE (event)<-[:related_to]-(report_2)
... you will prevent the extra :EVENT nodes from being created and get the graph you are looking for.

Finally, I find the answer. My solution is merge the :event node ,and then the relaionships
step 1 : merge the :event nodes
MATCH ()-[r_1:related_to]->(event_1:EVENT)<-[r_2:related_to]-()-[r_3:related_to]->(event_2:EVENT)<-[r_4:related_to]-()
call apoc.refactor.mergeNodes([event_1,event_2]) YIELD node
RETURN node
step 2 : merge the dupicate relationships
MATCH (X)-[r]-(Y)
WITH X,Y, TAIL (collect(r)) as rr
FOREACH (r IN rr | DELETE r)

Related

Selecting from multiple tables in Cassandra CQL

So I have two tables in the query I am using:
SELECT
R.dst_ap, B.name
FROM airports as A, airports as B, routes as R
WHERE R.src_ap = A.iata
AND R.dst_ap = B.iata;
However it is throwing the error:
mismatched input 'as' expecting EOF (..., B.name FROM airports [as] A...)
Is there anyway I can do what I am attempting to do (which is how it works relationally) in Cassandra CQL?

The short answer, is that there are no joins in Cassandra. Period. So using SQL-based JOIN syntax will yield an error similar to what you posted above.
The idea with Cassandra (or any distributed database) is to ensure that your queries can be served by a single node (cutting down on network time). There really isn't a way to guarantee that data from different tables could be queried from a single node. For this reason, distributed joins are typically seen as an anti-pattern. To that end, Cassandra simply doesn't allow them.
In Cassandra you need to take a query-based modeling approach. So you could solve this by building a table from your post-join result set, consisting of desired combinations of dst_ap and name. You would have to find an appropriate way to partition this table, but ultimately you would want to build it based on A) the result set you expect to see and B) the properties you expect to filter on in your WHERE clause.

kademlia closest good nodes won't intersect enough between two requests

working on bep44 implementation, i use the defined kademlia algorithm to find the closest good node given an hash id.
Using my program i do go run main.go -put "Hello World!" -kname mykey -salt foobar2 -b public and get the value stored over a hundred nodes (good).
Now, when i run it multiple consecutive times, the sets of ip which are written by the put requests poorly intersects.
It is a problem as when i try to do a get request, the set of ips queried does not intersect with the put set, so the value is not found.
In my tests i use the public dht bootstrap node
"router.utorrent.com:6881",
"router.bittorrent.com:6881",
"dht.transmissionbt.com:6881",
When i query the nodes, I select the 8 closest nodes (nodes := s.ClosestGoodNodes(8, msg.InfoHash())), which usually end up in a list of ~1K queries after a recursive traversal.
In my understanding, storing addresses of the info hash in the dht table is deterministic given the status of the table. As i m doing consecutive queries i expect the table to change, indeed, but not that much.
How does it happen the store nodes set does not intersect ?

Since BEP44 is an extension it is only supported by a subset of the DHT nodes, which means the iterative lookup mechanism needs to take support into account when determining whether the set of closest nodes is stable and the lookup can be terminated.
If a node returns a token, v or seq field in in a get response then it is eligible for the closest-set of a read-only get.
If a node returns a token then it is eligible for the closest-set for a get that will be followed by put operation.
So your lookup may home in on a set of nodes in the keyspace that is closest to the target ID but not eligible for the operations in question. As long as you have candidates that are closer than the best known eligible contacts you have to continue searching. I call this perimeter widening, as it conceptually broadens the search area around the target.
Additionally you also need to take error responses or the absence of a response into account when performing put requests. You can either retry the node or try the next eligible node instead.
I have written down some additional constraints that one might want to put on the closest set in lookups for robustness and security reasons in the documentation of my own DHT implementation.
which usually end up in a list of ~1K queries after a recursive traversal.
This suggests something is wrong with your lookup algorithm. In my experience a lookup should only take somewhere between 60 and 200 udp requests to find its target if you're doing a lookup with concurrent requests, maybe even fewer when it is sequential.
Verbose logs of the terminal sets to eyeball how the lookups make progress and how much junk I am getting from peers have served me well.
In my tests i use the public dht bootstrap node
You should write your routing table to disk and reload it from there and only perform bootstrapping when none of the persisted nodes in your RT are reachable. Otherwise you are wasting the bootstrap nodes' resources and also waste time by having to re-populate your routing table first before performing any lookup.

Get Rid with Duplicate Dialog Nodes Watson Conversation

I am trying to build a Watson Conversation for an application. I have created a single intent and it has multiple child dialog nodes. I am having two sibling dialog nodes having same child nodes and the hierarchy would be repeated.
So, is there any way to handle this situation? (I mean to reduce duplicate nodes or to reuse the existing nodes.) Because it repeats the nodes multiple times for each sibling dialog nodes.
Below image is self-explanatory.
When you look at the image below, you see there are two dialog nodes are similar for both siblings nodes(#boolean:yes / #boolean:no).
So, Without creating two similar nodes, how can I create a common node which will be used by both siblings?
Any help, please...

To solve your issue you can use a continue from and point it to the input node prior to where you want to continue on with the tree.

neo4j and groovy: automatic loading of paths of variable length

I have large amount of data that consists of users who visit web sites. I have time stamp for each visit. Using the http://jexp.de/blog/2012/10/parallel-batch-inserter-with-neo4j/ script, I created a graph that has a separate path for each page
U1-->T1-->P1
|
--->T2-->P2
etc.
Now, I would want to have the following structure:
U1->T1->P1->T2->P2...
Obviously, each user visits different number of pages. I have the file that looks like this:
person,time,place
U1,t1,P1
U1,t2,P2
U1,t3,P3
U2,t4,P1
U2,t5,P6
each user sequence is ordered by visit time, so t1about me->blog etc.
Is the above structure U1->T1->P1->T2->P2 a good approach? (I have around 30 million entries)
I need to modify the groovy script so that it can automatically add relationships and nodes in the same sequence. I was thinking to keep the previous user id in memory and if new user id=old id, then I will add only relationship and place. Otherwise, I will create a new user and build new path.

I assume that your nodes are labeled U for users, T for timestamps, and P for pages.
You do not need timestamp nodes. You can, instead, put the timestamp value in the relationship between a U and a P. This will greatly reduce the number of nodes and relationships.
For example, instead of this (I am making up the relationship
types):
(:U)-[:VISITED_AT]->(:T {timestamp: 123})-[:PAGE]->(:P)
you can use this, which saves you 1 node and 1 relationship per visit:
(:U)-[:VISITED {timestamp: 123}]->(:P)
What you describe seems reasonable, BUT you could create multiple nodes for the same page (e.g., P1 in your example file, since it appears twice), whereas you really want to have one node per page. Also, if the file were to contain another U1 row after the U2 rows, you'd create a second U1 node. To prevent such duplication, you should use MERGE instead of CREATE for your U and P nodes. MERGE will create a node only if it does not already exist, else it just returns the existing node. Once you have the nodes, you can go ahead and CREATE the relationship (with the timestamp as a property) linking them together.

Routing table creation at a node in a Pastry P2P network

This question is about the routing table creation at a node in a p2p network based on Pastry.
I'm trying to simulate this scheme of routing table creation in a single JVM. I can't seem to understand how these routing tables are created from the point of joining of the first node.
I have N independent nodes each with a 160 bit nodeId generated as a SHA-1 hash and a function to determine the proximity between these nodes. Lets say the 1st node starts the ring and joins it. The protocol says that this node should have had its routing tables set up at this time. But I do not have any other nodes in the ring at this point, so how does it even begin to create its routing tables?
When the 2nd node wishes to join the ring, it sends a Join message(containing its nodeID) to the 1st node, which it passes around in hops to the closest available neighbor for this 2nd node, already existing in the ring. These hops contribute to the creation of routing table entries for this new 2nd node. Again, in the absence of sufficient number of nodes, how do all these entries get created?
I'm just beginning to take a look at the FreePastry implementation to get these answers, but it doesn't seem very apparent at the moment. If anyone could provide some pointers here, that'd be of great help too.

My understanding of Pastry is not complete, by any stretch of the imagination, but it was enough to build a more-or-less working version of the algorithm. Which is to say, as far as I can tell, my implementation functions properly.
To answer your first question:
The protocol says that this [first] node should have had its routing tables
set up at this time. But I do not have any other nodes in the ring at
this point, so how does it even begin to create its routing tables?
I solved this problem by first creating the Node and its state/routing tables. The routing tables, when you think about it, are just information about the other nodes in the network. Because this is the only node in the network, the routing tables are empty. I assume you have some way of creating empty routing tables?
To answer your second question:
When the 2nd node wishes to join the ring, it sends a Join
message(containing its nodeID) to the 1st node, which it passes around
in hops to the closest available neighbor for this 2nd node, already
existing in the ring. These hops contribute to the creation of routing
table entries for this new 2nd node. Again, in the absence of
sufficient number of nodes, how do all these entries get created?
You should take another look at the paper (PDF warning!) that describes Pastry; it does a rather good job of explain the process for nodes joining and exiting the cluster.
If memory serves, the second node sends a message that not only contains its node ID, but actually uses its node ID as the message's key. The message is routed like any other message in the network, which ensures that it quickly winds up at the node whose ID is closest to the ID of the newly joined node. Every node that the message passes through sends their state tables to the newly joined node, which it uses to populate its state tables. The paper explains some in-depth logic that takes the origin of the information into consideration when using it to populate the state tables in a way that, I believe, is intended to reduce the computational cost, but in my implementation, I ignored that, as it would have been more expensive to implement, not less.
To answer your question specifically, however: the second node will send a Join message to the first node. The first node will send its state tables (empty) to the second node. The second node will add the sender of the state tables (the first node) to its state tables, then add the appropriate nodes in the received state tables to its own state tables (no nodes, in this case). The first node would forward the message on to a node whose ID is closer to that of the second node's, but no such node exists, so the message is considered "delivered", and both nodes are considered to be participating in the network at this time.
Should a third node join and route a Join message to the second node, the second node would send the third node its state tables. Then, assuming the third node's ID is closer to the first node's, the second node would forward the message to the first node, who would send the third node its state tables. The third node would build its state tables out of these received state tables, and at that point it is considered to be participating in the network.
Hope that helps.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string