How to merge nodes that represent synonyms? - nlp

Suppose I have a graph like this:
Cytoscape graph
The nodes 'lipid nanoparticle' and 'LNP' are synonyms. How can I identify nodes that represent synonyms and merge them in Cytoscape?
I searched Cytoscape App Store, but didn't manage to find a suitable app.

Sorry -- I don't know of an app that would automatically detect synonyms and merge them. You can certainly merge nodes using groups, but you would have to do that manually by selecting the two nodes, grouping them, and then collapsing them. All of the edges for the two nodes would then connect (as meta-edges) to the group node.
-- scooter

Related

With ArangoDB what is the practical difference between 1 named graph with x edge definitions vs x named graphs with 1 edge definition?

Is the difference only logical / housekeeping related?
I did read this question but the answer there only deals with 1 edge definition vs multiple edge definitions within a graph which is now already covered in the documentation. So I'm curious.
I have used Arango for 6 years and don't use Graph objects, all my queries are just AQL queries, which means you don't need to use a Graph to use the benefits of graph databases and to perform traversals.
The way I think of a 'Graph' in Arango is that it's a limited / curated view of your collections that is query-able, but also is helpful if you want it to manage some level of integrity on deletes.
Overall, it slows down a Traversal, so I find it better to avoid them. A key driver for my decision is that I don't need views, and I don't need the system to handle the deletion of edges if I delete a connected vertex, but that's just my use case.

Organizing a graph by node types: What is the most efficient way to collect nodes by type?

Problem
I'm creating a graph with many nodes where each node can represent one of many types of things. 5 cat nodes, 30 dog nodes, 40 people nodes, etc...
I'm currently assigning a 'node_type' attribute to each node and filtering over all nodes to collect nodes of a given type. I have to do this operation a lot and it feels very expensive and wrong. I've looked into bipartite graphs but they only support two groupings of nodes.
Question
What is the best, most efficient way to organize and collect nodes of a given type?
I've looked at similar questions but this use case seems too common for a full graph search to be the correct answer.
Instead of assigning each node a value and having to iterate over every single one when using them, why not just have lists for each node type?

question about data model design in arangodb

Update 2:
The original question is too long, a simple way:
In The City Graph, how to query the city that can be reached directly from Berlin by germanHighway. I don't want the internationalHighway.
Original Question:
I now use ArangoDB to store a graph. I have one question for the data model design.
Use the knows_graph for example, social_graph
In my original opition, I think I will design two collections, the Document collection is person, and the Edge collection is marriedWith or friendWith.
But when I want to query the person who marriedWith someone, I can't filter the unwanted friendWith edges.(I'm not very familiar with the AQL, maybe this is not true).
In contrast to the examples in AQL Documents, it used to define a more common edge collection, for example, relation in social_graph, and define the more specific type in attribute. for example, "type":"married" as an attribute of a relation.
and thus in AQL, I can use FILTER p.edges[0].type== 'married' to filter the unwanted relation.
My question is:
Which method of data model design is better, or any suggestions for this?
Now I think, put married as a type of a person, may be more flexible, easy to extend to student, neighbour... with one relation Edge collection.
Otherwise, many Edge collections, isStudent, neighbourWith... shoud be created.
Can AQL could filter nodes by edge type but not attributes? Maybe looks like:
FILTER 'isStudent' edge
Update:
I just tried, one relation can only used for two node type.
For example, one isFriend edge is used for person and dog nodes, then you can't use isFriend edge for dog and cat!
so many edges is must needed.
For the original question:
If you have a finite, well defined, number of edges, then using multiple edge collections is fine specially if you expect to have a large number of edge of each type. If in the other hand, you foresee having to a large number of relationship types (friend , best friend, wife, etc) and the number of relationships of each type is not huge, then a single edge collection with a type indicator is fine and may simplify things.
The only two ways I can think of filtering edges from a traversal are:
IS_SAME_COLLECTION function. This will tell you if a document is of particular type. Keep an eye on performance if you use this in a big dataset though
Adding a type attribute in each edge collection that indicates what type of collection this is. Yes, it is basically a static field and is a bit of a waste of space but it works and space is cheap nowadays
Use anonymous graph traversals where you can define which edges to use explicitly
Having said that, Arango is a multi-model DB, and as such you could just ignore the traversal syntax, and just join the tables that you need, which would work just fine as well. It is the great thing about multi-model DBs, you use them in any way you need them.
In terms of your last update, you could check the edge collection by doing something like:
FILTER IS_SAME_COLLECTION('internationalHighway', e._id) == false
I think the way to design the data model depends on your business, If your model is more or less stable, and without many edges, you can select the many edges way, the edges is a finite set.
But I don't know how to filter by edge names :-)
otherwise, I think less edge and more attribute will be good.

ArangoDB graph options in modeling a social network

We are building an app that is partly a social network, on top of ArangoDB. We are at the point where we need to decide how to construct our graphs, and we have some questions but we could not find something relevant in the docs.
We will be creating some relationships between the users. There will be
Friend request edges
Friend edges
Close friend edges
Block edges
Mute edges etc
As a first option, we have considered using the SmartGraph functionality, however we will not know in advance the users’ locations, and even if we did the user might relocate and since their location will be part of the shard key, it will be immutable (to our current understanding).
The second option is to create a separate named graph for each edge: friend request graph, friend graph, etc
The third option is to create a bigger named graph containing all the relationships (edges) and if we need a particular subset of this graph, to use anonymous graphs. However we cannot find any performance data comparing small graphs with large graphs.
Given that we cannot create multiple graphs with the same edges, we have to decide a priori which solution is the most performant and stick to it, since a possible change will result in changing all AQL queries (something we want to avoid when we are near release).
Which option would be the recommended one?

What is the difference between the AQL EDGES, NEIGHBORS, etc. and GRAPH_EDGES, GRAPH_NEIGHBORS

In ArangoDB, there seem to be two set of functions for working with graphs. One one side you have EDGES, NEIGHBORS, TRAVERSAL, SHORTEST_PATH and more (https://docs.arangodb.com/Aql/GraphFunctions.html).
OTOH there are the graph operations (https://docs.arangodb.com/Aql/GraphOperations.html) that seems to have the same functions prefixed by GRAPH and with some different parameters, such as GRAPH_EDGES, GRAPH_NEIGHBORS, GRAPH_TRAVERSAL, GRAPH_SHORTEST_PATH.
What is the difference between these. Are they used in different scenarios? Are there performance differences, etc?
There is no general recommendation which to choose over the other - it depends on your requirements.
The EDGES functions may work on collections that are not managed by the graph module, and thus may not be visible in the graph viewer (but you may use them on collections that are also managed). It however has lesser overhead by not doing graph management.
The GRAPH_EDGES family is the more recent implementation. It only works on managed graphs that you can also browse in the graph viewer. As you already noted, the later have many more options to i.e. filter the graphs by examples etc.
With ArangoDB 3 the GRAPH_* family of functions was removed. We explain in this cookbook how their functionality can be achieved with AQL in ArangoDB 3.

Resources