I am using networkx package to analyse IMDb data to compute centrality(closeness and betweenness). Problem is, the graph has two types of nodes - namely, actors and movies. I want to calculate the centrality with respect to only the actors and not the graph overall.
The code -
T = nx.Graph()
T.add_nodes_from(demo_df.primaryName,bipartite=1)
T.add_nodes_from(demo_df.primaryTitle,bipartite=0)
T = nx.from_pandas_edgelist(demo_df,'primaryName','primaryTitle')
nx.closeness_centrality(T)
nx.betweenness_centrality(T)
I don't want it to calculate/display the betweenness and closeness of the movies(Wings of Desire, Dopey Dicks, Studio Stoops). I want it to be calculated only for the actors.
For bipartite graphs, you have the networkx.algorithms.bipartite.centrality counterpart. For instance for the closeness_centrality the result will be a dictionary keyed by node with bipartite degree centrality as the value. In the nodes argument specify the nodes in one bipartite node set:
from networkx.algorithms import bipartite
part0_nodes, part1_nodes = bipartite.sets(T)
cs_partition0 = bipartite.centrality.closeness_centrality(T, part0_nodes)
For disconnected graphs, you may try obtaining the nodes from a given partition with:
partition = nx.get_node_attributes(T, 'bipartite')
part0_nodes = [node for node, p in partition.items() if p==0]
Note that the returned dictionary will still contain all nodes even though you've specified the nodes from one partition in nodes. So you can just keep those in just one set using part0_nodes. This is mentioned in the notes section:
The nodes input parameter must contain all nodes in one bipartite node set,
but the dictionary returned contains all nodes from both bipartite node
sets. See :mod:bipartite documentation <networkx.algorithms.bipartite>
for further details on how bipartite graphs are handled in NetworkX.
Given an arbitrary directed graph DG in python, is it possible to elegantly make it an undirected one? (This is all in terms of the networkx library).
I was trying to compute some statistics like average clustering, number of triangles etc. However these all are defined for the undirected graphs, so I was wondering if it is trivial to convert the directed graph into an undirected one.
You can use:
H = G.to_undirected()
See the Networkx documentation.
In ArangoDB if I have a vertex with multiple classes of edges that are not all contained in a single graph, is there any way to safely delete that vertex with a single command? What is the recommended way to delete such a vertex?
Rephrasing the question: assume I have a document collection D1 and two edge collections, E1 and E2. I create a graph that contains D1 and E1. If I delete a vertex from D1 it will delete the document and the edges connected to D1 of type E1 but not of type E2. Is there a way to delete a vertex and ALL the connecting edges if all those edges are not included within the graph?
More generally, I've read a few related questions but I am confused about when I should be combining multiple classes of edges into a single graph: in particular, the advice here to set up graphs in a way that aligns with the intended graph queries seems contradictory to my perception that I need to combine all types of edges into one graph in order to ensure safe deletion of vertex documents.
when using the graphs API to delete vertices the following is guaranteed:
If you remove a vertex from a graph, all it's connected edges in this graph are removed as well.
if you remove a vertex all it's connected edges in all other graphs are removed as well.
EdgeCollections now included in any graphs definition are NOT modified in any way.
so for consistent removing of vertices it is sufficient to reference all edge collections in at least one graph and use the graph api to delete vertices.
I am trying to use AgglomerativeClustering from scikit-learn to cluster points on a place. Points are defined by coordinates (X,Y) stored in _XY.
Cluster are limited to a few neighbours through the connectivity matrix defined by
C = kneighbors_graph(_XY, n_neighbors = 20).
I want some points not be part of the same cluster, even if they are neighbours, so I modified the connectivity matrix to put 0 between these points.
The algorithm runs smoothly but, at the end, some clusters contain points that should not be together, i.e. some couple for which I imposed _C = 0.
From the children, I can see that the problem arises when a cluster of two points (i, j) is already formed and that k joins (i,j) even if _C[i,k]=0.
So I was wondering how the connectivity constraint is propagated when the size of some clusters is larger than 2, _C being not defined in that case.
Thanks !
So what seems to be happening in your case is that despite your active disconnection of point you do not want to have in one cluster, these points are still part of the same connected component and the data associated to them still imply that they should be connected to the same cluster from a certain level up.
In general, AgglomerativeClustering works as follows: At the beginning, all data points are separate clusters. Then, at each iteration, two adjacent clusters are merged, such that the overall increase in discrepancy with the original data is minimal if we compare the original data with cluster means in L2 distance.
Hence, although you sever the direct link between two nodes, they can be clustered together one level higher by an intermediate node.
Academically speaking, what's the essential difference between the data structure Tree and Graph? And how about the tree based search and Graph based search?
A Tree is just a restricted form of a Graph.
Trees have direction (parent / child relationships) and don't contain cycles.
They fit with in the category of Directed Acyclic Graphs (or a DAG).
So Trees are DAGs with the restriction that a child can only have one parent.
One thing that is important to point out, Trees aren't a recursive data structure.
They can not be implemented as a recursive data structure because of the above restrictions. But any DAG implementation, which are generally not recursive, can also be used.
My preferred Tree implementation is a centralized map representation and is non recursive.
Graphs are generally searched breadth first or depth first. The same applies to Tree.
Instead of explaining I prefer to show it in pictures.
A tree in real time
A graph in real life use
Yes a map can be visualised as a graph data structure.
Seeing them like this makes life easier. Trees are used at places where we know that each node has only one parent. But graphs can have multiple predecessors(term parent is generally not used for graphs).
In real world, you can represent almost anything using graphs. I used a map, for example. If you consider each city as a node, it can be reached from multiple points. The points which lead to this node are called predecessors and the points which this node will lead to are called successors.
electrical circuit diagram, the plan of a house, computer network or a river system are few more examples of graphs. Many real world examples can be considered as graphs.
Technical diagram could be like this
Tree :
Graph :
Make sure to refer to below links. Those will answer almost all your questions on trees and graphs.
References :
http://www.introprogramming.info/english-intro-csharp-book/read-online/chapter-17-trees-and-graphs/#_Toc362296541
http://www.community-of-knowledge.de/beitrag/data-trees-as-a-means-of-presenting-complex-data-analysis/
Wikipedia
The other answers are useful, but they're missing the properties of each:
Graph
Undirected graph, image source: Wikipedia
Directed graph, image source: Wikipedia
Consists of a set of vertices (or nodes) and a set of edges connecting some or all of them
Any edge can connect any two vertices that aren't already connected by an identical edge (in the same direction, in the case of a directed graph)
Doesn't have to be connected (the edges don't have to connect all vertices together): a single graph can consist of a few disconnected sets of vertices
Could be directed or undirected (which would apply to all edges in the graph)
As per Wikipedia:
For example, if the vertices represent people at a party, and there is an edge between two people if they shake hands, then this graph is undirected because any person A can shake hands with a person B only if B also shakes hands with A. In contrast, if any edge from a person A to a person B corresponds to A admiring B, then this graph is directed, because admiration is not necessarily reciprocated.
Tree
Image source: Wikipedia
A type of graph
Vertices are more commonly called "nodes"
Edges are directed and represent an "is child of" (or "is parent of") relationship
Each node (except the root node) has exactly one parent (and zero or more children)
Has exactly one "root" node (if the tree has at least one node), which is a node without a parent
Has to be connected
Is acyclic, meaning it has no cycles: "a cycle is a path [AKA sequence] of edges and vertices wherein a vertex is reachable from itself"
There is some overlap in the above properties. Specifically, the last two properties are implied by the rest of the properties. But all of them are worth noting nonetheless.
TREE :
1. Only one path exist between two vertices (Nodes).
2. Root node is the starting node of the tree.
3. Tree doesn't have loops.
4. Number of edges: n-1 (where n is number of nodes)
5. Tree looks like Hierarchical
6. All trees are graph.
GRAPH :
1. More than one path is allowed between two vertices.
2. There is no root node concept (we can start from any node).
3. There can be loop in graph.
4. Number of edges are not defined.
5. Graph looks like Network.
6. All graphs are not tree.
More detailed explanation you can find in this video -> https://www.youtube.com/watch?v=KVHrjVTp9_w
Tree is special form of graph i.e. minimally connected graph and having only one path between any two vertices.
In graph there can be more than one path i.e. graph can have uni-directional or bi-directional paths (edges) between nodes
Also you can see more details:
http://freefeast.info/difference-between/difference-between-trees-and-graphs-trees-vs-graphs/
Tree is basically undirected graph which not contain cycle,so we can say that tree is more restricted form of graph.
However tree and graph have different application to implement various algorithm in programming.
For example graph can be used for model road map and tree can be used for implement any hierarchical data structure.
Simple concept is Tree doesn't have cycle formation and its unidirectional whereas Graph forms cycle and it will be Bidirectional in some cases and Unidirectional in another.
A tree is a digraph such that:
a) with edge directions removed, it is connected and acyclic
You can remove either the assumption that it is acyclic
If it is finite, you can alternatively remove the assumption that it is connected
b) every vertex but one, the root, has indegree 1
c) the root has indegree 0
If there are only finitely many nodes, you can remove either the assumption that the root has indegree 0 or the assumption that the
nodes other than the root have degree 1
Reference: http://www.cs.cornell.edu/courses/cs2800/2016sp/lectures/lec27-29-graphtheory.pdf
Trees are obvious: they're recursive data structures consisting of nodes with children.
Map (aka dictionary) are key/value pairs. Give a map a key and it will return the associated value.
Maps can be implemented using trees, I hope you don't find that confusing.
UPDATE: Confusing "graph" for "map" is very confusing.
Graphs are more complex than trees. Trees imply recursive parent/child relationships. There are natural ways to traverse a tree: depth-first, breadth-first, level-order, etc.
Graphs can have uni-directional or bi-directional paths between nodes, be cyclic or acyclic, etc. I would consider graphs to be more complex.
I think a cursory search in any decent data structures text (e.g. "Algorithms Design Manual") would give more and better information than any number of SO answers. I would recommend that you not take the passive route and start doing some research for yourself.
one root node in tree and only one parent for one child. However, there is no concept of root node. Another difference is, tree is hierarchical model but graph is network model.
In tree, each node (except the root node) has exactly one predecessor node and one or two successor nodes. It can be traversed by using In-order, Pre-order, Post-order, and Breadth First traversals​. Tree is a special kind of graph that has no cycle so that is known as DAG (Directed Acyclic Graph). Tree is a hierarchical model.
In graph, each node has one or more predecessor nodes and successor nodes. The graph is traversed by using Depth First Search (DFS) and Breadth First Search (BFS) algorithms. Graph has cycle so it is more complex than tree. Graph is a network model. There are two kinds of graph: directed graphs and undirected graphs.
In mathematics, a graph is a representation of a set of objects where some pairs of the objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges.[1] Typically, a graph is depicted in diagrammatic form as a set of dots for the vertices, joined by lines or curves for the edges. Graphs are one of the objects of study in discrete mathematics.