graph_edit_distance of two graphs using networkx - python-3.x

I have two non-Isomorphic graphs:
MultiDiGraph with 779 nodes and 20 edges , MultiDiGraph with 146 nodes and 28 edge
n = nx.graph_edit_distance(g, h, timeout=10)
The above code gives the output None. What does this mean?
According to me, we cannot calculate the edit distance between these two graphs as the difference in number of nodes is large. What I think is, to find the Graph Edit Distance of these two graphs using the Edge transformation as it has many nodes but less difference in the number of edges.
So, in order to use edge transformation we have a function:
edge_subst_cost(G1[u1][v1], G2[u2][v2]), edge_del_cost(G1[u1][v1]), edge_ins_cost(G2[u2][v2])
My question is how to supply the parameters G1 and G2 in the edge_subst_cost() ?

Related

Networkx (or Graphviz) layout with fixed y positions

Are there any layout algorithms in networkx (or that I can call in Graphviz) that allow me to fix the Y-position of nodes in a DAG to a potentially different floating point value for each node, but spread out the X positions in some reasonable way (ideally attempting to minimise edge lengths or crossovers, although I suspect this might not be possible)? I can only find layouts that require nodes to be on discrete layers.
Added: Below is an example of the sort of graph topology I have, plotted using nx.kamada_kawai_layout. The thing is that these nodes have a "time" value (not shown here), which I want to plot on the Y axis. The vertices are directed in time, so that a parent node (e.g. 54 here) is always older than its children (here 52 and 53). So I want to lay this out with the Y position given by the node "time", and the X position such that crossings are minimised, in as much as that's possible (I know this is NP hard in general, but the layout below is actually doing a pretty good job.
p.s. usually all the leaf nodes, e.g. 2, 3, 7 here, are at time 0, so should be laid out at the bottom of the final layout.
p.p.s. Essentially what I would like to do is to imagine this as a spring diagram, "pick up" the root node (54) in the plot above and place it at the top of the page, with the topology dangling down, then adjust the Y-position of the children to the their internal "time" values.
Edit 2. Thanks to #sroush below, I can get a decent layout with the dot graphviz engine:
A = nx.nx_agraph.to_agraph(G)
fig = plt.figure(1, figsize=(10, 10))
A.add_subgraph(ts.samples(), level="same", name="cluster")
A.layout(prog="dot")
pos = {n: [float(x) for x in A.get_node(n).attr["pos"].split(",")] for n in G.nodes()}
nx.draw_networkx(G, pos, with_labels=True)
But I then want to reposition the nodes slightly so instead of ranked times (the numbers) they use their actual, floating point times. Like this:
true_times = nx.get_node_attributes(G, 'time')
reposition = {node_id: np.array([pos[node_id][0], true_times[node_id]]) for node_id in true_times}
nx.draw_networkx(G, reposition, with_labels=True)
As you can see, that squashed the nodes together rather a lot. Is there any way to increase the horizontal positions of those nodes to make them not bump into one-another? I could perhaps cluster some on to the same layer and iterate, but that seems quite expensive.
The Graphviz dot engine can get you pretty close. This is usually described as a "timeline" issue. Here is a graph that is part of the Graphviz source that seems to do what you want: https://www.flickr.com/photos/kentbye/1155560169

Compute maximum edge length for subdividing mesh

I have two triangulated meshes m and m1 with the following properties:
m_faces: 16640
m_points: 49920
m_surface_area: 178.82696989147524
m1_faces: 8
m1_points: 24
m1_surface_area: 1.440205667851934
Now I would like to subdivide m1 so that it has approx. the same faces number as m, i.e. 16640. I am using the vtk library and more specifically the vtkAdaptiveSubdivisionFilter() function which according to the description:
…is a filter that subdivides triangles based on maximum edge length and/or triangle area.
My question is how to compute the maximum edge length, according to some trial and error I found that this needs to be a value between [0.0188-0.0265] which gives me 16384 faces. However, I couldn’t find any formulation that gives a number in this ratio range and is consistent on different cases. Any idea how to calculate this maximum edge length each time?
On another example I have the following two meshes:
Sph1_faces: 390
Sph1_points: 1170
Sph1_surface_area: 1.9251713393584104
Sph2_faces: 1722
Sph2_points: 5166
Sph2_suface_area: 10.59400389764954
And for getting Sph1 number of faces close to Sph2 number of faces the maximum edge length should be between [0.089-0.09] which gives me 1730 faces for Sph1.
I've tried to use the equilateral triangle area formulation making the corresponding assumption and then solving for side and dividing by number of faces or points but it didn't seem to work. Thus, any other idea would be appreciated.
Thanks.

Find all the planar surfaces in an rgbd image using depth and normal data

Many questions deal with generating normal from depth or depth from normal, but I want to ask about a simple way to generate all the planar surfaces given the depth and normal of an image.
I already have depth and normal of each pixel in the image. For each pixel (ui, vi), assume that we can get its 3D coordinates (xi, yi, zi) with zi as the depth and normal vector (nix, niy, niz). Thus, a unique tangent plane is defined by: nix(x - xi) + niy(y - yi) + niz(z - zi) = 0. Then, for each pixel we can define a unique planar surface by the above equation.
What is a common practice in finding the function f such that f(u, v) = (x, y, z) (from pixel to 3D coordinates)? Is pinhole model (plus the depth data) an effective and accurate one?
How does one generate all the planar surfaces effectively? One way is to iterate through all the pixels in the image and find all the planes, but this seems like an ineffective method.
If its pinhole model
make sure your 3D data is not distorted by projection.
group your points by normal
this is easy or hard depending on the points/normal accuracy. Simply sort the points by normals which leads to O(n.log(n)) where n is number of points.
test/group by planes in single normal group
The idea is to pick 3 points from a group compute plane from it and test which points of the group belongs to it. If too low count you got wrong points picked (not belonging to the same plane) and need to pick different ones. Also if the picked points are too close to each or on the same line you can not get correct plane from it.
The math function for plane is:
x*nx + y*ny + z*nz + d = 0
where (nx,ny,nz) is your normal of the group (unit vector) and (x,y,z) is your point position. So you just compute d from a known point (one of the picked ones (x0,y0,z0) ) ...
d = -x0*nx -y0*ny -z0*nz
and then just test which points are sattisfying this condition:
threshod=1e-20; // just accuracy margin
fabs(x*nx + y*ny + z*nz + d) <= threshod
now remove matched points from the group (move them into found plane object) and apply this bullet again on the remaining points until they count is low or no valid plane is found...
then test another group until no groups are left...
I think RANSAC can speed things up to avoid brute force in this case but never used it myself so google ...
A possible approach for the planes is to consider the set of normal vectors and perform clustering on them (for instance by k-means). Then every cluster can correspond to several parallel surfaces. By evaluating the distance from the origin (a scalar function), you can form sub-clusters which will separate those surfaces. Finally, points at constant distance can belong to different coplanar patches, which you can separate by connected component labelling.
It is likely that clustering on the normal vectors and distance simultaneously (hence in a 4D space) will yield better results and be simpler. Be sure to normalize the vectors. Another option is to represent the vectors by just two parameters (such as spherical angles), but this will lead to a quite non-uniform mapping, and create phase wrapping issues.

Python: How to find the MAXIMUM spanning tree of a graph [duplicate]

Does the opposite of Kruskal's algorithm for minimum spanning tree work for it? I mean, choosing the max weight (edge) every step?
Any other idea to find maximum spanning tree?
Yes, it does.
One method for computing the maximum weight spanning tree of a network G –
due to Kruskal – can be summarized as follows.
Sort the edges of G into decreasing order by weight. Let T be the set of edges comprising the maximum weight spanning tree. Set T = ∅.
Add the first edge to T.
Add the next edge to T if and only if it does not form a cycle in T. If
there are no remaining edges exit and report G to be disconnected.
If T has n−1 edges (where n is the number of vertices in G) stop and
output T . Otherwise go to step 3.
Source: https://web.archive.org/web/20141114045919/http://www.stats.ox.ac.uk/~konis/Rcourse/exercise1.pdf.
From Maximum Spanning Tree at Wolfram MathWorld:
"A maximum spanning tree is a spanning tree of a weighted graph having maximum weight. It can be computed by negating the weights for each edge and applying Kruskal's algorithm (Pemmaraju and Skiena, 2003, p. 336)."
If you invert the weight on every edge and minimize, do you get the maximum spanning tree? If that works you can use the same algorithm. Zero weights will be a problem, of course.
Although this thread is too old, I have another approach for finding the maximum spanning tree (MST) in a graph G=(V,E)
We can apply some sort Prim's algorithm for finding the MST. For that I have to define Cut Property for the maximum weighted edge.
Cut property: Let say at any point we have a set S which contains the vertices that are in MST( for now assume it is calculated somehow ). Now consider the set S/V ( vertices not in MST ):
Claim: The edge from S to S/V which has the maximum weight will always be in every MST.
Proof: Let's say that at a point when we are adding the vertices to our set S the maximum weighted edge from S to S/V is e=(u,v) where u is in S and v is in S/V. Now consider an MST which does not contain e. Add the edge e to the MST. It will create a cycle in the original MST. Traverse the cycle and find the vertices u' in S and v' in S/V such that u' is the last vertex in S after which we enter S/V and v' is the first vertex in S/V on the path in cycle from u to v.
Remove the edge e'=(u',v') and the resultant graph is still connected but the weight of e is greater than e' [ as e is the maximum weighted edge from S to S/V at this point] so this results in an MST which has sum of weights greater than original MST. So this is a contradiction. This means that edge e must be in every MST.
Algorithm to find MST:
Start from S={s} //s is the start vertex
while S does not contain all vertices
do
{
for each vertex s in S
add a vertex v from S/V such that weight of edge e=(s,v) is maximum
}
end while
Implementation:
we can implement using Max Heap/Priority Queue where the key is the maximum weight of the edge from a vertex in S to a vertex in S/V and value is the vertex itself. Adding a vertex in S is equal to Extract_Max from the Heap and at every Extract_Max change the key of the vertices adjacent to the vertex just added.
So it takes m Change_Key operations and n Extract_Max operations.
Extract_Min and Change_Key both can be implemented in O(log n). n is the number of vertices.
So This takes O(m log n) time. m is the number of edges in the graph.
Let me provide an improvement algorithm:
first construct an arbitrary tree (using BFS or DFS)
then pick an edge outside the tree, add to the tree, it will form a cycle, drop the smallest weight edge in the cycle.
continue doing this util all the rest edges are considered
Thus, we'll get the maximum spanning tree.
This tree satisfies any edge outside the tree, if added will form a cycle and the edge outside <= any edge weights in the cycle
In fact, this is a necessary and sufficient condition for a spanning tree to be maximum spanning tree.
Pf.
Necessary: It's obvious that this is necessary, or we could swap edge to make a tree with a larger sum of edge weights.
Sufficient: Suppose tree T1 satisfies this condition, and T2 is the maximum spanning tree.
Then for the edges T1 ∪ T2, there're T1-only edges, T2-only edges, T1 ∩ T2 edges, if we add a T1-only edge(x1, xk) to T2, we know it will form a cycle, and we claim, in this cycle there must exist one T2-only edge that has the same edge weights as (x1, xk). Then we can exchange these edges will produce a tree with one more edge in common with T2 and has the same sum of edge weights, repeating doing this we'll get T2. so T1 is also a maximum spanning tree.
Prove the claim:
suppose it's not true, in the cycle we must have a T2-only edge since T1 is a tree. If none of the T2-only edges has a value equal to that of (x1, xk), then each of T2-only edges makes a loop with tree T1, then T1 has a loop leads to a contradiction.
This algorithm taken from UTD professor R. Chandrasekaran's notes. You can refer here: Single Commodity Multi-terminal Flows
Negate the weight of original graph and compute minimum spanning tree on the negated graph will give the right answer. Here is why: For the same spanning tree in both graphs, the weighted sum of one graph is the negation of the other. So the minimum spanning tree of the negated graph should give the maximum spanning tree of the original one.
Only reversing the sorting order, and choosing a heavy edge in a vertex cut does not guarantee a Maximum Spanning Forest (Kruskal's algorithm generates forest, not tree). In case all edges have negative weights, the Max Spanning Forest obtained from reverse of kruskal, would still be a negative weight path. However the ideal answer is a forest of disconnected vertices. i.e. a forest of |V| singleton trees, or |V| components having total weight of 0 (not the least negative).
Change the weight in a reserved order(You can achieve this by taking a negative weight value and add a large number, whose purpose is to ensure non-negative) Then run your family geedy-based algorithm on the minimum spanning tree.

Minimum cost arborescence of a specific subset of vertices

I can use the Chu-Liu/Edmonds algorithm to get the minimum cost arborescence of a directed weighted graph. I want to apply this for a pre-specified subset of k vertices. (I know exactly which k vertices must be included in the the tree).
What are the steps required to apply the Chu-Liu/Edmonds algorithm?
Similar to Construct a minimum spanning tree covering a specific subset of the vertices, but for directed graphs / minimum cost arborescence (vs. undirected graphs / minimum spanning tree).

Resources