Color nodes by networkx - python-3.x

I am generating a network topology diagram through the data in a csv file where s0..s2 and c1..c3 are nodes of the diagram.
network.csv:
source,port,destination
s1,1,c3
s2,1,c1
s0,1,c2
s1,2,s2
s2,2,s0
I need to make all the source to be blue and destinations to be green.
How can I do it without overriding the source nodes?

Following is a working solution:
import csv
import networkx as nx
from matplotlib import pyplot as plt
with open('../resources/network.csv') as csvfile:
reader = csv.DictReader(csvfile)
edges = {(row['source'], row['destination']) for row in reader }
print(edges) # {('s1', 'c3'), ('s1', 's2'), ('s0', 'c2'), ('s2', 's0'), ('s2', 'c1')}
G = nx.DiGraph()
source_nodes = set([edge[0] for edge in edges])
G.add_edges_from(edges)
for n in G.nodes():
G.nodes[n]['color'] = 'b' if n in source_nodes else 'g'
pos = nx.spring_layout(G)
colors = [node[1]['color'] for node in G.nodes(data=True)]
nx.draw_networkx(G, pos, with_labels=True, node_color=colors)
plt.show()
We first read the csv to an edge list, which is later used for the construction of G. For well defining the colors we set each source node with blue and the rest of the nodes as green (i.e., all destination nodes that also not source nodes).
We also use nx.draw_networkx to get a more compact implementation for drawing the graph.
The result should be something like:

Related

Deleting nodes while plotting 3D network using mayavi

I've a graph network created using Networkx and plotted using Mayavi.
After the graph is created, I 'm deleting nodes with degree < 2, using G.remove_nodes_from(). Once the nodes are deleted, the edges connected to these nodes are deleted but the nodes still appear in the final output (image below).
import matplotlib.pyplot as plt
from mayavi import mlab
import numpy as np
import pandas as pd
pos = [[0.1, 2, 0.3], [40, 0.5, -10],
[0.1, -40, 0.3], [-49, 0.1, 2],
[10.3, 0.3, 0.4], [-109, 0.3, 0.4]]
pos = pd.DataFrame(pos, columns=['x', 'y', 'z'])
ed_ls = [(x, y) for x, y in zip(range(0, 5), range(1, 6))]
G = nx.Graph()
G.add_edges_from(ed_ls)
remove = [node for node, degree in dict(G.degree()).items() if degree < 2]
G.remove_nodes_from(remove)
pos.drop(pos.index[remove], inplace=True)
print(G.edges)
nx.draw(G)
plt.show()
mlab.figure(1, bgcolor=bgcolor)
mlab.clf()
for i, e in enumerate(G.edges()):
# ----------------------------------------------------------------------------
# the x,y, and z co-ordinates are here
pts = mlab.points3d(pos['x'], pos['y'], pos['z'],
scale_mode='none',
scale_factor=1)
# ----------------------------------------------------------------------------
pts.mlab_source.dataset.lines = np.array(G.edges())
tube = mlab.pipeline.tube(pts, tube_radius=edge_size)
mlab.pipeline.surface(tube, color=edge_color)
mlab.show() # interactive window
I'd like to ask for suggestions on how to remove the deleted nodes and the corresponding positions and display the rest in the output.
Secondly, I would like to know how to delete the nodes and the edges connected to these nodes interactively. For instance, if I want to delete nodes and edges connected to nodes of degree < 2, first I would like to display an interactive graph with all nodes with degree < 2 highlighted. The user can select the nodes that have to be deleted in an interactive manner. By clicking on a highlighted node, the node and connect edge can be deleted.
EDIT:
I tried to remove the positions of the deleted nodes from the dataframe pos by including pos.drop(pos.index[remove], inplace=True) updated in the complete code posted above.
But I still don't get the correct output.
Here is a solution for interactive removal of network nodes and edges in Mayavi
(I think matplotlib might be sufficient and easier but anyways...).
The solution is inspired by this Mayavi example.
However, the example is not directly transferable because a glyph (used to visualize the nodes) consists of many points and when plotting
each glyph/node by itself, the point_id cannot be used to identify the glyph/node. Moreover, it does not include the option to
hide/delete objects. To avoid these problems, I used four ideas:
Each node/edge is plotted as a separate object, so it is easier to adjust it's (visibility) properties.
Instead of deleting nodes/edges, they are just hidden when clicked upon.
Moreover, clicking twice makes the node visible again
(this does not work for the edges with the code below but you might be able to implement that if required,
just needs keeping track of visible nodes).
The visible nodes can be collected at the end (see code below).
As in the example, the mouse position is captured using a picker callback.
But instead of using the point_id of the closest point, it's coordinates are used directly.
The node to be deleted/hidden is found by computing the minimum Euclidean distance between the mouse position and all nodes.
PS: In your original code, the for-loop is quite redundant because it plots all nodes and edges many times on top of each other.
Hope that helps!
# import modules
from mayavi import mlab
import numpy as np
import pandas as pd
import networkx as nx
# set number of nodes
number = 6
# create random node positions
np.random.seed(5)
pos = 100*np.random.rand(6, 3)
pos = pd.DataFrame(pos, columns=['x', 'y', 'z'])
# create chain graph links
links = [(x, y) for x, y in zip(range(0, number-1), range(1, number))]
# create graph (not strictly needed, link list above would be enough)
graph = nx.Graph()
graph.add_edges_from(links)
# setup mayavi figure
figure = mlab.gcf()
mlab.clf()
# add nodes as individual glyphs
# store glyphs in dictionary to allow interactive adjustments of visibility
color = (0.5, 0.0, 0.5)
nodes = dict()
texts = dict()
for ni, n in enumerate(graph.nodes()):
xyz = pos.loc[n]
n = mlab.points3d(xyz['x'], xyz['y'], xyz['z'], scale_factor=5, color=color)
label = 'node %s' % ni
t = mlab.text3d(xyz['x'], xyz['y'], xyz['z']+5, label, scale=(5, 5, 5))
# each glyph consists of many points
# arr = n.glyph.glyph_source.glyph_source.output.points.to_array()
nodes[ni] = n
texts[ni] = t
# add edges as individual tubes
edges = dict()
for ei, e in enumerate(graph.edges()):
xyz = pos.loc[np.array(e)]
edges[ei] = mlab.plot3d(xyz['x'], xyz['y'], xyz['z'], tube_radius=1, color=color)
# define picker callback for figure interaction
def picker_callback(picker):
# get coordinates of mouse click position
cen = picker.pick_position
# compute Euclidean distance btween mouse position and all nodes
dist = np.linalg.norm(pos-cen, axis=1)
# get closest node
ni = np.argmin(dist)
# hide/show node and text
n = nodes[ni]
n.visible = not n.visible
t = texts[ni]
t.visible = not t.visible
# hide/show edges
# must be adjusted if double-clicking should hide/show both nodes and edges in a reasonable way
for ei, edge in enumerate(graph.edges()):
if ni in edge:
e = edges[ei]
e.visible = not e.visible
# add picker callback
picker = figure.on_mouse_pick(picker_callback)
picker.tolerance = 0.01
# show interactive window
# mlab.show()
# collect visibility/deletion status of nodes, e.g.
# [(0, True), (1, False), (2, True), (3, True), (4, True), (5, True)]
[(key, node.visible) for key, node in nodes.items()]

How to draw half - filled nodes in a networkx graph?

I am drawing a graph using the NetworkX library where I want semi-circular nodes.
The node_shape attribute in nx.draw_networkx_nodes refers to the matplotlib.scatter marker specifications. But, there is no option of a half-filled circle. Moreover matplotlib.lines has the attribute fillStyles, but I am confused about how I can implement in the code.
nx.draw_networkx_nodes(G,pos,
node_list = nodes.keys(),
node_size = [n for n in nodes.values()],
node_color = '#78CCF0',
node_shape = '.',
alpha = 0.77)
Here's a quick look: https://imgur.com/a/wsyQls3
import networkx as nx
G=nx.dodecahedral_graph()
nodes=nx.draw_networkx_nodes(G,pos=nx.spring_layout(G),
node_shape=matplotlib.markers.MarkerStyle(marker='o',
fillstyle='top'))

louvain community detection in complete weighted networks returns only 1 partition

Referring to : https://stackoverflow.com/a/44907357/305883
I am using python-louvain implementation to detect community in complete weighted graph.
But I only get one partition, containing all nodes.
Code:
import community # this is pip install python-louvain
import networkx as nx
import matplotlib.pyplot as plt
# Replace this with your networkx graph loading depending on your format !
# using graph g as a completed graph, weights between 0 and 1
#first compute the best partition
partition = community.best_partition(g)
#drawing
size = float(len(set(partition.values())))
pos = nx.spring_layout(g)
count = 0.
for com in set(partition.values()) :
count = count + 1.
list_nodes = [nodes for nodes in partition.keys() if partition[nodes] == com]
nx.draw_networkx_nodes(g, pos, list_nodes, node_size = 20, node_color = str(count / size))
nx.draw_networkx_edges(g, pos, alpha=0.1)
plt.show()
I would like to extract communities from a complete weighted network.
I also tried girvan_newman (https://networkx.github.io/documentation/networkx-2.0/reference/algorithms/generated/networkx.algorithms.community.centrality.girvan_newman.html) but could only detect 2 communities out of a complete graph of 200 nodes (with 198 and 2 nodes).
Is Louvain working correctly to detect communities in complete graph?
Better suggestions?
It is possible that the used model selection for this case returns a single block with all nodes, which means that there is not enough statistical evidence for more blocks.
You could try Peixotos graph-tool package, which has an implementation of weighted stochastic block model.
If you have a weighted network you need to use the weight='weight' argument:
import networkx as nx
import community
import numpy as np
np.random.seed(0)
W = np.random.rand(15,15)
np.fill_diagonal(W,0.0)
G = nx.from_numpy_array(W)
louvain_partition = community.best_partition(G, weight='weight')
modularity2 = community.modularity(louvain_partition, G, weight='weight')
print("The modularity Q based on networkx is {}".format(modularity2))
The modularity Q based on networkx is 0.0849022950503318

Color of the node of tree with graphviz using class_names

Expanding on a prior question:
Changing colors for decision tree plot created using export graphviz
How would I color the nodes of the tree bases on the dominant class (species of iris), instead of a binary distinction? This should require a combination of the iris.target_names, the string describing the class, and iris.target, the class.
import pydotplus
from sklearn.datasets import load_iris
from sklearn import tree
import collections
clf = tree.DecisionTreeClassifier(random_state=42)
iris = load_iris()
clf = clf.fit(iris.data, iris.target)
dot_data = tree.export_graphviz(clf, out_file=None,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True,
special_characters=True)
graph = pydotplus.graph_from_dot_data(dot_data)
nodes = graph.get_node_list()
edges = graph.get_edge_list()
colors = ('brown', 'forestgreen')
edges = collections.defaultdict(list)
for edge in graph.get_edge_list():
edges[edge.get_source()].append(int(edge.get_destination()))
for edge in edges:
edges[edge].sort()
for i in range(2):
dest = graph.get_node(str(edges[edge][i]))[0]
dest.set_fillcolor(colors[i])
graph.write_png('tree.png')
The code from the example looks so familiar and is therefore easy to modify :)
For each node Graphviz tells us how many samples from each group we have, i.e. if it is a mixed population or the tree came to a decision. We can extract this info and use to get a color.
values = [int(ii) for ii in node.get_label().split('value = [')[1].split(']')[0].split(',')]
Alternatively you can map the GraphViz nodes back to the sklearn nodes:
values = clf.tree_.value[int(node.get_name())][0]
We only have 3 classes, so each one gets its own color (red, green, blue), mixed populations get mixed colors according to their distribution.
values = [int(255 * v / sum(values)) for v in values]
color = '#{:02x}{:02x}{:02x}'.format(values[0], values[1], values[2])
We can now see the separation nicely, the greener it gets the more of the 2nd class we have, same for blue and the 3rd class.
import pydotplus
from sklearn.datasets import load_iris
from sklearn import tree
clf = tree.DecisionTreeClassifier(random_state=42)
iris = load_iris()
clf = clf.fit(iris.data, iris.target)
dot_data = tree.export_graphviz(clf,
feature_names=iris.feature_names,
out_file=None,
filled=True,
rounded=True,
special_characters=True)
graph = pydotplus.graph_from_dot_data(dot_data)
nodes = graph.get_node_list()
for node in nodes:
if node.get_label():
values = [int(ii) for ii in node.get_label().split('value = [')[1].split(']')[0].split(',')]
values = [int(255 * v / sum(values)) for v in values]
color = '#{:02x}{:02x}{:02x}'.format(values[0], values[1], values[2])
node.set_fillcolor(color)
graph.write_png('colored_tree.png')
A general solution for more than 3 classes which colors only the final nodes .
colors = ('lightblue', 'lightyellow', 'forestgreen', 'lightred', 'white')
for node in nodes:
if node.get_name() not in ('node', 'edge'):
values = clf.tree_.value[int(node.get_name())][0]
#color only nodes where only one class is present
if max(values) == sum(values):
node.set_fillcolor(colors[numpy.argmax(values)])
#mixed nodes get the default color
else:
node.set_fillcolor(colors[-1])
Great answers guys. Just to add to #Maximilian Peters's answer. One other thing that one can do identify leaf nodes for specific coloration is to check on the split_criteria(threshold) values. Since leaf nodes don't have child nodes, hence the absence of split criteria as well.
https://github.com/scikit-learn/scikit-learn/blob/a24c8b464d094d2c468a16ea9f8bf8d42d949f84/sklearn/tree/_tree.pyx
TREE_UNDEFINED = -2
thresholds = clf.tree_.threshold
for node in nodes:
if node.get_name() not in ('node', 'edge'):
value = clf.tree_.value[int(node.get_name())][0]
# color only nodes where only one class is present or if it is a leaf
# node
if max(values) == sum(values) or
thresholds[int(node.get_name())] == TREE_UNDEFINED:
node.set_fillcolor(colors[numpy.argmax(value)])
# mixed nodes get the default color
else:
node.set_fillcolor(colors[-1])
Not completely related to the question, but adding some more info in-case it is helpful to others.
Continuing on this idea of understanding the decision stumps of a tree-based classifier, Skater has added support to summarize all forms of tree-based models using tree surrogates. Check out the examples here.
https://github.com/datascienceinc/Skater/blob/master/examples/rule_list_notebooks/explanation_using_tree_surrogate.ipynb

Multiple heatmaps with fixed grid size

I am using seaborn(v.0.7.1) together with matplotlib(1.5.1) and pandas (v.0.18.1) to plot different clusters of data of different sizes as heat maps within a for loop as shown in the following code.
My issue is that since each cluster contains different number of rows, the final figures are of different sizes (i.e. the height and width of each box in the heat map is different across different heat maps)(see figures). Eventually, I would like to have figures of the same size (as explained above).
I have checked some parts of seabornand matplotlib documentations as well as stackoverflowbut since I do not know what the exact keywords are to look for (as evident in the question title itself) I have not been able to find any answer. [EDIT: Now I have updated the title based on a suggestion from #ImportanceOfBeingErnest. Previously the title was read: "Enforcing the same width across multiple plots".]
import numpy as np
import pandas as pd
clusters = pd.DataFrame([(1,'aaaaaaaaaaaaaaaaa'),(1,'b'), (1,'c'), (1,'d'), (2,'e'), (2,'f')])
clusters.columns = ['c', 'p']
clusters.set_index('c', inplace=True)
g = pd.DataFrame(np.ones((6,4)))
c= pd.DataFrame([(1,'aaaaaaaaaaaaaaaaa'),(2,'b'), (3,'c'), (4,'d'), (5,'e'), (6,'f')])
c.columns = ['i', 'R']
for i in range(1,3,1):
ee = clusters[clusters.index==i].p
inds = []
for v in ee:
inds.append(np.where(c.R.values == v)[0][0])
f, ax = plt.subplots(1, figsize=(13, 15))
ax = sns.heatmap(g.iloc[inds], square=True, ax=ax, cbar=True, linewidths=2, linecolor='k', cmap="Reds", cbar_kws={"shrink": .5},
vmin = math.floor(g.values.min()), vmax =math.ceil(g.values.max()))
null = ax.set_xticklabels(['a', 'b', 'c', 'd'], fontsize=15)
null = ax.set_yticklabels(c.R.values[inds][::-1], fontsize=15, rotation=0)
plt.tight_layout(pad=3)
[EDIT]: Now I have added some code to create a minimal, functional example as suggested by #Brian. Now I have noticed that the issue might have been caused by the text!
Under the following conditions
If only the squares in the saved images should have the same size and we don't care about the plot on screen and
We can omit the colorbar
the solution is rather straight forward.
One would define the size that one square should have in the final image squaresize = 50, find out the number of squares to draw in each dimension (n, m) and adjust the figure size as
figwidth = m*squaresize/float(dpi)
figheight = n*squaresize/float(dpi)
where dpi denotes the pixels per inch.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
dpi=100
squaresize = 50 # pixels
n = 3
m = 4
data = np.random.rand(n,m)
figwidth = m*squaresize/float(dpi)
figheight = n*squaresize/float(dpi)
f, ax = plt.subplots(1, figsize=(figwidth, figheight), dpi=dpi)
f.subplots_adjust(left=0, right=1, bottom=0, top=1)
ax = sns.heatmap(data, square=True, ax=ax, cbar=False)
plt.savefig(__file__+".png", dpi=dpi, bbox_inches="tight")
The bbox_inches="tight" makes sure that the labels etc. are still drawn (i.e. the final figure size will be larger than the one calculated here, depending on how much space the labels need).
To apply this example to your case you'd still need to find out how many rows and columns you have in the heatmap depending on the dataframe, but as I don't have it's structure, it's hard to provide a general solution.

Resources