Reproduce same graph in NetworkX - python-3.x

I would like to improve my graph.
There are problems as follow:
how to create a consistent graph.the graph itself is not consistent everytime i execute / run the code, it will generate different images. The inconsistent graph is shown in the url.
how to customize the whole graph / picture size and to make it bigger
how to set a permanent position for an object 'a' so that it will consistently appears at the first / top position
how to customize length of arrow for each relationship.
Appreciate if anyone could give some notes or advices
This is my codes:
Unique_liss= ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm']
edgesList= [('a', 'b'), ('b', 'c '), ('c ', 'd'), ('d', 'e'), ('d', 'f'), ('e', 'g'), ('f', 'g'), ('g', 'h'), ('h', 'i '), ('i ', 'j'), ('j', 'k'), ('j', 'l'), ('k', 'm'), ('l', 'm')]
import networkx as nx
g = nx.DiGraph()
g.add_nodes_from(Unique_liss)
g.add_edges_from(edgesList)
nx.to_pandas_adjacency(g)
G = nx.DiGraph()
for node in edgesList:
G.add_edge(*node,sep=',')
A = nx.adjacency_matrix(G).A
nx.draw(G, with_labels=True, node_size = 2000,
node_color = 'skyblue')

In order to have deterministic node layouts, you can use one of NetworkX's layouts, which allow you to specify a seed. Here's an example using nx.spring_layout for the above graph:
from matplotlib import pyplot as plt
seed = 31
pos = nx.spring_layout(G, seed=seed)
plt.figure(figsize=(10,6))
nx.draw(G, pos=pos, with_labels=True, node_size = 1500, node_color = 'skyblue')
You'll get the exact same layout if you re-run the above.
In order to customize the graph size you have several options. The simplest one baing setting the figure size as plt.figure(figsize=(x,y)) as above. And you can also control the size of the graph within the figure using the scale paramater in nx.spring_layout.
As per the last point, it looks like you cannot set specific arrow sizes for each edge. From the [docs](arrowsize : int, optional (default=10)) what you have is:
arrowsize : int, optional (default=10)
So you can only set this value to an int, which will result in an equal size for all edge arrows.

For anyone who doesn't get to read the comments in answers. I found that as #amj mentions in the components, that on top of #yatu's response above, you need to set the seed in numpy.
import numpy as np
seed = 31
np.random.seed(seed)

Related

What is the best way to calculate centralities for a single node in networkx?

I am able to calculate different kinds of centralities such as degree, betweenness, closeness, and eigenvector for all the nodes in graph G. For instance, this code calculates the betweenness centrality for all of the included nodes of graph G:
import networkx as nx
# build up a graph
G = nx.Graph()
G.add_nodes_from(['A', 'B', 'C', 'D', 'E'])
G.add_edges_from([('A', 'B'), ('B','C'), ('C', 'D'), ('D', 'E')])
bw_centrality = nx.betweenness_centrality(G, normalized=True)
print (bw_centrality)
For large networks, it is very time consuming to calculate some of the centralities, such as betweenness and closeness. Therefore, I would like to calculate the centrality of only a subset of nodes, instead of calculating all of the nodes' centrality. In the example above, how can I calculate the betweenness of node A, by Networkx library in Python?
In a graph, I found a solution for calculating the closeness centrality of a single node. But, for betweenness, I have not been able to find a solution. Let's calculate a node's closeness centrality in a graph. Here is how the closeness of all the nodes can be calculated:
import networkx as nx
# build up a graph
G = nx.Graph()
G.add_nodes_from(['A', 'B', 'C', 'D', 'E'])
G.add_edges_from([('A', 'B'), ('B','C'), ('C', 'D'), ('D', 'E')])
cc_centrality = nx.closeness_centrality(G)
print (cc_centrality )
Therefore, the above code produces the following result:
{'A': 0.4, 'B': 0.5714285714285714, 'C': 0.6666666666666666, 'D': 0.5714285714285714, 'E': 0.4}
In the next step, we will calculate the closeness of node A separately. According to the source code for Neworkx, closeness centrality has the following meaning:
closeness_centrality(G, u=None, distance=None, wf_improved=True)
The graph G shows the defined graph, and u represents the node you want to determine its closeness separately. To calculate the closeness of node A, follow these steps:
nx.closeness_centrality(G, u='A')
The result equals to 0.4. Similarly, nx.closeness_centrality(G, u='B') gives you 0.5714285714285714.
Networkx has introduced a new form of centrality, called Group Centrality, which calculates the centrality of a group of nodes. To illustrate, if you want to calculate a centrality of 3 nodes in Networkx, the introduced capability will calculate the desired centrality of those 3 nodes and give you a number (just one number), indicating the combined centrality of 3 nodes. In order to calculate the centrality of a node, it should be viewed as a group. Therefore, its centrality can be calculated separately. Based on the above question, we can calculate the betweenness centrality of all nodes as follows:
G = nx.Graph()
G.add_nodes_from(['A', 'B', 'C', 'D', 'E'])
G.add_edges_from([('A', 'B'), ('B','C'), ('C', 'D'), ('D', 'E')])
bw_centrality = nx.betweenness_centrality(G, normalized=True)
print (bw_centrality)
The code above produces the following result:
{'A': 0.0, 'B': 0.5, 'C': 0.6666666666666666, 'D': 0.5, 'E': 0.0}
Now, we use the group centrality feature to calculate node C's centrality, as follows:
nx.group_betweenness_centrality(G, ['C'])
The result of calculating the betweenness of node C equals to 0.6666666666666666. Visit the following link for more information:
https://networkx.org/documentation/stable/reference/algorithms/centrality.html

Coloring sns barplot based on condition from another dataframe

I am using following code to generate a barplot from a dataframe df1(x,y) as below. (For simplicity I have added sample values in the chart code itself).
sns.barplot(x=['A','B','C','D','E','F','G'],y=[10,40,20,5,60,30,80],palette='Blues_r')
This generates a beautiful chart with shades of blue color in descending order for bars A to G.
However, I need the colors to be in the order determined based on another dataframe df2 where there are values against A to G. I do not wish to change the order of A to G in this chart, so sorting the dataframe df1 based on values of df2 will not work for me.
So, say df2 is like this:
A 90
B 70
C 40
D 30
E 30
F 20
G 80
Notice that df2 can have same values (D and E), in which case I do not care whether D and E has same colors or adjacent from the palette. But there should not be any other bar with color in between D and E. That is, I need the chart to have bars starting from A and ending at G (fix order). However, colors will be in the order of df2 values.
How do we do this?
You can use hue= with the values of the second dataframe. You'll also need dodge=False to tell Seaborn that you want a full bar per x-position.
import seaborn as sns
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'x': ['A', 'B', 'C', 'D', 'E', 'F', 'G'],
'y': [10, 40, 20, 5, 60, 30, 80]})
df2 = pd.DataFrame({'x': ['A', 'B', 'C', 'D', 'E', 'F', 'G'],
'y': [90, 70, 40, 30, 30, 20, 80]})
sns.barplot(data=df1, x='x', y='y', palette='Blues_r', hue=df2['y'], dodge=False, legend=False)
Note that this uses the values in df2['y] to make the relative coloring. If you just want to use the order, you can use np.argsort(df2['y']) to get the indices of the ordered array.
ax = sns.barplot(data=df1, x='x', y='y', palette='Blues_r', hue=np.argsort(df2['y']), dodge=False)
ax.legend_.remove() # remove the legend which consists of the indices 0,1,2,...
You can try vanilla plt.bar:
x = ['A','B','C','D','E','F','G']
y=[10,40,20,5,60,30,80]
# assuming the columns of df2 are 'x' and 'color'
colors = df2.set_index('x').loc[x, 'color']
cmap = plt.get_cmap('Blues_r')
plt.bar(x,y, color=[cmap(c) for c in colors])
Output:

Determine all the possible combinations between the main elements of the parent list

I'm working on designing a dataset and I'm facing an issue with a specific part of it. I provided the example below to simplify and relate to my issue.
I have a list of lists
list = ['b',['c','g','d'],['h','l']]
and I'm interested in a general solution to determine all the possible combinations between the main elements of the parent list
Solution needed:
['b','c','h']
['b','c','l']
['b','g','h']
['b','g','l']
['b','d','h']
['b','d','l']
You can use itertools.product():
import itertools
my_list = ['b', ['c','g','d'], ['h','l']]
print(list(itertools.product(*my_list)))
output:
[('b', 'c', 'h'), ('b', 'c', 'l'), ('b', 'g', 'h'),
('b', 'g', 'l'), ('b', 'd', 'h'), ('b', 'd', 'l')]

Label and color glyph in bokeh

I am trying out bokeh. It's quite fun so far. But I am not totally getting the hang of it. My goal is to make a simple but interactive scatter chart.
I have three main issues:
I want to label the scatter plot with names
I want the scatter to be colored in accordance to colors
I would love widgets where I can decide if the colors and names are displayed.
Here is what I have done so far. I tried to use LabelSet but I am stuck. Any help is greatly appreciated!
# interactive widget bokeh figure
from bokeh.io import curdoc
from bokeh.layouts import row, widgetbox
from bokeh.models import ColumnDataSource
from bokeh.models.widgets import Slider, TextInput
from bokeh.plotting import figure
from bokeh.models import Range1d, LabelSet, Label
import numpy as np
# data
x = [-4, 3, 2, 4, 10, 11, -2, 6]
y = [-3, 2, 2, 9, 11, 12, -5, 6]
names = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
colors =['r', 'y', 'y', 'r', 'g', 'g', 'g', 'g']
p = figure(plot_height=400, plot_width=400, title="a little interactive chart",
tools="crosshair,pan,reset,save,wheel_zoom",
x_range=[-10, 10], y_range=[-10, 10])
labels = LabelSet(x='x', y='y', text='names', level='glyph',
x_offset=5, y_offset=5)
p.add_layout(labels)
p.circle(x, y, fill_color="red", line_color="red", size=6)
# Set up widgets
text = TextInput(title="title", value='a little interavtive chart')
# Set up callbacks
def update_title(attrname, old, new):
p.title.text = text.value
text.on_change('value', update_title)
# # Set up layouts and add to document
inputs = widgetbox(text, names)
curdoc().add_root(row(inputs, p, width=800))
curdoc().title = "Sliders"
Typically you use LabelSet by configuring it with the same data source as some glyph renderer. I find whenever sharing column data sources, its best to just also create them explicitly. Here is an updated version of your code that renders:
# interactive widget bokeh figure
from bokeh.io import curdoc
from bokeh.layouts import row, widgetbox
from bokeh.models import ColumnDataSource, Range1d, LabelSet, Label
from bokeh.models.widgets import Slider, TextInput
from bokeh.plotting import figure
# data
x = [-4, 3, 2, 4, 10, 11, -2, 6]
y = [-3, 2, 2, 9, 11, 12, -5, 6]
names = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
colors =['r', 'y', 'y', 'r', 'g', 'g', 'g', 'g']
# create a CDS by hand
source = ColumnDataSource(data=dict(x=x, y=y, names=names, colors=colors))
p = figure(plot_height=400, plot_width=400, title="a little interactive chart",
tools="crosshair,pan,reset,save,wheel_zoom",
x_range=[-10, 10], y_range=[-10, 10])
# pass the CDS here, and column names (not the arrays themselves)
p.circle('x', 'y', fill_color="red", line_color="red", size=6, source=source)
# pass the CDS here too
labels = LabelSet(x='x', y='y', text='names', level='glyph',
x_offset=5, y_offset=5, source=source)
p.add_layout(labels)
# Set up widgets
text = TextInput(title="title", value='a little interavtive chart')
# Set up callbacks
def update_title(attrname, old, new):
p.title.text = text.value
text.on_change('value', update_title)
# # Set up layouts and add to document
inputs = widgetbox(text)
curdoc().add_root(row(inputs, p, width=800))
curdoc().title = "Sliders"
I also removed names from the widgetbox because widget boxes can only contain widget models. Maybe you intend to use the names in a Select widget or something?

How to add column next to Seaborn heat map

Given the code below, which produces a heat map, how can I get column "D" (the total column)
to display as a column to the right of the heat map with no color, just aligned total values per cell? I'm also trying to move the labels to the top. I don't mind that the labels on the left are horizontal as this does not occur with my actual data.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
%matplotlib inline
df = pd.DataFrame(
{'A' : ['A', 'A', 'B', 'B','C', 'C', 'D', 'D'],
'B' : ['A', 'B', 'A', 'B','A', 'B', 'A', 'B'],
'C' : [2, 4, 5, 2, 0, 3, 9, 1],
'D' : [6, 6, 7, 7, 3, 3, 10, 10]})
df=df.pivot('A','B','C')
fig, ax = plt.subplots(1, 1, figsize =(4,6))
sns.heatmap(df, annot=True, linewidths=0, cbar=False)
plt.show()
Here's the desired result:
Thanks in advance!
I think the cleanest way (although probably not the shortest), would be to plot Total as one of the columns, and then access colors of the facets of the heatmap and change some of them to white.
The element that is responsible for color on heatmap is matplotlib.collections.QuadMesh. It contains all facecolors used for each facet of the heatmap, from left to right, bottom to top.
You can modify some colors and pass them back to QuadMesh before you plt.show().
There is a slight problem that seaborn changes text color of some of the annotations to make them visible on dark background, and they become invisible when you change to white color. So for now I set color of all text to black, you will need to figure out what is best for your plots.
Finally, to put x axis ticks and label on top, use:
ax.xaxis.tick_top()
ax.xaxis.set_label_position('top')
The final version of the code:
import matplotlib.pyplot as plt
from matplotlib.collections import QuadMesh
from matplotlib.text import Text
import seaborn as sns
import pandas as pd
import numpy as np
%matplotlib inline
df = pd.DataFrame(
{'A' : ['A', 'A', 'B', 'B','C', 'C', 'D', 'D'],
'B' : ['A', 'B', 'A', 'B','A', 'B', 'A', 'B'],
'C' : [2, 4, 5, 2, 0, 3, 9, 1],
'D' : [6, 6, 7, 7, 3, 3, 10, 10]})
df=df.pivot('A','B','C')
# create "Total" column
df['Total'] = df['A'] + df['B']
fig, ax = plt.subplots(1, 1, figsize =(4,6))
sns.heatmap(df, annot=True, linewidths=0, cbar=False)
# find your QuadMesh object and get array of colors
quadmesh = ax.findobj(QuadMesh)[0]
facecolors = quadmesh.get_facecolors()
# make colors of the last column white
facecolors[np.arange(2,12,3)] = np.array([1,1,1,1])
# set modified colors
quadmesh.set_facecolors = facecolors
# set color of all text to black
for i in ax.findobj(Text):
i.set_color('black')
# move x ticks and label to the top
ax.xaxis.tick_top()
ax.xaxis.set_label_position('top')
plt.show()
P.S. I am on Python 2.7, some syntax adjustments might be required, though I cannot think of any.

Resources