Assign edge weights to a networkx graph using pandas dataframe - python-3.x

I am contructing a networkx graph in python 3. I am using a pandas dataframe to supply the edges and nodes to the graph. Here is what I have done :
test = pd.read_csv("/home/Desktop/test_call1", delimiter = ';')
g_test = nx.from_pandas_edgelist(test, 'number', 'contactNumber', edge_attr='callDuration')
What I want is that the "callDuration" column of the pandas dataframe act as the weight of the edges for the networkx graph and the thickness of the edges also change accordingly.
I also want to get the 'n' maximum weighted edges.

Let's try:
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
df = pd.DataFrame({'number':['123','234','345'],'contactnumber':['234','345','123'],'callduration':[1,2,4]})
df
G = nx.from_pandas_edgelist(df,'number','contactnumber', edge_attr='callduration')
durations = [i['callduration'] for i in dict(G.edges).values()]
labels = [i for i in dict(G.nodes).keys()]
labels = {i:i for i in dict(G.nodes).keys()}
fig, ax = plt.subplots(figsize=(12,5))
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos, ax = ax, labels=True)
nx.draw_networkx_edges(G, pos, width=durations, ax=ax)
_ = nx.draw_networkx_labels(G, pos, labels, ax=ax)
Output:

Do not agree with what has been said. In the calcul of different metrics that takes into account the weight of each edges like the pagerank or the betweeness centrality your weight would not be taking into account if is store as an edge attributes.
Use graph.
Add_edges(source, target, weight, *attrs)

Related

Is there a library that will help me fit data easily? I found fitter and i will provide the code but it shows some errors

So, here is my code:
import pandas as pd
import scipy.stats as st
import matplotlib.pyplot as plt
from matplotlib.ticker import AutoMinorLocator
from fitter import Fitter, get_common_distributions
df = pd.read_csv("project3.csv")
bins = [282.33, 594.33, 906.33, 1281.33, 15030.33, 1842.33, 2154.33, 2466.33, 2778.33, 3090.33, 3402.33]
#declaring
facecolor = '#EAEAEA'
color_bars = '#3475D0'
txt_color1 = '#252525'
txt_color2 = '#004C74'
fig, ax = plt.subplots(1, figsize=(16, 6), facecolor=facecolor)
ax.set_facecolor(facecolor)
n, bins, patches = plt.hist(df.City1, color=color_bars, bins=10)
#grid
minor_locator = AutoMinorLocator(2)
plt.gca().xaxis.set_minor_locator(minor_locator)
plt.grid(which='minor', color=facecolor, lw = 0.5)
xticks = [(bins[idx+1] + value)/2 for idx, value in enumerate(bins[:-1])]
xticks_labels = [ "{:.0f}-{:.0f}".format(value, bins[idx+1]) for idx, value in enumerate(bins[:-1])]
plt.xticks(xticks, labels=xticks_labels, c=txt_color1, fontsize=13)
#beautify
ax.tick_params(axis='x', which='both',length=0)
plt.yticks([])
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
for idx, value in enumerate(n):
if value > 0:
plt.text(xticks[idx], value+5, int(value), ha='center', fontsize=16, c=txt_color1)
plt.title('Histogram of rainfall in City1\n', loc = 'right', fontsize = 20, c=txt_color1)
plt.xlabel('\nCentimeters of rainfall', c=txt_color2, fontsize=14)
plt.ylabel('Frequency of occurrence', c=txt_color2, fontsize=14)
plt.tight_layout()
#plt.savefig('City1_Raw.png', facecolor=facecolor)
plt.show()
city1 = df['City1'].values
f = Fitter(city1, distributions=get_common_distributions())
f.fit()
fig = f.plot_pdf(names=None, Nbest=4, lw=1, method='sumsquare_error')
plt.show()
print(f.get_best(method = 'sumsquare_error'))
The issue is with the plots it shows. The first histogram it generates is
Next I get another graph with best fitted distributions which is
Then an output statement
{'chi2': {'df': 10.692966790090342, 'loc': 16.690849400411103, 'scale': 118.71595997157786}}
Process finished with exit code 0
I have a couple of questions. Why is chi2, the best fitted distribution not plotted on the graph?
How do I plot these distributions on top of the histograms and not separately? The hist() function in fitter library can do that but there I don't get to control the bins and so I end up getting like 100 bins with some flat looking data.
How do I solve this issue? I need to plot the best fit curve on the histogram that looks like image1. Can I use any other module/package to get the work done in similar way? This uses least squares fit but I am OK with least likelihood or log likelihood too.
Simple way of plotting things on top of each other (using some properties of the Fitter class)
import scipy.stats as st
import matplotlib.pyplot as plt
from fitter import Fitter, get_common_distributions
from scipy import stats
numberofpoints=50000
df = stats.norm.rvs( loc=1090, scale=500, size=numberofpoints)
fig, ax = plt.subplots(1, figsize=(16, 6))
n, bins, patches = ax.hist( df, bins=30, density=True)
f = Fitter(df, distributions=get_common_distributions())
f.fit()
errorlist = sorted(
[
[f._fitted_errors[dist], dist]
for dist in get_common_distributions()
]
)[:4]
for err, dist in errorlist:
ax.plot( f.x, f.fitted_pdf[dist] )
plt.show()
Using the histogram normalization, one would need to play with scaling to generalize again.

Working with Grid Cells in a GeoDataFrame

I have a GeoDataFrame filled with Points from a given city xy, which I loaded from osmnx package.
If I plot this, I'm getting the Longitude and Latitude on the plot as x and y Axis. (See picture)
I want to create a better grid, which is based on 100x100 meters and Not on longitude and latitude
I also want to access these grid cells so that I can iterate through them and even index them, example top left grid cell should have Cell_ID = "1". the next one "2"
What I have so far
Packages:
import pandas as pd
import geopandas as gpd
import numpy as np
from shapely.geometry import Point, Polygon, LineString
%matplotlib inline
import matplotlib.pyplot as plt
import shapely
import plotly_express as px
import networkx as nx
import osmnx as ox
ox.config(use_cache=True, log_console=True)
Create the Graph Function
def create_graph(loc, dist, transport_mode, loc_type="address"):
"""Transport mode = ‘walk’, ‘bike’, ‘drive’, ‘drive_service’, ‘all’, ‘all_private’, ‘none’"""
if loc_type == "address":
V = ox.graph_from_address(loc, dist=dist, network_type=transport_mode)
elif loc_type == "points":
V = ox.graph_from_point(loc, dist=dist, network_type=transport_mode )
return V
Enter City:
V = create_graph("Enter a city here", 2500, "drive")
ox.plot_graph(V)
# Retrieve nodes and edges
nodes, edges = ox.graph_to_gdfs(V)
Put a Grid on it and plot it:
pcproj = ccrs.PlateCarree()
fig = plt.figure(figsize=(12, 8))
extent =[16.01,16.10, 48.305, 48.345] #lonmin, lonmax, latmin, latmax
ax = plt.axes(projection= pcproj )
ax.set_extent(extent, crs=pcproj)
lon_grid = np.arange(16.0, 16.09, 0.01)
lat_grid = np.arange(48.310, 48.340, 0.005)
gl = ax.gridlines(draw_labels=True,
xlocs=lon_grid, ylocs=lat_grid,
x_inline=False, y_inline=False,
color='r', linestyle='dotted')
ax = nodes.plot(ax=ax, edgecolor='k', lw=0.9)
ax.set_title("Gridded Version : Some_City Points")
plt.show()

Networkx: how to add edge labels from csv file in a graph

How can I add Edge label from csv/excel file to networkx directed graph
I want to add labels to my networkx graph from column Edge_label present in csv file
import pandas as pd
import matplotlib.pyplot as plt
#%matplotlib inline
import networkx as nx
df = pd.read_csv('Trail_data.csv')
g = nx.from_pandas_edgelist(df,
'Source',
'Target',
create_using=nx.DiGraph() # For Directed Route arrows
)
plt.figure( figsize=(40, 40)
)
nx.draw(g,
with_labels=True,
node_size= 3000,#k=200,
node_color='#82CAFF',##00b4d9
font_size=16,
font_weight ='bold',
font_color='black',
edge_color = ('#E55451','#810541','#00FF00'),
node_shape='o',
width=4 ,
arrows=True, #Show arrow From and To
pos=nx.random_layout(g),iterations=20,
connectionstyle='arc3, rad =0.11' #To avoid overlapping edgs
)
plt.savefig('Visualization.jpeg',
dpi = (100)
)
** Also I wanted to convert this directed graph to interactive graph with python-dash **
According to the documentation of from_pandas_edgelist you can simply specify a list of columns with edge_attr.
In your case, you get the desired graph with:
g = nx.from_pandas_edgelist(df,
'Source',
'Target',
edge_attr=`Edge_label`,
create_using=nx.DiGraph(),)
For drawing you currently only draw node labels. You can add edge labels with draw_networkx_edge_labels
pos = nx.random_layout(g)
nx.draw(g,
pos=pos, ...) # add other parameters
edge_labels = nx.get_edge_attributes(g, "Edge_label")
nx.draw_networkx_edge_labels(g, pos, edge_labels)

Specifying the color Increments of heat-map in python

Is there a way to specify in Seaborn or Matplotlib the color increments of heat-map color scale. For instance, for data-frame that contains normalized values between 0-1, to specify 100,discrete, color increments so each value is distinguished from other values?
Thank you in advance
There are two principle approaches to discetize a heatmap into n colors:
Supply the data rounded to the n values.
Use a discrete colormap.
The following code shows those two options.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
x, y = np.meshgrid(range(15),range(6))
v = np.random.rand(len(x.flatten()))
df = pd.DataFrame({"x":x.flatten(), "y":y.flatten(),"value":v})
df = df.pivot(index="y", columns="x", values="value")
n = 4.
fig, (ax0, ax, ax2) = plt.subplots(nrows=3)
### original
im0 = ax0.imshow(df.values, cmap="viridis", vmin=0, vmax=1)
ax0.set_title("original")
### Discretize array
arr = np.floor(df.values * n)/n
im = ax.imshow(arr, cmap="viridis", vmin=0, vmax=1)
ax.set_title("discretize values")
### Discretize colormap
cmap = plt.cm.get_cmap("viridis", n)
im2 = ax2.imshow(df.values, cmap=cmap, vmin=0, vmax=1 )
ax2.set_title("discretize colormap")
#colorbars
fig.colorbar(im0, ax=ax0)
fig.colorbar(im, ax=ax)
fig.colorbar(im2, ax=ax2, ticks=np.arange(0,1,1./n), )
plt.tight_layout()
plt.show()

heatmap based on ratios in Python's seaborn

I have data in Cartesian coordinates. To each Cartesian coordinate there is also binary variable. I wan to make a heatmap, where in each polygon (hexagon/rectangle,etc.) the color strength is the ratio of number of occurrences where the boolean is True out of the total occurrences in that polygon.
The data can for example look like this:
df = pd.DataFrame([[1,2,False],[-1,5,True], [51,52,False]])
I know that seaborn can generate heatmaps via seaborn.heatmap, but the color strength is based by default on the total occurrences in each polygon, not the above ratio. Is there perhaps another plotting tool that would be more suitable?
You could also use the pandas groupby functionality to compute the ratios and then pass the result to seaborn.heatmap. With the example data borrowed from #ImportanceOfBeingErnest it would look like this:
import numpy as np
import pandas as pd
import seaborn as sns
np.random.seed(0)
x = np.random.poisson(5, size=200)
y = np.random.poisson(7, size=200)
z = np.random.choice([True, False], size=200, p=[0.3, 0.7])
df = pd.DataFrame({"x" : x, "y" : y, "z":z})
res = df.groupby(['y','x'])['z'].mean().unstack()
ax = sns.heatmap(res)
ax.axis('equal')
ax.invert_yaxis()
the resulting plot
If your x and y values aren't integers you can cut them into the desired number of categories for grouping:
bins = 10
res = df.groupby([pd.cut(df.y, bins),pd.cut(df.x,bins)])['z'].mean().unstack()
An option would be to calculate two histograms, one for the complete dataframe, and one for the dataframe filtered for the True values. Then dividing the latter by the former gives the ratio, you're after.
from __future__ import division
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
x = np.random.poisson(5, size=200)
y = np.random.poisson(7, size=200)
z = np.random.choice([True, False], size=200, p=[0.3, 0.7])
df = pd.DataFrame({"x" : x, "y" : y, "z":z})
dftrue = df[df["z"] == True]
bins = np.arange(0,22)
hist, xbins, ybins = np.histogram2d(df.x, df.y, bins=bins)
histtrue, _ ,__ = np.histogram2d(dftrue.x, dftrue.y, bins=bins)
plt.imshow(histtrue/hist, cmap=plt.cm.Reds)
plt.colorbar()
plt.show()

Resources