Using python and networkx to find the probability density function - python-3.x

I'm struggling to draw a power law graph for Facebook Data that I found online. I'm using Networkx and I've found how to draw a Degree Histogram and a degree rank. The problem that I'm having is I want the y axis to be a probability so I'm assuming I need to sum up each y value and divide by the total number of nodes? Can anyone please help me do this? Once I've got this I'd like to draw a log-log graph to see if I can obtain a straight line. I'd really appreciate it if anyone could help! Here's my code:
import collections
import networkx as nx
import matplotlib.pyplot as plt
from networkx.algorithms import community
import math
import pylab as plt
g = nx.read_edgelist("/Users/Michael/Desktop/anaconda3/facebook_combined.txt","r")
nx.info(g)
degree_sequence = sorted([d for n, d in g.degree()], reverse=True)
degreeCount = collections.Counter(degree_sequence)
deg, cnt = zip(*degreeCount.items())
fig, ax = plt.subplots()
plt.bar(deg, cnt, width=0.80, color='b')
plt.title("Degree Histogram for Facebook Data")
plt.ylabel("Count")
plt.xlabel("Degree")
ax.set_xticks([d + 0.4 for d in deg])
ax.set_xticklabels(deg)
plt.show()
plt.loglog(degree_sequence, 'b-', marker='o')
plt.title("Degree rank plot")
plt.ylabel("Degree")
plt.xlabel("Rank")
plt.show()

You seem to be on the right tracks, but some simplifications will likely help you. The code below uses only 2 libraries.
Without access your graph, we can use some graph generators instead. I've chosen 2 qualitatively different types here, and deliberately chosen different sizes so that the normalization of the histogram is needed.
import networkx as nx
import matplotlib.pyplot as plt
g1 = nx.scale_free_graph(1000, )
g2 = nx.watts_strogatz_graph(2000, 6, p=0.8)
# we don't need to sort the values since the histogram will handle it for us
deg_g1 = nx.degree(g1).values()
deg_g2 = nx.degree(g2).values()
# there are smarter ways to choose bin locations, but since
# degrees must be discrete, we can be lazy...
max_degree = max(deg_g1 + deg_g2)
# plot different styles to see both
fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(deg_g1, bins=xrange(0, max_degree), density=True, histtype='bar', rwidth=0.8)
ax.hist(deg_g2, bins=xrange(0, max_degree), density=True, histtype='step', lw=3)
# setup the axes to be log/log scaled
ax.set_yscale('log')
ax.set_xscale('log')
ax.set_xlabel('degree')
ax.set_ylabel('relative density')
ax.legend()
plt.show()
This produces an output plot like this (both g1,g2 are randomised so won't be identical):
Here we can see that g1 has an approximately straight line decay in the degree distribution -- as expected for scale-free distributions on log-log axes. Conversely, g2 does not have a scale-free degree distribution.
To say anything more formal, you could look at the toolboxes from Aaron Clauset: http://tuvalu.santafe.edu/~aaronc/powerlaws/ which implement model fitting and statistical testing of power-law distributions.

Related

What kind of plot from matplotlib should I use?

I am programming in Python 3 and I have data structured like this:
coordinates = [(0.15,0.25),(0.35,0.25),(0.55,0.45),(0.65,0.10),(0.15,0.25)]
These are coordinates. Within each pair, the first number is the x coordinate and the second one the y coordinate. Some of the coordinates repeat themselves. I want to plot these data like this:
The coordinates that are most frequently found should appear either as higher intensity (i.e., brighter) points or as points with a different color (for example, red for very frequent coordinates and blue for very infrequent coordinates). Don't worry about the circle and semicircle. That's irrelevant. Is there a matplotlib plot that can do this? Scatter plots do not work because they do not report on the frequency with which each coordinate is found. They just create a cloud.
The answer is:
import matplotlib.pyplot as plt
from scipy.stats import kde
import numpy as np
xvalues = np.random.normal(loc=0.5,scale=0.01,size=50000)
yvalues = np.random.normal(loc=0.25,scale=0.1,size=50000)
nbins=300
k = kde.gaussian_kde([xvalues,yvalues])
xi, yi = np.mgrid[0:1:nbins*1j,0:1:nbins*1j]
zi = k(np.vstack([xi.flatten(),yi.flatten()]))
fig, ax = plt.subplots()
ax.pcolormesh(xi, yi, zi.reshape(xi.shape), shading='auto', cmap=plt.cm.hot)
x = np.arange(0.0,1.01,0.01,dtype=np.float64)
y = np.sqrt((0.5*0.5)-((x-0.5)*(x-0.5)))
ax.axis([0,1,0,0.55])
ax.set_ylabel('S', fontsize=16)
ax.set_xlabel('G', fontsize=16)
ax.tick_params(labelsize=12, width=3)
ax.plot(x,y,'w--')
plt.show()

Python: how to create a smoothed version of a 2D binned "color map"?

I would like to create a version of this 2D binned "color map" with smoothed colors.
I am not even sure this would be the correct nomenclature for the plot, but, essentially, I want my figure to be color coded by the median values of a third variable for points that reside in each defined bin of my (X, Y) space.
Even though I am able to accomplish that to a certain degree (see example), I would like to find a way to create a version of the same plot with a smoothed color gradient. That would allow me to visualize the overall behavior of my distribution.
I tried ideas described here: Smoothing 2D map in python
and here: Python: binned_statistic_2d mean calculation ignoring NaNs in data
as well as links therein, but could not find a clear solution to the problem.
This is what I have so far:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from scipy.stats import binned_statistic_2d
import random
random.seed(999)
x = np.random.normal (0,10,5000)
y = np.random.normal (0,10,5000)
z = np.random.uniform(0,10,5000)
fig = plt.figure(figsize=(20, 20))
plt.rcParams.update({'font.size': 10})
ax = fig.add_subplot(3,3,1)
ax.set_axisbelow(True)
plt.grid(b=True, lw=0.5, zorder=-1)
x_bins = np.arange(-50., 50.5, 1.)
y_bins = np.arange(-50., 50.5, 1.)
cmap = plt.cm.get_cmap('jet_r',1000) #just a colormap
ret = binned_statistic_2d(x, y, z, statistic=np.median, bins=[x_bins, y_bins]) # Bin (X, Y) and create a map of the medians of "Colors"
plt.imshow(ret.statistic.T, origin='bottom', extent=(-50, 50, -50, 50), cmap=cmap)
plt.xlim(-40,40)
plt.ylim(-40,40)
plt.xlabel("X", fontsize=15)
plt.ylabel("Y", fontsize=15)
ax.set_yticks([-40,-30,-20,-10,0,10,20,30,40])
bounds = np.arange(2.0, 20.0, 1.0)
plt.colorbar(ticks=bounds, label="Color", fraction=0.046, pad=0.04)
# save plots
plt.savefig("Whatever_name.png", bbox_inches='tight')
Which produces the following image (from random data):
Therefore, the simple question would be: how to smooth these colors?
Thanks in advance!
PS: sorry for excessive coding, but I believe a clear visualization is crucial for this particular problem.
Thanks to everyone who viewed this issue and tried to help!
I ended up being able to solve my own problem. In the end, it was all about image smoothing with Gaussian Kernel.
This link: Gaussian filtering a image with Nan in Python gave me the insight for the solution.
I, basically, implemented the exactly same code, but, in the end, mapped the previously known NaN pixels from the original 2D array to the resulting smoothed version. Unlike the solution from the link, my version does NOT fill NaN pixels with some value derived from the pixels around. Or, it does, but then I erase those again.
Here is the final figure produced for the example I provided:
Final code, for reference, for those who might need in the future:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from scipy.stats import binned_statistic_2d
import scipy.stats as st
import scipy.ndimage
import scipy as sp
import random
random.seed(999)
x = np.random.normal (0,10,5000)
y = np.random.normal (0,10,5000)
z = np.random.uniform(0,10,5000)
fig = plt.figure(figsize=(20, 20))
plt.rcParams.update({'font.size': 10})
ax = fig.add_subplot(3,3,1)
ax.set_axisbelow(True)
plt.grid(b=True, lw=0.5, zorder=-1)
x_bins = np.arange(-50., 50.5, 1.)
y_bins = np.arange(-50., 50.5, 1.)
cmap = plt.cm.get_cmap('jet_r',1000) #just a colormap
ret = binned_statistic_2d(x, y, z, statistic=np.median, bins=[x_bins, y_bins]) # Bin (X, Y) and create a map of the medians of "Colors"
sigma=1 # standard deviation for Gaussian kernel
truncate=5.0 # truncate filter at this many sigmas
U = ret.statistic.T.copy()
V=U.copy()
V[np.isnan(U)]=0
VV=sp.ndimage.gaussian_filter(V,sigma=sigma)
W=0*U.copy()+1
W[np.isnan(U)]=0
WW=sp.ndimage.gaussian_filter(W,sigma=sigma)
np.seterr(divide='ignore', invalid='ignore')
Z=VV/WW
for i in range(len(Z)):
for j in range(len(Z[0])):
if np.isnan(U[i][j]):
Z[i][j] = np.nan
plt.imshow(Z, origin='bottom', extent=(-50, 50, -50, 50), cmap=cmap)
plt.xlim(-40,40)
plt.ylim(-40,40)
plt.xlabel("X", fontsize=15)
plt.ylabel("Y", fontsize=15)
ax.set_yticks([-40,-30,-20,-10,0,10,20,30,40])
bounds = np.arange(2.0, 20.0, 1.0)
plt.colorbar(ticks=bounds, label="Color", fraction=0.046, pad=0.04)
# save plots
plt.savefig("Whatever_name.png", bbox_inches='tight')

I want to generate a mesh from a point cloud in Python

I have a point cloud from different parts of the human body, like an eye and I want to do a mesh. I tried to use Mayavi and Delaunay but I don't get a good mesh. The points of the cloud are in total disorder.
I have my point cloud in .npz file
Using Mayavi
Then I want to save my model in an obj or stl file, but first I want to generate the mesh.
What do you recommend me to use, do I need a special library?
You can use pyvista to do the 3D interpolation. You need however to manually play with the alpha parameter that controls the distance under which two points are linked.
import numpy as np
import pyvista as pv
# points is a 3D numpy array (n_points, 3) coordinates of a sphere
cloud = pv.PolyData(points)
cloud.plot()
volume = cloud.delaunay_3d(alpha=2.)
shell = volume.extract_geometry()
shell.plot()
Data
Let's use the capitals of Europe. We read them in from Excel with Pandas:
import pandas as pd
dg0 = pd.read_excel('psc_StaedteEuropa_coord.xlsx') # ,header=None
dg0.head()
City Inhabit xK yK
0 Andorra 24574.0 42.506939 1.521247
1 Athen 664046.0 37.984149 23.727984
2 Belgrad 1373651.0 44.817813 20.456897
3 Berlin 3538652.0 52.517037 13.388860
4 Bern 122658.0 46.948271 7.451451
Grid by triangulation
We use Scipy for that. For a 3-dim example see HERE and HERE or here (CGAL has a Python wrapper)
import numpy as np
from scipy.spatial import Delaunay
yk, xk, city = np.array(dg0['xK']), np.array(dg0['yK']), np.array(dg0['City'])
X1 = np.vstack((xk,yk)).T
tri = Delaunay(X1)
Graphics
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
#--- grafics -------
figX = 25; figY = 18
fig1 = plt.figure(figsize=(figX, figY), facecolor='white')
myProjection = ccrs.PlateCarree()
ax = plt.axes(projection=myProjection)
ax.stock_img()
ax.set_extent([-25, 40, 35, 65], crs=myProjection)
plt.triplot(X1[:,0], X1[:,1], tri.simplices.copy(), color='r', linestyle='-',lw=2)
plt.plot(X1[:,0], X1[:,1], 's', color='w')
plt.scatter(xk,yk,s=1000,c='w')
for i, txt in enumerate(city):
ax.annotate(txt, (X1[i,0], X1[i,1]), color='k', fontweight='bold')
plt.savefig('Europe_A.png')
plt.show()
If your points are "are in total disorder", and if you want to generate a mesh, then you need some interpolation from the cloud of points to the somehow structured grid points of the mesh..
In the 2-dimensional case matplotlib's triangulation can be a help:
matplotlib's triangulation 2dim.
In the 3-dimensional case there are 2 options. Depending on the data, you might want to interpolate them to a 3-dimensional surface. Then matplotlib's trisurf3d can be a help.
If you need a 3-dimensional volume grid then you have probably to look for a FEM (finite element) grid, e.g. FEnics
An example of interpolating a 3-dimensional field with scipy for contouring can be found here
Have you tried this example? https://docs.enthought.com/mayavi/mayavi/auto/example_surface_from_irregular_data.html
The relevant part is here
# Visualize the points
pts = mlab.points3d(x, y, z, z, scale_mode='none', scale_factor=0.2)
# Create and visualize the mesh
mesh = mlab.pipeline.delaunay2d(pts)
surf = mlab.pipeline.surface(mesh)

How to project certain values from the graph on the axis in Python?

I am trying to plot a normal distribution curve in Python using matplotlib. I followed the accepted answer in the post python pylab plot normal distribution in order to generate the graph.
I would like to know if there is a way of projecting the mu - 3*sigma, mu + 3*sigma and the mean values on both the x-axis and y-axis.
Thanks
EDIT 1
Image for explaining projection
example_image.
In the image, I am trying to project the mean value on x and y-axis. I would like to know if there is a way I can achieve this along with obtaining the values (the blue circles on x and y-axis) on x and y-axis.
The following script shows how to achieve what you want:
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
mu = 2
variance = 9
sigma = np.sqrt(variance)
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 500)
y = stats.norm.pdf(x, mu, sigma)
fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_xlim([min(x), max(x)])
ax.set_ylim([min(y), max(y)+0.02])
ax.hlines(y=max(y), xmin=min(x), xmax=mu, color='r')
ax.vlines(x=mu, ymin=min(y), ymax=max(y), color='r')
plt.show()
The produced plot is
If you are familiar with the properties of normal distribution, it is easy to know intersection with x axis is just mu, i.e., the distribution mean. Intersection with y axis is just the maximum value of y, i.e, max(y) in the code.

All Matplotlib points appearing at bottom of graph, regardless of y-value

I'm following this linear regression tutorial. Here's my code:
import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt
dataframe = pd.read_fwf('brain_body.txt')
x_values = dataframe[['Brain']]
y_values = dataframe[['Body']]
body_reg = linear_model.LinearRegression()
body_reg.fit(x_values, y_values)
plt.scatter(x_values, y_values)
plt.plot(x_values, body_reg.predict(x_values))
plt.show()
When I run the script, I get no errors, but the graph doesn't seem to account for the y-values. I reduced the data points to three so it's easier to see:
I tried to manually change the y-axis with plt.ylim([-1000,7000]) but no luck.
Thanks for any suggestions!
There's nothing wrong with the code, it's just that you have a few very extreme values in relation to the rest of your data. Matplotlib expands the graph to show the extreme values, but that ends up in bunching all the others. Broadening your ylim will only increase the effect - try a much smaller ylim and xlim instead:
plt.ylim([0, 20])
plt.xlim([0, 2])

Resources