How do I calculate the global efficiency of graph in igraph (python)? - python-3.x

I am trying to calculate the global efficiency of a graph in igraph but I am not sure if I using the module correctly. I think there is a solution that might make a bit of sense but it is in r, and I wasn't able to decipher what they were saying.
I have tried writing the code in a networkx fashion trying to emulate the way they calculate global efficiency but I have been unsuccessful thus far. I am using igraph due to the fact that I am dealing with large graphs. Any help would be really appreciated :D
This is what I have tried:
import igraph
import pandas as pd
import numpy as np
from itertools import permutations
datasafe = pd.read_csv("b1.csv", index_col=0)
D = datasafe.values
g = igraph.Graph.Adjacency((D > 0).tolist())
g.es['weight'] = D[D.nonzero()]
def efficiency_weighted(g):
weights = g.es["weight"][:]
eff = (1.0 / np.array(g.shortest_paths_dijkstra(weights=weights)))
return eff
def global_efficiecny_weighted(g):
n=180.0
denom=n*(n-1)
g_eff = sum(efficiency_weighted(g) for u, v in permutations(g, 2))
return g_eff
global_efficiecny_weighted(g)
The error message I am getting says:- TypeError: 'Graph' object is not iterable

Assuming that you want the nodal efficiency for all nodes, then you can do this:
import numpy as np
from igraph import *
np.seterr(divide='ignore')
# Example using a random graph with 20 nodes
g = Graph.Erdos_Renyi(20,0.5)
# Assign weights on the edges. Here 1s everywhere
g.es["weight"] = np.ones(g.ecount())
def nodal_eff(g):
weights = g.es["weight"][:]
sp = (1.0 / np.array(g.shortest_paths_dijkstra(weights=weights)))
np.fill_diagonal(sp,0)
N=sp.shape[0]
ne= (1.0/(N-1)) * np.apply_along_axis(sum,0,sp)
return ne
eff = nodal_eff(g)
print(eff)
#[0.68421053 0.81578947 0.73684211 0.76315789 0.76315789 0.71052632
# 0.81578947 0.81578947 0.81578947 0.73684211 0.71052632 0.68421053
# 0.71052632 0.81578947 0.84210526 0.76315789 0.68421053 0.68421053
# 0.78947368 0.76315789]
To get the global just do:
np.mean(eff)

Related

Converting Normal Distribution to Lognormal distribution

I have been following lectures of MIT open course on Application of Mathematics in Finance. I am trying to code out the concepts for better understanding.
According to lectures(from what I understand), if random variable X is normally distributed then exp(X) is log-normally distributed and vice versa (please correct me if I am wrong here).
Here is what I tried:
I have list of integers that are normally distributed:
import numpy as np
import matplotlib.pyplot as plt
from math import sqrt
X = np.array(l)
mu = np.mean(X)
sigma = np.std(X)
count, bins, ignored = plt.hist(X, 35, density=True)
plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( - (bins - mu)**2 / (2 * sigma**2)
),linewidth=2, color='r')
plt.show()
Output:
Normally distributed curve
Now I want to get log-normal distribution from above data, here is what I have tried:
import numpy as np
import matplotlib.pyplot as plt
from math import sqrt
X = np.array(l)
ln = []
for x in X:
val = np.e**x
ln.append(val)
X_ln = np.array(ln)
X_ln = np.array(X_ln) / np.min(X_ln)
mu = np.mean(X_ln)
sigma = np.std(X_ln)
count, bins, ignored = plt.hist(X_ln, 10, density=True)
x = np.linspace(min(bins), max(bins), 10000)
pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2)) / (x * sigma * np.sqrt(2 * np.pi)))
plt.plot(x, pdf, color='r', linewidth=2)
plt.show()
Output :
Not so clean Output
I know there is a better way to do this, but I can't figure out how. Any suggestions would be highly appreciated.
Here are couple of references:
Log normal distribution in Python
MIT lecture notes(topic-1.1)
In case this is relevant, here is a list of elements I am trying to process:
List of elements
Update 1:
I have normalized X before adding values to ln. This fixed the distribution of histogram, however, I can't seem to fix to get red line to show log-normal distribution. Also the new histogram distribution is not very different from normal distribution. I can't think of any suitable reason for that.
This is the block of code I have added:
def normalize(v):
norm=np.linalg.norm(v, ord=1)
if norm==0:
norm=np.finfo(v.dtype).eps
return v/norm
X = np.array(l)
X = normalize(X)
New Output:
Slightly better result

Is there any equivalent of hyperopts lognormal in Optuna?

I am trying to use Optuna for hyperparameter tuning of my model.
I am stuck in a place where I want to define a search space having lognormal/normal distribution. It is possible in hyperopt using hp.lognormal. Is it possible to define such a space using a combination of the existing suggest_ api of Optuna?
You could perhaps make use of inverse transforms from suggest_float(..., 0, 1) (i.e. U(0, 1)) since Optuna currently doesn't provide suggest_ variants for those two distributions directly. This example might be a starting point https://gist.github.com/hvy/4ef02ee2945fe50718c71953e1d6381d
Please find the code below
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
from scipy.special import erfcinv
import optuna
def objective(trial):
# Suggest from U(0, 1) with Optuna.
x = trial.suggest_float("x", 0, 1)
# Inverse transform into normal.
y0 = norm.ppf(x, loc=0, scale=1)
# Inverse transform into lognormal.
y1 = np.exp(-np.sqrt(2) * erfcinv(2 * x))
return y0, y1
if __name__ == "__main__":
n_objectives = 2 # Normal and lognormal.
study = optuna.create_study(
sampler=optuna.samplers.RandomSampler(),
# Could be "maximize". Does not matter for this demonstration.
directions=["minimize"] * n_objectives,
)
study.optimize(objective, n_trials=10000)
fig, axs = plt.subplots(n_objectives)
for i in range(n_objectives):
axs[i].hist(list(t.values[i] for t in study.trials), bins=100)
plt.show()

Overflow error subclassing a distribution using scipy.stats.rv_continuous

In the documentation page of rv_continuous we can find a 'custom' gaussian being subclassed as follows.
from scipy.stats import rv_continuous
import numpy as np
class gaussian_gen(rv_continuous):
"Gaussian distribution"
def _pdf(self, x):
return np.exp(-x**2 / 2.) / np.sqrt(2.0 * np.pi)
gaussian = gaussian_gen(name='gaussian')
In turn, I attempted to create a class for an exponential distribution with base 2, to model some nuclear decay:
class time_dist(rv_continuous):
def _pdf(self, x):
return 2**(-x)
random_var = time_dist(name = 'decay')
This had the purpose of then calling random_var.rvs() in order to generate a randomly distributed sample of values according to the pdf I defined. However, when I run this, I receive an OverflowError, and I don't quite understand why. Initially I thought it had to do with the fact that the function was not normalized. However, I keep making changes to the _pdf definition to no avail. Is there anything wrong with the code or is this method just ill-advised for defining functions of this sort?
According to wikipedia, the pdf of an exponential distribution would be:
lambda * exp(-lambda*x) for x >= 0
0 for x < 0
So, probably the function should be changed as follows:
from scipy.stats import rv_continuous
import numpy as np
import matplotlib.pyplot as plt
class time_dist(rv_continuous):
def _pdf(self, x):
return np.log(2) * 2 ** (-x) if x >= 0 else 0
random_var = time_dist(name='decay')
plt.hist(random_var.rvs(size=500))
plt.show()

How does one save torch.nn.Sequential models in pytorch properly?

I am very well aware of loading the dictionary and then having a instance of be loaded with the old dictionary of parameters (e.g. this great question & answer). Unfortunately, when I have a torch.nn.Sequential I of course do not have a class definition for it.
So I wanted to double check, what is the proper way to do it. I believe torch.save is sufficient (so far my code has not collapsed), though these things can be more subtle than one might expect (e.g. I get a warning when I use pickle but torch.save uses it internally so it's confusing). Also, numpy has it's own save functions (e.g. see this answer) which tend to be more efficient, so there might be a subtle trade off I might be overlooking.
My test code:
# creating data and running through a nn and saving it
import torch
import torch.nn as nn
from pathlib import Path
from collections import OrderedDict
import numpy as np
import pickle
path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)
num_samples = 3
Din, Dout = 1, 1
lb, ub = -1, 1
x = torch.torch.distributions.Uniform(low=lb, high=ub).sample((num_samples, Din))
f = nn.Sequential(OrderedDict([
('f1', nn.Linear(Din,Dout)),
('out', nn.SELU())
]))
y = f(x)
# save data torch to numpy
x_np, y_np = x.detach().cpu().numpy(), y.detach().cpu().numpy()
np.savez(path / 'db', x=x_np, y=y_np)
print(x_np)
# save model
with open('db_saving_seq', 'wb') as file:
pickle.dump({'f': f}, file)
# load model
with open('db_saving_seq', 'rb') as file:
db = pickle.load(file)
f2 = db['f']
# test that it outputs the right thing
y2 = f2(x)
y_eq_y2 = y == y2
print(y_eq_y2)
db2 = {'f': f, 'x': x, 'y': y}
torch.save(db2, path / 'db_f_x_y')
print('Done')
db3 = torch.load(path / 'db_f_x_y')
f3 = db3['f']
x3 = db3['x']
y3 = db3['y']
yy3 = f3(x3)
y_eq_y3 = y == y3
print(y_eq_y3)
y_eq_yy3 = y == yy3
print(y_eq_yy3)
Related:
related question from forum: https://discuss.pytorch.org/t/how-to-save-nn-sequential-as-a-model/89117/14
As can be seen in the code torch.nn.Sequential is based on torch.nn.Module:
https://pytorch.org/docs/stable/_modules/torch/nn/modules/container.html#Sequential
So you can use
f = torch.nn.Sequential(...)
torch.save(f.state_dict(), path)
just like with any other torch.nn.Module.

Python 3: Met "ndarray is not contiguous" when construct a regression function

This code is designed for calculating a linear regression by defining a function "standRegres" which compile by ourself. Although we can do the lm by the functions in sklearn or statsmodels, here we just try to construct the function by ourself. But unfortunately, I confront error and can't conquer it. So, I'm here asking for your favor to help.
The whole code runs without any problem until the last row. If I run the last row, an Error message emerges: "ValueError: ndarray is not contiguous".
import os
import pandas as pd
import numpy as np
import pylab as pl
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
# load data
iris = load_iris()
# Define a DataFrame
df = pd.DataFrame(iris.data, columns = iris.feature_names)
# take a look
df.head()
#len(df)
# rename the column name
df.columns = ['sepal_length','sepal_width','petal_length','petal_width']
X = df[['petal_length']]
y = df['petal_width']
from numpy import *
#########################
# Define function to do matrix calculation
def standRegres(xArr,yArr):
xMat = mat(xArr); yMat = mat(yArr).T
xTx = xMat.T * xMat
if linalg.det(xTx) == 0.0:
print ("this matrix is singular, cannot do inverse!")
return NA
else :
ws = xTx.I * (xMat.T * yMat)
return ws
# test
x0 = np.ones((150,1))
x0 = pd.DataFrame(x0)
X0 = pd.concat([x0,X],axis = 1)
# test
standRegres(X0,y)
This code runs without any problem until the last row. If I run the last row, an Error message emerges: "ValueError: ndarray is not contiguous".
I dry to solve it but don't know how. Could you help me? Quite appreciate for that!
Your problem stems from using the mat function. Stick to array.
In order to use array, you'll need to use the # sign for matrix multiplication, not *. Finally, you have a line that says xTx.I, but that function isn't defined for general arrays, so we can use numpy.linalg.inv.
def standRegres(xArr,yArr):
xMat = array(xArr); yMat = array(yArr).T
xTx = xMat.T # xMat
if linalg.det(xTx) == 0.0:
print ("this matrix is singular, cannot do inverse!")
return NA
else :
ws = linalg.inv(xTx) # (xMat.T # yMat)
return ws
# test
x0 = np.ones((150,1))
x0 = pd.DataFrame(x0)
X0 = pd.concat([x0,X],axis = 1)
# test
standRegres(X0,y)
# Output: array([-0.36651405, 0.41641913])

Resources