Python3, scipy.optimize: Fit model to multiple datas sets - python-3.x

I have a model which is defined as:
m(x,z) = C1*x^2*sin(z)+C2*x^3*cos(z)
I have multiple data sets for different z (z=1, z=2, z=3), in which they give me m(x,z) as a function of x.
The parameters C1 and C2 have to be the same for all z values.
So I have to fit my model to the three data sets simultaneously otherwise I will have different values of C1 and C2 for different values of z.
It this possible to do with scipy.optimize.
I can do it for just one value of z, but can't figure out how to do it for all z's.
For one z I just write this:
def my_function(x,C1,C1):
z=1
return C1*x**2*np.sin(z)+ C2*x**3*np.cos(z)
data = 'some/path/for/data/z=1'
x= data[:,0]
y= data[:,1]
from lmfit import Model
gmodel = Model(my_function)
result = gmodel.fit(y, x=x, C1=1.1)
print(result.fit_report())
How can I do it for multiple set of datas (i.e different z values?)

So what you want to do is fit a multi-dimensional fit (2-D in your case) to your data; that way for the entire data set you get a single set of C parameters that bests describes your data. I think the best way to do this is using scipy.optimize.curve_fit().
So your code would look something like this:
import scipy.optimize as optimize
import numpy as np
def my_function(xz, *par):
""" Here xz is a 2D array, so in the form [x, z] using your variables, and *par is an array of arguments (C1, C2) in your case """
x = xz[:,0]
z = xz[:,1]
return par[0] * x**2 * np.sin(z) + par[1] * x**3 * np.cos(z)
# generate fake data. You will presumable have this already
x = np.linspace(0, 10, 100)
z = np.linspace(0, 3, 100)
xx, zz = np.meshgrid(x, z)
xz = np.array([xx.flatten(), zz.flatten()]).T
fakeDataCoefficients = [4, 6.5]
fakeData = my_function(xz, *fakeDataCoefficients) + np.random.uniform(-0.5, 0.5, xx.size)
# Fit the fake data and return the set of coefficients that jointly fit the x and z
# points (and will hopefully be the same as the fakeDataCoefficients
popt, _ = optimize.curve_fit(my_function, xz, fakeData, p0=fakeDataCoefficients)
# Print the results
print(popt)
When I do this fit I get precisely the fakeDataCoefficients I used to generate the function, so the fit works well.
So the conclusion is that you don't do 3 fits independently, setting the value of z each time, but instead you do a 2D fit which takes the values of x and z simultaneously to find the best coefficients.

Your code is incomplete and has a few syntax errors.
But I think that you want to build a model that concatenates the models for the different data sets, and then fit the concatenated data to that model. Within the context of lmfit (disclosure: author and maintainer), I often find it easier to use minimize() and an objective function for multiple data set fits rather than the Model class. Perhaps start with something like this:
import lmfit
import numpy as np
# define the model function for each dataset
def my_function(x, c1, c2, z=1):
return C1*x**2*np.sin(z)+ C2*x**3*np.cos(z)
# Then write an objective function like this
def f2min(params, x, data2d, zlist):
ndata, npts = data2d.shape
residual = 0.0*data2d[:]
for i in range(ndata):
c1 = params['c1_%d' % (i+1)].value
c2 = params['c2_%d' % (i+1)].value
residual[i,:] = data[i,:] - my_function(x, c1, c2, z=zlist[i])
return residual.flatten()
# now build that `data2d`, `zlist` and build the `Parameters`
data2d = []
zlist = []
x = None
for fname in dataset_names:
d = np.loadtxt(fname) # or however you read / generate data
if x is None: x = d[:, 0]
data2d.append(d[:, 1])
zlist.append(z_for_dataset(fname)) # or however ...
data2d = np.array(data2d) # turn list into nd array
ndata, npts = data2d.shape
params = lmfit.Parameters()
for i in range(ndata):
params.add('c1_%d' % (i+1), value=1.0) # give a better starting value!
params.add('c2_%d' % (i+1), value=1.0) # give a better starting value!
# now you're ready to do the fit and print out the results:
result = lmfit.minimize(f2min, params, args=(x, data2d, zlist))
print(results.fit_report())
That code really a sketch and is all untested, but hopefully will give you a good starting foundation.

Related

Function to Convert Square Matrix to Upper Hessenberg with Similarity Transformations

I am attempting to translate a MATLAB function to Python from Timothy Sauer,
Numerical Analysis Second Edition, page 546, Program 12.8. The original function
receives a square matrix and returns a matrix with the same eigenvalues but in
Upper Hessenberg form. The original function creates Householder reflectors to produce zeros in the
offdiagonals of the matrix and performs similarity transformations on the original matrix to
get it to upper hessenberg form.
My Python translation succeeds only in obtaining the eigenvalues for 3x3 matrices
but not for 4x4 matrices. Would anyone know the cause of the error? I pasted my code with success and failing cases below. Thank you.
import numpy as np
import math
norm = lambda v:math.sqrt(np.sum(v**2))
def upper_hessenberg(A):
'''
Translated from Timothy Sauer, Numerical Analysis Second Edition, page 546, Program 12.8
Input: Square Matrix, A
Output: B, a Similar Matrix with Same Eigenvalues as A except in Upper Hessenberg form
V, a matrix containing the reflectors used to produce zeros in the off diagonals
'''
rows, columns = A.shape
B = A[:,:].astype(np.float) #will store the similar matrix
V = np.zeros(shape=(rows,columns),dtype=float) #will store the reflectors
for column in range(columns-2): #start from the 1st column end at the third to last column
row = column
x = B[row+1: ,column] #decapitate the column
reflection_of_x = np.zeros(len(x)) #first entry is the norm, followed by 0s
if abs(norm(x)) <= np.finfo(float).eps: #if there are already 0s inthe offdiagonals skip this column
continue
reflection_of_x[0] = norm(x)
v = reflection_of_x - x # v, (the difference vector) represents the line connecting the original column to the reflection of the column (see Timothy Sauer Num Analysis 2nd Edition Figure 4.11 Householder reflector)
v = v/norm(v) #normalize to length of 1 (unit vector)
V[:len(v), column] = v #save the reflector in an upper triangular matrix called V
#verify with x-2*(x # v * v) should equal a vector with all zeros except the leading entry
column_projections = np.outer(v , v # B[row+1:, column:]) #project each col onto difference vector
B[row+1:, column:] = B[row+1:, column:] - (2 * column_projections)
row_projections = np.outer(v, B[row:, column + 1:] # v).T #project each row onto difference vector
B[row:, column + 1:] = B[row:, column + 1:] - (2 * row_projections)
return V, B
# Algorithm succeeds only with 3x3 matrices
eigvectors = np.array([
[1,3,2],
[4,5,6],
[7,8,9],
])
eigvalues = np.array([
[4,0,0],
[0,3,0],
[0,0,2]
])
M = eigvectors # eigvalues # np.linalg.inv(eigvectors)
print("The expected eigvals :", np.linalg.eigvals(M))
V,B = upper_hessenberg(M)
print("For 3x3 matrices, The function successfully produces these eigvals",np.linalg.eigvals(B))
#But with 4x4 matrices it fails
eigvectors = np.array([
[1,3,2,4],
[4,5,6,2],
[7,8,9,5],
[5,2,7,8]
])
eigvalues = np.array([
[4,0,0,0],
[0,3,0,0],
[0,0,2,0],
[0,0,0,1]
])
M = eigvectors # eigvalues # np.linalg.inv(eigvectors)
print("The expected eigvals :", np.linalg.eigvals(M))
V,B = upper_hessenberg(M)
print("For 4x4 matrices, The function fails to obtain correct eigvals",np.linalg.eigvals(B))
Your error is that you try to be too efficient. While the last rows are indeed increasingly reduced with leading zeros, this is not the case for the last columns. So in row_projections you need to remove the limiter row:, change to B[:, column + 1:].
You are using the unstable variant of the "improved" Householder reflector. The older version would use the larger of x_refl - x and x_refl + x by setting reflection_of_x[0] = -np.sign(x[0])*norm(x) (or remove all minus signs there).
The stable variant of the improved reflector would use the binomial trick in the normalization of x_refl - x if this difference becomes too small.
x_refl - x = [ norm(x) - x[0], - x[1:] ]
= [ norm(x[1:])^2/(norm(x) + x[0]), - x[1:] ]
(x_refl - x)/norm(x_refl - x)
[ norm(x[1:]), - (norm(x)+x[0])*(x[1:]/norm(x[1:])) ]
= -----------------------------------------------------
sqrt(2*norm(x)*(norm(x)+x[0]))
While the parts may have wildly different scales, no catastrophic cancellation happens for x[0]>0.
See the discussion about the same algorithm from Golub/van Loan 4th ed. in for further details and opinions and the code from that book.

How can I interpolate values from two lists (in Python)?

I am relatively new to coding in Python. I have mainly used MatLab in the past and am used to having vectors that can be referenced explicitly rather than appended lists. I have a script where I generate a list of x- and y- (z-, v-, etc) values. Later, I want to interpolate and then print a table of the values at specified points. Here is a MWE. The problem is at line 48:
yq = interp1d(x_list, y_list, xq(nn))#interp1(output1(:,1),output1(:,2),xq(nn))
I'm not sure I have the correct syntax for the last two lines either:
table[nn] = ('%.2f' %xq, '%.2f' %yq)
print(table)
Here is the full script for the MWE:
#This script was written to test how to interpolate after data was created in a loop and stored as a list. Can a list be accessed explicitly like a vector in matlab?
#
from scipy.interpolate import interp1d
from math import * #for ceil
from astropy.table import Table #for Table
import numpy as np
# define the initial conditions
x = 0 # initial x position
y = 0 # initial y position
Rmax = 10 # maxium range
""" initializing variables for plots"""
x_list = [x]
y_list = [y]
""" define functions"""
# not necessary for this MWE
"""create sample data for MWE"""
# x and y data are calculated using functions and appended to their respective lists
h = 1
t = 0
tf = 10
N=ceil(tf/h)
# Example of interpolation without a loop: https://docs.scipy.org/doc/scipy/tutorial/interpolate.html#d-interpolation-interp1d
#x = np.linspace(0, 10, num=11, endpoint=True)
#y = np.cos(-x**2/9.0)
#f = interp1d(x, y)
for i in range(N):
x = h*i
y = cos(-x**2/9.0)
""" appends selected data for ability to plot"""
x_list.append(x)
y_list.append(y)
## Interpolation after x- and y-lists are already created
intervals = 0.5
nfinal = ceil(Rmax/intervals)
NN = nfinal+1 # length of table
dtype = [('Range (units?)', 'f8'), ('Drop? (units)', 'f8')]
table = Table(data=np.zeros(N, dtype=dtype))
for nn in range(NN):#for nn = 1:NN
xq = 0.0 + (nn-1)*intervals #0.0 + (nn-1)*intervals
yq = interp1d(x_list, y_list, xq(nn))#interp1(output1(:,1),output1(:,2),xq(nn))
table[nn] = ('%.2f' %xq, '%.2f' %yq)
print(table)
Your help and patience will be greatly appreciated!
Best regards,
Alex
Your code has some glaring issues that made it really difficult to understand. Let's first take a look at some things I needed to fix:
for i in range(N):
x = h*1
y = cos(-x**2/9.0)
""" appends selected data for ability to plot"""
x_list.append(x)
y_list.append(y)
You are appending a single value without modifying it. What I presume you wanted is down below.
intervals = 0.5
nfinal = ceil(Rmax/intervals)
NN = nfinal+1 # length of table
dtype = [('Range (units?)', 'f8'), ('Drop? (units)', 'f8')]
table = Table(data=np.zeros(N, dtype=dtype))
for nn in range(NN):#for nn = 1:NN
xq = 0.0 + (nn-1)*intervals #0.0 + (nn-1)*intervals
yq = interp1d(x_list, y_list, xq(nn))#interp1(output1(:,1),output1(:,2),xq(nn))
table[nn] = ('%.2f' %xq, '%.2f' %yq)
This is where things get strange. First: use pandas tables, this is the more popular choice. Second: I have no idea what you are trying to loop over. What I presume you wanted was to vary the number of points for the interpolation, which I have done so below. Third: you are trying to interpolate a point, when you probably want to interpolate over a range of points (...interpolation). Lastly, you are using the interp1d function incorrectly. Please take a look at the code below or run it here; let me know what you exactly wanted (specifically: what should xq / xq(nn) be?), because the MRE you provided is quite confusing.
from scipy.interpolate import interp1d
from math import *
import numpy as np
Rmax = 10
h = 1
t = 0
tf = 10
N = ceil(tf/h)
x = np.arange(0,N+1)
y = np.cos(-x**2/9.0)
interval = 0.5
NN = ceil(Rmax/interval) + 1
ip_list = np.arange(1,interval*NN,interval)
xtable = []
ytable = []
for i,nn in enumerate(ip_list):
f = interp1d(x,y)
x_i = np.arange(0,nn+interval,interval)
xtable += [x_i]
ytable += [f(x_i)]
[print(i) for i in xtable]
[print(i) for i in ytable]

Sort simmilarity matrix according to plot colors

I have this similarity matrix plot of some documents. I want to sort the values of the matrix, which is a numpynd array, to group colors, while maintaining their relative position (diagonal yellow line), and labels as well.
path = "C:\\Users\\user\\Desktop\\texts\\dataset"
text_files = os.listdir(path)
#print (text_files)
tfidf_vectorizer = TfidfVectorizer()
documents = [open(f, encoding="utf-8").read() for f in text_files if f.endswith('.txt')]
sparse_matrix = tfidf_vectorizer.fit_transform(documents)
labels = []
for f in text_files:
if f.endswith('.txt'):
labels.append(f)
pairwise_similarity = sparse_matrix * sparse_matrix.T
pairwise_similarity_array = pairwise_similarity.toarray()
fig, ax = plt.subplots(figsize=(20,20))
cax = ax.matshow(pairwise_similarity_array, interpolation='spline16')
ax.grid(True)
plt.title('News articles similarity matrix')
plt.xticks(range(23), labels, rotation=90);
plt.yticks(range(23), labels);
fig.colorbar(cax, ticks=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1])
plt.show()
Here is one possibility.
The idea is to use the information in the similarity matrix and put elements next to each other if they are similar. If two items are similar they should also be similar with respect to other elements ie have similar colors.
I start with the element which has the most in common with all other elements (this choice is a bit arbitrary) [a] and as next element I choose from the remaining elements the one which is closest to the current [b].
import numpy as np
import matplotlib.pyplot as plt
def create_dummy_sim_mat(n):
sm = np.random.random((n, n))
sm = (sm + sm.T) / 2
sm[range(n), range(n)] = 1
return sm
def argsort_sim_mat(sm):
idx = [np.argmax(np.sum(sm, axis=1))] # a
for i in range(1, len(sm)):
sm_i = sm[idx[-1]].copy()
sm_i[idx] = -1
idx.append(np.argmax(sm_i)) # b
return np.array(idx)
n = 10
sim_mat = create_dummy_sim_mat(n=n)
idx = argsort_sim_mat(sim_mat)
sim_mat2 = sim_mat[idx, :][:, idx] # apply reordering for rows and columns
# Plot results
fig, ax = plt.subplots(1, 2)
ax[0].imshow(sim_mat)
ax[1].imshow(sim_mat2)
def ticks(_ax, ti, la):
_ax.set_xticks(ti)
_ax.set_yticks(ti)
_ax.set_xticklabels(la)
_ax.set_yticklabels(la)
ticks(_ax=ax[0], ti=range(n), la=range(n))
ticks(_ax=ax[1], ti=range(n), la=idx)
After meTchaikovsky's answer I also tested my idea on a clustered similarity matrix (see first image) this method works but is not perfect (see second image).
Because I use the similarity between two elements as approximation to their similarity to all other elements, it is quite clear why this does not work perfectly.
So instead of using the initial similarity to sort the elements one could calculate a second order similarity matrix which measures how similar the similarities are (sorry).
This measure describes better what you are interested in. If two rows / columns have similar colors they should be close to each other. The algorithm to sort the matrix is the same as before
def add_cluster(sm, c=3):
idx_cluster = np.array_split(np.random.permutation(np.arange(len(sm))), c)
for ic in idx_cluster:
cluster_noise = np.random.uniform(0.9, 1.0, (len(ic),)*2)
sm[ic[np.newaxis, :], ic[:, np.newaxis]] = cluster_noise
def get_sim_mat2(sm):
return 1 / (np.linalg.norm(sm[:, np.newaxis] - sm[np.newaxis], axis=-1) + 1/n)
sim_mat = create_dummy_sim_mat(n=100)
add_cluster(sim_mat, c=4)
sim_mat2 = get_sim_mat2(sim_mat)
idx = argsort_sim_mat(sim_mat)
idx2 = argsort_sim_mat(sim_mat2)
sim_mat_sorted = sim_mat[idx, :][:, idx]
sim_mat_sorted2 = sim_mat[idx2, :][:, idx2]
# Plot results
fig, ax = plt.subplots(1, 3)
ax[0].imshow(sim_mat)
ax[1].imshow(sim_mat_sorted)
ax[2].imshow(sim_mat_sorted2)
The results with this second method are quite good (see third image)
but I guess there exist cases where this approach also fails, so I would be happy about feedback.
Edit
I tried to explain it and did also link the ideas to the code with [a] and [b], but obviously I did not do a good job, so here is a second more verbose explanation.
You have n elements and a n x n similarity matrix sm where each cell (i, j) describes how similar element i is to element j. The goal is to order the rows / columns in such a way that one can see existing patterns in the similarity matrix. My idea to achieve this is really simple.
You start with an empty list and add elements one by one. The criterion for the next element is the similarity to the current element. If element i was added in the last step, I chose the element argmax(sm[i, :]) as next, ignoring the elements already added to the list. I ignore the elements by setting the values of those elements to -1.
You can use the function ticks to reorder the labels:
labels = np.array(labels) # make labels an numpy array, to index it with a list
ticks(_ax=ax[0], ti=range(n), la=labels[idx])
#scleronomic's solution is very elegant, but it also has one shortage, which is we cannot set the number of clusters in the sorted correlation matrix. Assume we are working with a set of variables, in which some of them are weakly correlated
import string
import numpy as np
import pandas as pd
n_variables = 20
n_clusters = 10
n_samples = 100
np.random.seed(100)
names = list(string.ascii_lowercase)[:n_variables]
belongs_to_cluster = np.random.randint(0,n_clusters,n_variables)
latent = np.random.randn(n_clusters,n_samples)
variables = np.random.rand(n_variables,n_samples)
for ind in range(n_clusters):
mask = belongs_to_cluster == ind
# weakening the correlation
if ind % 2 == 0:variables[mask] += latent[ind]*0.1
variables[mask] += latent[ind]
df = pd.DataFrame({key:val for key,val in zip(names,variables)})
corr_mat = np.array(df.corr())
As you can see, there are 10 clusters of variables by construction, however, variables within clusters that has an even index are weakly correlated. If we only want to see roughly 5 clusters in the sorted correlation matrix, maybe we need to find another way.
Based on this post, which is the accepted answer to the question "Clustering a correlation matrix", to sort a correlation matrix into blocks, what we need to find are blocks, where correlations within blocks are high and correlations between blocks are low. However, the solution provided by this accepted answer works best when we know how many blocks are there in the first place, and more importantly, the sizes of the underlying blocks are the same, or at least similar. Therefore, I improved the solution with a new function sort_corr_mat
def sort_corr_mat(corr_mat,clusters_guess):
def _swap_rows(corr_mat, var1, var2):
rs = corr_mat.copy()
rs[var2, :],rs[var1, :]= corr_mat[var1, :],corr_mat[var2, :]
cs = rs.copy()
cs[:, var2],cs[:, var1] = rs[:, var1],rs[:, var2]
return cs
# analysis
max_iter = 500
best_score,current_score,best_count = -1e8,-1e8,0
num_minimua_to_visit = 20
best_corr = corr_mat
best_ordering = np.arange(n_variables)
for i in range(max_iter):
for row1 in range(n_variables):
for row2 in range(n_variables):
if row1 == row2: continue
option_ordering = best_ordering.copy()
option_ordering[row1],option_ordering[row2] = best_ordering[row2],best_ordering[row1]
option_corr = _swap_rows(best_corr,row1,row2)
option_score = score(option_corr,n_variables,clusters_guess)
if option_score > best_score:
best_corr = option_corr
best_ordering = option_ordering
best_score = option_score
if best_score > current_score:
best_count += 1
current_corr = best_corr
current_ordering = best_ordering
current_score = best_score
if best_count >= num_minimua_to_visit:
return best_corr#,best_ordering
return best_corr#,best_ordering
With this function and the corr_mat constructed in the first place, I compared the result obtained with my function (on the right) with that obtained with #scleronomic's solution (in the middle)
sim_mat_sorted = corr_mat[argsort_sim_mat(corr_mat), :][:, argsort_sim_mat(corr_mat)]
corr_mat_sorted = sort_corr_mat(corr_mat,clusters_guess=5)
# Plot results
fig, ax = plt.subplots(1,3,figsize=(18,6))
ax[0].imshow(corr_mat)
ax[1].imshow(sim_mat_sorted)
ax[2].imshow(corr_mat_sorted)
Clearly, #scleronomic's solution works much better and faster, but my solution offers more control to the pattern of the output.

Python, scipy - How to fit a curve using a piecewise function with a conditional parameter that also needs to be calculated?

As the title suggests, I'm trying to fit a piecewise equation to a large data set. The equations I would like to fit to my data are as follows:
y(x) = b, when x < c
else:
y(x) = b + exp(a(x-c)) - 1, when x >= c
There are multiple answers to how such an issue can be addressed, but as a Python beginner I can't figure out how to apply them to my problem:
Curve fit with a piecewise function?
Conditional curve fit with scipy?
The problem is that all variables (a,b and c) have to be calculated by the fitting algorithm.
Thank you for your help!
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# Reduced Dataset
y = np.array([23.032, 21.765, 20.525, 21.856, 21.592, 20.754, 20.345, 20.534,
23.502, 21.725, 20.126, 21.381, 20.217, 21.553, 21.176, 20.976,
20.723, 20.401, 22.898, 22.02 , 21.09 , 22.543, 22.584, 22.799,
20.623, 20.529, 20.921, 22.505, 22.793, 20.845, 20.584, 22.026,
20.621, 23.316, 22.748, 20.253, 21.218, 23.422, 23.79 , 21.371,
24.318, 22.484, 24.775, 23.773, 25.623, 23.204, 25.729, 26.861,
27.268, 27.436, 29.471, 31.836, 34.034, 34.057, 35.674, 41.512,
48.249])
x = np.array([3756., 3759., 3762., 3765., 3768., 3771., 3774., 3777., 3780.,
3783., 3786., 3789., 3792., 3795., 3798., 3801., 3804., 3807.,
3810., 3813., 3816., 3819., 3822., 3825., 3828., 3831., 3834.,
3837., 3840., 3843., 3846., 3849., 3852., 3855., 3858., 3861.,
3864., 3867., 3870., 3873., 3876., 3879., 3882., 3885., 3888.,
3891., 3894., 3897., 3900., 3903., 3906., 3909., 3912., 3915.,
3918., 3921., 3924.])
# Simple exponential function without conditions (works so far)
def exponential_fit(x,a,b,c):
return b + np.exp(a*(x-c))
popt, pcov = curve_fit(exponential_fit, x, y, p0 = [0.1, 20,3800])
plt.plot(x, y, 'bo')
plt.plot(x, exponential_fit(x, *popt), 'r-')
plt.show()
You should change your function to something like
def exponential_fit(x, a, b, c):
if x >= c:
return b + np.exp(a*(x-c))-1
else:
return b
Edit: As chaosink pointed out in the comments, this approach no longer works as the the above function assumes that x is a scalar. However, curve_fit evaluates the function for array-like x. Consequently, one should use vectorised operations instead, see here for more details. To do so, one can either use
def exponential_fit(x, a, b, c):
return np.where(x >= c, b + np.exp(a*(x-c))-1, b)
or chaosink's suggestion in the comments:
def exponential_fit(x, a, b, c):
mask = (x >= c)
return mask * (b + np.exp(a*(x-c)) - 1) + ~mask * b
Both give:

Finding conditional mutual information from 3 discrete variable

I am trying to find conditional mutual information between three discrete random variable using pyitlib package for python with the help of the formula:
I(X;Y|Z)=H(X|Z)+H(Y|Z)-H(X,Y|Z)
The expected Conditional Mutual information value is= 0.011
My 1st code:
import numpy as np
from pyitlib import discrete_random_variable as drv
X=[0,1,1,0,1,0,1,0,0,1,0,0]
Y=[0,1,1,0,0,0,1,0,0,1,1,0]
Z=[1,0,0,1,1,0,0,1,1,0,0,1]
a=drv.entropy_conditional(X,Z)
##print(a)
b=drv.entropy_conditional(Y,Z)
##print(b)
c=drv.entropy_conditional(X,Y,Z)
##print(c)
p=a+b-c
print(p)
The answer i am getting here is=0.4632245116328402
My 2nd code:
import numpy as np
from pyitlib import discrete_random_variable as drv
X=[0,1,1,0,1,0,1,0,0,1,0,0]
Y=[0,1,1,0,0,0,1,0,0,1,1,0]
Z=[1,0,0,1,1,0,0,1,1,0,0,1]
a=drv.information_mutual_conditional(X,Y,Z)
print(a)
The answer i am getting here is=0.1583445441575102
While the expected result is=0.011
Can anybody help? I am in big trouble right now. Any kind of help will be appreciable.
Thanks in advance.
I think that the library function entropy_conditional(x,y,z) has some errors. I also test my samples, the same problem happens.
however, the function entropy_conditional with two variables is ok.
So I code my entropy_conditional(x,y,z) as entropy(x,y,z), the results is correct.
the code may be not beautiful.
def gen_dict(x):
dict_z = {}
for key in x:
dict_z[key] = dict_z.get(key, 0) + 1
return dict_z
def entropy(x,y,z):
x = np.array([x,y,z]).T
x = x[x[:,-1].argsort()] # sorted by the last column
w = x[:,-3]
y = x[:,-2]
z = x[:,-1]
# dict_w = gen_dict(w)
# dict_y = gen_dict(y)
dict_z = gen_dict(z)
list_z = [dict_z[i] for i in set(z)]
p_z = np.array(list_z)/sum(list_z)
pos = 0
ent = 0
for i in range(len(list_z)):
w = x[pos:pos+list_z[i],-3]
y = x[pos:pos+list_z[i],-2]
z = x[pos:pos+list_z[i],-1]
pos += list_z[i]
list_wy = np.zeros((len(set(w)),len(set(y))), dtype = float , order ="C")
list_w = list(set(w))
list_y = list(set(y))
for j in range(len(w)):
pos_w = list_w.index(w[j])
pos_y = list_y.index(y[j])
list_wy[pos_w,pos_y] += 1
#print(pos_w)
#print(pos_y)
list_p = list_wy.flatten()
list_p = np.array([k for k in list_p if k>0]/sum(list_p))
ent_t = 0
for j in list_p:
ent_t += -j * math.log2(j)
#print(ent_t)
ent += p_z[i]* ent_t
return ent
X=[0,1,1,0,1,0,1,0,0,1,0,0]
Y=[0,1,1,0,0,0,1,0,0,1,1,0]
Z=[1,0,0,1,1,0,0,1,1,0,0,1]
a=drv.entropy_conditional(X,Z)
##print(a)
b=drv.entropy_conditional(Y,Z)
c = entropy(X, Y, Z)
p=a+b-c
print(p)
0.15834454415751043
Based on the definitions of conditional entropy, calculating in bits (i.e. base 2) I obtain H(X|Z)=0.784159, H(Y|Z)=0.325011, H(X,Y|Z) = 0.950826. Based on the definition of conditional mutual information you provide above, I obtain I(X;Y|Z)=H(X|Z)+H(Y|Z)-H(X,Y|Z)= 0.158344. Noting that pyitlib uses base 2 by default, drv.information_mutual_conditional(X,Y,Z) appears to be computing the correct result.
Note that your use of drv.entropy_conditional(X,Y,Z) in your first example to compute conditional entropy is incorrect, you can however use drv.entropy_conditional(XY,Z), where XY is a 1D array representing the joint observations about X and Y, for example XY = [2*xy[0] + xy[1] for xy in zip(X,Y)].

Resources