Numpy: division by zero error, but mathematically function is apparently defined - python-3.x

I'm testing out some functions to fit with data, and one of them (in 2-D) is
f(x) = (1/(1-x)) / (1 + 1/(1-x))
Which, according to Wolfram and the Google plotters, gives you the result
f(1) = 1
I've tried to get this to work without hard coding the case
if x == 1:
return 1
but I end up with a nan and a RunTimeWarning informing me that I have indeed divided by zero.
import numpy as np
def f(x):
return 1/(1-x) / (1 + 1/(1-x))
x_range = np.linspace(0, 1, 50)
y = f(x_range)
print(y)
Is there a more elegant solution than to simply introduce a hard-coded if?

Is there a reason to keep it in this form, you can simplify it to:
def f(x):
return 1/(2-x)
Wolfram and Google probably to some sort of algebraic simplification too.

Just simplify the equation for f(x) = (1/(1-x)) / (1 + 1/(1-x)). The simplified equation will be (1/(2-x)). Now update the program as:
import numpy as np
def f(x):
return 1/(2-x)
x_range = np.linspace(0, 1, 50)
y = f(x_range)
print(y)
output:
[0.5 0.50515464 0.51041667 0.51578947 0.5212766 0.52688172
0.5326087 0.53846154 0.54444444 0.5505618 0.55681818 0.56321839
0.56976744 0.57647059 0.58333333 0.59036145 0.59756098 0.60493827
0.6125 0.62025316 0.62820513 0.63636364 0.64473684 0.65333333
0.66216216 0.67123288 0.68055556 0.69014085 0.7 0.71014493
0.72058824 0.73134328 0.74242424 0.75384615 0.765625 0.77777778
0.79032258 0.80327869 0.81666667 0.83050847 0.84482759 0.85964912
0.875 0.89090909 0.90740741 0.9245283 0.94230769 0.96078431
0.98 1. ]

Related

Is there a way to return tuple of mixed variables in Jax helper function?

On my path learning Jax, I tried to achieve something like
def f(x):
return [x + 1, [1,2,3], "Hello"]
x = 1
new_x, a_list, str = jnp.where(
x > 0,
test(x),
test(x + 1)
)
Well, Jax clearly does not support this. I tried searching online and went through quite a few docs, but I couldn't find a good answer.
Any help on how can I achieve this in Jax?
In general, JAX functions like jnp.where only accept array arguments, not list or string arguments. Since you're using a function that is not compatible with JAX in the first place, it might be better to just avoid JAX conditionals and just use standard Python conditionals instead:
import jax.numpy as jnp
def f(x):
return [x + 1, [1,2,3], "Hello"]
x = 1
new_x, a_list, str_ = f(x) if x > 0 else f(x + 1)

Trying to rule out astrology but something is wrong

I am trying to rule out a possible astrology effect on populations as a statistically insignificant effect but to no avail. I am using Pearson's Chi Square test on two distributions of sun signs from two different populations one of astronaut pilots and the other one of celebrities. Something must be wrong but I failed to find it, probably on the statistics side.
import numpy as np
import pandas as pd
import ephem
from collections import Counter, namedtuple
import matplotlib.pyplot as plt
from scipy import stats
models = pd.read_csv('models.csv', delimiter=',')
astronauts = pd.read_csv('astronauts.csv', delimiter=',')
models = models.sample(229)
astronauts = astronauts.sample(229)
sun = ephem.Sun()
def get_planet_constellation(planet, dataset):
person_planet_constellation = []
for person in dataset['Birth Date']:
planet.compute(person)
person_planet_constellation += [ephem.constellation(planet)[1]]
return person_planet_constellation
def plot_bar_group(planet, data1, data2):
fig, ax = plt.subplots()
plt.bar(data1.keys(), data1.values(), alpha=0.5)
plt.bar(data2.keys(), data2.values(), alpha=0.5)
plt.legend(['astronauts', 'models'])
ylabel = 'Percentages of ' + planet.name + ' in constellation'
ax.set_ylabel(ylabel)
title = 'Histogram of ' + planet.name + ' in constellation by group'
ax.set_title(title)
plt.show()
astronaut_sun_constellation = Counter(
get_planet_constellation(sun, astronauts))
model_sun_constellation = Counter(get_planet_constellation(sun, models))
plot_bar_group(sun, astronaut_sun_constellation, model_sun_constellation)
a = list(astronaut_sun_constellation.values())
b = list(model_sun_constellation.values())
s = np.array([a, b])
stat, p, dof, expected = stats.chi2_contingency(s)
print(stat, p, dof, expected)
prob = 0.95
critical = stats.chi2.ppf(prob, dof)
if abs(stat) >= critical:
print('Dependent (reject H0)')
else:
print('Independent (fail to reject H0)')
# interpret p-value
alpha = 1.0 - prob
if p <= alpha:
print('Dependent (reject H0)')
else:
print('Independent (fail to reject H0)')
https://www.dropbox.com/s/w7rye6m5lbihjlh/astronauts.csv
https://www.dropbox.com/s/xlxanr0pxqtxcvv/models.csv
I have eventually found the bug, it was on passing the counter as a list to the chisquare function, it must be sorted first, otherwise chisquare sees a major difference in the counters values. All astrology effects now are insignificant as expected at the level of 0.95

Python3, scipy.optimize: Fit model to multiple datas sets

I have a model which is defined as:
m(x,z) = C1*x^2*sin(z)+C2*x^3*cos(z)
I have multiple data sets for different z (z=1, z=2, z=3), in which they give me m(x,z) as a function of x.
The parameters C1 and C2 have to be the same for all z values.
So I have to fit my model to the three data sets simultaneously otherwise I will have different values of C1 and C2 for different values of z.
It this possible to do with scipy.optimize.
I can do it for just one value of z, but can't figure out how to do it for all z's.
For one z I just write this:
def my_function(x,C1,C1):
z=1
return C1*x**2*np.sin(z)+ C2*x**3*np.cos(z)
data = 'some/path/for/data/z=1'
x= data[:,0]
y= data[:,1]
from lmfit import Model
gmodel = Model(my_function)
result = gmodel.fit(y, x=x, C1=1.1)
print(result.fit_report())
How can I do it for multiple set of datas (i.e different z values?)
So what you want to do is fit a multi-dimensional fit (2-D in your case) to your data; that way for the entire data set you get a single set of C parameters that bests describes your data. I think the best way to do this is using scipy.optimize.curve_fit().
So your code would look something like this:
import scipy.optimize as optimize
import numpy as np
def my_function(xz, *par):
""" Here xz is a 2D array, so in the form [x, z] using your variables, and *par is an array of arguments (C1, C2) in your case """
x = xz[:,0]
z = xz[:,1]
return par[0] * x**2 * np.sin(z) + par[1] * x**3 * np.cos(z)
# generate fake data. You will presumable have this already
x = np.linspace(0, 10, 100)
z = np.linspace(0, 3, 100)
xx, zz = np.meshgrid(x, z)
xz = np.array([xx.flatten(), zz.flatten()]).T
fakeDataCoefficients = [4, 6.5]
fakeData = my_function(xz, *fakeDataCoefficients) + np.random.uniform(-0.5, 0.5, xx.size)
# Fit the fake data and return the set of coefficients that jointly fit the x and z
# points (and will hopefully be the same as the fakeDataCoefficients
popt, _ = optimize.curve_fit(my_function, xz, fakeData, p0=fakeDataCoefficients)
# Print the results
print(popt)
When I do this fit I get precisely the fakeDataCoefficients I used to generate the function, so the fit works well.
So the conclusion is that you don't do 3 fits independently, setting the value of z each time, but instead you do a 2D fit which takes the values of x and z simultaneously to find the best coefficients.
Your code is incomplete and has a few syntax errors.
But I think that you want to build a model that concatenates the models for the different data sets, and then fit the concatenated data to that model. Within the context of lmfit (disclosure: author and maintainer), I often find it easier to use minimize() and an objective function for multiple data set fits rather than the Model class. Perhaps start with something like this:
import lmfit
import numpy as np
# define the model function for each dataset
def my_function(x, c1, c2, z=1):
return C1*x**2*np.sin(z)+ C2*x**3*np.cos(z)
# Then write an objective function like this
def f2min(params, x, data2d, zlist):
ndata, npts = data2d.shape
residual = 0.0*data2d[:]
for i in range(ndata):
c1 = params['c1_%d' % (i+1)].value
c2 = params['c2_%d' % (i+1)].value
residual[i,:] = data[i,:] - my_function(x, c1, c2, z=zlist[i])
return residual.flatten()
# now build that `data2d`, `zlist` and build the `Parameters`
data2d = []
zlist = []
x = None
for fname in dataset_names:
d = np.loadtxt(fname) # or however you read / generate data
if x is None: x = d[:, 0]
data2d.append(d[:, 1])
zlist.append(z_for_dataset(fname)) # or however ...
data2d = np.array(data2d) # turn list into nd array
ndata, npts = data2d.shape
params = lmfit.Parameters()
for i in range(ndata):
params.add('c1_%d' % (i+1), value=1.0) # give a better starting value!
params.add('c2_%d' % (i+1), value=1.0) # give a better starting value!
# now you're ready to do the fit and print out the results:
result = lmfit.minimize(f2min, params, args=(x, data2d, zlist))
print(results.fit_report())
That code really a sketch and is all untested, but hopefully will give you a good starting foundation.

Python, scipy - How to fit a curve using a piecewise function with a conditional parameter that also needs to be calculated?

As the title suggests, I'm trying to fit a piecewise equation to a large data set. The equations I would like to fit to my data are as follows:
y(x) = b, when x < c
else:
y(x) = b + exp(a(x-c)) - 1, when x >= c
There are multiple answers to how such an issue can be addressed, but as a Python beginner I can't figure out how to apply them to my problem:
Curve fit with a piecewise function?
Conditional curve fit with scipy?
The problem is that all variables (a,b and c) have to be calculated by the fitting algorithm.
Thank you for your help!
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# Reduced Dataset
y = np.array([23.032, 21.765, 20.525, 21.856, 21.592, 20.754, 20.345, 20.534,
23.502, 21.725, 20.126, 21.381, 20.217, 21.553, 21.176, 20.976,
20.723, 20.401, 22.898, 22.02 , 21.09 , 22.543, 22.584, 22.799,
20.623, 20.529, 20.921, 22.505, 22.793, 20.845, 20.584, 22.026,
20.621, 23.316, 22.748, 20.253, 21.218, 23.422, 23.79 , 21.371,
24.318, 22.484, 24.775, 23.773, 25.623, 23.204, 25.729, 26.861,
27.268, 27.436, 29.471, 31.836, 34.034, 34.057, 35.674, 41.512,
48.249])
x = np.array([3756., 3759., 3762., 3765., 3768., 3771., 3774., 3777., 3780.,
3783., 3786., 3789., 3792., 3795., 3798., 3801., 3804., 3807.,
3810., 3813., 3816., 3819., 3822., 3825., 3828., 3831., 3834.,
3837., 3840., 3843., 3846., 3849., 3852., 3855., 3858., 3861.,
3864., 3867., 3870., 3873., 3876., 3879., 3882., 3885., 3888.,
3891., 3894., 3897., 3900., 3903., 3906., 3909., 3912., 3915.,
3918., 3921., 3924.])
# Simple exponential function without conditions (works so far)
def exponential_fit(x,a,b,c):
return b + np.exp(a*(x-c))
popt, pcov = curve_fit(exponential_fit, x, y, p0 = [0.1, 20,3800])
plt.plot(x, y, 'bo')
plt.plot(x, exponential_fit(x, *popt), 'r-')
plt.show()
You should change your function to something like
def exponential_fit(x, a, b, c):
if x >= c:
return b + np.exp(a*(x-c))-1
else:
return b
Edit: As chaosink pointed out in the comments, this approach no longer works as the the above function assumes that x is a scalar. However, curve_fit evaluates the function for array-like x. Consequently, one should use vectorised operations instead, see here for more details. To do so, one can either use
def exponential_fit(x, a, b, c):
return np.where(x >= c, b + np.exp(a*(x-c))-1, b)
or chaosink's suggestion in the comments:
def exponential_fit(x, a, b, c):
mask = (x >= c)
return mask * (b + np.exp(a*(x-c)) - 1) + ~mask * b
Both give:

Trapezoidal approximation error plotting in python

Im trying to code a function that plots the error of the composite trapezoidal rule against the step size.
Obviously this doesn't look to good since i'm just starting to learn these things.
Anyhow i managed to get the plot and everything, but i'm supposed to get a plot with slope 2, so i am in need of help to figure out where i did go wrong.
from scipy import *
from pylab import *
from matplotlib import *
def f(x): #define function to integrate
return exp(x)
a=int(input("Integrate from? ")) #input for a
b=int(input("to? ")) #inpput for b
n=1
def ctrapezoidal(f,a,b,n): #define the function ctrapezoidal
h=(b-a)/n #assign h
s=0 #clear sum1-value
for i in range(n): #create the sum of the function
s+=f(a+((i)/n)*(b-a)) #iterate th sum
I=(h/2)*(f(a)+f(b))+h*s #the function + the sum
return (I, h) #returns the approximation of the integral
val=[] #start list
stepsize=[]
error=[]
while len(val)<=2 or abs(val[-1]-val[-2])>1e-2:
I, h=ctrapezoidal(f,a,b,n)
val.append(I)
stepsize.append(h)
n+=1
for i in range(len(val)):
error.append(abs(val[i]-(e**b-e**a)))
error=np.array(error)
stepsize=np.array(stepsize)
plt.loglog(stepsize, error, basex=10, basey=10)
plt.grid(True,which="both",ls="steps")
plt.ylabel('error')
plt.xlabel('h')

Resources