Failing a simple Cosine fit in Python - python-3.x

Here's how I generate my data and the tried fit:
import matplotlib.pyplot as plt
from scipy import optimize
import numpy as np
def f(t,a,b):
return a*np.cos(b*t)
v = 0
x = 0.03
t = 0
dt = 0.001
time = []
pos = []
while t<3:
a = (-5*x)/0.1
v = v + a*dt
x = x + v*dt
time.append(t)
pos.append(x)
t = t+dt
pop, pcov = optimize.curve_fit(f,time,pos)
print(pop)
Even when I indicate initial values for the parameters (such as 0.03 for "a" and "7" for b), the resulting fit is still way off (see below, dashed line is the fit function).
Am I using the wrong library? or have I made an obvious blunder?
Thanks for any hints.

As Tyberius noted, you need to provide better initial values.
Why is that? optimize.curve_fit uses least_squares which finds a local minimum of the cost function.
I believe in your case you are stuck in such a local minimum (that is not the global minimum). If you look at your diagram, your fit is approximately y=0. (It is a bit wavy because it is a cosine)
If you were to increase a a bit the error would go up, so a stays close to zero. And if you were to increase b to fit the frequency of the data, the cost function would go up as well so that one stays low as well.
If you don't provide initial values, the parameters start at 1 each so it looks like this:
plt.plot(time, pos, 'black', label="data")
a,b = 1,1
init = [a*np.cos(b*t) for t in time]
plt.plot(time, init, 'b', label="a,b=1,1")
plt.legend()
plt.show()
a will go down and b will stay behind. I believe the scale is an additional problem. If you normalized your data to have an amplitude of 1 the humps might be more pronounced and easier to fit.
If you start with a convenient value for a, b can find its way from an initial value as low as 5:
plt.plot(time, pos, 'black', label="data")
for i in [1, 4.8, 4.9, 5]:
pop, pcov = optimize.curve_fit(f,time,pos, p0=(0.035,i))
a,b = pop
fit = [a*np.cos(b*t) for t in time]
plt.plot(time, fit, label=f"$b_0 = {i}$")
plt.legend()
plt.show()

Related

Python matplotlib fails to draw the Acnode (isolated point) on the Elliptic Curve y^2+x^3+x^2=0

I'm using the below code to draw the ECC curve y^2+x^3+x^2 =0
import numpy as np
import matplotlib.pyplot as plt
import math
def main():
fig = plt.figure()
ax = fig.add_subplot(111)
y, x = np.ogrid[-2:2:1000j, -2:2:1000j]
ax.contour(x.ravel(), y.ravel(), pow(y, 2) + pow(x, 3) + pow(x, 2) , [0],colors='red')
ax.grid()
plt.show()
if __name__ == '__main__':
main()
The output is
The expected image, however, is this
As we can see, the isolated point at (0,0) is not drawn. Any suggestions to solve this issue?
As already mentioned in the comment, it seems that a single point is not displayed as a contour. The best solution would be if the application indicates such points in some way by itself. Perhaps the library allows this, but I have not found a way and therefore show two workarounds here:
Option 1:
The isolated point at (0,0) could be marked explicitly:
ax.plot(0, 0, color="red", marker = "o", markersize = 2.5, zorder = 10)
In the case of multiple points, a masked array is a good choice, here.
Option 2:
The plot can be slightly varied around z = 0, e.g. z = 0.0002:
z = pow(y,2) + pow(x, 2) + pow(x, 3)
ax.contour(x.ravel(), y.ravel(), z, [0.0002], colors='red', zorder=10)
This will move the whole plot. Alternatively, the area around the isolated point alone could be shifted (by adding a second contour call with a small x,y grid around the isolated point at (0,0)). This does not change the rest.

Interpolating using a cubic function gives a negative value for probability

I have a set of data which correspond to ages (in steps of 0.1) along the x axis, and probabilities along the y axis. I'm trying to interpolate the data so I can find the maximum and a range of ages which covers 95% of the probability.
I've tried a simple interpolation using the code below, taken from the SciPy help pages, and it produces good results (I change the x and y variables to read my data), except for one feature.
from scipy.interpolate import interp1d
x = np.linspace(72, 100, num=29, endpoint=True)
y = df.iloc[:,0].values
f = interp1d(x, y)
f2 = interp1d(x, y, kind='cubic')
xnew = np.linspace(0, 10, num=41, endpoint=True)
import matplotlib.pyplot as plt
plt.plot(x, y, 'o', xnew, f(xnew), '-', xnew, f2(xnew), '--')
plt.legend(['data', 'linear', 'cubic'], loc='best')
plt.show()
The problem is, the cubic function works best, with the smoothest fit. However, it gives negative values for some parts of the probability curve, which is obviously not acceptable. Is there some way of setting a floor at y=0? I thought maybe switching to a quadratic kind would fix it, but it doesn't seem to. The linear fit does, but it's not smoothed, so is not a very good match.
I'm also not sure how to perform the second part of what I'm trying to do. It's probably very simple, but I don't know how to find the mean when I don't have a frequency table, but a grid of interpolated points which form a function. If I knew the function, I could integrate it, but I'm not sure how to do that in Python.
EDIT to include some data:
This is what my y data looks like:
array([3.41528917e-08, 7.81041275e-05, 9.60711716e-04, 5.75868934e-05,
6.50260297e-05, 2.95556411e-05, 2.37331370e-05, 9.11990619e-05,
1.08003254e-04, 4.16800419e-05, 6.63673113e-05, 2.57934035e-04,
3.42235937e-03, 5.07534495e-03, 1.76603165e-02, 1.69535370e-01,
2.67624254e-01, 4.29420872e-01, 8.25165926e-02, 2.08367339e-02,
2.01227453e-03, 1.15405995e-04, 5.40163098e-07, 1.66905537e-10,
8.31862858e-18, 4.14093219e-23, 8.32103362e-29, 5.65637769e-34,
7.93547444e-40])

When plotting the Wigner function of a coherent state using QuTiP strange patterns appear

I noticed something strange this day when I plotted the Wigner function of a coherent state using the open source quantum toolbox QuTiP in python.
When I do the plot I noticed these strange patterns just around the edge of the plot that are not supposed to be there. I believe it's just some sort of numerical error but I don't know how I can get rid or minimize them or most impartant: what's causing them.
Here is the code
# import packages
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib as mpl
from matplotlib import cm
from qutip import *
N = 60 # number of levels in Hilbert space
# density matrix of a coherent state
rho_coherent = coherent_dm(N, 1-1j)
X = np.linspace(-3, 3, 300)
Y = np.linspace(-3, 3, 300)
# Wigner function
W = wigner(rho_coherent, X, Y, 'iterative', 2)
X, Y = np.meshgrid(X, Y)
# Color Normalization
class MidpointNormalize(colors.Normalize):
def __init__(self, vmin=None, vmax=None, midpoint=None, clip=False):
self.midpoint = midpoint
colors.Normalize.__init__(self, vmin, vmax, clip)
def __call__(self, value, clip=None):
x, y = [self.vmin, self.midpoint, self.vmax], [0, 0.5, 1]
return np.ma.masked_array(np.interp(value, x, y))
# contour plot
plt.subplot(111, aspect='equal')
plt.contourf(X, Y, W, 100, cmap = cm.RdBu_r, norm = MidpointNormalize(midpoint=0.))
plt.show()
and here is the plot
The blue spots as you can clearly see that's around the edges are not supposed to be there! The blue spots indicate that the Wigner function is negative at that point, but a coherent state should have a Wigner function thats positive everywhere!
I also noticed that when I reduce the linspace steps from 300 to 100 the blue parts disappear.
Would appreciate very much if someone can explain what's causing this problem to appear.
This is simply due to truncation. When using a finite number of modes (in your case N=60), the Wigner function will go negative at some point.
Reducing the linspace steps brings the negative regions you see on the plot into the zero value increment and displays these regions as zero. Reducing the linspace steps is probably the best solution to your problem. Your plot will only be as accurate as the errors introduced by truncation, so simply reduce the resolution until those errors disappear.

1-D interpolation using python 3.x

I have a data that looks like a sigmoidal plot but flipped relative to the vertical line.
But the plot is a result of plotting 1D data instead of some sort of function.
My goal is to find the x value when the y value is at 50%. As you can see, there is no data point when y is exactly at 50%.
Interpolate comes to my mind. But I'm not sure if interpolate enable me to find the x value when the y value is 50%. So my question is 1) can you use interpolate to find the x when the y is 50%? or 2)do you need to fit the data to some sort of a function?
Below is what I currently have in my code
import numpy as np
import matplotlib.pyplot as plt
my_x = [4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66]
my_y_raw=np.array([0.99470977497817203, 0.99434995886145172, 0.98974611323163653, 0.961630837657524, 0.99327633558441175, 0.99338952769251909, 0.99428263292577534, 0.98690514212711611, 0.99111667721533181, 0.99149418924880861, 0.99133773062680464, 0.99143506380003499, 0.99151080464011454, 0.99268261743308517, 0.99289757252812316, 0.99100207861144063, 0.99157171773324027, 0.99112571824824358, 0.99031608691035722, 0.98978104266076905, 0.989782674787969, 0.98897835092187614, 0.98517540405423909, 0.98308943666187076, 0.96081810781994603, 0.85563541881892147, 0.61570811548079107, 0.33076276040577052, 0.14655134838124245, 0.076853147122142126, 0.035831324928136087, 0.021344669212790181])
my_y=my_y_raw/np.max(my_y_raw)
plt.plot(my_x, my_y,color='k', markersize=40)
plt.scatter(my_x,my_y,marker='*',label="myplot", color='k', edgecolor='k', linewidth=1,facecolors='none',s=50)
plt.legend(loc="lower left")
plt.xlim([4,102])
plt.show()
Using SciPy
The most straightforward way to do the interpolation is to use the SciPy interpolate.interp1d function. SciPy is closely related to NumPy and you may already have it installed. The advantage to interp1d is that it can sort the data for you. This comes at the cost of somewhat funky syntax. In many interpolation functions it is assumed that you are trying to interpolate a y value from an x value. These functions generally need the "x" values to be monotonically increasing. In your case, we swap the normal sense of x and y. The y values have an outlier as #Abhishek Mishra has pointed out. In the case of your data, you are lucky and you can get away with the the leaving the outlier in.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
my_x = [4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,
48,50,52,54,56,58,60,62,64,66]
my_y_raw=np.array([0.99470977497817203, 0.99434995886145172,
0.98974611323163653, 0.961630837657524, 0.99327633558441175,
0.99338952769251909, 0.99428263292577534, 0.98690514212711611,
0.99111667721533181, 0.99149418924880861, 0.99133773062680464,
0.99143506380003499, 0.99151080464011454, 0.99268261743308517,
0.99289757252812316, 0.99100207861144063, 0.99157171773324027,
0.99112571824824358, 0.99031608691035722, 0.98978104266076905,
0.989782674787969, 0.98897835092187614, 0.98517540405423909,
0.98308943666187076, 0.96081810781994603, 0.85563541881892147,
0.61570811548079107, 0.33076276040577052, 0.14655134838124245,
0.076853147122142126, 0.035831324928136087, 0.021344669212790181])
# set assume_sorted to have scipy automatically sort for you
f = interp1d(my_y_raw, my_x, assume_sorted = False)
xnew = f(0.5)
print('interpolated value is ', xnew)
plt.plot(my_x, my_y_raw,'x-', markersize=10)
plt.plot(xnew, 0.5, 'x', color = 'r', markersize=20)
plt.plot((0, xnew), (0.5,0.5), ':')
plt.grid(True)
plt.show()
which gives
interpolated value is 56.81214249272691
Using NumPy
Numpy also has an interp function, but it doesn't do the sort for you. And if you don't sort, you'll be sorry:
Does not check that the x-coordinate sequence xp is increasing. If xp
is not increasing, the results are nonsense.
The only way I could get np.interp to work was to shove the data in to a structured array.
import numpy as np
import matplotlib.pyplot as plt
my_x = np.array([4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,
48,50,52,54,56,58,60,62,64,66], dtype = np.float)
my_y_raw=np.array([0.99470977497817203, 0.99434995886145172,
0.98974611323163653, 0.961630837657524, 0.99327633558441175,
0.99338952769251909, 0.99428263292577534, 0.98690514212711611,
0.99111667721533181, 0.99149418924880861, 0.99133773062680464,
0.99143506380003499, 0.99151080464011454, 0.99268261743308517,
0.99289757252812316, 0.99100207861144063, 0.99157171773324027,
0.99112571824824358, 0.99031608691035722, 0.98978104266076905,
0.989782674787969, 0.98897835092187614, 0.98517540405423909,
0.98308943666187076, 0.96081810781994603, 0.85563541881892147,
0.61570811548079107, 0.33076276040577052, 0.14655134838124245,
0.076853147122142126, 0.035831324928136087, 0.021344669212790181],
dtype = np.float)
dt = np.dtype([('x', np.float), ('y', np.float)])
data = np.zeros( (len(my_x)), dtype = dt)
data['x'] = my_x
data['y'] = my_y_raw
data.sort(order = 'y') # sort data in place by y values
print('numpy interp gives ', np.interp(0.5, data['y'], data['x']))
which gives
numpy interp gives 56.81214249272691
As you said, your data looks like a flipped sigmoidal. Can we make the assumption that your function is a strictly decreasing function? If that is the case, we can try the following methods:
Remove all the points where the data is not strictly decreasing.For example, for your data that point will be near 0.
Use the binary search to find the location where y=0.5 should be put in.
Now you know two (x, y) pairs where your desired y=0.5 should lie.
You can use simple linear interpolation if (x, y) pairs are very close.
Otherwise, you can see what is the approximation of sigmoid near those pairs.
You might not need to fit any functions to your data. Simply find the following two elements:
The largest x for which y<50%
The smallest x for which y>50%
Then use interpolation and find the x*. Below is the code
my_x = np.array([4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66])
my_y=np.array([0.99470977497817203, 0.99434995886145172, 0.98974611323163653, 0.961630837657524, 0.99327633558441175, 0.99338952769251909, 0.99428263292577534, 0.98690514212711611, 0.99111667721533181, 0.99149418924880861, 0.99133773062680464, 0.99143506380003499, 0.99151080464011454, 0.99268261743308517, 0.99289757252812316, 0.99100207861144063, 0.99157171773324027, 0.99112571824824358, 0.99031608691035722, 0.98978104266076905, 0.989782674787969, 0.98897835092187614, 0.98517540405423909, 0.98308943666187076, 0.96081810781994603, 0.85563541881892147, 0.61570811548079107, 0.33076276040577052, 0.14655134838124245, 0.076853147122142126, 0.035831324928136087, 0.021344669212790181])
tempInd1 = my_y<.5 # This will only work if the values are monotonic
x1 = my_x[tempInd1][0]
y1 = my_y[tempInd1][0]
x2 = my_x[~tempInd1][-1]
y2 = my_y[~tempInd1][-1]
scipy.interp(0.5, [y1, y2], [x1, x2])

Fitting distribution functions to dataset in Python 3

I'm trying to find the find the probability distribution that better fits my data. I've tried with the code I've found in different threads, but the results are not what I'm expecting.
The descriptive statistics and histogram for my data are as follows:
Data Histogram
count 865.000000
mean 43.476713
std 12.486362
min 4.075682
25% 34.934609
50% 41.917304
75% 51.271708
max 88.843940
I tried to find a proper distribution function using the following code, but the results are not what I expected.
size = 865
kappa=99
x = scipy.arange(size)
y = scipy.int_(scipy.round_(st.vonmises.rvs(kappa,size=size)*100))
h = plt.hist(df['spreadMaizChicagoAtlantico'],bins=100,color='b')
dist_names = ['gamma', 'beta', 'rayleigh', 'norm', 'pareto']
for dist_name in dist_names:
dist = getattr(scipy.stats, dist_name)
param = dist.fit(y)
pdf_fitted = dist.pdf(x, *param[:-2], loc=param[-2], scale=param[-1]) * size
plt.plot(pdf_fitted, label=dist_name)
plt.xlim(0,100)
plt.legend(loc='upper right')
plt.show()
Data histogram with functions
Can Anyone please tell me what I'm doing wrong and guide me through a better understanding of this solutions.
Thanks to the reply from before I found my mistake.
I got all the values from the DataFrame and made a numpy array.
ser=df.values
Then I ran a similar code from before correcting the fitting of the distribution to the proper data
size = 867
x = scipy.arange(size)
y = scipy.int_(scipy.round_(scipy.stats.vonmises.rvs(5,size=size)*60))
h = plt.hist(ser, bins=range(80))
dist_names = ['beta', 'rayleigh', 'norm']
for dist_name in dist_names:
dist = getattr(scipy.stats, dist_name)
param = dist.fit(ser)
pdf_fitted = dist.pdf(x, *param[:-2], loc=param[-2], scale=param[-1]) * size
plt.plot(pdf_fitted, label=dist_name)
plt.xlim(0,100)
plt.legend(loc='upper right')
plt.show()
The result is as follows, showing the histogram and three probability density functions.
The distfit library can do this job as it searches for the best fit among 89 theoretical distributions.
pip install distfit
import numpy as np
from distfit import distfit
# Example data
X = np.random.normal(10, 3, 2000)
# Initialize
dfit = distfit()
# Search for best theoretical fit on your empirical data
dfit.fit_transform(X)
# The plot function will now also include the predictions of y
dfit.plot(chart='PDF',
emp_properties={'linewidth': 4, 'color': 'k'},
bar_properties={'edgecolor':'k', 'color':'g'},
pdf_properties={'linewidth': 4, 'color': 'r'})

Resources