Problems with numpy polyfit - python-3.x

For some reason, my polyfit is way way off, and I cannot figure out why that is. My scatter plot seems normal.
Scatter Plot
PolyFit Plot
How can I fix this? here is my code:
def plot(data, x_axis, y_axis, title):
x = data[0]
y = data[1]
## Plot data
plt.figure(figsize=(8,4))
plt.scatter(x, y)
idx = np.isfinite(x) & np.isfinite(y)
plt.plot(np.poly1d(np.polyfit(x[idx], y[idx], 3)))
## Format graph
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
plt.gca().xaxis.set_major_locator(mdates.YearLocator(3))
plt.gcf().autofmt_xdate()
## Define labels
plt.xlabel(x_axis)
plt.ylabel(y_axis)
plt.title(title)
## Graph data
plt.show()
If I need to link my data, then I can. There's too much of it to post here.
Inspecting
print(x[idx])
print(y[idx])
Shows the correct values and nothing seems off.
x[idx] and y[idx] plot
EDIT:
I have figured out my solution. I was not using polyfit correctly.
idx = np.isfinite(x) & np.isfinite(y)
avgTrend = np.poly1d(np.polyfit(x[idx], y[idx], 3))
plt.plot(x, avgTrend(x), color='red')
enter image description here

The problem seems to be with the degree of the polynomial. For so many data points it may be simply impossible to fit a good degree 3 polynomial. You could try a higher degree (unlikely to work in the way you want it) or you can try a spline function.
For example you could try the csaps package that implements smoothing splines and that I can recommend.
Hope this helps

Related

How to plot data like points and draw line of linear fit in python?

I have data in plotted on graph, how to draw linear fit line in graph?
I will be grateful if you can suggest solution, Thank you!
You can use inbuilt polyfit function to get linear fit value based on least square method for your data.
But I am not aware how to maintain aspect ratio of inside grid 1:1.
angle = np.polyfit(x, y, 1)
y_line = angle[1] + angle[0] * x
fig, ax = plt.subplots(figsize=(8,8))
ax.scatter(x,y)
ax.plot(x,y_line, 'r')

Modify position of colorbar so that extend triangle is above plot

So, I have to make a bunch of contourf plots for different days that need to share colorbar ranges. That was easily made but sometimes it happens that the maximum value for a given date is above the colorbar range and that changes the look of the plot in a way I dont need. The way I want it to treat it when that happens is to add the extend triangle above the "original colorbar". It's clear in the attached picture.
I need the code to run things automatically, right now I only feed the data and the color bar range and it outputs the images, so the fitting of the colorbar in the code needs to be automatic, I can't add padding in numbers because the figure sizes changes depending on the area that is being asked to be plotted.
The reason why I need this behavior is because eventually I would want to make a .gif and I can't have the colorbar to move in that short video. I need for the triangle to be added, when needed, to the top (and below) without messing with the "main" colorbar.
Thanks!
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize, BoundaryNorm
from matplotlib import cm
###############
## Finds the appropriate option for variable "extend" in fig colorbar
def find_extend(vmin, vmax, datamin, datamax):
#extend{'neither', 'both', 'min', 'max'}
if datamin >= vmin:
if datamax <= vmax:
extend="neither"
else:
extend="max"
else:
if datamax <= vmax:
extend="min"
else:
extend="both"
return extend
###########
vmin=0
vmax=30
nlevels=8
colormap=cm.get_cmap("rainbow")
### Creating data
z_1=30*abs(np.random.rand(5, 5))
z_2=37*abs(np.random.rand(5, 5))
data={1:z_1, 2:z_2}
x=range(5)
y=range(5)
## Plot
for day in [1, 2]:
fig = plt.figure(figsize=(4,4))
## Normally figsize=get_figsize(bounds) and bounds is retrieved from gdf.total_bounds
## The function creates the figure size based on the x/y ratio of the bounds
ax = fig.add_subplot(1, 1, 1)
norm=BoundaryNorm(np.linspace(vmin, vmax, nlevels+1), ncolors=colormap.N)
z=data[day]
cs=ax.contourf(x, y, z, cmap=cmap, norm=norm, vmin=vmin, vmax=vmax)
extend=find_extend(vmin, vmax, np.nanmin(z), np.nanmax(z))
fig.colorbar(cm.ScalarMappable(norm=norm, cmap=cmap), ax=ax, extend=extend)
plt.close(fig)
You can do something like this: putting a triangle on top of the colorbar manually:
fig, ax = plt.subplots()
pc = ax.pcolormesh(np.random.randn(20, 20))
cb = fig.colorbar(pc)
trixy = np.array([[0, 1], [1, 1], [0.5, 1.05]])
p = mpatches.Polygon(trixy, transform=cb.ax.transAxes,
clip_on=False, edgecolor='k', linewidth=0.7,
facecolor='m', zorder=4, snap=True)
cb.ax.add_patch(p)
plt.show()

How to fit a curve to this data using scipy curve_fit

I am hoping someone can me with where I'm going wrong with fitting a curve to this data. I am using the method in this link and so have the following code:
def sigmoid(x, L, x0, k, b):
y = L / (1 + np.exp(-k*(x-x0)))+b
return y
p0 = [max(y1), np.median(x2), 1, min(y1)]
popt, pcov = curve_fit(sigmoid, xdata=x2, ydata=y1, p0=p0, method='dogbox')
predictions = sigmoid(x2, *popt)
And my plotted "curve" looks like so:
But I am expecting a more s-shaped curve. I have experimented with different p0 values but not getting the required output (and if I'm honest I'm not sure how I'm supposed to find the ideal starting parameters).
Using p0 = [max(y1), np.median(x2), 0.4, 1] and method='trf I did get the following, which is closer but still missing the curve in the middle?
Any help greatly appreciated!
That is because your y-axis is a log scale. If you change the y-axis to a linear one, you'll see that the fit is actually quite good.

Problem in ploting multiple lines in a graph for precision recall curve

I am trying to plot precision recall curve for multiclass in one figure for this purpose I used below code
def plot_prc(y_test, y_score, N_classes):
precision = dict()
recall = dict()
average_precision = dict()
for i in range(N_classes):
precision[i], recall[i], _ = precision_recall_curve(y_test[:, i],y_score[:, i])
average_precision[i] = average_precision_score(y_test[:, i], y_score[:, i])
for i in range(N_classes):
plt.plot(recall[i], precision[i], lw=2, label='class {}'.format(i,average_precision[i] ))
#plt.plot(recall[i], precision[i], lw=2, label='class {}'.format(i))
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel("recall")
plt.ylabel("precision")
plt.legend(loc="best")
plt.title("precision vs. recall curve")
plt.show()
but i am getting multiple figures for different classes.I could not point it out what is error in my code.
I got a single line like this
for class 1 curve
but i want look like this figure
will be multiple line in a figure for all class
I will appreciate any kind of help regarding this problem.
In the second for-loop which is iterated for each class, if plt.show() is included in the loop, then the plot shows up for single class every time instead of plotting the graphs of all the classes in the same plot. So the solution for this problem should be to put the plt.show() out of the loop.
You are plotting the lines inside the for loop but your plt.show() function is also
inside the for loop, so this will result in plotting of result of last for loop cycle.
Put your plt.show() function outside the for loop.

Interpolating using a cubic function gives a negative value for probability

I have a set of data which correspond to ages (in steps of 0.1) along the x axis, and probabilities along the y axis. I'm trying to interpolate the data so I can find the maximum and a range of ages which covers 95% of the probability.
I've tried a simple interpolation using the code below, taken from the SciPy help pages, and it produces good results (I change the x and y variables to read my data), except for one feature.
from scipy.interpolate import interp1d
x = np.linspace(72, 100, num=29, endpoint=True)
y = df.iloc[:,0].values
f = interp1d(x, y)
f2 = interp1d(x, y, kind='cubic')
xnew = np.linspace(0, 10, num=41, endpoint=True)
import matplotlib.pyplot as plt
plt.plot(x, y, 'o', xnew, f(xnew), '-', xnew, f2(xnew), '--')
plt.legend(['data', 'linear', 'cubic'], loc='best')
plt.show()
The problem is, the cubic function works best, with the smoothest fit. However, it gives negative values for some parts of the probability curve, which is obviously not acceptable. Is there some way of setting a floor at y=0? I thought maybe switching to a quadratic kind would fix it, but it doesn't seem to. The linear fit does, but it's not smoothed, so is not a very good match.
I'm also not sure how to perform the second part of what I'm trying to do. It's probably very simple, but I don't know how to find the mean when I don't have a frequency table, but a grid of interpolated points which form a function. If I knew the function, I could integrate it, but I'm not sure how to do that in Python.
EDIT to include some data:
This is what my y data looks like:
array([3.41528917e-08, 7.81041275e-05, 9.60711716e-04, 5.75868934e-05,
6.50260297e-05, 2.95556411e-05, 2.37331370e-05, 9.11990619e-05,
1.08003254e-04, 4.16800419e-05, 6.63673113e-05, 2.57934035e-04,
3.42235937e-03, 5.07534495e-03, 1.76603165e-02, 1.69535370e-01,
2.67624254e-01, 4.29420872e-01, 8.25165926e-02, 2.08367339e-02,
2.01227453e-03, 1.15405995e-04, 5.40163098e-07, 1.66905537e-10,
8.31862858e-18, 4.14093219e-23, 8.32103362e-29, 5.65637769e-34,
7.93547444e-40])

Resources