1-D interpolation using python 3.x - python-3.x

I have a data that looks like a sigmoidal plot but flipped relative to the vertical line.
But the plot is a result of plotting 1D data instead of some sort of function.
My goal is to find the x value when the y value is at 50%. As you can see, there is no data point when y is exactly at 50%.
Interpolate comes to my mind. But I'm not sure if interpolate enable me to find the x value when the y value is 50%. So my question is 1) can you use interpolate to find the x when the y is 50%? or 2)do you need to fit the data to some sort of a function?
Below is what I currently have in my code
import numpy as np
import matplotlib.pyplot as plt
my_x = [4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66]
my_y_raw=np.array([0.99470977497817203, 0.99434995886145172, 0.98974611323163653, 0.961630837657524, 0.99327633558441175, 0.99338952769251909, 0.99428263292577534, 0.98690514212711611, 0.99111667721533181, 0.99149418924880861, 0.99133773062680464, 0.99143506380003499, 0.99151080464011454, 0.99268261743308517, 0.99289757252812316, 0.99100207861144063, 0.99157171773324027, 0.99112571824824358, 0.99031608691035722, 0.98978104266076905, 0.989782674787969, 0.98897835092187614, 0.98517540405423909, 0.98308943666187076, 0.96081810781994603, 0.85563541881892147, 0.61570811548079107, 0.33076276040577052, 0.14655134838124245, 0.076853147122142126, 0.035831324928136087, 0.021344669212790181])
my_y=my_y_raw/np.max(my_y_raw)
plt.plot(my_x, my_y,color='k', markersize=40)
plt.scatter(my_x,my_y,marker='*',label="myplot", color='k', edgecolor='k', linewidth=1,facecolors='none',s=50)
plt.legend(loc="lower left")
plt.xlim([4,102])
plt.show()

Using SciPy
The most straightforward way to do the interpolation is to use the SciPy interpolate.interp1d function. SciPy is closely related to NumPy and you may already have it installed. The advantage to interp1d is that it can sort the data for you. This comes at the cost of somewhat funky syntax. In many interpolation functions it is assumed that you are trying to interpolate a y value from an x value. These functions generally need the "x" values to be monotonically increasing. In your case, we swap the normal sense of x and y. The y values have an outlier as #Abhishek Mishra has pointed out. In the case of your data, you are lucky and you can get away with the the leaving the outlier in.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
my_x = [4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,
48,50,52,54,56,58,60,62,64,66]
my_y_raw=np.array([0.99470977497817203, 0.99434995886145172,
0.98974611323163653, 0.961630837657524, 0.99327633558441175,
0.99338952769251909, 0.99428263292577534, 0.98690514212711611,
0.99111667721533181, 0.99149418924880861, 0.99133773062680464,
0.99143506380003499, 0.99151080464011454, 0.99268261743308517,
0.99289757252812316, 0.99100207861144063, 0.99157171773324027,
0.99112571824824358, 0.99031608691035722, 0.98978104266076905,
0.989782674787969, 0.98897835092187614, 0.98517540405423909,
0.98308943666187076, 0.96081810781994603, 0.85563541881892147,
0.61570811548079107, 0.33076276040577052, 0.14655134838124245,
0.076853147122142126, 0.035831324928136087, 0.021344669212790181])
# set assume_sorted to have scipy automatically sort for you
f = interp1d(my_y_raw, my_x, assume_sorted = False)
xnew = f(0.5)
print('interpolated value is ', xnew)
plt.plot(my_x, my_y_raw,'x-', markersize=10)
plt.plot(xnew, 0.5, 'x', color = 'r', markersize=20)
plt.plot((0, xnew), (0.5,0.5), ':')
plt.grid(True)
plt.show()
which gives
interpolated value is 56.81214249272691
Using NumPy
Numpy also has an interp function, but it doesn't do the sort for you. And if you don't sort, you'll be sorry:
Does not check that the x-coordinate sequence xp is increasing. If xp
is not increasing, the results are nonsense.
The only way I could get np.interp to work was to shove the data in to a structured array.
import numpy as np
import matplotlib.pyplot as plt
my_x = np.array([4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,
48,50,52,54,56,58,60,62,64,66], dtype = np.float)
my_y_raw=np.array([0.99470977497817203, 0.99434995886145172,
0.98974611323163653, 0.961630837657524, 0.99327633558441175,
0.99338952769251909, 0.99428263292577534, 0.98690514212711611,
0.99111667721533181, 0.99149418924880861, 0.99133773062680464,
0.99143506380003499, 0.99151080464011454, 0.99268261743308517,
0.99289757252812316, 0.99100207861144063, 0.99157171773324027,
0.99112571824824358, 0.99031608691035722, 0.98978104266076905,
0.989782674787969, 0.98897835092187614, 0.98517540405423909,
0.98308943666187076, 0.96081810781994603, 0.85563541881892147,
0.61570811548079107, 0.33076276040577052, 0.14655134838124245,
0.076853147122142126, 0.035831324928136087, 0.021344669212790181],
dtype = np.float)
dt = np.dtype([('x', np.float), ('y', np.float)])
data = np.zeros( (len(my_x)), dtype = dt)
data['x'] = my_x
data['y'] = my_y_raw
data.sort(order = 'y') # sort data in place by y values
print('numpy interp gives ', np.interp(0.5, data['y'], data['x']))
which gives
numpy interp gives 56.81214249272691

As you said, your data looks like a flipped sigmoidal. Can we make the assumption that your function is a strictly decreasing function? If that is the case, we can try the following methods:
Remove all the points where the data is not strictly decreasing.For example, for your data that point will be near 0.
Use the binary search to find the location where y=0.5 should be put in.
Now you know two (x, y) pairs where your desired y=0.5 should lie.
You can use simple linear interpolation if (x, y) pairs are very close.
Otherwise, you can see what is the approximation of sigmoid near those pairs.

You might not need to fit any functions to your data. Simply find the following two elements:
The largest x for which y<50%
The smallest x for which y>50%
Then use interpolation and find the x*. Below is the code
my_x = np.array([4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66])
my_y=np.array([0.99470977497817203, 0.99434995886145172, 0.98974611323163653, 0.961630837657524, 0.99327633558441175, 0.99338952769251909, 0.99428263292577534, 0.98690514212711611, 0.99111667721533181, 0.99149418924880861, 0.99133773062680464, 0.99143506380003499, 0.99151080464011454, 0.99268261743308517, 0.99289757252812316, 0.99100207861144063, 0.99157171773324027, 0.99112571824824358, 0.99031608691035722, 0.98978104266076905, 0.989782674787969, 0.98897835092187614, 0.98517540405423909, 0.98308943666187076, 0.96081810781994603, 0.85563541881892147, 0.61570811548079107, 0.33076276040577052, 0.14655134838124245, 0.076853147122142126, 0.035831324928136087, 0.021344669212790181])
tempInd1 = my_y<.5 # This will only work if the values are monotonic
x1 = my_x[tempInd1][0]
y1 = my_y[tempInd1][0]
x2 = my_x[~tempInd1][-1]
y2 = my_y[~tempInd1][-1]
scipy.interp(0.5, [y1, y2], [x1, x2])

Related

Python scipy interpolation meshgrid data

Dear all I want to interpolate an experimental data in order to make it look with higher resolution but apparently it does not work. I followed the example in this link for mgrid data the csv data can be found goes as follow.
Csv data
My code
import pandas as pd
import numpy as np
import scipy
x=np.linspace(0,2.8,15)
y=np.array([2.1,2,1.9,1.8,1.7,1.6,1.5,1.4,1.3,1.2,1.1,0.9,0.7,0.5,0.3,0.13])
[X, Y]=np.meshgrid(x,y)
Vx_df=pd.read_csv("Vx.csv", header=None)
Vx=Vx_df.to_numpy()
tck=scipy.interpolate.bisplrep(X,Y,Vx)
plt.pcolor(X,Y,Vx, shading='nearest');
plt.show()
xi=np.linspace(0.1, 2.5, 30)
yi=np.linspace(0.15, 2.0, 50)
[X1, Y1]=np.meshgrid(xi,yi)
VxNew = scipy.interpolate.bisplev(X1[:,0], Y1[0,:], tck, dx=1, dy=1)
plt.pcolor(X1,Y1,VxNew, shading='nearest')
plt.show()
CSV DATA:
0.73,,,-0.08,-0.19,-0.06,0.02,0.27,0.35,0.47,0.64,0.77,0.86,0.90,0.93
0.84,,,0.13,0.03,0.12,0.23,0.32,0.52,0.61,0.72,0.83,0.91,0.96,0.95
1.01,1.47,,0.46,0.46,0.48,0.51,0.65,0.74,0.80,0.89,0.99,0.99,1.07,1.06
1.17,1.39,1.51,1.19,1.02,0.96,0.95,1.01,1.01,1.05,1.06,1.05,1.11,1.13,1.19
1.22,1.36,1.42,1.44,1.36,1.23,1.24,1.17,1.18,1.14,1.14,1.09,1.08,1.14,1.19
1.21,1.30,1.35,1.37,1.43,1.36,1.33,1.23,1.14,1.11,1.05,0.98,1.01,1.09,1.15
1.14,1.17,1.22,1.25,1.23,1.16,1.23,1.00,1.00,0.93,0.93,0.80,0.82,1.05,1.09
,0.89,0.95,0.98,1.03,0.97,0.94,0.84,0.77,0.68,0.66,0.61,0.48,,
,0.06,0.25,0.42,0.55,0.55,0.61,0.49,0.46,0.56,0.51,0.40,0.28,,
,0.01,0.05,0.13,0.23,0.32,0.33,0.37,0.29,0.30,0.32,0.27,0.25,,
,-0.02,0.01,0.07,0.15,0.21,0.23,0.22,0.20,0.19,0.17,0.20,0.21,0.13,
,-0.07,-0.05,-0.02,0.06,0.07,0.07,0.16,0.11,0.08,0.12,0.08,0.13,0.16,
,-0.13,-0.14,-0.09,-0.07,0.01,-0.03,0.06,0.02,-0.01,0.00,0.01,0.02,0.04,
,-0.16,-0.23,-0.21,-0.16,-0.10,-0.08,-0.05,-0.11,-0.14,-0.17,-0.16,-0.11,-0.05,
,-0.14,-0.25,-0.29,-0.32,-0.31,-0.33,-0.31,-0.34,-0.36,-0.35,-0.31,-0.26,-0.14,
,-0.02,-0.07,-0.24,-0.36,-0.39,-0.45,-0.45,-0.52,-0.48,-0.41,-0.43,-0.37,-0.22,
The image of the low resolution (without iterpolation) is Low resolution and the image I get after interpolation is High resolution
Can you please give me some advice? why it does not interpolate properly?
Ok so to interpolate we need to set up an input and output grid an possibly need to remove values from the grid that are missing. We do that like so
array = pd.read_csv(StringIO(csv_string), header=None).to_numpy()
def interp(array, scale=1, method='cubic'):
x = np.arange(array.shape[1]*scale)[::scale]
y = np.arange(array.shape[0]*scale)[::scale]
x_in_grid, y_in_grid = np.meshgrid(x,y)
x_out, y_out = np.meshgrid(np.arange(max(x)+1),np.arange(max(y)+1))
array = np.ma.masked_invalid(array)
x_in = x_in_grid[~array.mask]
y_in = y_in_grid[~array.mask]
return interpolate.griddata((x_in, y_in), array[~array.mask].reshape(-1),(x_out, y_out), method=method)
Now we need to call this function 3 times. First we fill the missing values in the middle with spline interpolation. Then we fill the boundary values with nearest neighbor interpolation. And finally we size it up by interpreting the pixels as being a few pixels apart and filling in gaps with spline interpolation.
array = interp(array)
array = interp(array, method='nearest')
array = interp(array, 50)
plt.imshow(array)
And we get the following result

Python: how to create a smoothed version of a 2D binned "color map"?

I would like to create a version of this 2D binned "color map" with smoothed colors.
I am not even sure this would be the correct nomenclature for the plot, but, essentially, I want my figure to be color coded by the median values of a third variable for points that reside in each defined bin of my (X, Y) space.
Even though I am able to accomplish that to a certain degree (see example), I would like to find a way to create a version of the same plot with a smoothed color gradient. That would allow me to visualize the overall behavior of my distribution.
I tried ideas described here: Smoothing 2D map in python
and here: Python: binned_statistic_2d mean calculation ignoring NaNs in data
as well as links therein, but could not find a clear solution to the problem.
This is what I have so far:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from scipy.stats import binned_statistic_2d
import random
random.seed(999)
x = np.random.normal (0,10,5000)
y = np.random.normal (0,10,5000)
z = np.random.uniform(0,10,5000)
fig = plt.figure(figsize=(20, 20))
plt.rcParams.update({'font.size': 10})
ax = fig.add_subplot(3,3,1)
ax.set_axisbelow(True)
plt.grid(b=True, lw=0.5, zorder=-1)
x_bins = np.arange(-50., 50.5, 1.)
y_bins = np.arange(-50., 50.5, 1.)
cmap = plt.cm.get_cmap('jet_r',1000) #just a colormap
ret = binned_statistic_2d(x, y, z, statistic=np.median, bins=[x_bins, y_bins]) # Bin (X, Y) and create a map of the medians of "Colors"
plt.imshow(ret.statistic.T, origin='bottom', extent=(-50, 50, -50, 50), cmap=cmap)
plt.xlim(-40,40)
plt.ylim(-40,40)
plt.xlabel("X", fontsize=15)
plt.ylabel("Y", fontsize=15)
ax.set_yticks([-40,-30,-20,-10,0,10,20,30,40])
bounds = np.arange(2.0, 20.0, 1.0)
plt.colorbar(ticks=bounds, label="Color", fraction=0.046, pad=0.04)
# save plots
plt.savefig("Whatever_name.png", bbox_inches='tight')
Which produces the following image (from random data):
Therefore, the simple question would be: how to smooth these colors?
Thanks in advance!
PS: sorry for excessive coding, but I believe a clear visualization is crucial for this particular problem.
Thanks to everyone who viewed this issue and tried to help!
I ended up being able to solve my own problem. In the end, it was all about image smoothing with Gaussian Kernel.
This link: Gaussian filtering a image with Nan in Python gave me the insight for the solution.
I, basically, implemented the exactly same code, but, in the end, mapped the previously known NaN pixels from the original 2D array to the resulting smoothed version. Unlike the solution from the link, my version does NOT fill NaN pixels with some value derived from the pixels around. Or, it does, but then I erase those again.
Here is the final figure produced for the example I provided:
Final code, for reference, for those who might need in the future:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from scipy.stats import binned_statistic_2d
import scipy.stats as st
import scipy.ndimage
import scipy as sp
import random
random.seed(999)
x = np.random.normal (0,10,5000)
y = np.random.normal (0,10,5000)
z = np.random.uniform(0,10,5000)
fig = plt.figure(figsize=(20, 20))
plt.rcParams.update({'font.size': 10})
ax = fig.add_subplot(3,3,1)
ax.set_axisbelow(True)
plt.grid(b=True, lw=0.5, zorder=-1)
x_bins = np.arange(-50., 50.5, 1.)
y_bins = np.arange(-50., 50.5, 1.)
cmap = plt.cm.get_cmap('jet_r',1000) #just a colormap
ret = binned_statistic_2d(x, y, z, statistic=np.median, bins=[x_bins, y_bins]) # Bin (X, Y) and create a map of the medians of "Colors"
sigma=1 # standard deviation for Gaussian kernel
truncate=5.0 # truncate filter at this many sigmas
U = ret.statistic.T.copy()
V=U.copy()
V[np.isnan(U)]=0
VV=sp.ndimage.gaussian_filter(V,sigma=sigma)
W=0*U.copy()+1
W[np.isnan(U)]=0
WW=sp.ndimage.gaussian_filter(W,sigma=sigma)
np.seterr(divide='ignore', invalid='ignore')
Z=VV/WW
for i in range(len(Z)):
for j in range(len(Z[0])):
if np.isnan(U[i][j]):
Z[i][j] = np.nan
plt.imshow(Z, origin='bottom', extent=(-50, 50, -50, 50), cmap=cmap)
plt.xlim(-40,40)
plt.ylim(-40,40)
plt.xlabel("X", fontsize=15)
plt.ylabel("Y", fontsize=15)
ax.set_yticks([-40,-30,-20,-10,0,10,20,30,40])
bounds = np.arange(2.0, 20.0, 1.0)
plt.colorbar(ticks=bounds, label="Color", fraction=0.046, pad=0.04)
# save plots
plt.savefig("Whatever_name.png", bbox_inches='tight')

Find all positive-going zero-crossings in a large quasi-periodic array

I need to find zero-crossings in a 1D array of a roughly periodic function. It will be the points where an orbiting satellite crosses the Earth's equator going north.
I've worked out a simple solution based on finding points where one value is zero or negative and the next is positive, then using a quadratic or cubic interpolator with scipy.optimize.brentq to find the nearby zeros.
The interpolator does not go beyond cubic, and before I learn to use a better interpolator I'd first like to check if there already exists a fast method in numpy or scipy to find all of the zero crossings in a large array (n = 1E+06 to 1E+09).
Question: So I'm asking does there already exist a faster method in numpy or scipy to find all of the zero crossings in a large array (n = 1E+06 to 1E+09) than the way I've done it here?
The plot shows the errors between the interpolated zeros and the actual value of the function, the smaller line is the cubic interpolation, the larger is quadratic.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
from scipy.optimize import brentq
def f(x):
return np.sin(x + np.sin(x*e)/e) # roughly periodic function
halfpi, pi, twopi = [f*np.pi for f in (0.5, 1, 2)]
e = np.exp(1)
x = np.arange(0, 10000, 0.1)
y = np.sin(x + np.sin(x*e)/e)
crossings = np.where((y[1:] > 0) * (y[:-1] <= 0))[0]
Qd = interp1d(x, y, kind='quadratic', assume_sorted=True)
Cu = interp1d(x, y, kind='cubic', assume_sorted=True)
x0sQd = [brentq(Qd, x[i-1], x[i+1]) for i in crossings[1:-1]]
x0sCu = [brentq(Cu, x[i-1], x[i+1]) for i in crossings[1:-1]]
y0sQd = [f(x0) for x0 in x0sQd]
y0sCu = [f(x0) for x0 in x0sCu]
if True:
plt.figure()
plt.plot(x0sQd, y0sQd)
plt.plot(x0sCu, y0sCu)
plt.show()

Matplotlib compute values when plotting - python3

I want to plot only positive values when plotting a graph (like the RELU function in ML)
This may well be a dumb question. I hope not.
In the code below I iterate and change the underlying list data. I really want to only change the values when it's plot time and not change the source list data. Is that possible?
#create two lists in range -10 to 10
x = list(range(-10, 11))
y = list(range(-10, 11))
#this function changes the underlying data to remove negative values
#I really want to do this at plot time
#I don't want to change the source list. Can it be done?
for idx, val in enumerate(y):
y[idx] = max(0, val)
#a bunch of formatting to make the plot look nice
plt.figure(figsize=(6, 6))
plt.axhline(y=0, color='silver')
plt.axvline(x=0, color='silver')
plt.grid(True)
plt.plot(x, y, 'rx')
plt.show()
I'd suggest using numpy and filter the data when plotting:
import numpy as np
import matplotlib.pyplot as plt
#create two lists in range -10 to 10
x = list(range(-10, 11))
y = list(range(-10, 11))
x = np.array(x)
y = np.array(y)
#a bunch of formatting to make the plot look nice
plt.figure(figsize=(6, 6))
plt.axhline(y=0, color='silver')
plt.axvline(x=0, color='silver')
plt.grid(True)
# plot only those values where y is positive
plt.plot(x[y>0], y[y>0], 'rx')
plt.show()
This will not plot points with y < 0 at all. If instead, you want to replace any negative value by zero, you can do so as follows
plt.plot(x, np.maximum(0,y), 'rx')
It may look a bit complicated but filter the data on the fly:
plt.plot(list(zip(*[(x1,y1) for (x1,y1) in zip(x,y) if x1>0])), 'rx')
Explanation: it is safer to handle the data as pairs so that (x,y) stay in sync, and then you have to convert pairs back to separate xlist and ylist.

Highlighting arbitrary points in a matplotlib plot

I am new to python and matplotlib.
I am trying to highlight a few points that match a certain criteria in an already existing plot in matplotlib.
The code for the initial plot is as below:
pl.plot(t,y)
pl.title('Damped Sine Wave with %.1f Hz frequency' % f)
pl.xlabel('t (s)')
pl.ylabel('y')
pl.grid()
pl.show()
In the above plot I wanted to highlight some specific points which match the criteria abs(y)>0.5. The code coming up with the points is as below:
markers_on = [x for x in y if abs(x)>0.5]
I tried using the argument 'markevery', but it throws an error saying
'markevery' is iterable but not a valid form of numpy fancy indexing;
The code that was giving the error is as below:
pl.plot(t,y,'-gD',markevery = markers_on)
pl.title('Damped Sine Wave with %.1f Hz frequency' % f)
pl.xlabel('t (s)')
pl.ylabel('y')
pl.grid()
pl.show()
The markevery argument to the plotting function accepts different types of inputs. Depending on the input type, they are interpreted differently. Find a nice list of possibilities in this matplotlib example.
In the case where you have a condition for the markers to show, there are two options. Assuming t and y are numpy arrays and one has imported numpy as np,
Either specify a boolean array,
plt.plot(t,y,'-gD',markevery = np.where(y > 0.5, True, False))
or
an array of indices.
plt.plot(t,y,'-gD',markevery = np.arange(len(t))[y > 0.5])
Complete example
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
t = np.linspace(0,3,14)
y = np.random.rand(len(t))
plt.plot(t,y,'-gD',markevery = np.where(y > 0.5, True, False))
# or
#plt.plot(t,y,'-gD',markevery = np.arange(len(t))[y > 0.5])
plt.xlabel('t (s)')
plt.ylabel('y')
plt.show()
resulting in
markevery uses boolean values to mark every point where a boolean is True
so instead of markers_on = [x for x in y if abs(x)>0.5]
you'd do markers_on = [abs(x)>0.5 for x in y] which will return a list of boolean values the same size of y, and every point where |x| > 0.5 you'd get True
Then you'd use your code as is:
pl.plot(t,y,'-gD',markevery = markers_on)
pl.title('Damped Sine Wave with %.1f Hz frequency' % f)
pl.xlabel('t (s)')
pl.ylabel('y')
pl.grid()
pl.show()
I know this question is old, but I found this solution while trying to do the top answer as I'm not familiar with numpy and it seemed to overcomplicate things
The markevery argument only takes indices of type None, integer or boolean arrays as input. Since I was passing the values directly it was throwing the error.
I know it is not very pythonic but I used the below code to come up with the indices.
marker_indices = []
for x in range(len(y)):
if abs(y[x]) > 0.5:
marker_indices.append(x)
I was having this issue because I was trying to mark some points that were out of the bounds of the data frame.
For example:
some_df.shape
-> (276, 9)
markers = [1000, 1080, 1120]
some_df.plot(
x='date',
y=['speed'],
figsize=(17, 7), title="Performance",
legend=True,
marker='o',
markersize=10,
markevery=markers,
)
-> ValueError: markevery=[1000, 1080, 1120] is iterable but not a valid numpy fancy index
Just make sure that the values you are giving as markers are within the bounds of the data frame you want to plot.

Resources