How to interpolate 2D spatial data with kriging in Python? - scikit-learn
I have a spatial 2D domain, say [0,1]×[0,1]. In this domain, there are 6 points where some scalar quantity of interest has been observed (e.g., temperature, mechanical stress, fluid density, etc.). How can I predict the quantity of interest at unobserved points? In other words, how may I interpolate spatial data in Python?
For example, consider the following coordinates for points in the 2D domain (inputs) and corresponding observations of the quantity of interest (outputs).
import numpy as np
coordinates = np.array([[0.0,0.0],[0.5,0.0],[1.0,0.0],[0.0,1.0],[0.5,1.],[1.0,1.0]])
observations = np.array([1.0,0.5,0.75,-1.0,0.0,1.0])
The X and Y coordinates can be extracted with:
x = coordinates[:,0]
y = coordinates[:,1]
The following script creates a scatter plot where yellow (resp. blue) represents high (resp. low) output values.
import matplotlib.pyplot as plt
fig = plt.figure()
plt.scatter(x, y, c=observations, cmap='viridis')
plt.colorbar()
plt.show()
I would like to use Kriging to predict the scalar quantity of interest on a regular grid within the 2D input domain. How can I do this in Python?
In OpenTURNS, the KrigingAlgorithm class can estimate the hyperparameters of a Gaussian process model based on the known output values at specific input points. The getMetamodel method of KrigingAlgorithm, then, returns a function which interpolates the data.
First, we need to convert the Numpy arrays coordinates and observations to OpenTURNS Sample objects:
import openturns as ot
input_train = ot.Sample(coordinates)
output_train = ot.Sample(observations, 1)
The array coordinates has shape (6, 2), so it is turned into a Sample of size 6 and dimension 2. The array observations has shape (6,), which is ambiguous: Is it going to be a Sample of size 6 and dimension 1, or a Sample of size 1 and dimension 6? To clarify this, we specify the dimension (1) in the call to the Sample class constructor.
In the following, we define a Gaussian process model with constant trend function and squared exponential covariance kernel:
inputDimension = 2
basis = ot.ConstantBasisFactory(inputDimension).build()
covariance_kernel = ot.SquaredExponential([1.0]*inputDimension, [1.0])
algo = ot.KrigingAlgorithm(input_train, output_train,
covariance_kernel, basis)
We then fit the value of the trend and the parameters of the covariance kernel (amplitude parameter and scale parameters) and obtain a metamodel:
# Fit
algo.run()
result = algo.getResult()
krigingMetamodel = result.getMetaModel()
The resulting krigingMetamodel is a Function which takes a 2D Point as input and returns a 1D Point. It predicts the quantity of interest. To illustrate this, let us build the 2D domain [0,1]×[0,1] and discretize it with a regular grid:
# Create the 2D domain
myInterval = ot.Interval([0.0, 0.0], [1.0, 1.0])
# Define the number of interval in each direction of the box
nx = 20
ny = 10
myIndices = [nx - 1, ny - 1]
myMesher = ot.IntervalMesher(myIndices)
myMeshBox = myMesher.build(myInterval)
Using our krigingMetamodel to predict the values taken by the quantity of interest on this mesh can be done with the following statements. We first get the vertices of the mesh as a Sample, and then evaluate the predictions with a single call to the metamodel (there is no need for a for loop here):
# Predict
vertices = myMeshBox.getVertices()
predictions = krigingMetamodel(vertices)
In order to see the result with Matplotlib, we first have to create the data required by the pcolor function:
# Format for plot
X = np.array(vertices[:, 0]).reshape((ny, nx))
Y = np.array(vertices[:, 1]).reshape((ny, nx))
predictions_array = np.array(predictions).reshape((ny,nx))
The following script produces the plot:
# Plot
import matplotlib.pyplot as plt
fig = plt.figure()
plt.pcolor(X, Y, predictions_array)
plt.colorbar()
plt.show()
We see that the predictions of the metamodel are equal to the observations at the observed input points.
This metamodel is a smooth function of the coordinates: its smoothness increases with covariance kernel smoothness and squared exponential covariance kernels happen to be smooth.
Related
Python scipy interpolation meshgrid data
Dear all I want to interpolate an experimental data in order to make it look with higher resolution but apparently it does not work. I followed the example in this link for mgrid data the csv data can be found goes as follow. Csv data My code import pandas as pd import numpy as np import scipy x=np.linspace(0,2.8,15) y=np.array([2.1,2,1.9,1.8,1.7,1.6,1.5,1.4,1.3,1.2,1.1,0.9,0.7,0.5,0.3,0.13]) [X, Y]=np.meshgrid(x,y) Vx_df=pd.read_csv("Vx.csv", header=None) Vx=Vx_df.to_numpy() tck=scipy.interpolate.bisplrep(X,Y,Vx) plt.pcolor(X,Y,Vx, shading='nearest'); plt.show() xi=np.linspace(0.1, 2.5, 30) yi=np.linspace(0.15, 2.0, 50) [X1, Y1]=np.meshgrid(xi,yi) VxNew = scipy.interpolate.bisplev(X1[:,0], Y1[0,:], tck, dx=1, dy=1) plt.pcolor(X1,Y1,VxNew, shading='nearest') plt.show() CSV DATA: 0.73,,,-0.08,-0.19,-0.06,0.02,0.27,0.35,0.47,0.64,0.77,0.86,0.90,0.93 0.84,,,0.13,0.03,0.12,0.23,0.32,0.52,0.61,0.72,0.83,0.91,0.96,0.95 1.01,1.47,,0.46,0.46,0.48,0.51,0.65,0.74,0.80,0.89,0.99,0.99,1.07,1.06 1.17,1.39,1.51,1.19,1.02,0.96,0.95,1.01,1.01,1.05,1.06,1.05,1.11,1.13,1.19 1.22,1.36,1.42,1.44,1.36,1.23,1.24,1.17,1.18,1.14,1.14,1.09,1.08,1.14,1.19 1.21,1.30,1.35,1.37,1.43,1.36,1.33,1.23,1.14,1.11,1.05,0.98,1.01,1.09,1.15 1.14,1.17,1.22,1.25,1.23,1.16,1.23,1.00,1.00,0.93,0.93,0.80,0.82,1.05,1.09 ,0.89,0.95,0.98,1.03,0.97,0.94,0.84,0.77,0.68,0.66,0.61,0.48,, ,0.06,0.25,0.42,0.55,0.55,0.61,0.49,0.46,0.56,0.51,0.40,0.28,, ,0.01,0.05,0.13,0.23,0.32,0.33,0.37,0.29,0.30,0.32,0.27,0.25,, ,-0.02,0.01,0.07,0.15,0.21,0.23,0.22,0.20,0.19,0.17,0.20,0.21,0.13, ,-0.07,-0.05,-0.02,0.06,0.07,0.07,0.16,0.11,0.08,0.12,0.08,0.13,0.16, ,-0.13,-0.14,-0.09,-0.07,0.01,-0.03,0.06,0.02,-0.01,0.00,0.01,0.02,0.04, ,-0.16,-0.23,-0.21,-0.16,-0.10,-0.08,-0.05,-0.11,-0.14,-0.17,-0.16,-0.11,-0.05, ,-0.14,-0.25,-0.29,-0.32,-0.31,-0.33,-0.31,-0.34,-0.36,-0.35,-0.31,-0.26,-0.14, ,-0.02,-0.07,-0.24,-0.36,-0.39,-0.45,-0.45,-0.52,-0.48,-0.41,-0.43,-0.37,-0.22, The image of the low resolution (without iterpolation) is Low resolution and the image I get after interpolation is High resolution Can you please give me some advice? why it does not interpolate properly?
Ok so to interpolate we need to set up an input and output grid an possibly need to remove values from the grid that are missing. We do that like so array = pd.read_csv(StringIO(csv_string), header=None).to_numpy() def interp(array, scale=1, method='cubic'): x = np.arange(array.shape[1]*scale)[::scale] y = np.arange(array.shape[0]*scale)[::scale] x_in_grid, y_in_grid = np.meshgrid(x,y) x_out, y_out = np.meshgrid(np.arange(max(x)+1),np.arange(max(y)+1)) array = np.ma.masked_invalid(array) x_in = x_in_grid[~array.mask] y_in = y_in_grid[~array.mask] return interpolate.griddata((x_in, y_in), array[~array.mask].reshape(-1),(x_out, y_out), method=method) Now we need to call this function 3 times. First we fill the missing values in the middle with spline interpolation. Then we fill the boundary values with nearest neighbor interpolation. And finally we size it up by interpreting the pixels as being a few pixels apart and filling in gaps with spline interpolation. array = interp(array) array = interp(array, method='nearest') array = interp(array, 50) plt.imshow(array) And we get the following result
Visualizing multiple linear regression predictions using a heatmap
I am using multiple linear regression to predict the temperature in every region of a field where wireless sensors are deployed, the sensors are as follows : 42 sensors deployed in a 1000x600 m² surface and collecting the temperature in each of these 42 locations per hour, see picture: Sensors placement We have here two features ( the location aka : x and y ), and the output which is the temperature, so i fit my model according to 70% of the dataset, for the sake of later accuracy computations, however after fitting my model I want to have the temperatures prediction over all the surface, specifically a heat map that gives me the temperature as a function of x and y ( see picture : Heatmap) I am stuck in the part of visualization, as my dataset contains the 42 known locations and their respective temperatures, how can i plot predictions for every x in [0,1000] and y in [0,600] Do i have to make an nx2 matrix iterating over all the values of x and y and then feeding it my fitted model ? or is there a simpler way
You can use np.meshgrid to create a grid of points, then use your model to predict on this grid of points. import numpy as np import matplotlib.pyplot as plt grid_x, grid_y = np.meshgrid(np.linspace(0, 1000, 100), np.linspace(0, 600, 60)) X = np.stack([grid_x.ravel(), grid_y.ravel()]).T y_pred = model.predict(X) # use your scikit-learn model here image = np.reshape(y_pred, grid_x.shape) plt.imshow(image, origin="lower") plt.colorbar() plt.show()
Plot 3D density plot from many 2D arrays
I am trying to plot a 3D density plot from many 2D numpy arrays of the same shape. Each [x,y] coordinate returns an intensity (how dense it is at that point). I cannot figure out how to plot this using matplotlib I'm able to successfully get a contour plot by just plotting one 2D array, or using imshow to get a nice slice of my density at a certain 'z' cut, but just plotting that 2D array. I have an object: data, which when I apply the method slice() and pass in an integer from 0 to 480, I get a 2D array of that 'z' cross section: plt.imshow(data.slice(200)) I want to be able to plot a density map by iterating over data.slice(n) for n-> 0 to 480 and plot that on a single image. I'm not sure how to do such a thing.
If you have lots of slices that you want to view as a density map from one side, you can average over all the cells along a given axis and them view that as an image. import numpy as np import matplotlib.pyplot as plt def plot_projections(d): # project onto the appropriate plane by averaging d_mean_0 = d.mean(axis=0) d_mean_1 = d.mean(axis=1) d_mean_2 = d.mean(axis=2) plt.subplot(1, 3, 1) plt.imshow(d.mean(axis=0), cmap='rainbow') plt.subplot(1, 3, 2) plt.imshow(d.mean(axis=1), cmap='rainbow') plt.subplot(1, 3, 3) plt.imshow(d.mean(axis=2), cmap='rainbow') plt.show() # random seeded 10x10x10 array d = np.random.randint(0, 10, size=(10,10,10)) plot_projections(d) # pack matrix with 10s along one plane for i in range(len(d)): d[2][i] = np.array([10,10,10,10,10,10,10,10,10,10]) plot_projections(d)
Plotting residuals of masked values with `statsmodels`
I'm using statsmodels.api to compute the statistical parameters for an OLS fit between two variables: def computeStats(x, y, yName): ''' Takes as an argument an array, and a string for the array name. Uses Ordinary Least Squares to compute the statistical parameters for the array against log(z), and determines the equation for the line of best fit. Returns the results summary, residuals, statistical parameters in a list, and the best fit equation. ''' # Mask NaN values in both axes mask = ~np.isnan(y) & ~np.isnan(x) # Compute model parameters model = sm.OLS(y, sm.add_constant(x), missing= 'drop') results = model.fit() residuals = results.resid # Compute fit parameters params = stats.linregress(x[mask], y[mask]) fit = params[0]*x + params[1] fitEquation = '$(%s)=(%.4g \pm %.4g) \\times redshift+%.4g$'%(yName, params[0], # slope params[4], # stderr in slope params[1]) # y-intercept return results, residuals, params, fit, fitEquation The second part of the function (using stats.linregress) plays nicely with the masked values, but statsmodels does not. When I try to plot the residuals against the x values with plt.scatter(x, resids), the dimensions do not match: ValueError: x and y must be the same size because there are 29007 x-values, and 11763 residuals (that's how many y-values made it through the masking process). I tried changing the model variable to model = sm.OLS(y[mask], sm.add_constant(x[mask]), missing= 'drop') but this had no effect. How can I scatter-plot the residuals against the x-values they match with?
Hi #jim421616 Since statsmodels dropped few missing values, you should use the model's exog variable to plot the scatter as shown. plt.scatter(model.model.exog[:,1], model.resid) For reference a complete dummy example import numpy as np import statsmodels.api as sm import matplotlib.pyplot as plt #generate data x = np.random.rand(1000) y =np.sin( x*25)+0.1*np.random.rand(1000) # Make some as NAN y[np.random.choice(np.arange(1000), size=100)]= np.nan x[np.random.choice(np.arange(1000), size=80)]= np.nan # fit model model = sm.OLS(y, sm.add_constant(x) ,missing='drop').fit() print model.summary() # plot plt.scatter(model.model.exog[:,1], model.resid) plt.show()
sklearn - label points of PCA
I am generating a PCA which uses scikitlearn, numpy and matplotlib. I want to know how to label each point (row in my data). I found "annotate" in matplotlib, but this seems to be for labeling specific coordinates, or just putting text on arbitrary points by the order they appear. I'm trying to abstract away from this but struggling due to the PCA sections that appear before the matplot stuff. Is there a way I can do this with sklearn, while I'm still generating the plot, so I don't lose its connection to the row I got it from? Here's my code: # Create a Randomized PCA model that takes two components randomized_pca = decomposition.RandomizedPCA(n_components=2) # Fit and transform the data to the model reduced_data_rpca = randomized_pca.fit_transform(x) # Create a regular PCA model pca = decomposition.PCA(n_components=2) # Fit and transform the data to the model reduced_data_pca = pca.fit_transform(x) # Inspect the shape reduced_data_pca.shape # Print out the data print(reduced_data_rpca) print(reduced_data_pca) def rand_jitter(arr): stdev = .01*(max(arr)-min(arr)) return arr + np.random.randn(len(arr)) * stdev colors = ['red', 'blue'] for i in range(len(colors)): w = reduced_data_pca[:, 0][y == i] z = reduced_data_pca[:, 1][y == i] plt.scatter(w, z, c=colors[i]) targ_names = ["Negative", "Positive"] plt.legend(targ_names, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.) plt.xlabel('First Principal Component') plt.ylabel('Second Principal Component') plt.title("PCA Scatter Plot") plt.show()
PCA is a projection, not a clustering (you tagged this as clustering). There is no concept of a label in PCA. You can draw texts onto a scatterplot, but usually it becomes too crowded. You can find answers to this on stackoverflow already.