How to create a variable number of dimensions for mgrid - python-3.x

I would like to create a meshgrid of variable dimensions by specifying the dimensions with a variable i.e. specifying dim=2, rather than manually changing the expression as in the example below to set a 2D mesh grid.
How would I implement a wrapper function for this?
The problem stems from not being familiar with the syntax that mgrid uses (index_tricks).
import numpy as np
mgrid = np.mgrid[
-5:5:5j,
-5:5:5j,
]
Observed documentation for mgrid, but there seems to be no info on setting the number of dimensions with a variable.

You can create a tuple containing the slice manually, and repeat it some number of times:
import numpy as np
num_dims = 2
mgrid = np.mgrid[(slice(-5, 5, 5j),) * num_dims]

Related

Python 3 - Scipy and KDEpy

I am using python-3.x and I want to Evaluate the estimated pdf on a provided set of points using the KDEpy but I couldn't get it right,
I used the scipy.stats.gaussian_kde and it is fine and work very well when I apply the pdf Method as I am interested in the Evaluate the estimated pdf on a provided set of points.
so the question is how to get the same result from the scipy.stats.kde if I used the KDEpy FFTKDE
here a small example that describes what I am looking for:
from scipy.stats.kde import gaussian_kde
data = np.array([[-1.84134663, -1.42036525, -1.38819347],
[-2.58165693, -2.49423057, -1.57609454],
[-0.78776371, -0.79168188, 0.21967791],
[-1.0165618 , -1.78509185, -0.68373997],
[-1.21764947, -0.43215885, -0.34393573]])
my_pdf = gaussian_kde(data.T, bw_method = None )
my_pdf1.pdf(data.T)
print (my_pdf1.pdf(data.T)) # here we will Evaluate the estimated pdf on a provided set of points
the result is:
[0.24234078 0.22071922 0.23802877 0.22474656 0.25402297]
how to get the same result by using the KDEpy FFTKDE
from KDEpy import FFTKDE
my_pdf2 = FFTKDE(kernel="gaussian").fit(data.T).evaluate()
but I don't know how to do the Evaluate the estimated pdf on a provided set of points similar to the scipy.stats.kde with pdf method.
You can create an equidistant grid using e.g. numpy.linspace and pass it to .evaluate():
from KDEpy import FFTKDE
import numpy as np
x_grid = np.linspace(-10, 10, num=2**10)
my_pdf = FFTKDE(kernel="gaussian").fit(data.T).evaluate(x_grid)

Is index function in to_csv I have used correctly according to problem statement?

Problem Statement :
1) Create another Series named heights_B from a 1-D numpy array of 5 elements derived from the normal distribution of mean 170.0 and standard deviation 25.0.
Note: Set random seed to 100 before creating heights_B series. Use numpy.
2) Create another Series named weights_B from a 1-D numpy array of 5 elements derived from the normal distribution of mean 75.0 and standard deviation 12.0.
Note: Set random seed to 100 again before creating weights_B series. Use numpy.
3)Label both Series elements with s1, s2, s3, s4 and s5.
4)Create a dataframe df_B containing the height and weight of students s1, s2, s3, s4 and s5 belonging to class B.
5)Label the columns as Student_height and Student_weight respectively.
Write the contents of df_B without the index to a CSV file named classB.csv.
Note: Use the index argument of to_csv method.
Solution :
import pandas as pd
import numpy as np
import random
height_A=np.array([176.2,158.4,167.6,156.2,161.4])
s1=pd.Series(height_A,index=['s1','s2','s3','s4','s5'])
weight_A=np.array([85.1,90.2,76.8,80.4,78.9])
s2=pd.Series(weight_A,index=['s1','s2','s3','s4','s5'])
df={'Student_height':s1,'Student_weight':s2}
hdf=pd.DataFrame(df)
random.seed(100)
x=np.random.normal(loc=170.0,scale=25.0,size=5)
s3=pd.Series(x,index=['s1','s2','s3','s4','s5'])
random.seed(100)
y=np.random.normal(loc=75.0,scale=12.0,size=5)
s4=pd.Series(y,index=['s1','s2','s3','s4','s5'])
df1=df={'Student_height':s3,'Student_weight':s4}
hdf1=pd.DataFrame(df1)
hdf1.to_csv('classB.csv',index=False)
I have written code according to problem statement but online compiler is not accepting my solution , please tell me if I have done any mistake.
add one more line to code
m = pd.read_csv("classB.csv"); print(m)
Use this code :
import os
import numpy as np
np.random.seed(100)
x=np.random.normal(loc=170.0,scale=25.0,size=5)
np.random.seed(100)
heights_B=pd.Series(x,index=['s1','s2','s3','s4','s5'])
np.random.seed(100)
y=np.random.normal(loc=75.0,scale=12.0,size=5)
weights_B=pd.Series(y,index=['s1','s2','s3','s4','s5'])
df_B = pd.DataFrame({'Student_height': heights_B,'Student_weight':weights_B}, index = weights_B.index)
df_B.to_csv("classB.csv",index=False)
os.system("cat classB.csv")

Reprojecting Xarray Dataset

I'm trying to reproject a Lambert Conformal dataset to Plate Carree. I know that this can easily be done visually using cartopy. However, I'm trying to create a new dataset rather than just show a reprojected image. Below is methodology I have mapped out but I'm unable to subset the dataset properly (Python 3.5, MacOSx).
from siphon.catalog import TDSCatalog
import xarray as xr
from xarray.backends import NetCDF4DataStore
import numpy as np
import cartopy.crs as ccrs
from scipy.interpolate import griddata
import numpy.ma as ma
from pyproj import Proj, transform
import metpy
# Declare bounding box
min_lon = -78
min_lat = 36
max_lat = 40
max_lon = -72
boundinglat = [min_lat, max_lat]
boundinglon = [min_lon, max_lon]
# Load the dataset
cat = TDSCatalog('https://thredds.ucar.edu/thredds/catalog/grib/NCEP/HRRR/CONUS_2p5km/latest.xml')
dataset_name = sorted(cat.datasets.keys())[-1]
dataset = cat.datasets[dataset_name]
ds = dataset.remote_access(service='OPENDAP')
ds = NetCDF4DataStore(ds)
ds = xr.open_dataset(ds)
# parse the temperature at 850 and # 0z reftime
tempiso = ds.metpy.parse_cf('Temperature_isobaric')
t850 = tempiso[0][2]
# transform bounding lat/lons to src_proj
src_proj = tempiso.metpy.cartopy_crs #aka lambert conformal conical
extents = src_proj.transform_points(ccrs.PlateCarree(), np.array(boundinglon), np.array(boundinglat))
# subset the data using the indexes of the closest values to the src_proj extents
t850_subset = t850[(np.abs(tempiso.y.values - extents[1][0])).argmin():(np.abs(tempiso.y.values - extents[1][1])).argmin()][(np.abs(tempiso.x.values - extents[0][1])).argmin():(np.abs(tempiso.x.values - extents[0][0])).argmin()]
# t850_subset should be a small, reshaped dataset, but it's shape is 0x2145
# now use nplinspace, npmeshgrid & scipy interpolate to reproject
My transform point > find nearest value subsetting isn't working. It's claiming the closest points are outside the realm of the dataset. As noted, I plan to use nplinspace, npmeshgrid and scipy interpolate to create a new, square lat/lon dataset from t850_subset.
Is there an easier way to resize & reproject an xarray dataset?
Your easiest path forward is to take advantage of xarray's ability to do pandas-like data selection; this is IMO the best part of xarray. Replace your last two lines with:
# By transposing the result of transform_points, we can unpack the
# x and y coordinates into individual arrays.
x_lim, y_lim, _ = src_proj.transform_points(ccrs.PlateCarree(),
np.array(boundinglon), np.array(boundinglat)).T
t850_subset = t850.sel(x=slice(*x_lim), y=slice(*y_lim))
You can find more information in the documentation on xarray's selection and indexing functionality. You would probably also be interested in xarray's built-in support for interpolation. And if interpolation methods beyond SciPy's are of interest, MetPy also has a suite of other interpolation methods.
We have various "regridding" methods in Iris, if that isn't too much of a context switch for you.
Xarray explains its relationship to Iris here, and provides a to_iris() method.

Create named list from matrix using rpy2

I have a 2D numpy array which I converted to R matrix and now I need to convert it further to named list:
rpy2.robjects.numpy2ri.activate()
nr,nc = counts.shape
r_mtx = robjects.r.matrix(counts, nrow=nr, ncol=nc)
So, I got the matrix r_mtx, but I am not sure how to make a named list out of it similar to how we do it in R:
named_list <- list(counts=mtx)
I need it to feed into SingleCellExperiment object to do dataset normalization:
https://bioconductor.org/packages/devel/bioc/vignettes/scran/inst/doc/scran.html
I tried using rpy2.rlike.container both TaggedList and OrdDict but can't figure out how to apply them to my case.
Ultimately I solved it (avoiding convertion of numpy array to r matrix), straight making the named list from the numpy array:
named_list = robjects.r.list(counts=counts)
Where counts is a 2D numpy array

What to pass to clf.predict()?

I started playing with Decision Trees lately and I wanted to train my own simple model with some manufactured data. I wanted to use this model to predict some further mock data, just to get a feel of how it works, but then I got stuck. Once your model is trained, how do you pass data to predict()?
http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
Docs state:
clf.predict(X)
Parameters:
X : array-like or sparse matrix of shape = [n_samples, n_features]
But when trying to pass np.array, np.ndarray, list, tuple or DataFrame it just throws an error. Can you help me understand why please?
Code below:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
import graphviz
import pandas as pd
import numpy as np
import random
from sklearn import tree
pd.options.display.max_seq_items=5000
pd.options.display.max_rows=20
pd.options.display.max_columns=150
lenght = 50000
miles_commuting = [random.choice([2,3,4,5,7,10,20,25,30]) for x in range(lenght)]
salary = [random.choice([1300,1600,1800,1900,2300,2500,2700,3300,4000]) for x in range(lenght)]
full_time = [random.choice([1,0,1,1,0,1]) for x in range(lenght)]
DataFrame = pd.DataFrame({'CommuteInMiles':miles_commuting,'Salary':salary,'FullTimeEmployee':full_time})
DataFrame['Moving'] = np.where((DataFrame.CommuteInMiles > 20) & (DataFrame.Salary > 2000) & (DataFrame.FullTimeEmployee == 1),1,0)
DataFrame['TargetLabel'] = np.where((DataFrame.Moving == 1),'Considering move','Not moving')
target = DataFrame.loc[:,'Moving']
data = DataFrame.loc[:,['CommuteInMiles','Salary','FullTimeEmployee']]
target_names = DataFrame.TargetLabel
features = data.columns.values
clf = tree.DecisionTreeClassifier()
clf = clf.fit(data, target)
clf.predict(?????) #### <===== What should go here?
clf.predict([30,4000,1])
ValueError: Expected 2D array, got 1D array instead:
array=[3.e+01 4.e+03 1.e+00].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
clf.predict(np.array(30,4000,1))
ValueError: only 2 non-keyword arguments accepted
Where is your "mock data" that you want to predict?
Your data should be of the same shape that you used when calling fit(). From the code above, I see that your X has three columns ['CommuteInMiles','Salary','FullTimeEmployee']. You need to have those many columns in your prediction data, number of rows can be arbitrary.
Now when you do
clf.predict([30,4000,1])
The model is not able to understand that these are columns of a same row or data of different rows.
So you need to convert that into 2-d array, where inner array represents the single row.
Do this:
clf.predict([[30,4000,1]]) #<== Observe the two square brackets
You can have multiple rows to be predicted, each in inner list. Something like this:
X_test = [[30,4000,1],
[35,15000,0],
[40,2000,1],]
clf.predict(X_test)
Now as for your last error clf.predict(np.array(30,4000,1)), this has nothing to do with predict(). You are using the np.array() wrong.
According to the documentation, the signature of np.array is:
(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)
Leaving the first (object) all others are keyword arguments, so they need to be used as such. But when you do this: np.array(30,4000,1), each value is considered as input to separate param here: object=30, dtype=4000, copy=1. This is not allowed and hence error. If you want to make a numpy array from list, you need to pass a list.
Like this: np.array([30,4000,1])
Now this will be considered correctly as input to object param.

Resources