Reprojecting Xarray Dataset - python-3.x

I'm trying to reproject a Lambert Conformal dataset to Plate Carree. I know that this can easily be done visually using cartopy. However, I'm trying to create a new dataset rather than just show a reprojected image. Below is methodology I have mapped out but I'm unable to subset the dataset properly (Python 3.5, MacOSx).
from siphon.catalog import TDSCatalog
import xarray as xr
from xarray.backends import NetCDF4DataStore
import numpy as np
import cartopy.crs as ccrs
from scipy.interpolate import griddata
import numpy.ma as ma
from pyproj import Proj, transform
import metpy
# Declare bounding box
min_lon = -78
min_lat = 36
max_lat = 40
max_lon = -72
boundinglat = [min_lat, max_lat]
boundinglon = [min_lon, max_lon]
# Load the dataset
cat = TDSCatalog('https://thredds.ucar.edu/thredds/catalog/grib/NCEP/HRRR/CONUS_2p5km/latest.xml')
dataset_name = sorted(cat.datasets.keys())[-1]
dataset = cat.datasets[dataset_name]
ds = dataset.remote_access(service='OPENDAP')
ds = NetCDF4DataStore(ds)
ds = xr.open_dataset(ds)
# parse the temperature at 850 and # 0z reftime
tempiso = ds.metpy.parse_cf('Temperature_isobaric')
t850 = tempiso[0][2]
# transform bounding lat/lons to src_proj
src_proj = tempiso.metpy.cartopy_crs #aka lambert conformal conical
extents = src_proj.transform_points(ccrs.PlateCarree(), np.array(boundinglon), np.array(boundinglat))
# subset the data using the indexes of the closest values to the src_proj extents
t850_subset = t850[(np.abs(tempiso.y.values - extents[1][0])).argmin():(np.abs(tempiso.y.values - extents[1][1])).argmin()][(np.abs(tempiso.x.values - extents[0][1])).argmin():(np.abs(tempiso.x.values - extents[0][0])).argmin()]
# t850_subset should be a small, reshaped dataset, but it's shape is 0x2145
# now use nplinspace, npmeshgrid & scipy interpolate to reproject
My transform point > find nearest value subsetting isn't working. It's claiming the closest points are outside the realm of the dataset. As noted, I plan to use nplinspace, npmeshgrid and scipy interpolate to create a new, square lat/lon dataset from t850_subset.
Is there an easier way to resize & reproject an xarray dataset?

Your easiest path forward is to take advantage of xarray's ability to do pandas-like data selection; this is IMO the best part of xarray. Replace your last two lines with:
# By transposing the result of transform_points, we can unpack the
# x and y coordinates into individual arrays.
x_lim, y_lim, _ = src_proj.transform_points(ccrs.PlateCarree(),
np.array(boundinglon), np.array(boundinglat)).T
t850_subset = t850.sel(x=slice(*x_lim), y=slice(*y_lim))
You can find more information in the documentation on xarray's selection and indexing functionality. You would probably also be interested in xarray's built-in support for interpolation. And if interpolation methods beyond SciPy's are of interest, MetPy also has a suite of other interpolation methods.

We have various "regridding" methods in Iris, if that isn't too much of a context switch for you.
Xarray explains its relationship to Iris here, and provides a to_iris() method.

Related

How to read specific keypoints in COCOEval

I need to calculate the mean average precision (mAP) of specific keypoints (and not for all keypoints, as it done by default).
Here's my code :
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
# https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb
cocoGt = COCO('annotations/person_keypoints_val2017.json') # initialize COCO ground truth api
cocoDt = cocoGt.loadRes('detections/results.json') # initialize COCO pred api
cat_ids = cocoGt.getCatIds(catNms=['person'])
imgIds = cocoGt.getImgIds(catIds=cat_ids)
cocoEval = COCOeval(cocoGt, cocoDt, 'keypoints')
cocoEval.params.imgIds = imgIds
cocoEval.evaluate()
cocoEval.accumulate()
cocoEval.summarize()
print(cocoEval.stats[0])
This code prints the mAP for all keypoints ['nose', ...,'right_ankle'] but I need only for few specific keypoints like ['nose', 'left_hip', 'right_hip']
I recently solved this and evaluated only the 13 key points, leaving behind the eyes and the ears as per my application.
Just open the cocoeval.py under pycocotools, then head over to the computeOKS function, where you will encounter two sets of keypoints—ground truth keypoints—and detection keypoints, such as a NumPy array.
Make sure to do proper slicing for that 51 array size Python lists.
For example, if you wish to only check the mAP for nose, the slicing would be as follows:
g= np.array(gt['keypoints'][0:3])
Similarly, do it for a dt array.
Also, set the sigma values of those unwanted key points to 0.
You are all set!

Python 3 - Scipy and KDEpy

I am using python-3.x and I want to Evaluate the estimated pdf on a provided set of points using the KDEpy but I couldn't get it right,
I used the scipy.stats.gaussian_kde and it is fine and work very well when I apply the pdf Method as I am interested in the Evaluate the estimated pdf on a provided set of points.
so the question is how to get the same result from the scipy.stats.kde if I used the KDEpy FFTKDE
here a small example that describes what I am looking for:
from scipy.stats.kde import gaussian_kde
data = np.array([[-1.84134663, -1.42036525, -1.38819347],
[-2.58165693, -2.49423057, -1.57609454],
[-0.78776371, -0.79168188, 0.21967791],
[-1.0165618 , -1.78509185, -0.68373997],
[-1.21764947, -0.43215885, -0.34393573]])
my_pdf = gaussian_kde(data.T, bw_method = None )
my_pdf1.pdf(data.T)
print (my_pdf1.pdf(data.T)) # here we will Evaluate the estimated pdf on a provided set of points
the result is:
[0.24234078 0.22071922 0.23802877 0.22474656 0.25402297]
how to get the same result by using the KDEpy FFTKDE
from KDEpy import FFTKDE
my_pdf2 = FFTKDE(kernel="gaussian").fit(data.T).evaluate()
but I don't know how to do the Evaluate the estimated pdf on a provided set of points similar to the scipy.stats.kde with pdf method.
You can create an equidistant grid using e.g. numpy.linspace and pass it to .evaluate():
from KDEpy import FFTKDE
import numpy as np
x_grid = np.linspace(-10, 10, num=2**10)
my_pdf = FFTKDE(kernel="gaussian").fit(data.T).evaluate(x_grid)

Is there a way using librosa's waveplot to store the coordinates of the graph rather than show the image of the waveplot?

I am working on an audio project where I am using Librosa and have the following code from an example online. Rather than opening up an image with a graph of the amplitude versus time, I want to be able to store the coordinates that make up the graph in an array. I have tried a lot of different examples found on stackoverflow as well as other websites with no luck. I am relatively new to python and this is my first question on stackoverflow so please be kind.
import librosa.display
import matplotlib.pyplot as plt
from IPython.display import display, Audio
filename = 'queen2.mp3'
samples, sampleRate = librosa.load(filename)
display(Audio(filename))
plt.figure(figsize=(12, 4))
librosa.display.waveplot(y, sr=None, max_points=200)
plt.show()
librosa is open-source (under the ISC license), so you can look at the code to see how it does this. The documentation for functions has a handy [source] link which takes you do the code. For librosa.display.waveplot you will see that it calls a function __envelope() to compute the envelope. Presumably it is these coordinates you are after.
hop_length = 1
y = __envelope(y, hop_length)
y_top = y[0]
y_bottom = -y[-1]
import numpy as np
def __envelope(x, hop):
'''Compute the max-envelope of non-overlapping frames of x at length hop
x is assumed to be multi-channel, of shape (n_channels, n_samples).
'''
x_frame = np.abs(util.frame(x, frame_length=hop, hop_length=hop))
return x_frame.max(axis=1)

Using weighted adjacency matrices to calculate global efficiency of said matrix using networkx

I have been trying to study the impact on a network by looking at deletions of different combinations of nodes.
To study this I have used the networkx graph theory metric, global efficiency. But, I figured that the networkx code ignores weight when calculating global efficiency. So, I went in and changed the source code and added weight as a metric. It seems to be working and is giving me different values than the non-weighted approach but is exceptionally slow (about 20 times).
How can I speed up these computations?
##The code I am running
import networkx
import numpy as np
from networkx import algorithms
from networkx.algorithms import efficiency
from networkx.algorithms.efficiency import global_efficiency
import pandas
data=pandas.read_csv("ones.csv")
lol = data.values.tolist()
data=pandas.read_csv("twos.csv")
lol2 = data.values.tolist()
combo=[["10pp", "10d"]]
GE_list=[]
for row in combo:
values = row
datasafe=pandas.read_csv("b1.csv", index_col=0)
datasafe.loc[values, :] = 0
datasafe[values] = 0
g=networkx.from_pandas_adjacency(datasafe)
ge=global_efficiency(g)
GE_list.append(ge)
extra=[""]
extra2=["full"]
combo.append(extra)
combo.append(extra2)
datasafe=pandas.read_csv("b1.csv", index_col=0)
g=networkx.from_pandas_adjacency(datasafe)
ge=global_efficiency(g)
GE_list.append(ge)
values = ["s6-8","p9-46v","p47r","p10p","IFSp","IFSa",'IFJp','IFJa','i6-8','a9-46v','a47r','a10p','9p','9a','9-46d','8C','8BL','8AV','8AD','47s','47L','10pp','10d','46','45','44']
datasafe=pandas.read_csv("b1.csv", index_col=0)
datasafe.loc[values, :] = 0
datasafe[values] = 0
g=networkx.from_pandas_adjacency(datasafe)
ge=global_efficiency(g)
GE_list.append(ge)
output=pandas.DataFrame(list(zip(combo, GE_list)))
output.to_csv('delete 1.csv',index=None)
##The change I made to the original networkx code
try:
eff = 1 / nx.shortest_path_length(G, u, v)
## changed to
try:
eff = 1 / nx.shortest_path_length(G, u, v, weight='weight')
Previously with my unweighted graphs I was able to process my data in 2 hours, currently its taking the same time to do a twentieth of the data. Please do suggest any improvements to my code or any other pieces of code that I can run.
Ps-I don't have a great understanding of python, so please do bear with me :)
Using weights, you exchange breadth-first search with Dijkstra algorithm, which increases the runtime by log|V|, see second comment of https://stackoverflow.com/a/25449911
If you have problem with the runtime, you should rather exchange networkx, which is implemented in python, with a C implementation like graph-tool or igraph, see e.g. for a (probably biased) comparison of performance: https://graph-tool.skewed.de/performance

Robust statistics linear regression in seaborn pairplot

Trying to implement robust statistics instead of ordinary least squares (OLS) fitting so that outliers aren't such a problem to my fits.
I was hoping to implement this in the pairplot function of seaborn and can't see and easy way to add this from the AP documentation as there doesn't seem to be a key word argument for the fit.
From: scipy lectures They suggest using the following but I guess thats for regplot where you can define the fit using
`fit = statsmodels.formula.api.rlm()`
Here is some sample code
import seaborn as sns; sns.set(style="ticks", color_codes=True)
import matplotlib.pyplot as plt
%matplotlib inline
iris = sns.load_dataset("iris")
sns.pairplot(iris, kind="reg")#, robust = True)
plt.show()
Thanks in advance!
Edit: I found a workaround, but loose the 'hue' function apparently that could be done on the pairplot. Would be a nice feature to add robust option to pairplot.
Code:
def corrfunc(x, y, **kws):
r, _ = stats.pearsonr(x, y)
ax = plt.gca()
ax.annotate("r = {:.2f}".format(r), xy=(.1, .9), xycoords=ax.transAxes)
g = sns.PairGrid(df1, palette=["red"])
g.map_upper(sns.regplot, robust = True)
g.map_diag(sns.distplot, kde=True)
g.map_lower(sns.kdeplot, cmap="Blues_d")
g.map_lower(corrfunc)
Extra keywords, such as "robust = True" can be passed to regplot via the plot_kws argument:
sns.pairplot(df1,kind='reg',hue='species',plot_kws=dict(robust=True,n_boot=50))
NB: In this example I have also decreased n_boot to reduce the runtime (see "robust" in regplot documentation), so the confidence intervals might be incorrect).

Resources