I am working on creating cross sections of HRRR model output, I have read in the grib files using xarray with pynio as the engine and then converted this files to netcdf so I can work with them on my windows machine, therefore I am wondering if this is causing these issues.
Here is a what my dataset looks like after reading in the netcdf with xarray: Imgur
After reading in the data I try to follow the Metpy cross section/ Xarray tutorials by parsing the data:
data = ds.metpy.parse_cf()
Which yields this new dataset:Imgur
It created the crs coordinate so I assumed it worked somewhat correctly.
Following this I created a contour map of 700mb RH, winds, and elevation(different data set) where I parsed the RH from the data dataset and also pulled out the x and y
RH = data.metpy.parse_cf('RH_P0_L100_GLC0')
x, y = RH.metpy.coordinates('x', 'y')
This all worked and I could produce a nice looking plot no problem. So next I wanted to make a cross section. Following the example in the documentation:
start = (40.3847, -120.5676)
end = (39.2692, -122.3784)
cross = cross_section(data, start, end)
which gave these errors:Imgur
So then I instead tried using the RH variable from above since
RH.metpy.x
gave the x-dimension. But running
cross = cross_section(RH, start, end)
gave this error instead:Imgur
So I'm just wondering if I missed a step in parsing the original dataset or if the grib to netcdf conversion messed something up or if this is even possible using metpy?
In general I am just working towards creating a cross section like the one in the example: https://unidata.github.io/MetPy/latest/examples/cross_section.html#sphx-glr-examples-cross-section-py
As a bonus question would it be possible to fill terrain under the plots?
Currently, MetPy's cross section interpolation relies on the x and y dimensions being present in the Dataset/DataArray as dimension coordinates (see the description in xarray's documentation here). In your dataset, the x and y dimensions of ygrid_0 and xgrid_0 are listed as dimensions without coordinates, hence the problem.
However, since this situation is commonly encountered in meteorological data files, MetPy's current implementation may be too stringent. I would suggest opening an issue on MetPy's issue tracker.
In regards to your bonus question, so long as you have terrain level data in the same vertical coordinate as your data, you can use the fill_between() method in matplotlib to fill in terrain under the plots.
I have nearly the same problem.
ValueError: Data missing required coordinate information. Verify that your data have been parsed by MetPy with proper x and y dimension coordinates and added crs coordinate of the correct projection for each variable.
if i try this:
cross = cross_section(data, start, end)
the xarray looks like this:
<xarray.Dataset>
Dimensions: (bnds: 2, height: 61, height_2: 1, height_3: 60, height_4: 61, height_5: 1, lat: 101, lev: 1, lev_2: 1, lev_3: 1, lon: 121, time: 24)
Coordinates:
* height (height) float64 1.0 2.0 3.0 4.0 ... 58.0 59.0 60.0 61.0
* height_3 (height_3) float64 1.0 2.0 3.0 4.0 ... 57.0 58.0 59.0 60.0
* lev (lev) float64 0.0
* lev_2 (lev_2) float64 400.0
* lev_3 (lev_3) float64 800.0
* lon (lon) float64 -30.0 -29.5 -29.0 -28.5 ... 29.0 29.5 30.0
* lat (lat) float64 -10.0 -9.5 -9.0 -8.5 ... 38.5 39.0 39.5 40.0
crs object Projection: latitude_longitude
* height_2 (height_2) float64 10.0
* time (time) float64 2.017e+07 2.017e+07 ... 2.017e+07 2.017e+07
* height_4 (height_4) float64 1.0 2.0 3.0 4.0 ... 58.0 59.0 60.0 61.0
* height_5 (height_5) float64 2.0
Dimensions without coordinates: bnds
Data variables:
height_bnds (height, bnds) float64 ...
height_3_bnds (height_3, bnds) float64 ...
lev_bnds (lev, bnds) float64 ...
lev_2_bnds (lev_2, bnds) float64 ...
lev_3_bnds (lev_3, bnds) float64 ...
z_ifc (height, lat, lon) float32 ...
topography_c (lat, lon) float32 ...
fis (lat, lon) float32 ...
con_gust (time, height_2, lat, lon) float32 ...
gust10 (time, height_2, lat, lon) float32 ...
u (time, height_3, lat, lon) float32 ...
I mean there is a lat lon grid... is there a workaround to use the cross_section for a lat lon grid?
or can i rename the lat lon to x and y?
Best
Related
I would like to plot a line plot (source: pandas dataframe) over a hvplot (source: xarray/ NetCDF).
The xarray looks like this:
dataDIR = 'ceilodata.nc'
DS = xr.open_dataset(dataDIR)
DS = DS.transpose()
print(DS)
<xarray.Dataset>
Dimensions: (range_hr: 32, range: 1024, layer: 3, time: 5760)
Coordinates:
* range_hr (range_hr) float32 0.001 4.995 9.99 ... 144.9 149.9 154.8
* range (range) float32 14.98 29.97 44.96 ... 1.533e+04 1.534e+04
* layer (layer) int32 1 2 3
* time (time) datetime64[ns] 2022-03-18 ... 2022-03-18T23:59:46
Data variables: (12/41)
zenith float32 ...
wavelength float32 ...
scaling float32 ...
range_gate_hr float32 ...
range_gate float32 ...
longitude float32 ...
... ...
cbe (layer, time) int16 ...
beta_raw_hr (range_hr, time) float32 ...
beta_raw (range, time) float32 ...
bcc (time) int8 ...
base (time) float32 ...
average_time (time) int32 ...
Attributes: (12/13)
comment:
software_version: 15.06.1 2.13 1.040 1
title: CHM15k Nimbus
wmo_id: 10865
month: 3
source: CHM160138
... ...
serlom: TUB160038
location: muenchen
year: 2022
device_name: CHM160138
institution: DWD
day: 18
The pandas dataframe source looks like this:
df = pd.read_csv('PTU.csv')
print(df)
Unnamed: 0 PTU
0 2022-03-18 07:38:56 451.839
1 2022-03-18 07:38:57 468.826
2 2022-03-18 07:38:58 469.093
3 2022-03-18 07:38:59 469.356
4 2022-03-18 07:39:00 469.623
... ... ...
6140 2022-03-18 09:21:16 31690.600
6141 2022-03-18 09:21:17 31694.700
6142 2022-03-18 09:21:18 31692.900
6143 2022-03-18 09:21:19 31712.000
6144 2022-03-18 09:21:20 31711.500
[6145 rows x 2 columns]
Both are time dependend datasets but have different time stamps and frequencies. Time is index in each data set.
I tried to plot them together with additional imports of holoviews. While each single plot is no problem, plotting them together seems not to work the way I tried it:
import hvplot.pandas
import holoviews as hv
# cmap of the xarray:
ceilo = (DS.b_r.hvplot(cmap="viridis_r", width = 850, height = 600, title = 'title', clim = (5, 80))
# line plot of the data frame
p = df.hvplot.line()
# add pressure line plot to pcolormeshplot using * which overlays the line on the plot
ceilo * p
but this ended in an error message with the following complete traceback:
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-10-2b1c6baca339> in <module>
24 p = df.hvplot.line()
25 # add pressure line plot to pcolormeshplot using * which overlays the line on the plot
---> 26 ceilo * df
c:\python38\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
68 other = item_from_zerodim(other)
69
---> 70 return method(self, other)
71
72 return new_method
c:\python38\lib\site-packages\pandas\core\arraylike.py in __rmul__(self, other)
118 #unpack_zerodim_and_defer("__rmul__")
119 def __rmul__(self, other):
--> 120 return self._arith_method(other, roperator.rmul)
121
122 #unpack_zerodim_and_defer("__truediv__")
c:\python38\lib\site-packages\pandas\core\frame.py in _arith_method(self, other, op)
6936 other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))
6937
-> 6938 self, other = ops.align_method_FRAME(self, other, axis, flex=True, level=None)
6939
6940 new_data = self._dispatch_frame_op(other, op, axis=axis)
c:\python38\lib\site-packages\pandas\core\ops\__init__.py in align_method_FRAME(left, right, axis, flex, level)
275 elif is_list_like(right) and not isinstance(right, (ABCSeries, ABCDataFrame)):
276 # GH 36702. Raise when attempting arithmetic with list of array-like.
--> 277 if any(is_array_like(el) for el in right):
278 raise ValueError(
279 f"Unable to coerce list of {type(right[0])} to Series/DataFrame"
c:\python38\lib\site-packages\holoviews\core\element.py in __iter__(self)
94 def __iter__(self):
95 "Disable iterator interface."
---> 96 raise NotImplementedError('Iteration on Elements is not supported.')
97
98
NotImplementedError: Iteration on Elements is not supported.
Is the different time frequency a problem here? The line plot should be orientated along the x- and the y-axis considering the right time stamp and altitude of the underlying cmap-(matplotlib)-plot.
To illustrate what I am aiming for, here is a picture of my goal:
Thanks for reading / helping.
I found a solution for this case:
Both dataset time columns have to have the same format. In my case it's: datetime64[ns] (to adopt to the NetCDF xarray). That is why I converted the dataframe time column to datetime64[ns]:
df.Datetime = df.Datetime.astype('datetime64')
Also I found the data to be type "object". So I transformed it to "float":
df.PTU = df.PTU.astype(float) # convert to correct data type
The last step was choosing hvplot as this helps in plotting xarray data
import hvplot.xarray
hvplot.quadmesh
And here is my final solution:
title = ('Ceilo data + '\ndate: '+ str(DS.year) + '-' + str(DS.month) + '-' + str(DS.day))
ceilo = (DS.br.hvplot.quadmesh(cmap="viridis_r", width = 850, height = 600, title = title,
clim = (1000, 10000), # set colorbar limits
cnorm = ('log'), # choose log scale
clabel = ('colorbar title'),
rot = 0 # degree rotation of ticks
)
)
# from: https://justinbois.github.io/bootcamp/2020/lessons/l27_holoviews.html
# take care! may take 2...3 minutes to be ploted:
p = hv.Points(data=df,
kdims=['Datetime', 'PTU'],
).opts(#alpha=0.7,
color='red',
size=1,
ylim=(0, 5000))
# add PTU line plot to quadmesh plot using * which overlays the line on the plot
ceilo * p
I'm trying to clip a rioxarray dataset to a shapefile, but get the following error:
> data_clipped = data.rio.clip(shape.geometry.apply(mapping))
MissingCRS: CRS not found. Please set the CRS with 'set_crs()' or 'write_crs()'. Data variable: precip
This error seems straightforward, but I can't figure out which CRS needs to be set. Both the dataset and the shapefile have CRS values that rio can find:
> print(data.rio.crs)
EPSG:4326
> print(shape.crs)
epsg:4326
The dataarray within the dataset, called 'precip', does not have a CRS, but it also doesn't seem to respond to the set_crs() command:
> print(data.precip.rio.crs)
None
> data.precip.rio.set_crs(data.rio.crs)
> print(data.precip.rio.crs)
None
What am I missing here?
For reference, rioxarray set_crs() documentation - this shows set_crs() working on data arrays, unlike my experience with data.precip
My data, in case I have something unusual:
> print(data)
<xarray.Dataset>
Dimensions: (x: 541, y: 411)
Coordinates:
* y (y) float64 75.0 74.9 74.8 74.7 74.6 ... 34.3 34.2 34.1 34.0
* x (x) float64 -12.0 -11.9 -11.8 -11.7 ... 41.7 41.8 41.9 42.0
time object 2020-01-01 00:00:00
spatial_ref int64 0
Data variables:
precip (y, x) float64 nan nan nan ... 1.388e-17 1.388e-17 1.388e-17
Attributes:
Conventions: CF-1.6
history: 2021-01-05 01:36:52 GMT by grib_to_netcdf-2.16.0: /opt/ecmw...
> print(shape)
ID name orgn_name geometry
0 Albania Shqipëria MULTIPOLYGON (((19.50115 40.96230, 19.50563 40...
1 Andorra Andorra POLYGON ((1.43992 42.60649, 1.45041 42.60596, ...
2 Austria Österreich POLYGON ((16.00000 48.77775, 16.00000 48.78252...
This issue is resolved if the set_crs() is used in the same command as the clip operation:
data_clipped = data.precip.rio.set_crs('WGS84').rio.clip(shape.geometry.apply(mapping))
I have a DataFrame that stores results from a regression, like this:
feats = ['X1', 'X2', 'X3']
betas = [0.5, 0.7, 0.9]
ses = [0.05, 0.03, 0.02]
data = {
"Feature": feats,
"Beta": betas,
"Error":ses
}
data = pd.DataFrame(data)
It looks like this:
Beta Error Feature
0 0.5 0.05 X1
1 0.7 0.03 X2
2 0.9 0.02 X3
I want to make a graph coefficients for each feature, the height being "Beta" and the error line being "Error".
Is there a way to get this working in Matplot?
I have tried error plot but maybe did it wrong or something.
You can use the plt.errorbar as following (matplotlib 2.2.2)
plt.errorbar(data.Feature, data.Beta, yerr=data.Error, capthick=2, capsize=2)
If somehow the above line doesn't work for you, you can use this workaround
plt.errorbar(range(len(data.Feature)), data.Beta, yerr=data.Error, capthick=2, capsize=2)
plt.xticks(range(len(data.Feature)), data.Feature)
I have a very large dataset in a NetCDF file.
RZSC = xr.open_dataset('/home/chandra/data/RZSC_250m_SA.nc')
RZSC = RZSC.Band1
RZSC
[Output]:
<xarray.DataArray 'Band1' (lat: 32093, lon: 20818)>
[668112074 values with dtype=float32]
Coordinates:
* lat (lat) float64 -58.36 -58.36 -58.35 -58.35 ... 13.71 13.71 13.71
* lon (lon) float64 -81.38 -81.37 -81.37 -81.37 ... -34.63 -34.63 -34.62
Attributes:
long_name: GDAL Band Number 1
grid_mapping: crs
########################
Treecover = xr.open_dataset('/home/chandra/data/Treecover_MOD44B_2000_250m_AMAZON.nc')
Treecover = Treecover.Band1
Treecover
[Output]:
<xarray.DataArray 'Band1' (lat: 32093, lon: 20818)>
[668112074 values with dtype=float64]
Coordinates:
* lat (lat) float64 -58.36 -58.36 -58.35 -58.35 ... 13.71 13.71 13.71
* lon (lon) float64 -81.38 -81.37 -81.37 -81.37 ... -34.63 -34.63 -34.62
Attributes:
long_name: GDAL Band Number 1
grid_mapping: crs
####
np.nanmax(Treecover[:,:])
[Output]: 85.0625
np.nanmin(Treecover[:,:])
[Output]: 0.0
I am neither able to visualize the dataset or filter the dataset using any command like RZSC[:,:].where(Treecover[:,:] > 1000).shape which is quite frustrating (as the output is (32093, 20818), same as the original array size).
Does anyone have any suggestion for this?
I was not able to share the data as the size of the netcdf file is > 6 GB.
xr.where() will always return the same size array that you feed it. Did you try visualizing it? It should set all of the indices where the condition is false to NA. You can manually set it to whatever you want as well:
RZSC.where(Treecover > 1000, Treecover, np.NaN)
I am new here.
first on all, I am very thankful for your time and consideration.
I have 2 questions regarding to managing 2 different netcdf files in python.
I searched a lot but unfortunately I couldn't find a solution.
1- I have a netcdf file which has coordinates like below:
time datetime64[ns] 2016-08-16T22:00:00
* y (y) int32 220000 ... 620000
* x (x) int32 20000 ... 720000
lat (y, x) float64 dask.array<shape=(401, 701),
lon (y, x) float64 dask.array<shape=(401, 701),
I need to change coords to lon/lat in order that I can slice an area based on specific lon/lat coords (by using xarray). But I don't know how to change x and y to lon lat.
here my code:
import xarray as xr
import matplotlib.pyplot as plt
p = "R_201608.nc"
ds = xr.open_mfdataset(p)
q=ds.RR.sel(time='2016-08-16T21:00:00')
2- Similar to 1, I have another netcdf file which has coordinates like below:
* X (X) float32 557600.0 .. 579400.0
* Y (Y) float32 5190600 ... 5205400.0
* time (time) datetime64[ns] 2007-01I
How can I convert x and y to lon/lat system in order that I can plot it in lon/lat system?
Edit related to #Ryan :
1- Yes. this file demonestrates rainfall over a large area. I want to cut it into smaller area -similar area of file related to q2- and compare them uusing bias, RMSE, etc. here is full information related to this file:
<xarray.Dataset>
Dimensions: (time: 2976, x: 701, y: 401)
Coordinates:
* time (time) datetime64[ns] 2016-08-31T23:45:00
* y (y) int32 220000 221000 ... 619000 620000
* x (x) int32 20000 21000 ... 719000 720000
lat (y, x) float64 dask.array<shape=(401, 701),chunksize=(401, 701)>
lon (y, x) float64 dask.array<shape=(401, 701), chunksize=(401, 701)
Data variables:
RR (time, y, x) float32 dask.array<shape=(2976, 401, 701), chunksize=(2976, 401, 701)>
lambert_conformal_conic int32 ...
Conventions: CF-1.5
edit related to #Ryan :2- And here it is the full information about the second file (Smaller area):
<xarray.DataArray 'Precip' (time: 8928, Y: 75, X: 110)>
dask.array<shape=(8928, 75, 110), dtype=float32, chunksize=(288, 75, 110)>
Coordinates:
sensor_height_precip float32 1.5
sensor_height_P float32 1.5
* X (X) float32 557600.0 557800.0 ... 579200.0 579400.0
* Y (Y) float32 5190600.0 5190800.0 ... 5205400.0
* time (time) datetime64[ns] 2007-01-31T23:55:00
Attributes:
grid_mapping: UTM33N
ancillary_variables: QFlag_Precip QGrid_Precip
long_name: Precipitation Amount
standard_name: precipitation_amount
cell_methods: time:sum
units: mm
In problem 1), it is not possible to convert lon and lat to dimension coordinates, because they are two-dimensional (both have dimension x, y). Dimension coordinates, used for slicing, can only be one-dimensional. If you can be more specific about what you want to do after slicing, we can provide more suggestions about how to proceed. Do you want to select a particular latitude / longitude range and then calculate some statistics (e.g. mean / variance)?
In problem 2) it looks like you have a map projection. Without more information about the projection, it is impossible to convert to lat / lon coordinates or plot on a map. Is there more information contained in your dataset about the map projection used? Can you post the full output of print(ds)?
I have solved my problem with your help. Thanks a lot.
I could change the coords of both data sets to lon/lat using PYPROJ as #Bart mentioned. creating meshgid from original and projected coordinates was the key point.
from pyproj import Proj
nxv, nyv = np.meshgrid(nx, ny)
unausp = Proj('+proj=lcc +lat_1=49 +lat_2=46 +lat_0=47.5 +lon_0=13.33333333333333 +x_0=400000 +y_0=400000 +ellps=bessel +towgs84=577.326,90.129,463.919,5.137,1.474,5.297,2.4232 +units=m +no_defs ')
nlons, nlats = unausp(nxv, nyv, inverse=True)
upLon, upLat = np.meshgrid(nlons,nlats)
Since I want to compare two rainfall data sets with different spatial resolution (different grid size), I have to upscale one of them by using xarray interpolation:
upnew_lon = np.linspace(w.X[0], w.X[-1], w.dims['X'] // 5)
upnew_lat = np.linspace(w.Y[0], w.Y[-1], w.dims['Y'] //5)
uppds = w.interp(Y=upnew_lat, X=upnew_lon)
AS far as I know, this interpolation is based on linear interpolation. I compared upscaled data set with the original one. The mean of rainfall decreases about 0.03mm/day after upscaling. I just want to know do you think this upscaling method for sub-hourly rainfall is reliable?