In python 3.6 I have imported a netCDF4 file containing global precipitation values. I have also imported a shapefile which contains the shape for the Colorado River basin. My goal is to be able to read/extract precipitation data only within my shapefile. I have looked up multiple examples but none have really helped.
Here is my code so far:
from netCDF4 import Dataset
import numpy as np
import geopandas as gpd
nc = Dataset('filename.nc')
long = nc.variables['lon'][:]
lati = nc.variables['lat'][:]
rainfall = nc.variables['precip'][:]
shapefile=gpd.read_file('filename.shp')
There are no error messages on the code above.
Oh, look, hydrologist in the house! ;)
Well, so far you haven't done much with your code, all you did was read files into memory.
When I was trying to perform the same analysis (only with grib files), I found a great Python library for exactly such purpose, called RasterStats.
It supports working with ndarray raster objects as well as most of the GDAL supported raster filetypes (must be netCDF also!), and it generates exactly the thing you want.
For more, see a very neat manual and let me know if you get stuck somewhere!
Related
I'm using style in pandas to display a dataframe consisting of a timestamp on jupyter notebook.
The displayed value, 1623838447949609984 turned out to be different from the input, 1623838447949609899.
pandas version, 1.4.2.
Can someone please explain the reason of the following code and output?
Thanks.
import pandas as pd
pd.DataFrame([[1623838447949609899]]).style
Pandas Styler, within its render script, contains the line return f"{x:,.0f}" when x is an integer.
In python if you execute
>>> "{:.0f}".format(1623838447949609899)
'1623838447949609984'
you obtain the result you cite. I suspect this is due to data storage of integers. Although why it pandas might be converting from a 64 bit int to a 32 bit int is unclear, and not related to Styler
I'm working on some code to select and export geodata based on a bounding box. The data I want to select comes from 2 seperate layers in a huge File GDB (16GB) covering the entire Netherlands. I use a bounding box as to avoid reading the entire dataset before making a selection.
This method works great when applied on a gpkg database, but with a file geodatabase the time to process is way longer (0,2s vs 300s for a 200x200 meter selection). The File GDB I'm using has a spatial index set for the layers I'm reading. I'm using geopandas to read and select. Below you'll find an example for the layer 'Adres':
import geopandas as gpd
def ImportGeodata(FilePath, BoundingBox):
importBag=gpd.read_file(FilePath, layer='Adres', bbox=BoundingBox)
importBag['mergeid']=importBag['identificatie']
return importBag
Am I overseeing something? Or is this a limitation when importing from a huge File GDB? I can't find an obvious mistake here. For now the workaround is another script that imports and dumps the layers I need in a gpkg. Problem is this runs for 3 to 4 hours (gpkg result is almost 6 GB). I don't want to keep doing that, it would be necessary to do once every month or so in order to process a new version of this dataset.
Curious what you guys come up with.
I have to generate a 2d mesh in a format compatible with optimesh, in order to refine it with the algorithms included in that library, (in particular Centroidal Voronoi tesselation smoothing). I'm starting from a set of unordered points, so I'm trying to understand which is the easiest chain of tools to do the job.I have no familiarity at all with geometry processing, so forgive me if my questions are stupid.
I found a lot of libraries to process a mesh from a file in a huge variety of format, but I'm missing how to generate it from points.
I've seen that with scipy I can get a triangulation, but the object returning from scipy, can't be fed directly to optimesh.
So, my problem now is basically something like this:
import numpy as np
from scipy.spatial import Delaunay,delaunay_plot_2d
points = np.random.random((100,2))
delaun = Delaunay(points)
#Magic code that I wish
delaun.to_meshfile('meshfile.xxx')
#
with a file format that i can process later with optimesh
optimesh author here. Your delaun object has delaun.points and delaun.simplices. Those can be fed into optimesh:
import numpy as np
from scipy.spatial import Delaunay, delaunay_plot_2d
import optimesh
points = np.random.random((100, 2))
delaun = Delaunay(points)
points, cells = optimesh.cvt.quasi_newton_uniform_blocks(
delaun.points, delaun.simplices, tol=1.0e-5, max_num_steps=100
)
If you really want to store them in a file, check out meshio.
Im new to medical image processing. how can i convert 3D DICOM medical images to numerical matrix format using either python or c++?
Another option, if you really want "3D" dicom image support (ie CT/MR/NM/PET 3d series - as opposed to purely 2D image handling) and you want do anything really 3d related and/or more complex, you might want to check out simple ITK.
That gives you very powerful true 3d handling and is fast (it's wrapped around complied C). It includes, for example, full 3D image registration and various filters/tools etc.
It can read an entire series at once and automatically create a fully spatially aware 3D numpy array for you (ie it takes care of processing all the dicom 3D spatial orientation/spacing etc tags for you)
However, because it's a lot more powerful than pydicom, it also has a much steeper learning curve - but does have many examples and online jupyter notebook tutorials.
...so, depending on your needs it might be good for you. However, if you only really want basic 2d image-at-a-time type processing, pydicom is the way to go.
You can use pydicom package in python. You can install it in python by:
pip install pydicom
Here is a simple example of reading DICOM images and converting to numpy array:
import os
import pydicom
import numpy as np
dicom_dir = your_dicom_folder_of_slices
file_names = os.listdir(dicom_dir)
file_names.sort()
dicom_data = []
for name in file_names:
path = os.path.join(dicom_dir, name)
dicom_data.append(pydicom.read_file(path))
array = [data.pixel_array for data in dicom_data]
array = np.stack(array, axis=-1) # or 0 if 'channel_first'
Here is a detailed example.
I prefer using SimpleElastix for medical image processing. it has many methods for segmentations and many other helpful methods. it is available in both python and C++. In my experience SimpleElastix handled DICOMS and niftis better than other Packages.
I just updated pandas from 0.17.1 to 0.21.0 to take advantage of some new functionalities, and ran into compatibility issue with matplotlib (which I also updated to latest 2.1.0). In particular, the Timestamp object seems to be changed significantly.
I happen to have another machine still running the older versions of pandas(0.17.1)/matplotlib(1.5.1) which I used to compared the differences:
Both versions show my DataFrame index to be dtype='datetime64[ns]
DatetimeIndex(['2017-03-13', '2017-03-14', ... '2017-11-17'], type='datetime64[ns]', name='dates', length=170, freq=None)
But when calling type(df.index[0]), 0.17.1 gives pandas.tslib.Timestamp and 0.21.0 gives pandas._libs.tslib.Timestamp.
When plotting with df.index as x-axis:
plt.plot(df.index, df['data'])
matplotlibs by default formats the x-axis labels as dates for pandas 0.17.1 but fails to recognize it for pandas 0.21.0 and simply gives raw number 1.5e18 (epoch time in nanosec).
I also have a customized cursor that reports clicked location on the graph by using matplotlib.dates.DateFormatter on the x-value which fails for 0.21.0 with:
OverflowError: signed integer is greater than maximum
I can see in debug the reported x-value is around 736500 (i.e. day count since year 0) for 0.17.1 but is around 1.5e18 (i.e. nanosec epoch time) for 0.21.0.
I am surprised at this break of compatibility between matplotlib and pandas as they are obviously used together by most people. Am I missing something in the way I call the plot function above for the newer versions?
Update as I mentioned above, I prefer directly calling plot with a given axes object but just for the heck of it, I tried calling the plot method of the DataFrame itself df.plot(). As soon as this is done, all subsequent plots correctly recognize the Timestamp within the same python session. It's as if an environment variable is set, because I can reload another DataFrame or create another axes with subplots and no where does the 1.5e18 show up. This really smells like a bug as the latest pandas doc says pandas:
The plot method on Series and DataFrame is just a simple wrapper around plt.plot()
But clearly it does something to the python session such that subsequent plots deal with the Timestamp index properly.
In fact, simply running the example at the above pandas link:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
Depending on whether ts.plot() is called or not, the following plot either correctly formats x-axis as dates or not:
plt.plot(ts.index,ts)
plt.show()
Once a member plot is called, subsequently calling plt.plot on new Series or DataFrame will autoformat correctly without needing to call the member plot method again.
There is an issue with pandas datetimes and matplotlib coming from the recent release of pandas 0.21, which does not register its converters any more at import. Once you use those converters once (within pandas) they'll be registered and automatically used by matplotlib as well.
A workaround would be to register them manually,
import pandas.plotting._converter as pandacnv
pandacnv.register()
In any case the issue is well known at both pandas and matplotlib side, so there will be some kind of fix for the next releases. Pandas is thinking about readding the register in an upcomming release. So this issue may be there only temporarily. An option is also to revert to pandas 0.20.x where this should not occur.
Update: this is no longer an issue with current versions of matplotlib (2.2.2)/pandas(0.23.1), and likely many that have been released since roughly December 2017, when this was fixed.
Update 2: As of pandas 0.24 or higher the recommended way to register the converters is
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
or if pandas is already imported as pd,
pd.plotting.register_matplotlib_converters()
After opening an issue on pandas github, I learned that this was indeed a known issue between pandas and matplotlib regarding auto-registration of unit converter. In fact it was listed on what's new page which I had failed to see before, along with the proper way to register the converters:
from pandas.tseries import converter
converter.register()
This is also done the first time a member plot method is called on a Series or DataFrame which explains what I observed above.
It appears to have been done with the intention that matplotlib is supposed to implement some basic support for pandas datetime, but indeed a deprecation warning of some sort could be useful for such a break. However until matplotlib actually implements such support (or some sort of lazy registration mechanism), practically I'm always putting those two lines at the pandas import. So I'm not sure why pandas would want to disable the automatic registration on import before things are ready on the matplotlib side.
It looks this issue has been fixed in the future version of matplotlib.
Try to run "pip install --upgrade matplotlib".
I met the same issue "AttributeError: 'numpy.datetime64' object has no attribute 'toordinal'". It was fixed when upgrade the package matplotlib.