Am I Scrambling my data when writing to netCDF? - netcdf4

I have created a 2D array that when plotted looks like this:
Basically it is an array of size [101,365] of numbers with range from 0.0 to 1.2 and contains NaNs.
I am writing it to a netCDF4 file in this manner:
nc_out = Dataset(nc_out_file, 'w', format='NETCDF4')
#Create Dimemsions
y = nc_out.createDimension('y',101)
x = nc_out.createDimension('x',365)
#Create Variables
latitudes = nc_out.createVariable('latitude', np.float32, ('y'))
days = nc_out.createVariable('days', np.float32,('x'))
on2_climo = nc_out.createVariable('on2_climo', np.float32, ('x', 'y'))
#Fill Variables
latitudes[:] = lat
days[:] = day
on2_climo[:] = data
nc_out.close()
However, when I plot the data I've saved in the file it looks nothing like the original data:
What is going on here? The faint diagonal lines make me think there is something weird going on here...
Is there a better way to code a netCDF4 file? I'd share a copy of the original data with you... but I can't seem to get a faithful version of it saved...

Related

Expand netcdf to the whole globe with xarray

I have a dataset that looks like this:
As you can see, it only covers Latitudes between -55.75 and 83.25. I would like to expand that dataset so that it covers the whole globe (-89.75 to 89.75 in my case) and fill it with an arbitrary NA value.
Ideally I would want to do this with xarray. I have looked at .pad(), .expand_dims() and .assign_coords(), but did not really get a handle on the working ofeither of those.
If someone can provide an alternative solution with cdo, I would also be grateful for that.
You could do this with nctoolkit (https://nctoolkit.readthedocs.io/en/latest/), which uses CDO as a backend.
The example below shows how you could do it. Example starts by cropping a global temperature dataset to latitudes between -50 and 50. You would then need to regrid it to a global dataset, at whatever resolution you need. This uses CDO, which will extrapolate at the edges. So you probably want to set everything to NA outside the original dataset's values, so my code calls masklonlatbox from CDO.
import nctoolkit as nc
ds = nc.open_thredds("https://psl.noaa.gov/thredds/dodsC/Datasets/COBE2/sst.mon.ltm.1981-2010.nc")
ds.subset(time = 0)
ds.crop(lat = [-50, 50])
ds.to_latlon(lon = [-179.5, 179.5], lat = [-89.5, 89.5], res = 1)
ds.mask_box(lon = [-179.5, 179.5], lat = [-50, 50])
ds.plot()
# convert to xarray dataset
ds_xr = ds.to_xarray()

open and plot several data file on same plot Python

Newbie here, first question.
I have several data files, that I want to open, get the relevant data (x and y) and plot on the same plot.
I know how to do it if I type out a plot statement for each of them, but what I would want to create is a single function or script that takes the filenames as input, extracts the data (this part depends on the type of file, but I think I know how to do it) and then creates one single plot with the different datasets. It should be pretty basic, but all my attempts return a plot for each file.
I think that my problem is that I have not understood how the whole ax, fig, gca, plot loop works, as I have been learning mostly by adapting things and doing.
So far I have created a for loop that opens each file, gets the data and stores it in a dataframe (a dataframe per file) then uses a plt.plot to plot, and then out of the loop, I have a plt.gca() that in my intentions would get things together to then modify the plot, add stuff to it and save it. I have also tried changing the position of the gca and using ax and fig, playing around with a few tutorials, but never with satisfying results.
I get different kinds of errors, depending on the different iterations of the script, here is one of my attempts. If there's an electrochemist among you they might recognize the datatype :) but the datatype should not be important.
**EDIT: I modified the script, as it had a couple of errors, the current versions returns an empty plot.
**
the current version returns an empty plot, the dataframe is created properly, from what I can see
files = ['file1.i2b', 'file2.i2b']
colors = []
fig_name = ''
file_type = 'i2b'
norm = []
if len(colors) != len(files):
l = len(files)
col_list = ['b', 'g', 'r', 'c', 'm', 'y', 'k']
color_list = col_list[0:l]
if len(norm)!= len(files):
norm = [1]*len(files)
if file_type == 'i2b':
for filename, norm_factor, col in zip (files, norm, color_list):
flnm1 = os.path.splitext(filename)[0]
data_xrd = pd.read_csv(filename, sep=(' '), decimal = '.', skiprows =10,
header= None, names =['Freq','Real_part', 'Imm_part'])
data_xrd['norm_Imm_part'] = (0-data_xrd['Imm_part'])*norm_factor
data_xrd['norm_Re_part'] = data_xrd['Real_part']*norm_factor
plt.plot(x=data_xrd['norm_Re_part'], y=data_xrd['norm_Imm_part'],
legend=flnm1, style='-', color = col)
#plt.show
plt.gca()
#plt.axhline(y=0, color='k', linestyle='--')
#plt.set_xlabel('Z_real [Ohm]')
#plt.set_ylabel('Z_imm [Ohm]')
#plt.set_aspect('equal')
plt.savefig(fig_name + '.png')
Now, it might be better to split the data extraction to a different function, so that the plotting function is more flexible and can be paired with different kinds of data input, but at the moment I'd just like to understand how to use plot multiple files on a single plot simply by using a list of their names as input, in order to facilitate the grouping and plotting of a lot of datafiles.
Thanks for the help and please let me know how to improve my question!

'numpy.ndarray' object has no attribute 'write'

I am writing a python code to calculate the background of an astronomical image of globular cluster M15 (M15 reduced). My code can calculate the background and plot it using plt.imshow(). To save the background subtracted image I have to convert it to a str from a numpy.nparray. I have tried many things including the np.array2string used here. The file just stays as an array, which can't be saved as I need it to save as a .fits file. Any ideas how to get this to a str?
The code:
#sigma clip is the number of standard deviations from centre value that value can be before being rejected
sigma_clip = SigmaClip(sigma=2.)
#used to estimate the background in each of the meshes
bkg_estimator = MedianBackground()
#define path for reading in images
M15red_path = Path('.', 'ObservingData/M15normalised/')
M15red_images = ccdp.ImageFileCollection(M15red_path)
M15reduced = M15red_images.files_filtered(imagetyp='Light Frame', include_path=True)
M15backsub_path = Path('.', 'ObservingData/M15backsub/')
for n in range (0,59):
bkg = Background2D(CCDData.read(M15reduced[n]).data, box_size=(20,20),
filter_size=(3, 3),
edge_method='pad',
sigma_clip=sigma_clip,
bkg_estimator=bkg_estimator)
M15subback = CCDData.read(M15reduced[n]).data - bkg.background
np.array2string(M15subback)
#M15subback.write(M15backsub_path / 'M15backsub{}.fits'.format(n))
print(type(M15subback[1]))
You could try using [numpy.save][1] (but it saves a '.npy' file). In your case,
import numpy as np
...
for n in range (0,59):
...
np.save('M15backsub{}.npy'.format(n), M15backsub)
Since you need to store a numpy array, this should work.

Perform a calculation in a newly added column in csv in python

I am trying to add a new column to a csv file in python 3. The csv file has a header row, and the first two columns i don't need at this point. the other 8 columns contain 4 coordinates of a polygon. I am trying to add a new column that calculates the area from the points in the csv. I have seen several questions similar on stack overflow, and have tried to use the information there in my code however at the moment, only the last line of the csv is displaying and the I don't think the area is calculating correctly either. Any suggestions? (FYI this is my first code with a csv.)
Here is my code:
with open(poly.csv, 'rU')as input:
with open ('polyout.csv', 'w') as output:
writer = csv.writer(output, lineterminator='\n')
reader=csv.reader(input)
coords=[]
row =next(reader)
row =next(reader,None)
coords=row[2:]
prev_de=coords[-2]
prev_dn=coords[-1]
prev_de=float(prev_de)
prev_dn=float(prev_dn)
areasq=float(0)
for de,dn in zip(coords[:-1:2], coords[1::2]):
areasq+= (float(de)*float(prev_dn))-(float(dn)*float(prev_de))
prev_de, prev_dn = de,dn
area =abs(areasq)/2
for row in reader:
row.append(area)
coords.append(row)
writer.writerows(coords)
print(row)
I would recommend you use pandas for this.
import pandas as pd
df = pd.read_csv('./poly.csv')
df['area'] = calculate_area(df) # implement calculate_area
df.write_csv('polyout.csv')
You're probably better off actually just using plain numpy, see the answer to this question Calculate area of polygon given (x,y) coordinates
My data, 1st quadrangular given clockwise, 2nd given anticlockwise
$ cat a.csv
a,b,x1,y1,x2,y2,x3,y3,x4,y4
a,b,3,3,3,9,4,9,4,3
e,f,0,0,5,0,5,5,0,5
$
Imports, I import also stdout to be able to show on screen my
results
from csv import reader, writer
from sys import stdout
use the csv classes
data = reader(open('a.csv'))
out = writer(stdout)
process the headers (assuming one row of headers)
headers = next(data)
headers = headers+['A']
out.writerow(headers)
loop on data, process data, output processed data
for row in data:
# the list comprehension is unpacked in aptly named variables
x1, y1, x2, y2, x3, y3, x4, y4 = [int(v) for v in row[2:]]
# https://en.wikipedia.org/wiki/Shoelace_formula#Examples
a = (x1*y2+x2*y3+x3*y4+x4*y1-y1*x2-y2*x3-y3*x4-y4*x1)/2
row.append(a)
out.writerow(row)
I have saved the above in a file named area.py and finally we have
$ python3 area.py
a,b,x1,y1,x2,y2,x3,y3,x4,y4,A
a,b,3,3,3,9,4,9,4,3,-6.0
e,f,0,0,5,0,5,5,0,5,25.0
$
To use the shoelace formula as is remember that points must be ordered clockwise, if your data is different just write a = -(...

Bokeh Mapping Counties

I am attempting to modify this example with county data for Michigan. In short, it's working, but it seems to be adding some extra shapes here and there in the process of drawing the counties. I'm guessing that in some instances (where there are counties with islands), the island part needs to be listed as a separate "county", but I'm not sure about the other case, such as with Wayne county in the lower right part of the state.
Here's a picture of what I currently have:
Here's what I did so far:
Get county data from Bokeh's sample county data just to get the state abbreviation per state number (my second, main data source only has state numbers). For this example, I'll simplify it by just filtering for state number 26).
Get state coordinates ('500k' file) by county from the U.S. Census site.
Use the following code to generate an 'interactive' map of Michigan.
Note: To pip install shapefile (really pyshp), I think I had to download the .whl file from here and then do pip install [path to .whl file].
import pandas as pd
import numpy as np
import shapefile
from bokeh.models import HoverTool, ColumnDataSource
from bokeh.palettes import Viridis6
from bokeh.plotting import figure, show, output_notebook
shpfile=r'Path\500K_US_Counties\cb_2015_us_county_500k.shp'
sf = shapefile.Reader(shpfile)
shapes = sf.shapes()
#Here are the rows from the shape file (plus lat/long coordinates)
rows=[]
lenrow=[]
for i,j in zip(sf.shapeRecords(),sf.shapes()):
rows.append(i.record+[j.points])
if len(i.record+[j.points])!=10:
print("Found record with irrular number of columns")
fields1=sf.fields[1:] #Ignore first field as it is not used (maybe it's a meta field?)
fields=[seq[0] for seq in fields1]+['Long_Lat']#Take the first element in each tuple of the list
c=pd.DataFrame(rows,columns=fields)
try:
c['STATEFP']=c['STATEFP'].astype(int)
except:
pass
#cns=pd.read_csv(r'Path\US_Counties.csv')
#cns=cns[['State Abbr.','STATE num']]
#cns=cns.drop_duplicates('State Abbr.',keep='first')
#c=pd.merge(c,cns,how='left',left_on='STATEFP',right_on='STATE num')
c['Lat']=c['Long_Lat'].apply(lambda x: [e[0] for e in x])
c['Long']=c['Long_Lat'].apply(lambda x: [e[1] for e in x])
#c=c.loc[c['State Abbr.']=='MI']
c=c.loc[c['STATEFP']==26]
#latitudex, longitude=y
county_xs = c['Lat']
county_ys = c['Long']
county_names = c['NAME']
county_colors = [Viridis6[np.random.randint(1,6, size=1).tolist()[0]] for l in aland]
randns=np.random.randint(1,6, size=1).tolist()[0]
#county_colors = [Viridis6[e] for e in randns]
#county_colors = 'b'
source = ColumnDataSource(data=dict(
x=county_xs,
y=county_ys,
color=county_colors,
name=county_names,
#rate=county_rates,
))
output_notebook()
TOOLS="pan,wheel_zoom,box_zoom,reset,hover,save"
p = figure(title="Title", tools=TOOLS,
x_axis_location=None, y_axis_location=None)
p.grid.grid_line_color = None
p.patches('x', 'y', source=source,
fill_color='color', fill_alpha=0.7,
line_color="white", line_width=0.5)
hover = p.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = [
("Name", "#name"),
#("Unemployment rate)", "#rate%"),
("(Long, Lat)", "($x, $y)"),
]
show(p)
I'm looking for a way to avoid the extra lines and shapes.
Thanks in advance!
I have a solution to this problem, and I think I might even know why it is correct. First, let me show quote from Bryan Van de ven in a Google groups Bokeh discussion:
there is no built-in support for dealing with shapefiles. You will have to convert the data to the simple format that Bokeh understands. (As an aside: it would be great to have a contribution that made dealing with various GIS formats easier).
The format that Bokeh expects for patches is a "list of lists" of points. So something like:
xs = [ [patch0 x-coords], [patch1 x-coords], ... ]
ys = [ [patch1 y-coords], [patch1 y-coords], ... ]
Note that if a patch is comprised of multiple polygons, this is currently expressed by putting NaN values in the sublists. So, the task is basically to convert whatever form of polygon data you have to this format, and then Bokeh can display it.
So it seems like somehow you are ignoring NaNs or otherwise not handling multiple polygons properly. Here is some code that will download US census data, unzip it, read it properly for Bokeh, and make a data frame of lat, long, state, and county.
def get_map_data(shape_data_file, local_file_path):
url = "http://www2.census.gov/geo/tiger/GENZ2015/shp/" + \
shape_data_file + ".zip"
zfile = local_file_path + shape_data_file + ".zip"
sfile = local_file_path + shape_data_file + ".shp"
dfile = local_file_path + shape_data_file + ".dbf"
if not os.path.exists(zfile):
print("Getting file: ", url)
response = requests.get(url)
with open(zfile, "wb") as code:
code.write(response.content)
if not os.path.exists(sfile):
uz_cmd = 'unzip ' + zfile + " -d " + local_file_path
print("Executing command: " + uz_cmd)
os.system(uz_cmd)
shp = open(sfile, "rb")
dbf = open(dfile, "rb")
sf = shapefile.Reader(shp=shp, dbf=dbf)
lats = []
lons = []
ct_name = []
st_id = []
for shprec in sf.shapeRecords():
st_id.append(int(shprec.record[0]))
ct_name.append(shprec.record[5])
lat, lon = map(list, zip(*shprec.shape.points))
indices = shprec.shape.parts.tolist()
lat = [lat[i:j] + [float('NaN')] for i, j in zip(indices, indices[1:]+[None])]
lon = [lon[i:j] + [float('NaN')] for i, j in zip(indices, indices[1:]+[None])]
lat = list(itertools.chain.from_iterable(lat))
lon = list(itertools.chain.from_iterable(lon))
lats.append(lat)
lons.append(lon)
map_data = pd.DataFrame({'x': lats, 'y': lons, 'state': st_id, 'county_name': ct_name})
return map_data
The inputs to this command are a local directory where you want to download the map data to and the other input is the name of the shape file. I know there are at least two available maps from the url in the function above that you could call:
map_low_res = "cb_2015_us_county_20m"
map_high_res = "cb_2015_us_county_500k"
If the US census changes their url, which they certainly will one day, then you will need to change the input file name and the url variable. So, you can call the function above
map_output = get_map_data(map_low_res, ".")
Then you could plot it just as the code in the original question does. Add a color data column first ("county_colors" in the original question), and then set it to the source like this:
source = ColumnDataSource(map_output)
To make this all work you will need to import libraries such as requests, os, itertools, shapefile, bokeh.models.ColumnDataSource, etc...
One solution:
Use the 1:20,000,000 shape file instead of the 1:500,000 file.
It loses some detail around the shape of each county but does not have any extra shapes (and just a couple of extra lines).

Resources