NCO netcdf4 operations - ncwa (Averaging) - linux

I am having trouble trying to combine three files to be averaged. I am not so sure how to even start. I have three files
"nday1.06.nc , nday1.07.nc, nday.08.nc"
each with the variables
"filling on), ('SST', <class 'netCDF4._netCDF4.Variable'>
float32 SST(time, nlat, nlon)
long_name: Surface Potential Temperature
units: degC
coordinates: TLONG TLAT time
grid_loc: 2110
cell_methods: time: mean time: mean time: mean
_FillValue: 9.96921e+36
missing_value: 9.96921e+36
unlimited dimensions: time
current shape = (1, 2400, 3600)
I just need to average the SST variables and then an output file with the averages

You need ncra not ncwa
http://nco.sourceforge.net/nco.html#ncra
ncra nday1.06.nc nday1.07.nc nday.08.nc out.nc

Similarly, you could you use cdo, but you first need to merge the files:
cdo mergetime nday1.06.nc nday1.07.nc nday.08.nc mergedfile.nc
and then average:
cdo timmean mergedfile.nc out.nc

Related

python3: Split time series by diurnal periods

I have the following dataset:
01/05/2020,00,26.3,27.5,26.3,80,81,73,22.5,22.7,22.0,993.7,993.7,993.0,0.0,178,1.2,-3.53,0.0
01/05/2020,01,26.1,26.8,26.1,79,80,75,22.2,22.4,21.9,994.4,994.4,993.7,1.1,22,2.0,-3.54,0.0
01/05/2020,02,25.4,26.1,25.4,80,81,79,21.6,22.3,21.6,994.7,994.7,994.4,0.1,335,2.3,-3.54,0.0
01/05/2020,03,23.3,25.4,23.3,90,90,80,21.6,21.8,21.5,994.7,994.8,994.6,0.9,263,1.5,-3.54,0.0
01/05/2020,04,22.9,24.2,22.9,89,90,86,21.0,22.1,21.0,994.2,994.7,994.2,0.3,268,2.0,-3.54,0.0
01/05/2020,05,22.8,23.1,22.8,90,91,89,21.0,21.4,20.9,993.6,994.2,993.6,0.7,264,1.5,-3.54,0.0
01/05/2020,06,22.2,22.8,22.2,92,92,90,20.9,21.2,20.8,993.6,993.6,993.4,0.8,272,1.6,-3.54,0.0
01/05/2020,07,22.6,22.6,22.0,91,93,91,21.0,21.2,20.7,993.4,993.6,993.4,0.4,284,2.3,-3.49,0.0
01/05/2020,08,21.6,22.6,21.5,92,92,90,20.2,20.9,20.1,993.8,993.8,993.4,0.4,197,2.1,-3.54,0.0
01/05/2020,09,22.0,22.1,21.5,92,93,92,20.7,20.8,20.2,994.3,994.3,993.7,0.0,125,2.1,-3.53,0.0
01/05/2020,10,22.7,22.7,21.9,91,92,91,21.2,21.2,20.5,995.0,995.0,994.3,0.0,354,0.0,70.99,0.0
01/05/2020,11,25.0,25.0,22.7,83,91,82,21.8,22.1,21.1,995.5,995.5,995.0,0.8,262,1.5,744.8,0.0
01/05/2020,12,27.9,28.1,24.9,72,83,70,22.3,22.8,21.6,996.1,996.1,995.5,0.7,228,1.9,1392.,0.0
01/05/2020,13,30.4,30.4,27.7,58,72,55,21.1,22.6,20.4,995.9,996.2,995.9,1.6,134,3.7,1910.,0.0
01/05/2020,14,31.7,32.3,30.1,50,58,48,20.2,21.3,19.7,995.8,996.1,995.8,3.0,114,5.4,2577.,0.0
01/05/2020,15,32.9,33.2,31.8,44,50,43,19.1,20.5,18.6,994.9,995.8,994.9,0.0,128,5.6,2853.,0.0
01/05/2020,16,33.2,34.4,32.0,46,48,41,20.0,20.0,18.2,994.0,994.9,994.0,0.0,125,4.3,2700.,0.0
01/05/2020,17,33.1,34.5,32.7,44,46,39,19.2,19.9,18.5,993.4,994.1,993.4,0.0,170,1.6,2806.,0.0
01/05/2020,18,33.6,34.2,32.6,41,47,40,18.5,20.0,18.3,992.6,993.4,992.6,0.0,149,0.0,2319.,0.0
01/05/2020,19,33.5,34.7,32.1,43,49,39,19.2,20.4,18.3,992.3,992.6,992.3,0.3,168,4.1,1907.,0.0
01/05/2020,20,32.1,33.9,32.1,49,51,41,20.2,20.7,18.5,992.4,992.4,992.3,0.1,192,3.7,1203.,0.0
01/05/2020,21,29.9,32.2,29.9,62,62,49,21.8,21.9,20.2,992.3,992.4,992.2,0.0,188,2.9,408.0,0.0
01/05/2020,22,28.5,29.9,28.4,67,67,62,21.8,22.0,21.7,992.5,992.5,992.3,0.4,181,2.3,6.817,0.0
01/05/2020,23,27.8,28.5,27.8,71,71,66,22.1,22.1,21.5,993.1,993.1,992.5,0.0,225,1.6,-3.39,0.0
02/05/2020,00,27.4,28.2,27.3,75,75,68,22.5,22.5,21.7,993.7,993.7,993.1,0.5,139,1.5,-3.54,0.0
02/05/2020,01,27.3,27.7,27.3,72,75,72,21.9,22.6,21.9,994.3,994.3,993.7,0.0,126,1.1,-3.54,0.0
02/05/2020,02,25.4,27.3,25.2,85,85,72,22.6,22.8,21.9,994.4,994.5,994.3,0.1,256,2.6,-3.54,0.0
02/05/2020,03,25.5,25.6,25.3,84,85,82,22.5,22.7,22.1,994.3,994.4,994.2,0.0,329,0.7,-3.54,0.0
02/05/2020,04,24.5,25.5,24.5,86,86,82,22.0,22.5,21.9,993.9,994.3,993.9,0.0,290,1.2,-3.54,0.0
02/05/2020,05,24.0,24.5,23.5,87,88,86,21.6,22.1,21.3,993.6,993.9,993.6,0.7,285,1.3,-3.54,0.0
02/05/2020,06,23.7,24.1,23.7,87,87,85,21.3,21.6,21.3,993.1,993.6,993.1,0.1,305,1.1,-3.51,0.0
02/05/2020,07,22.7,24.1,22.5,91,91,86,21.0,21.7,20.7,993.1,993.3,993.1,0.6,220,1.1,-3.54,0.0
02/05/2020,08,22.9,22.9,22.6,92,92,91,21.5,21.5,21.0,993.2,993.2,987.6,0.0,239,1.5,-3.53,0.0
02/05/2020,09,22.9,23.0,22.8,93,93,92,21.7,21.7,21.4,993.6,993.6,993.2,0.0,289,0.4,-3.53,0.0
02/05/2020,10,23.5,23.5,22.8,92,93,92,22.1,22.1,21.6,994.3,994.3,993.6,0.0,256,0.0,91.75,0.0
02/05/2020,11,26.1,26.2,23.5,80,92,80,22.4,23.1,22.2,995.0,995.0,994.3,1.1,141,1.9,789.0,0.0
02/05/2020,12,28.7,28.7,26.1,69,80,68,22.4,22.7,22.1,995.5,995.5,995.0,0.0,116,2.2,1468.,0.0
02/05/2020,13,31.4,31.4,28.6,56,69,56,21.6,22.9,21.0,995.5,995.7,995.4,0.0,65,0.0,1762.,0.0
02/05/2020,14,32.1,32.4,30.6,48,58,47,19.8,22.0,19.3,995.0,995.6,990.6,0.0,105,0.0,2657.,0.0
02/05/2020,15,34.0,34.2,31.7,43,48,42,19.6,20.1,18.6,993.9,995.0,993.9,3.0,71,6.0,2846.,0.0
02/05/2020,16,34.7,34.7,32.3,38,48,38,18.4,20.3,18.3,992.7,993.9,992.7,1.4,63,6.3,2959.,0.0
02/05/2020,17,34.0,34.7,32.7,42,46,38,19.2,20.0,18.4,991.7,992.7,991.7,2.2,103,4.8,2493.,0.0
02/05/2020,18,34.3,34.7,33.6,41,42,38,19.1,19.4,18.0,991.2,991.7,991.2,2.0,141,4.8,2593.,0.0
02/05/2020,19,33.5,34.5,32.5,42,47,39,18.7,20.0,18.4,990.7,991.4,989.9,1.8,132,4.2,1317.,0.0
02/05/2020,20,32.5,34.2,32.5,47,48,40,19.7,20.3,18.7,990.5,990.7,989.8,1.3,191,4.2,1250.,0.0
02/05/2020,21,30.5,32.5,30.5,59,59,47,21.5,21.6,20.0,979.8,990.5,979.5,0.1,157,2.9,345.5,0.0
02/05/2020,22,28.6,30.5,28.6,67,67,59,21.9,21.9,21.5,978.9,980.1,978.7,0.6,166,2.2,1.122,0.0
02/05/2020,23,27.2,28.7,27.2,74,74,66,22.1,22.2,21.6,978.9,979.3,978.6,0.0,246,1.7,-3.54,0.0
03/05/2020,00,26.5,27.2,26.0,77,80,74,22.2,22.5,22.0,979.0,979.1,978.7,0.0,179,1.4,-3.54,0.0
03/05/2020,01,26.0,26.6,26.0,80,80,77,22.4,22.5,22.1,979.1,992.4,978.7,0.0,276,0.6,-3.54,0.0
03/05/2020,02,26.0,26.5,26.0,79,81,75,22.1,22.5,21.7,978.8,979.1,978.5,0.0,290,0.6,-3.53,0.0
03/05/2020,03,25.3,26.0,25.3,83,83,79,22.2,22.4,21.8,978.6,989.4,978.5,0.5,303,1.0,-3.54,0.0
03/05/2020,04,25.3,25.6,24.6,81,85,81,21.9,22.5,21.7,978.1,992.7,977.9,0.7,288,1.5,-3.00,0.0
03/05/2020,05,23.7,25.3,23.7,88,88,81,21.5,21.9,21.5,977.6,991.8,977.3,1.2,256,1.8,-3.54,0.0
03/05/2020,06,23.3,23.7,23.3,91,91,88,21.7,21.7,21.5,976.9,977.6,976.7,0.4,245,1.8,-3.54,0.0
03/05/2020,07,23.0,23.6,23.0,91,91,89,21.4,21.9,21.3,976.7,977.0,976.4,0.9,257,1.9,-3.54,0.0
03/05/2020,08,23.4,23.4,22.9,90,92,90,21.7,21.7,21.3,976.8,976.9,976.5,0.4,294,1.6,-3.52,0.0
03/05/2020,09,23.0,23.5,23.0,88,90,87,21.0,21.6,20.9,992.1,992.1,976.7,0.8,263,1.6,-3.54,0.0
03/05/2020,10,23.2,23.2,22.5,91,92,88,21.6,21.6,20.8,993.0,993.0,992.2,0.1,226,1.5,29.03,0.0
03/05/2020,11,26.0,26.1,23.2,77,91,76,21.6,22.1,21.5,993.8,993.8,982.1,0.0,120,0.9,458.1,0.0
03/05/2020,12,26.6,27.0,25.5,76,80,76,22.1,22.5,21.4,982.7,994.3,982.6,0.3,121,2.3,765.3,0.0
03/05/2020,13,28.5,28.7,26.6,66,77,65,21.5,23.1,21.2,982.5,994.2,982.4,1.4,130,3.2,1219.,0.0
03/05/2020,14,31.1,31.1,28.5,55,66,53,21.0,21.8,19.9,982.3,982.7,982.1,1.2,129,3.7,1743.,0.0
03/05/2020,15,31.6,31.8,30.7,50,55,49,19.8,20.8,19.2,992.9,993.5,982.2,1.1,119,5.1,1958.,0.0
03/05/2020,16,32.7,32.8,31.1,46,52,46,19.6,20.7,19.2,991.9,992.9,991.9,0.8,122,4.4,1953.,0.0
03/05/2020,17,32.3,33.3,32.0,44,49,42,18.6,20.2,18.2,990.7,991.9,979.0,2.6,133,5.9,2463.,0.0
03/05/2020,18,33.1,33.3,31.9,44,50,44,19.3,20.8,18.9,989.9,990.7,989.9,1.1,170,5.4,2033.,0.0
03/05/2020,19,32.4,33.2,32.2,47,47,44,19.7,20.0,18.7,989.5,989.9,989.5,2.4,152,5.2,1581.,0.0
03/05/2020,20,31.2,32.5,31.2,53,53,46,20.6,20.7,19.4,989.5,989.7,989.5,1.7,159,4.6,968.6,0.0
03/05/2020,21,29.7,32.0,29.7,62,62,51,21.8,21.8,20.5,989.7,989.7,989.4,0.8,154,4.0,414.2,0.0
03/05/2020,22,28.3,29.7,28.3,69,69,62,22.1,22.1,21.7,989.9,989.9,989.7,0.3,174,2.0,6.459,0.0
03/05/2020,23,26.9,28.5,26.9,75,75,67,22.1,22.5,21.7,990.5,990.5,989.8,0.2,183,1.0,-3.54,0.0
The second column is time (hour). I want to separate the dataset by morning (06-11), afternoon (12-17), evening (18-23) and night (00-05). How I can do it?
You can use pd.cut:
bins = [-1,5,11,17,24]
labels = ['morning', 'afternoon', 'evening', 'night']
df['day_part'] = pd.cut(df['hour'], bins=bins, labels=labels)
I added column names, including Hour for the second column.
Then I used read_csv which reads the source text, "dropping" leading
zeroes, so that Hour column is just int.
To split rows (add a column marking the diurnal period), use:
df['period'] = pd.cut(df.Hour, bins=[0, 6, 12, 18, 24], right=False,
labels=['night', 'morning', 'afternoon', 'evening'])
Then you can e.g. use groupby to process your groups.
Because I used right=False parameter, the bins are closed on the left
side, thus bin limits are more natural (no need for -1 as an hour).
And bin limits (except for the last) are just starting hours of
each period - quite natural notation.

How to create a time array in python for seasonal data

I am working with paleoclimate data (536-550 CE) in NetCDF format, which I imported with xarray. The time format is a bit strange:
import xarray as xr
ds_tas_01 = xr.open_dataset('ue536a01_temp2_seasmean.nc')
ds_tas_01['time']
<xarray.DataArray 'time' (time: 61)>
array([15360215.25, 15360430.75, 15360731.75, 15361031.75, 15370131.75,
15370430.75, 15370731.75, 15371031.75, 15380131.75, 15380430.75,
15380731.75, 15381031.75, 15390131.75, 15390430.75, 15390731.75,
15391031.75, 15400131.75, 15400430.75, 15400731.75, 15401031.75,
15410131.75, 15410430.75, 15410731.75, 15411031.75, 15420131.75,
15420430.75, 15420731.75, 15421031.75, 15430131.75, 15430430.75,
15430731.75, 15431031.75, 15440131.75, 15440430.75, 15440731.75,
15441031.75, 15450131.75, 15450430.75, 15450731.75, 15451031.75,
15460131.75, 15460430.75, 15460731.75, 15461031.75, 15470131.75,
15470430.75, 15470731.75, 15471031.75, 15480131.75, 15480430.75,
15480731.75, 15481031.75, 15490131.75, 15490430.75, 15490731.75,
15491031.75, 15500131.75, 15500430.75, 15500731.75, 15501031.75,
15501231.75])
Coordinates:
* time (time) float64 1.536e+07 1.536e+07 1.536e+07 ... 1.55e+07 1.55e+07
Attributes:
standard_name: time
bounds: time_bnds
units: day as %Y%m%d.%f
calendar: proleptic_gregorian
axis: T
So I want to make my own time array that I can use to plot the climate data. For monthly data I used:
import numpy as np
time = np.arange('0536-01-31', '0551-01-31', dtype='datetime64[M]')
which gives me an array with the years and months between those two dates.
now I grouped my data by season using cdo seasmean ('djf', 'mam', jja, 'son') and got 61 values instead of 180. Is there a way to regroup the 'time' array to seasonal values, or create a new time array that corresponds to the seasonal data?
I made it work by setting the number of steps in np.arange:
time = np.arange('0536-01-31', '0551-01-31', steps=3, dtype='datetime64[M]')
This gives a time step every three months, so essentially every 'season'.

Resampling Time Series Data (Pandas Python 3)

Trying to convert data at daily frequency to weekly frequency.
In:
weeklyaaapl = pd.DataFrame()
weeklyaapl['Open'] = aapl.Open.resample('W').iloc[0]
#here I am trying to take the first value of the aapl.Open,
#that falls within the week.
Out:
ValueError: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
I want the true open (the first open that prints for the week) (the open of the first day in that week).
It instead wants me to take the mean of the daily open values for a given week using .mean(), which is not the information I need.
Can't seem to interpret the error, documentation isn't helping either.
I think you need.
aapl.resample('W').first()
Output:
Open High Low Close Volume
Date
2010-01-10 30.49 30.64 30.34 30.57 123432050
2010-01-17 30.40 30.43 29.78 30.02 115557365
2010-01-24 29.76 30.74 29.61 30.72 182501620
2010-01-31 28.93 29.24 28.60 29.01 266424802
2010-02-07 27.48 28.00 27.33 27.82 187468421

Writing to a NetCDF3 file using module netcdf4 in python

I'm having a issue writing to a netcdf3 file using the netcdf4 functions. I tried using the create variable function but it gives me this error: NetCDF: Attempting netcdf-4 operation on netcdf-3 file
nc = Dataset(root.fileName,'a',format="NETCDF4")
Hycom_U = nc.createVariable('/variables/Hycom_U','float',('time','lat','lon',))
Hycom_V = nc.createVariable('/variables/Hycom_V','f4',('time','lat','lon',))
nc=
root group (NETCDF3_CLASSIC data model, file format NETCDF3):
netcdf_library_version: 4.1.3
format_version: HFRNet_1.0.0
product_version: HFRNet_1.1.05
Conventions: CF-1.0
title: Near-Real Time Surface Ocean Velocity, Hawaii,
2 km Resolution
institution: Scripps Institution of Oceanography
source: Surface Ocean HF-Radar
history: 22-Feb-2017 00:55:46: NetCDF file created
22-Feb-2017 00:55:46: Filtered U and V by GDOP < 1.25 ;
FMRC Best Dataset
references: Terrill, E. et al., 2006. Data Management and Real-time
Distribution in the HF-Radar National Network. Proceedings
of the MTS/IEEE Oceans 2006 Conference, Boston MA,
September 2006.
creator_name: Mark Otero
creator_email: motero#ucsd.edu
creator_url: http://cordc.ucsd.edu/projects/mapping/
summary: Surface ocean velocities estimated from HF-Radar are
representative of the upper 0.3 - 2.5 meters of the
ocean. The main objective of near-real time
processing is to produce the best product from
available data at the time of processing. Radial
velocity measurements are obtained from individual
radar sites through the U.S. HF-Radar Network.
Hourly radial data are processed by unweighted
least-squares on a 2 km resolution grid of Hawaii
to produce near real-time surface current maps.
geospatial_lat_min: 20.487279892
geospatial_lat_max: 21.5720806122
geospatial_lon_min: -158.903594971
geospatial_lon_max: -157.490005493
grid_resolution: 2km
grid_projection: equidistant cylindrical
regional_description: Unites States, Hawaiian Islands
cdm_data_type: GRID
featureType: GRID
location: Proto fmrc:HFRADAR,_US_Hawaii,_2km_Resolution,_Hourly_RTV
History: Translated to CF-1.0 Conventions by Netcdf-Java CDM (NetcdfCFWriter)
Original Dataset = fmrc:HFRADAR,_US_Hawaii,_2km_Resolution,_Hourly_RTV; Translation Date = Thu Feb 23 13:35:32 GMT 2017
dimensions(sizes): time(25), lat(61), lon(77)
variables(dimensions): float32 u(time,lat,lon), float64 time_run(time), float64 time(time), float32 lat(lat), float32 lon(lon), float32 v(time,lat,lon)
groups:
What are the netcdf 3 operations I can use to add data into the file? I found out that I could manually add data by simply doing this nc.variables["Hycom_U"]=U2which directly adds the data, but nothing else. Is there a better way to do this?
I believe the issue is that you're claiming the file to be netCDF4 format:
nc = Dataset(root.fileName,'a',format="NETCDF4")`
but you really want to indicate that it's netCDF3:
nc = Dataset(root.fileName,'a',format="NETCDF3_CLASSIC")
Additional documentation can be found here.
I figured it out! I simply couldn't use a path as a varname.
Hycom_U = nc.createVariable('Hycom_U','float',('time','lat','lon',))
It properly created a variable for me.

How to determine a formula for execution time given quantitative data, Excel, trendlines, monte carlo simulation

Can I get your help on some Maths and possibly Excel?
I have benchmarked my app increasing the number of iterations and number of obligors recording the time taken in seconds with the following result:
200 400 600 800 1000 1200 1400 1600 1800 2000
20000 15.627681 30.0968663 44.7592684 60.9037558 75.8267358 90.3718977 105.8749983 121.0030672 135.9191249 150.3331682
40000 31.7202111 62.3603882 97.2085204 128.8111731 156.2443206 186.6374271 218.324317 249.2699288 279.6008184 310.9970803
60000 47.0708635 92.4599437 138.874287 186.0576007 231.2181381 280.541207 322.9836878 371.3076757 413.4058622 459.6208335
80000 60.7346238 120.3216303 180.471169 241.668982 300.4283548 376.9639188 417.5231669 482.6288981 554.9740194 598.0394434
100000 76.7535915 150.7479245 227.5125656 304.3908046 382.5900043 451.6034296 526.0730786 609.0358776 679.0268121 779.6887277
120000 90.4174626 179.5511355 269.4099593 360.2934453 448.4387573 537.1406039 626.7325734 727.6132992 807.4767327 898.307638
How can I now come up with a function for T (time taken in seconds) as an expression of number of obligors O and number of iterations I
Thanks
I'm not quite sure of the data involved due to the question construction/presentation.
Assuming you're looking for y = f(x). If you load the data into Excel, you can use the methods SLOPE and INTERCEPT on the data ranges to derive an expression of the form
y = mx+c
and thus a linear function.
If you want a quadratic or cubic, you can use LINEST with a column of time data squared/cubed etc. to give you quadratic/cubic parameters, and thus derive an appropriate higher order function.
Spoke to one of the quants here the function is of the from T = KNO, where T is time, K some constant, N iterations, O obligors.
Rearrange for K = T/(NO), plug this into my sample data, take the average of all sample points, use the Std dev for the error
I did this for my data and get:
T = 3.81524E-06 * N * O (with 1.9% error), this is a pretty good approximation.
Create a chart in Excel, add a trendline, and select to have the equation displayed on the chart.
To clarify: You have tabular data below which you want to fit to some function f(O,I)=t?
200 400 600 800 1000 1200 1400 1600 1800 2000
20000 15.627681 30.0968663 44.7592684 60.9037558 75.8267358 90.3718977 105.8749983 121.0030672 135.9191249 150.3331682
40000 31.7202111 62.3603882 97.2085204 128.8111731 156.2443206 186.6374271 218.324317 249.2699288 279.6008184 310.9970803
60000 47.0708635 92.4599437 138.874287 186.0576007 231.2181381 280.541207 322.9836878 371.3076757 413.4058622 459.6208335
80000 60.7346238 120.3216303 180.471169 241.668982 300.4283548 376.9639188 417.5231669 482.6288981 554.9740194 598.0394434
100000 76.7535915 150.7479245 227.5125656 304.3908046 382.5900043 451.6034296 526.0730786 609.0358776 679.0268121 779.6887277
120000 90.4174626 179.5511355 269.4099593 360.2934453 448.4387573 537.1406039 626.7325734 727.6132992 807.4767327 898.307638
A rough guess looks like both O & I are linear. So f is in the form t = aO + bI + c. Plug in a few (O,I,t) and see what a,b,c should be.

Resources