transform timestamp (Seconds sins 1.1.1904) from NetCDF file - python-3.x

even there are several posts concerning NetCDF files and timestamp conversion I draw a blank today.
I
read in a NetCDF data set (version 3), and after I call variables information:
# Load required Python packages
import netCDF4 as nc
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
import pandas as pd
#read in a NetCDF data set
ds = nc.Dataset(fn)
# call time variable information
print(ds['time'])
As answer I get:
<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
units: seconds since 1904-01-01 00:00:00.000 00:00
long_name: time UTC
axis: T
unlimited dimensions: time
current shape = (5760,)
filling on, default _FillValue of 9.969209968386869e+36 used
Now I would like to transform the seconds since 1.1.1904 time stamp into a DD.MM.YYYY HH:MM:SS.sss format. (by the way: why is there a second 00:00 information included after the time stamp?)
(1) I tried:
t = ds['time'][:]
dtime = []
dtime = (pd.to_datetime(t, format='%d.%m.%Y %H:%M:%S.micros') - datetime(1904, 1, 1)).total_seconds()
And I get the error:
pandas_libs\tslibs\strptime.pyx in pandas._libs.tslibs.strptime.array_strptime()
time data '3730320000' does not match format '%d.%m.%Y %H:%M:%S' (match)
(2) I tried:
d = datetime.strptime("01-01-1904", "%m-%d-%Y")
dt = d + timedelta(seconds=(t))
I get the
TypeError: unsupported type for timedelta seconds component: MaskedArray
(3) I tried
d = datetime.strptime("%m-%d-%Y", "01-01-1904")
dt = d + timedelta(seconds=(ds['time']))
And I get the answer:
unsupported type for timedelta seconds component: netCDF4._netCDF4.Variable
Has somebody a clearer view on the solution than I have at the moment?
Thanks,
Swawa

The NetCDF4 python library has a method for this: num2date().
https://unidata.github.io/netcdf4-python/#num2date. No need for datetime module.
NetCDF4 variables contain metadata attributes which describe the variable as seen in the output to your print:
print(ds['time']) #In particular the time variable units attribute.
# t contains just the numeric values of the time in `seconds since 1904-01-01 00:00:00.000 00:00`
t = ds['time'][:]
dtime = []
# t_var is the NetCDF4 variable which has the `units` attribute.
t_var = ds.['time']
#dtime = (pd.to_datetime(t, format='%d.%m.%Y %H:%M:%S.micros') - datetime(1904, 1, 1)).total_seconds()
dtime = NetCDF4.num2date(t, t_var.units)
The above should give you all the times in the dtime list as datetime objects.
print(dtime[0].isoformat())
print(dtime[-1].isoformat())
A simpler way would be:
dtime = NetCDF4.num2date(ds['time'][:], ds['time].units)

Related

Python datetime conversion

import datetime as dt
from dateutil.tz import gettz
import time
timezone_a = "Japan"
timezone_b = "Europe/London"
unix_time = 1619238722
result = dt.datetime.fromtimestamp(unix_time, gettz(timezone_a)).strftime("%Y-%m-%d-%H-%M-%S")
print(result, timezone_a)
result = dt.datetime.fromtimestamp(unix_time, gettz(timezone_b)).strftime("%Y-%m-%d-%H-%M-%S")
print(result, timezone_b)
# This code prints
"""
2021-04-24-13-32-02 Japan
2021-04-24-05-32-02 Europe/London
I am trying to reverse it backwards so that input is
2021-04-24-13-32-02 Japan
2021-04-24-05-32-02 Europe/London
And output is 1619238722
"""
Hello, I am trying to figure out how to convert a string with a timezone into Unix time. Any help would be apreciated. Thanks!
afaik, there is no built-in method in the standard lib to parse IANA time zone names. But you can do it yourself like
from datetime import datetime
from zoneinfo import ZoneInfo # Python 3.9+
t = ["2021-04-24-13-32-02 Japan", "2021-04-24-05-32-02 Europe/London"]
# split strings into tuples of date/time + time zone
t = [elem.rsplit(' ', 1) for elem in t]
# parse first element of the resulting tuples to datetime
# add time zone (second element from tuple)
# and take unix time
unix_t = [datetime.strptime(elem[0], "%Y-%m-%d-%H-%M-%S")
.replace(tzinfo=ZoneInfo(elem[1]))
.timestamp()
for elem in t]
# unix_t
# [1619238722.0, 1619238722.0]
See if this code works.
# convert string to datetimeformat
date = datetime.datetime.strptime(date, "%Y-%m-%d-%H-%M-%S %Z")
# convert datetime to timestamp
unixtime = datetime.datetime.timestamp(date)

How to show alternative calendar dates in mplfinance?

TL;DR - The issue
I have an mplfinance plot based on a pandas dataframe in which the indices are in Georgian calendar format and I need to have them displayed as Jalali format.
My data and code
My data looks like this:
open high low close
date
2021-03-15 67330.0 69200.0 66870.0 68720.0
2021-03-16 69190.0 71980.0 69000.0 71620.0
2021-03-17 72450.0 73170.0 71700.0 71820.0
2021-03-27 71970.0 73580.0 70000.0 73330.0
2021-03-28 73330.0 73570.0 71300.0 71850.0
... ... ... ... ...
The first column is both a date and the index. This is required by mplfinance plot the data correctly;
Which I can plot with something like this:
import mplfinance as mpf
mpf.plot(chart_data.tail(7), figratio=(16,9), type="candle", style='yahoo', ylabel='', tight_layout=True, xrotation=90)
Where chart_data is the data above and the rest are pretty much formatting stuff.
What I have now
My chart looks like this:
However, the I need the dates to look like this: 1400-01-12. Here's a table of equivalence to further demonstrate my case.
2021-03-15 1399-12-25
2021-03-16 1399-12-26
2021-03-17 1399-12-27
2021-03-27 1400-01-07
2021-03-28 1400-01-08
What I've tried
Setting Jdates as my indices:
chart_data.index = history.jdate
mpf.plot(chart_data_j)
Throws this exception:
TypeError('Expect data.index as DatetimeIndex')
So I tried converting the jdates into datetimes:
chart_data_j.index = pd.to_datetime(history.jdate)
Which threw an out of bounds exception:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1398-03-18 00:00:00
So I though maybe changing the timezone/locale would be an option, so I tried changing the timezones, following the official docs:
pd.to_datetime(history.date).tz_localize(tz='US/Eastern')
But I got this exception:
raise TypeError(f"{ax_name} is not a valid DatetimeIndex or PeriodIndex")
And finally I tried using libraries such as PersianTools and pandas_jalali to no avail.
You can get this to work by creating your own custom DateFormatter class, and using mpf.plot() kwarg returnfig=True to gain access to the Axes objects to be able to install your own custom DateFormatter.
I have written a custom DateFormatter (see code below) that is aware of the special way that MPLfinance handles the x-axis when show_nontrading=False (i.e. the default value).
import pandas as pd
import mplfinance as mpf
import jdatetime as jd
import matplotlib.dates as mdates
from matplotlib.ticker import Formatter
class JalaliDateTimeFormatter(Formatter):
"""
Formatter for JalaliDate in mplfinance.
Handles both `show_nontrading=False` and `show_nontrading=True`.
When show_nonntrading=False, then the x-axis is indexed by an
integer representing the row number in the dataframe, thus:
Formatter for axis that is indexed by integer, where the integers
represent the index location of the datetime object that should be
formatted at that lcoation. This formatter is used typically when
plotting datetime on an axis but the user does NOT want to see gaps
where days (or times) are missing. To use: plot the data against
a range of integers equal in length to the array of datetimes that
you would otherwise plot on that axis. Construct this formatter
by providing the arrange of datetimes (as matplotlib floats). When
the formatter receives an integer in the range, it will look up the
datetime and format it.
"""
def __init__(self, dates=None, fmt='%b %d, %H:%M', show_nontrading=False):
self.dates = dates
self.len = len(dates) if dates is not None else 0
self.fmt = fmt
self.snt = show_nontrading
def __call__(self, x, pos=0):
'''
Return label for time x at position pos
'''
if self.snt:
jdate = jd.date.fromgregorian(date=mdates.num2date(x))
formatted_date = jdate.strftime(self.fmt)
return formatted_date
ix = int(round(x,0))
if ix >= self.len or ix < 0:
date = None
formatted_date = ''
else:
date = self.dates[ix]
jdate = jd.date.fromgregorian(date=mdates.num2date(date))
formatted_date = jdate.strftime(self.fmt)
return formatted_date
# ---------------------------------------------------
df = pd.read_csv('so_67001540.csv',index_col=0,parse_dates=True)
mpf.plot(df,figratio=(16,9),type="candle",style='yahoo',ylabel='',xrotation=90)
dates = [mdates.date2num(d) for d in df.index]
formatter = JalaliDateTimeFormatter(dates=dates,fmt='%Y-%m-%d')
fig, axlist = mpf.plot(df,figratio=(16,9),
type="candle",style='yahoo',
ylabel='',xrotation=90,
returnfig=True)
axlist[0].xaxis.set_major_formatter(formatter)
mpf.show()
The file 'so_67001540.csv' looks like this:
date,open,high,low,close,alt_date
2021-03-15,67330.0,69200.0,66870.0,68720.0,1399-12-25
2021-03-16,69190.0,71980.0,69000.0,71620.0,1399-12-26
2021-03-17,72450.0,73170.0,71700.0,71820.0,1399-12-27
2021-03-27,71970.0,73580.0,70000.0,73330.0,1400-01-07
2021-03-28,73330.0,73570.0,71300.0,71850.0,1400-01-08
When you run the above script, you should get the following two plots:
Have you tried making these dates
1399-12-25
1399-12-26
1399-12-27
1400-01-07
1400-01-08
the index of the dataframe (maybe that's what you mean by "swapping the indices"?) and set kwarg datetime_format='%Y-%m-%d' ?
I think that should work.
UPDATE:
It appears to me that the problem is that
mplfinace requires a Pandas.DatetimeIndex as the index of your dataframe, and
Pandas.DatetimeIndex is made up of Pandas.Timestamp objects, and
Pandas.Timestamp has limits which preclude dates having years less than 1677:
In [1]: import pandas as pd
In [2]: pd.Timestamp.max
Out[2]: Timestamp('2262-04-11 23:47:16.854775807')
In [3]: pd.Timestamp.min
Out[3]: Timestamp('1677-09-21 00:12:43.145225')
I am going to poke around and see if I can come up with another solution. Internally Matplotlib dates can go to year zero.

How to calculate total precipitation per day using hourly data for whole year?

I have hourly data from ERA5 for each day in a specific year. I want to convert that data from hourly to daily. I know the long and hard way to do it, but I need something which does that easily.
Copernicus has a code for this here https://confluence.ecmwf.int/display/CKB/ERA5%3A+How+to+calculate+daily+total+precipitation, which works fine if the data set is only converted for one day, but when converting for the whole year, i am having problems with that.
Link to download ERA5 dataset which is available at https://cds.climate.copernicus.eu/cdsapp#!/home
Follow the steps to use copernicus server here
https://confluence.ecmwf.int/display/CKB/How+to+download+ERA5
This script downloads the houly data for only 2 days (1st and 2nd of January 2017):
#!/usr/bin/env python
"""
Save as get-tp.py, then run "python get-tp.py".
Input file : None
Output file: tp_20170101-20170102.nc
"""
import cdsapi
c = cdsapi.Client()
r = c.retrieve(
'reanalysis-era5-single-levels', {
'variable' : 'total_precipitation',
'product_type': 'reanalysis',
'year' : '2017',
'month' : '01',
'day' : ['01', '02'],
'time' : [
'00:00','01:00','02:00',
'03:00','04:00','05:00',
'06:00','07:00','08:00',
'09:00','10:00','11:00',
'12:00','13:00','14:00',
'15:00','16:00','17:00',
'18:00','19:00','20:00',
'21:00','22:00','23:00'
],
'format' : 'netcdf'
})
r.download('tp_20170101-20170102.nc')
## Add multiple days and multiple months to donload more data
Below script will create a netCDF file for only one day
#!/usr/bin/env python
"""
Save as file calculate-daily-tp.py and run "python calculate-daily-tp.py".
Input file : tp_20170101-20170102.nc
Output file: daily-tp_20170101.nc
"""
import time, sys
from datetime import datetime, timedelta
from netCDF4 import Dataset, date2num, num2date
import numpy as np
day = 20170101
d = datetime.strptime(str(day), '%Y%m%d')
f_in = 'tp_%d-%s.nc' % (day, (d + timedelta(days = 1)).strftime('%Y%m%d'))
f_out = 'daily-tp_%d.nc' % day
time_needed = []
for i in range(1, 25):
time_needed.append(d + timedelta(hours = i))
with Dataset(f_in) as ds_src:
var_time = ds_src.variables['time']
time_avail = num2date(var_time[:], var_time.units,
calendar = var_time.calendar)
indices = []
for tm in time_needed:
a = np.where(time_avail == tm)[0]
if len(a) == 0:
sys.stderr.write('Error: precipitation data is missing/incomplete - %s!\n'
% tm.strftime('%Y%m%d %H:%M:%S'))
sys.exit(200)
else:
print('Found %s' % tm.strftime('%Y%m%d %H:%M:%S'))
indices.append(a[0])
var_tp = ds_src.variables['tp']
tp_values_set = False
for idx in indices:
if not tp_values_set:
data = var_tp[idx, :, :]
tp_values_set = True
else:
data += var_tp[idx, :, :]
with Dataset(f_out, mode = 'w', format = 'NETCDF3_64BIT_OFFSET') as ds_dest:
# Dimensions
for name in ['latitude', 'longitude']:
dim_src = ds_src.dimensions[name]
ds_dest.createDimension(name, dim_src.size)
var_src = ds_src.variables[name]
var_dest = ds_dest.createVariable(name, var_src.datatype, (name,))
var_dest[:] = var_src[:]
var_dest.setncattr('units', var_src.units)
var_dest.setncattr('long_name', var_src.long_name)
ds_dest.createDimension('time', None)
var = ds_dest.createVariable('time', np.int32, ('time',))
time_units = 'hours since 1900-01-01 00:00:00'
time_cal = 'gregorian'
var[:] = date2num([d], units = time_units, calendar = time_cal)
var.setncattr('units', time_units)
var.setncattr('long_name', 'time')
var.setncattr('calendar', time_cal)
# Variables
var = ds_dest.createVariable(var_tp.name, np.double, var_tp.dimensions)
var[0, :, :] = data
var.setncattr('units', var_tp.units)
var.setncattr('long_name', var_tp.long_name)
# Attributes
ds_dest.setncattr('Conventions', 'CF-1.6')
ds_dest.setncattr('history', '%s %s'
% (datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
' '.join(time.tzname)))
print('Done! Daily total precipitation saved in %s' % f_out)
What I want is a code which will follows the same step as above data but assuming that I have an input file with one year houly data and convert that to one year daily data.
The result should be daily values for the calculate variable (such as precipitation, etc) for the whole year.
Example: Let's say I have a precipitation data for the whole year as 1mm/hr every day, I would have 2928 values for the whole year.
What I want is 24mm/day for the whole year with only 365 values for a non-leap year.
Example input dataset: Subset of the data can be downloaded from here (for 1st and 2nd January 2017) https://www.dropbox.com/sh/0vdfn20p355st3i/AABKYO4do_raGHC34VnsXGPqa?dl=0. Just use the 2nd script after this to check the code. {the code for the whole year is >10 GB thus can't be uploaded
Thanks in advance
xarray resample is just the tool for you. It converts netCDF data from one temporal resolution (e.g. hourly) to another (e.g. daily) in one line. Using your sample data file, we can create daily-means using the following code:
import xarray as xr
ds = xr.open_dataset('./tp_20170101-20170102.nc')
tp = ds['tp'] # dimensions [time: 48, latitude: 721, longitude: 1440]
tp_daily = tp.resample(time='D').mean(dim='time') # dimensions (time: 2, latitude: 721, longitude: 1440)
You'll see that the resample command takes in a temporal code, in this case 'D' which means daily and then we specify that we want to compute the mean for each day using the hourly data of that day with .mean(dim='time').
If instead, for example, you wanted to compute the daily max rather than the daily mean, you'd replace .mean(dim='time') with .max(dim='time'). You can also go from hourly to monthly (MS or month-start), annual (AS or annual-start), and many more. The temporal frequency codes can be found in the Pandas docs.
An alternative quick method from the command line using CDO would be:
cdo daysum -shifttime,-1hour era5_hourly.nc era5_daily.nc
Note, as per this answer/discussion here: Calculating ERA5 Daily Total Precipitation using CDO
the ERA5 hourly data has the timestep at the end of the hourly window, so you need to shift the timestamp before making the sum, I'm not sure the xarray solution handles that. Also to have mm/day, I think one needs to sum, not take the mean.

How fix the error "ValueError: Julian Day must be positive" in netCDF4.num2date?

Here is the partial code:
import netCDF4
import pandas as pd
import matplotlib.pyplot as plt
file='/Users/dedeco/Downloads/_grib2netcdf-atls12-95e2cf679cd58ee9b4db4dd119a05a8d-OzkfHp.nc'
nc = netCDF4.Dataset(file)
nc.variables.keys()
lat = nc.variables['latitude'][:]
lon = nc.variables['longitude'][:]
time_var = nc.variables['time']
dtime = netCDF4.num2date(time_var[:],time_var.units)
The file can be download in the link: https://stream.ecmwf.int/data/atls12/data/data01/scratch/84/bc/_grib2netcdf-atls12-95e2cf679cd58ee9b4db4dd119a05a8d-OzkfHp.nc
So, I got this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-47-3647c36af24c> in <module>()
2 lon = nc.variables['longitude'][:]
3 time_var = nc.variables['time']
----> 4 dtime = netCDF4.num2date(time_var[:],time_var.units)
cftime/_cftime.pyx in cftime._cftime.num2date()
cftime/_cftime.pyx in cftime._cftime.utime.num2date()
cftime/_cftime.pyx in cftime._cftime.DateFromJulianDay()
ValueError: Julian Day must be positive
How I can fix? Any ideas?
I fixed the problem set the parameter (the default is standard): calendar: describes the calendar used in the time calculations.
So replace this:
dtime = netCDF4.num2date(time_var[:],time_var.units)
by (in this case the year has 365 days):
dtime = netCDF4.num2date(time_var[:],time_var.units,'365_day')
Here is the documentation as follow:
def date2num(
...)
date2num(dates,units,calendar='standard')
Return numeric time values given datetime objects. The units of the
numeric time values are described by the units argument and the
calendar keyword. The datetime objects must be in UTC with no
time-zone offset. If there is a time-zone offset in units, it will be
applied to the returned numeric values.
dates: A datetime object or a sequence of datetime objects. The
datetime objects should not include a time-zone offset.
units: a string of the form since
describing the time units. can be days, hours, minutes,
seconds, milliseconds or microseconds. is the time
origin.
calendar: describes the calendar used in the time calculations. All
the values currently defined in the CF metadata convention Valid
calendars 'standard', 'gregorian', 'proleptic_gregorian' 'noleap',
'365_day', '360_day', 'julian', 'all_leap', '366_day'. Default is
'standard', which is a mixed Julian/Gregorian calendar.
returns a numeric time value, or an array of numeric time values with
approximately millisecond accuracy.
A complementar understand about the conversion can be found here.

convertion of datetime to numpy datetime without timezone info

Suppose I have a datetime variable:
dt = datetime.datetime(2001,1,1,0,0)
and I convert it to numpy as follows numpy.datetime64(dt) I get
numpy.datetime64('2000-12-31T19:00:00.000000-0500')
with dtype('<M8[us]')
But this automatically takes into account my time-zone (i.e. EST in this case) and gives me back a date of 2001-12-31 and a time of 19:00 hours.
How can I convert it to datetime64[D] in numpy that ignores the timezone information and simply gives me
numpy.datetime64('2001-01-01')
with dtype('<M8[D]')
The numpy datetime64 doc page gives no information on how to ignore the time-zone or give the default time-zone as UTC
I was just playing around with this the other day. I think there are 2 issues - how the datetime.datetime object is converted to np.datetime64, and how the later is displayed.
The numpy doc talks about creating a datatime64 object from a date string. It appears that when given a datetime.datetime object, it first produces a string.
np.datetime64(dt) == np.datetime64(dt.isoformat())
I found that I could add timezone info to that string
np.datetime64(dt.isoformat()+'Z') # default assumption
np.datetime64(dt.isoformat()+'-0500')
Numpy 1.7.0 reads ISO 8601 strings w/o TZ as local (ISO specifies this)
Datetimes are always stored based on POSIX time with an epoch of 1970-01-01T00:00Z
As for display, the test_datetime.py file offers some clues as to the undocumented behavior.
https://github.com/numpy/numpy/blob/280f6050d2291e50aeb0716a66d1258ab3276553/numpy/core/tests/test_datetime.py
e.g.:
def test_datetime_array_str(self):
a = np.array(['2011-03-16', '1920-01-01', '2013-05-19'], dtype='M')
assert_equal(str(a), "['2011-03-16' '1920-01-01' '2013-05-19']")
a = np.array(['2011-03-16T13:55Z', '1920-01-01T03:12Z'], dtype='M')
assert_equal(np.array2string(a, separator=', ',
formatter={'datetime': lambda x :
"'%s'" % np.datetime_as_string(x, timezone='UTC')}),
"['2011-03-16T13:55Z', '1920-01-01T03:12Z']")
So you can customize the print behavior of an array with np.array2string, and np.datetime_as_string. np.set_printoptions also takes a formatter parameter.
The pytz module is used to add further timezone handling:
#dec.skipif(not _has_pytz, "The pytz module is not available.")
def test_datetime_as_string_timezone(self):
# timezone='local' vs 'UTC'
a = np.datetime64('2010-03-15T06:30Z', 'm')
assert_equal(np.datetime_as_string(a, timezone='UTC'),
'2010-03-15T06:30Z')
assert_(np.datetime_as_string(a, timezone='local') !=
'2010-03-15T06:30Z')
....
Examples:
In [48]: np.datetime_as_string(np.datetime64(dt),timezone='local')
Out[48]: '2000-12-31T16:00:00.000000-0800'
In [49]: np.datetime64(dt)
Out[49]: numpy.datetime64('2000-12-31T16:00:00.000000-0800')
In [50]: np.datetime_as_string(np.datetime64(dt))
Out[50]: '2001-01-01T00:00:00.000000Z'
In [51]: np.datetime_as_string(np.datetime64(dt),timezone='UTC')
Out[51]: '2001-01-01T00:00:00.000000Z'
In [52]: np.datetime_as_string(np.datetime64(dt),timezone='local')
Out[52]: '2000-12-31T16:00:00.000000-0800'
In [81]: np.datetime_as_string(np.datetime64(dt),timezone=pytz.timezone('US/Eastern'))
Out[81]: '2000-12-31T19:00:00.000000-0500'

Resources