I have a numpy array which contains hours from 4 days:
s = np.array([0.0, 1.0, 2.0, 3.0, 4.0 ....96.0])
I want to create a datetime object from that.
I know that the first element is at timestamp 2021-03-21 00:00,
so:
start_date = datetime.datetime.strptime('2021-03-21 00:00', '%Y-%m-%d %H:%M')
How can I create a new array which contains datetimes, incremented by an hour from the s array.
Use timedelta to build your new array:
>>> import numpy as np
>>> from datetime import datetime, timedelta
>>> s = np.array([0.0, 1.0, 2.0, 3.0, 4.0, 96.0])
>>> start_date = datetime.strptime('2021-03-21 00:00', '%Y-%m-%d %H:%M')
>>> [start_date + timedelta(hours=diff) for diff in s]
[datetime.datetime(2021, 3, 21, 0, 0), datetime.datetime(2021, 3, 21, 1, 0), datetime.datetime(2021, 3, 21, 2, 0), datetime.datetime(2021, 3, 21, 3, 0), datetime.datetime(2021, 3, 21, 4, 0), datetime.datetime(2021, 3, 25, 0, 0)]
Related
I am trying to come up with a way to create a list of dates n months back from given date dt. However, it seems to tricky based on what dt is. Below I am illustrating the dilemma through a few examples (esp. look at tricky case-3 below):
from datetime import datetime
from dateutil.relativedelta import relativedelta
# Simple case.
dt = datetime(2021, 2, 15)
dt - relativedelta(months=1) # n=1 gives datetime.datetime(2021, 1, 15, 0, 0)
dt - relativedelta(months=2) # n=2 gives datetime.datetime(2020, 12, 15, 0, 0)
# Simple case-2
dt = datetime(2021, 3, 31)
dt - relativedelta(months=1) # n=1 gives datetime.datetime(2021, 2, 28, 0, 0)
dt - relativedelta(months=2) # n=2 gives datetime.datetime(2021, 1, 31, 0, 0)
dt - relativedelta(months=3) # n=3 gives datetime.datetime(2020, 12, 31, 0, 0)
dt - relativedelta(months=4) # n=4 gives datetime.datetime(2020, 11, 30, 0, 0)
# Tricky case-3
dt = datetime(2021, 2, 28)
dt - relativedelta(months=1) # n=1 gives datetime.datetime(2021, 1, 28, 0, 0) and not datetime.datetime(2021, 1, 31, 0, 0)
dt - relativedelta(months=2) # n=2 gives datetime.datetime(2020, 12, 28, 0, 0) and not datetime.datetime(2020, 12, 31, 0, 0)
dt - relativedelta(months=3) # n=3 gives datetime.datetime(2020, 11, 28, 0, 0) and not datetime.datetime(2020, 11, 30, 0, 0)
dt - relativedelta(months=4) # n=4 gives datetime.datetime(2020, 10, 28, 0, 0) and not datetime.datetime(2020, 10, 31, 0, 0)
relativedelta seems to fail on the corner case of date is end of month while month has less than 31 days. Here's a work-around:
check if date is end of month
if not, simply use relativedelta
if so, use relativedelta but make sure the day is the last of the month by setting the day attribute explicitly
from datetime import datetime, timedelta
from dateutil.relativedelta import relativedelta
# add_month adds n months to datetime object dt
def add_month(dt, n):
# we can add a day without month changing - not end of month:
if (dt + timedelta(1)).month == dt.month:
return dt + relativedelta(months=n)
# implicit else: end of month
return (dt + relativedelta(months=n+1)).replace(day=1) - timedelta(1)
Examples:
d = datetime(2021, 3, 15)
print(add_month(d, -1).date(), d.date(), add_month(d, 1).date())
# 2021-02-15 2021-03-15 2021-04-15
d = datetime(2021, 3, 31)
print(add_month(d, -1).date(), d.date(), add_month(d, 1).date())
# 2021-02-28 2021-03-31 2021-04-30
d = datetime(2021,2,28)
print(add_month(d, -1).date(), d.date(), add_month(d, 1).date())
# 2021-01-31 2021-02-28 2021-03-31
d = datetime(2021,11,30)
print(add_month(d, -1).date(), d.date(), add_month(d, 1).date())
# 2021-10-31 2021-11-30 2021-12-31
import numpy as np
minute = datetime.datetime(year,month,day,9,15,00,000000)
minute1 = minute + timedelta(minutes = 1)
main_array = np.array([[681985, minute , 2.0, 3.0], [70913, minute , 5.0, 6.0]])
temp_array = np.array([[681985, minute1 , 2.0, 3.0]])
main_array = np.concatenate((main_array, temp_array))
main_array
array([[681985, datetime.datetime(2020, 5, 7, 9, 15), 2.0, 3.0],
[70913, datetime.datetime(2020, 5, 7, 9, 15), 5.0, 6.0],
[681985, datetime.datetime(2020, 5, 7, 9, 16), 2.0, 3.0]],
dtype=object)
I tried assigning new value 234.7 to the slice but it does not work:
main_array[(main_array[:,0] == 681985) & (main_array[:,1] == minute)][0][2] = 234.7
Using [0][2] indexing along with boolean mask creates a copy and changes the content of the copy. In order to change the original array, you can change the indexing by moving 2 inside indexing to prevent creating a copy:
main_array[(main_array[:,0] == 681985) & (main_array[:,1] == minute), 2] = 234.7
output:
[[681985 datetime.datetime(2020, 5, 7, 9, 15) 234.7 3.0]
[70913 datetime.datetime(2020, 5, 7, 9, 15) 5.0 6.0]
[681985 datetime.datetime(2020, 5, 7, 9, 16) 2.0 3.0]]
I and to create a heatmap that will have year across the x axis and month across the y axis. In the heatmap will be % returns. Here's kinda what I am after.
So I have some data and I turn them into pct_change() series.
import pandas_datareader.data as web
import pandas as pd
from datetime import datetime as dt
import numpy as np
import seaborn as sns
start = dt(year = 2000, month = 1, day = 1)
df = web.DataReader('GDP', 'fred', start = '2000')
df.pct_change()
df.tail()
So here's what we are working with. Important to note that the index is a Datetime object.
GDP
DATE
2016-10-01 18905.545
2017-01-01 19057.705
2017-04-01 19250.009
2017-07-01 19500.602
2017-10-01 19736.491
I want to do something like this, but I dont know how to implement it with the datetime index
gdp = df.pivot(df.index.month, df.index.year, "GDP")
ax = sns.heatmap(gdp)
Which (expectedly) doesn't work...
KeyError: "Int64Index([ 1, 4, 7, 10, 1, 4, 7, 10, 1, 4, 7, 10, 1, 4, 7, 10, 1,\n 4, 7, 10, 1, 4, 7, 10, 1, 4, 7, 10, 1, 4, 7, 10, 1, 4,\n 7, 10, 1, 4, 7, 10, 1, 4, 7, 10, 1, 4, 7, 10, 1, 4, 7,\n 10, 1, 4, 7, 10, 1, 4, 7, 10, 1, 4, 7, 10, 1, 4, 7, 10,\n 1, 4, 7, 10],\n dtype='int64', name='DATE') not in index"
It's not working because you are extracting the month and year in place within the pivot function, and those information is not in the original df you specified.
You can specify them beforehand:
df["Year"] = df.DATE.apply(lambda x: x.year)
df["Month"] = df.DATE.apply(lambda x: x.strftime("%B"))
df.pivot_table(index="Month",columns="Year",values="GDP", aggfunc="sum").fillna(0)
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
pt = pt.reindex_axis(months)
sns.heatmap(pt, annot=True)
I'm reindexing the rows because when calling pivot_table, it sorts columns or rows in ascending order, which is not how the month names are usually sorted.
Above gives me:
I have set the following xlim on my x axis:
axA.set_xlim(datetime.date(2016, 12, 1), datetime.date(2018, 1, 30))
and now I would like to get the position of the 12th of October (2017-10-12) on my X axis, so that I can then put an annotation there.
I tried to figure that out the using date2num and datestr2num:
release_date = datetime.datetime(2017, 10, 12)
print(mdates.date2num(release_date))
print(mdates.datestr2num('2017-10-12'))
print(axA.get_xlim())
The above code output:
-736614.0
736614.0
(17136.0, 17561.0)
First it seems like date2num and datestr2num don't give an identical result, but more problematically, those results are not within the range of xlim.
How can I find the X position of a date (to place an annotation), given the xlim I set above?
Code to reproduce the problem:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime
def get_dataframe():
values = [12, 16, 20]
dates = [
datetime(2017, 12, 24),
datetime(2017, 12, 23),
datetime(2017, 12, 22)
]
df = pd.DataFrame(data={'date': dates, 'value': values})
df = df.set_index(['date']).sort_index()
return df
def plot(dataA):
fig, axA = plt.subplots()
dataA.plot(ax=axA)
axA.set_xlim(datetime(2016, 12, 1), datetime(2018, 1, 30))
release = datetime(2017, 10, 12)
print(mdates.date2num(release))
print(mdates.datestr2num('2017-10-12'))
print(axA.get_xlim())
df = get_dataframe()
plot(df)
plt.show()
You can use a date object directly if you have a date xaxis:
ax.annotate('hello', xy=(datetime.datetime(2017, 10, 12), 1),
xytext=(datetime.datetime(2017, 10, 12), 5),
arrowprops={'facecolor': 'r'})
I have a numpy array of milliseconds in integers, which I want to convert to an array of Python datetimes via a timedelta operation.
The following MWE works, but I'm convinced there is a more elegant approach or with better performence than multiplication by 1 ms.
start = pd.Timestamp('2016-01-02 03:04:56.789101').to_pydatetime()
dt = np.array([ 19, 14980, 19620, 54964615, 54964655, 86433958])
time_arr = start + dt * timedelta(milliseconds=1)
So your approach produces:
In [56]: start = pd.Timestamp('2016-01-02 03:04:56.789101').to_pydatetime()
In [57]: start
Out[57]: datetime.datetime(2016, 1, 2, 3, 4, 56, 789101)
In [58]: dt = np.array([ 19, 14980, 19620, 54964615, 54964655, 86433958])
In [59]: time_arr = start + dt * timedelta(milliseconds=1)
In [60]: time_arr
Out[60]:
array([datetime.datetime(2016, 1, 2, 3, 4, 56, 808101),
datetime.datetime(2016, 1, 2, 3, 5, 11, 769101),
datetime.datetime(2016, 1, 2, 3, 5, 16, 409101),
datetime.datetime(2016, 1, 2, 18, 21, 1, 404101),
datetime.datetime(2016, 1, 2, 18, 21, 1, 444101),
datetime.datetime(2016, 1, 3, 3, 5, 30, 747101)], dtype=object)
The equivalent using np.datetime64 types:
In [61]: dt.astype('timedelta64[ms]')
Out[61]: array([ 19, 14980, 19620, 54964615, 54964655, 86433958], dtype='timedelta64[ms]')
In [62]: np.datetime64(start)
Out[62]: numpy.datetime64('2016-01-02T03:04:56.789101')
In [63]: np.datetime64(start) + dt.astype('timedelta64[ms]')
Out[63]:
array(['2016-01-02T03:04:56.808101', '2016-01-02T03:05:11.769101',
'2016-01-02T03:05:16.409101', '2016-01-02T18:21:01.404101',
'2016-01-02T18:21:01.444101', '2016-01-03T03:05:30.747101'], dtype='datetime64[us]')
I can produce the same array from your time_arr with np.array(time_arr, dtype='datetime64[us]').
tolist converts these datetime64 items to datetime objects:
In [97]: t1=np.datetime64(start) + dt.astype('timedelta64[ms]')
In [98]: t1.tolist()
Out[98]:
[datetime.datetime(2016, 1, 2, 3, 4, 56, 808101),
datetime.datetime(2016, 1, 2, 3, 5, 11, 769101),
datetime.datetime(2016, 1, 2, 3, 5, 16, 409101),
datetime.datetime(2016, 1, 2, 18, 21, 1, 404101),
datetime.datetime(2016, 1, 2, 18, 21, 1, 444101),
datetime.datetime(2016, 1, 3, 3, 5, 30, 747101)]
or wrap it back in an array to get your time_arr:
In [99]: np.array(t1.tolist())
Out[99]:
array([datetime.datetime(2016, 1, 2, 3, 4, 56, 808101),
...
datetime.datetime(2016, 1, 3, 3, 5, 30, 747101)], dtype=object)
Just for the calculation datatime64 is faster, but with the conversions, it may not be the fastest overall.
https://docs.scipy.org/doc/numpy/reference/arrays.datetime.html