Month difference YYYYMM Pandas - python-3.x

I had two date columns in the data frame, which was of float type, So I converted it in to date format YYYYMM. Now I have to find the difference of months between
them. I tried the below, but I goves error.
df['Date_1'] = pd.to_datetime(df['Date_1'], format = '%Y%m%d').dt.strftime('%Y%m') #Convert float to YYYYMM Format
df['Date_2'] = pd.to_datetime(df['Date_2'], format='%Y%m.0').dt.strftime('%Y%m') #Convert float to YYYYMM Format
df['diff'] = df['Date_1'] - df['Date_2'] #Gives error

I think need subtract periods created byto_period :
df = pd.DataFrame({'Date_1':[20150810, 20160804],
'Date_2':[201505.0, 201602.0]})
print (df)
Date_1 Date_2
0 20150810 201505.0
1 20160804 201602.0
df['Date_1'] = pd.to_datetime(df['Date_1'], format = '%Y%m%d').dt.to_period('m')
df['Date_2'] = pd.to_datetime(df['Date_2'], format='%Y%m.0').dt.to_period('m')
df['diff'] = df['Date_1'] - df['Date_2']
print (df)
Date_1 Date_2 diff
0 2015-08 2015-05 3
1 2016-08 2016-02 6
Another solution is convert Date_1 to first day of month:
df['Date_1'] = pd.to_datetime(df['Date_1'], format = '%Y%m%d') - pd.offsets.MonthBegin()
df['Date_2'] = pd.to_datetime(df['Date_2'], format='%Y%m.0')
df['diff'] = df['Date_1'] - df['Date_2']
print (df)
Date_1 Date_2 diff
0 2015-08-01 2015-05-01 92 days
1 2016-08-01 2016-02-01 182 days

Related

How to find the next month and current month datetime in python

I need to find the next month and current month Datetime in python
I have a dataframe which has date columns and i need to filter the values based on todays date.
if todays date is less than 15, i need all row values starting from current month (2019-08-01 00:00:00)
if todays date is >=15, i need all row values starting from next month(2019-09-01 00:00:00)
Dataframe:
PC GEO Month Values
A IN 2019-08-01 00:00:00 1
B IN 2019-08-02 00:00:00 1
C IN 2019-09-14 00:00:00 1
D IN 2019-10-01 00:00:00 1
E IN 2019-07-01 00:00:00 1
if today's date is < 15
PC GEO Month Values
A IN 2019-08-01 00:00:00 1
B IN 2019-08-02 00:00:00 1
C IN 2019-09-14 00:00:00 1
D IN 2019-10-01 00:00:00 1
if today's date is >= 15
PC GEO Month Values
C IN 2019-09-14 00:00:00 1
D IN 2019-10-01 00:00:00 1
I am passing todays date as below
dat = date.today()
dat = dat.strftime("%d")
dat = int(dat)
Use:
#convert column to datetimes
df['Month'] = pd.to_datetime(df['Month'])
#get today timestamp
dat = pd.to_datetime('now')
print (dat)
2019-08-27 13:40:54.272257
#convert datetime to month periods
per = df['Month'].dt.to_period('m')
#convert timestamp to period
today_per = dat.to_period('m')
#compare day and filter
if dat.day < 15:
df = df[per >= today_per]
print (df)
else:
df = df[per > today_per]
print (df)
PC GEO Month Values
2 C IN 2019-09-14 1
3 D IN 2019-10-01 1
Test with values <15:
df['Month'] = pd.to_datetime(df['Month'])
dat = pd.to_datetime('2019-08-02')
print (dat)
2019-08-02 00:00:00
per = df['Month'].dt.to_period('m')
today_per = dat.to_period('m')
if dat.day < 15:
df = df[per >= today_per]
print (df)
PC GEO Month Values
0 A IN 2019-08-01 1
1 B IN 2019-08-02 1
2 C IN 2019-09-14 1
3 D IN 2019-10-01 1
else:
df = df[per > today_per]

Format datetime values in pandas by stripping

I have a df['timestamp'] column which has values in format: yyyy-mm-ddThh:mm:ssZ. The dtype is object.
Now, I want to split the value into 3 new columns, 1 for day, 1 for day index(mon,tues,wed,..) and 1 for hour like this:
Current:column=timestamp
yyyy-mm-ddThh:mm:ssZ
Desried:
New Col1|New Col2|New Col3
dd|hh|day_index
What function should I use?
Since you said column timestamp is of type object, I assume it's string. Since the format is fixed, use str.slice to get corresponding chars. To get the week days, use dt.day_name() on datetime64, which is converted from timestamp.
data = {'timestamp': ['2019-07-01T05:23:33Z', '2019-07-03T02:12:33Z', '2019-07-23T11:05:23Z', '2019-07-12T08:15:51Z'], 'Val': [1.24,1.259, 1.27,1.298] }
df = pd.DataFrame(data)
ds = pd.to_datetime(df['timestamp'], format='%Y-%m-%d', errors='coerce')
df['datetime'] = ds
df['dd'] = df['timestamp'].str.slice(start=8, stop=10)
df['hh'] = df['timestamp'].str.slice(start=11, stop=13)
df['weekday'] = df['datetime'].dt.day_name()
print(df)
The output:
timestamp Val datetime dd hh weekday
0 2019-07-01T05:23:33Z 1.240 2019-07-01 05:23:33+00:00 01 05 Monday
1 2019-07-03T02:12:33Z 1.259 2019-07-03 02:12:33+00:00 03 02 Wednesday
2 2019-07-23T11:05:23Z 1.270 2019-07-23 11:05:23+00:00 23 11 Tuesday
3 2019-07-12T08:15:51Z 1.298 2019-07-12 08:15:51+00:00 12 08 Friday
First convert the df['timestamp'] column to a DateTime object. Then extract Year, Month & Day from it. Code below.
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d', errors='coerce')
df['Year'] = df['timestamp'].dt.year
df['Month'] = df['timestamp'].dt.month
df['Day'] = df['timestamp'].dt.day

day of Year values starting from a particular date

I have a dataframe with a date column. The duration is 365 days starting from 02/11/2017 and ending at 01/11/2018.
Date
02/11/2017
03/11/2017
05/11/2017
.
.
01/11/2018
I want to add an adjacent column called Day_Of_Year as follows:
Date Day_Of_Year
02/11/2017 1
03/11/2017 2
05/11/2017 4
.
.
01/11/2018 365
I apologize if it's a very basic question, but unfortunately I haven't been able to start with this.
I could use datetime(), but that would return values such as 1 for 1st january, 2 for 2nd january and so on.. irrespective of the year. So, that wouldn't work for me.
First convert column to_datetime and then subtract datetime, convert to days and add 1:
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['Day_Of_Year'] = df['Date'].sub(pd.Timestamp('2017-11-02')).dt.days + 1
print (df)
Date Day_Of_Year
0 02/11/2017 1
1 03/11/2017 2
2 05/11/2017 4
3 01/11/2018 365
Or subtract by first value of column:
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['Day_Of_Year'] = df['Date'].sub(df['Date'].iat[0]).dt.days + 1
print (df)
Date Day_Of_Year
0 2017-11-02 1
1 2017-11-03 2
2 2017-11-05 4
3 2018-11-01 365
Using strftime with '%j'
s=pd.to_datetime(df.Date,dayfirst=True).dt.strftime('%j').astype(int)
s-s.iloc[0]
Out[750]:
0 0
1 1
2 3
Name: Date, dtype: int32
#df['new']=s-s.iloc[0]
Python has dayofyear. So put your column in the right format with pd.to_datetime and then apply Series.dt.dayofyear. Lastly, use some modulo arithmetic to find everything in terms of your original date
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['day of year'] = df['Date'].dt.dayofyear - df['Date'].dt.dayofyear[0] + 1
df['day of year'] = df['day of year'] + 365*((365 - df['day of year']) // 365)
Output
Date day of year
0 2017-11-02 1
1 2017-11-03 2
2 2017-11-05 4
3 2018-11-01 365
But I'm doing essentially the same as Jezrael in more lines of code, so my vote goes to her/him

Python Subtracting two columns with date data, from csv to get number of weeks , months?

I have a csv in which I have two columns representing start date: st_dt and end date: 'end_dt` , I have to subtract these columns to get the number of weeks. I tried iterating through columns using pandas, but it seems my output is wrong.
st_dt end_dt
---------------------------------------
20100315 20100431
Use read_csv with parse_dates for datetimes and then after substract days:
df = pd.read_csv(file, parse_dates=[0,1])
print (df)
st_dt end_dt
0 2010-03-15 2010-04-30
df['diff'] = (df['end_dt'] - df['st_dt']).dt.days
print (df)
st_dt end_dt diff
0 2010-03-15 2010-04-30 46
If some dates are wrong like 20100431 use to_datetime with parameter errors='coerce' for convert them to NaT:
df = pd.read_csv(file)
print (df)
st_dt end_dt
0 20100315 20100431
1 20100315 20100430
df['st_dt'] = pd.to_datetime(df['st_dt'], errors='coerce', format='%Y%m%d')
df['end_dt'] = pd.to_datetime(df['end_dt'], errors='coerce', format='%Y%m%d')
df['diff'] = (df['end_dt'] - df['st_dt']).dt.days
print (df)
st_dt end_dt diff
0 2010-03-15 NaT NaN
1 2010-03-15 2010-04-30 46.0

Select the data from between two timestamp in python

My query is regrading getting the data, given two timestamp in python.
I need to have a input field, where i can enter the two timestamp, then from the CSV read, i need to retrieve for that particular input.
Actaul Data(CSV)
Daily_KWH_System PowerScout Temperature Timestamp Visibility Daily_electric_cost kW_System
0 4136.900384 P371602077 0 07/09/2016 23:58 0 180.657705 162.224216
1 3061.657187 P371602077 66 08/09/2016 23:59 10 133.693074 174.193804
2 4099.614033 P371602077 63 09/09/2016 05:58 10 179.029562 162.774013
3 3922.490275 P371602077 63 10/09/2016 11:58 10 171.297701 169.230047
4 3957.128982 P371602077 88 11/09/2016 17:58 10 172.806125 164.099307
Example:
Input:
start date : 2-1-2017
end date :10-1-2017
Output
Timestamp Value
2-1-2017 10
3-1-2017 35
.
.
.
.
10-1-2017 25
The original CSV would contain all the data
Timestamp Value
1-12-2016 10
2-12-2016 25
.
.
.
1-1-2017 15
2-1-2017 10
.
.
.
10-1-2017 25
.
.
31-1-2017 50
use pd.read_csv to read the file
df = pd.read_csv('my.csv', index_col='Timestamp', parse_dates=[0])
Then use your inputs to slice
df[start_date:end_date]
It seems you need dayfirst=True in read_csv with select by [] if all start and end dates are in df.index:
import pandas as pd
from pandas.compat import StringIO
temp=u"""Timestamp;Value
1-12-2016;10
2-12-2016;25
1-1-2017;15
2-1-2017;10
10-1-2017;25
31-1-2017;50"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
#if necessary add sep
#index_col=[0] convert first column to index
#parse_dates=[0] parse first column to datetime
df = pd.read_csv(StringIO(temp), sep=";", index_col=[0], parse_dates=[0], dayfirst=True)
print (df)
Value
Timestamp
2016-12-01 10
2016-12-02 25
2017-01-01 15
2017-01-02 10
2017-01-10 25
2017-01-31 50
print (df.index.dtype)
datetime64[ns]
print (df.index)
DatetimeIndex(['2016-12-01', '2016-12-02', '2017-01-01', '2017-01-02',
'2017-01-10', '2017-01-31'],
dtype='datetime64[ns]', name='Timestamp', freq=None)
start_date = pd.to_datetime('2-1-2017', dayfirst=True)
end_date = pd.to_datetime('10-1-2017', dayfirst=True)
print (df[start_date:end_date])
Value
Timestamp
2017-01-02 10
2017-01-10 25
If some dates are not in index you need boolean indexing:
start_date = pd.to_datetime('3-1-2017', dayfirst=True)
end_date = pd.to_datetime('10-1-2017', dayfirst=True)
print (df[(df.index > start_date) & (df.index > end_date)])
Value
Timestamp
2017-01-31 50

Resources