Here I have dataset with datetime. Here I want to get time different value row by row in my csv file.
So I wrote the code to get the time different value in minutes. Then I want to convert that time different in hour.
That means;
if time difference value is 30 minutes. in hours 0.5h
if 120 min > 2h
But when I tried to it, it doesn't match with my required format. I just divide that time difference with 60.
my code:
df1['time_diff'] = pd.to_datetime(df1["time"])
print(df1['time_diff'])
0 2019-08-09 06:15:00
1 2019-08-09 06:45:00
2 2019-08-09 07:45:00
3 2019-08-09 09:00:00
4 2019-08-09 09:25:00
5 2019-08-09 09:30:00
6 2019-08-09 11:00:00
7 2019-08-09 11:30:00
8 2019-08-09 13:30:00
9 2019-08-09 13:50:00
10 2019-08-09 15:00:00
11 2019-08-09 15:25:00
12 2019-08-09 16:25:00
13 2019-08-09 18:00:00
df1['delta'] = (df1['time_diff']-df1['time_diff'].shift()).fillna(0)
df1['t'] = df1['delta'].apply(lambda x: x / np.timedelta64(1,'m')).astype('int64')% (24*60)
then the result:
After dividing by 60:
df1['t'] = df1['delta'].apply(lambda x: x / np.timedelta64(1,'m')).astype('int64')% (24*60)/60
result:
comparing each two images you can see in my first picture 30 min is there when I tries to convert into hours it is not showing and it just showing 1 only.
But have to convert 30 min as 0.5 hr.
Expected output:
[![
time_diff in min expected output of time_diff in hour
0 0
30 0.5
60 1
75 1.25
25 0.4167
5 0.083
90 1.5
30 0.5
120 2
20 0.333
70 1.33
25 0.4167
60 1
95 1.583
Can anyone help me to solve this error?
I suggest use Series.dt.total_seconds with divide by 60 and 3600:
df1['datetimes'] = pd.to_datetime(df1['date']+ ' ' + df1['time'], dayfirst=True)
df1['delta'] = df1['datetimes'].diff().fillna(pd.Timedelta(0))
td = df1['delta'].dt.total_seconds()
df1['time_diff in min'] = td.div(60).astype(int)
df1['time_diff in hour'] = td.div(3600)
print (df1)
datetimes delta time_diff in min time_diff in hour
0 2019-08-09 06:15:00 00:00:00 0 0.000000
1 2019-08-09 06:45:00 00:30:00 30 0.500000
2 2019-08-09 07:45:00 01:00:00 60 1.000000
3 2019-08-09 09:00:00 01:15:00 75 1.250000
4 2019-08-09 09:25:00 00:25:00 25 0.416667
5 2019-08-09 09:30:00 00:05:00 5 0.083333
6 2019-08-09 11:00:00 01:30:00 90 1.500000
7 2019-08-09 11:30:00 00:30:00 30 0.500000
8 2019-08-09 13:30:00 02:00:00 120 2.000000
9 2019-08-09 13:50:00 00:20:00 20 0.333333
10 2019-08-09 15:00:00 01:10:00 70 1.166667
11 2019-08-09 15:25:00 00:25:00 25 0.416667
12 2019-08-09 16:25:00 01:00:00 60 1.000000
13 2019-08-09 18:00:00 01:35:00 95 1.583333
Related
I'm totally new to Time Series Analysis and I'm trying to work on examples available online
this is what I have currently:
# Time based features
data = pd.read_csv('Train_SU63ISt.csv')
data['Datetime'] = pd.to_datetime(data['Datetime'],format='%d-%m-%Y %H:%M')
data['Hour'] = data['Datetime'].dt.hour
data['minute'] = data['Datetime'].dt.minute
data.head()
ID Datetime Count Hour Minute
0 0 2012-08-25 00:00:00 8 0 0
1 1 2012-08-25 01:00:00 2 1 0
2 2 2012-08-25 02:00:00 6 2 0
3 3 2012-08-25 03:00:00 2 3 0
4 4 2012-08-25 04:00:00 2 4 0
What I'm looking for is something like this:
ID Datetime Count Hour Minute 4-Hour-window
0 0 2012-08-25 00:00:00 20 4 0 00:00:00 - 04:00:00
1 1 2012-08-25 04:00:00 22 8 0 04:00:00 - 08:00:00
2 2 2012-08-25 08:00:00 18 12 0 08:00:00 - 12:00:00
3 3 2012-08-25 12:00:00 16 16 0 12:00:00 - 16:00:00
4 4 2012-08-25 16:00:00 18 20 0 16:00:00 - 20:00:00
5 5 2012-08-25 20:00:00 14 24 0 20:00:00 - 00:00:00
6 6 2012-08-25 00:00:00 20 4 0 00:00:00 - 04:00:00
7 7 2012-08-26 04:00:00 24 8 0 04:00:00 - 08:00:00
8 8 2012-08-26 08:00:00 20 12 0 08:00:00 - 12:00:00
9 9 2012-08-26 12:00:00 10 16 0 12:00:00 - 16:00:00
10 10 2012-08-26 16:00:00 18 20 0 16:00:00 - 20:00:00
11 11 2012-08-26 20:00:00 14 24 0 20:00:00 - 00:00:00
I think what you are looking for is the resample function, see here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html
Something like this should work (not tested):
sampled_data = data.resample(
'4H',
kind='timestamp',
on='Datetime',
label='left'
).sum()
The function is very similar to groupby and groups the data into chunks of the column specified in on=, in this case we use timestamps and chunks of 4 hours.
Finally, you need to use some kind of disaggregation, in this case sum(), to convert all elements of each group into a single element per timechunk
Here I have a dataset with date, time and one input. Then I need to first read some special time which is having same value. Then I want to add to that one hour one hour as a timedelta till to range of 6hours.
So here only 5 change. only change values with 5, I just need to separate the time which is related to the 5 values and then add timedelta (hours=1*6) into that each time
This is not write as a another column of data csv file.
data['date_time']= pd.to_datetime(data['date'] + " " + data['time'],
format='%d/%m/%Y %H:%M:%S', dayfirst=True)
t = data.loc[data['date_time'] == 5]
for it in range(6):
time=[]
time= t+timedelta(hours=1*it)
But it gave me an error.
Here what I expected output:
first separate the specific time of values 5
date time x3 first expected time
10/3/2018 6:15:00 7
10/3/2018 6:45:00 5 first seperate the time related to 6:45:00 =1st time of 5
10/3/2018 7:45:00 7 the 5
10/3/2018 9:00:00 7
10/3/2018 9:25:00 7
10/3/2018 9:30:00 5 9:30:00 = 2nd time of 5
10/3/2018 11:00:00 7
10/3/2018 11:30:00 7
10/3/2018 13:30:00 7
10/3/2018 13:50:00 5 13:50:00 = 3rd time 0f 5
10/3/2018 15:00:00 7
10/3/2018 15:25:00 7
After separating add timedelta each of that time separately
final expected output:
time final output
6:45:00 6:45:00 +timedelta(hours=1*6)
09:30:00 9:30:00 +timedelta(hours=1*6)
13:50:00 13:50:00 +timedelta(hours=1*6)
Then for 7 value:
date time x3 first seperate time of 7
10/3/2018 6:15:00 7 6:15:00
10/3/2018 6:45:00 5
10/3/2018 7:45:00 7 7:45:00
10/3/2018 9:00:00 7 9:00:00
10/3/2018 9:25:00 7 9:25:00
10/3/2018 9:30:00 5
10/3/2018 11:00:00 7 11:00:00
10/3/2018 11:30:00 7 11:30:00
10/3/2018 13:30:00 7 13:30:00
Then add timedelta into that time separately:
6:15:00 6:15:00 +timedelta(hours=1*2)
7:45:00 7:45:00 +timedelta(hours=1*2)
9:00:00 9:00:00 +timedelta(hours=1*2)
This timedelta I want to write in for loop
Subset of my csv file:
date time x3
10/3/2018 6:15:00 7
10/3/2018 6:45:00 5
10/3/2018 7:45:00 7
10/3/2018 9:00:00 7
10/3/2018 9:25:00 7
10/3/2018 9:30:00 5
10/3/2018 11:00:00 7
10/3/2018 11:30:00 7
10/3/2018 13:30:00 7
10/3/2018 13:50:00 5
10/3/2018 15:00:00 7
10/3/2018 15:25:00 7
10/3/2018 16:25:00 7
10/3/2018 18:00:00 5
10/3/2018 19:00:00 7
10/3/2018 19:30:00 7
10/3/2018 20:00:00 7
10/3/2018 22:05:00 7
My csv file :
Csv file
For the reference to jezrael:
Here I want get the new value of 5 with the help of one summation equation in the range of timedelta (6 hours) from starting time for each value of 5.
Assume My new input take as X
Then
I will take start time first as t of value of 5
Then
x=[]
x3 = data['X3']
for _ in range (x3):
if x3.all()==5:
for i in range(t+timedelta(hours=1*it)for it in range(1,6)):
X1 = 5 - 0.006 *np.sum(i*5)
x.append(X1)
So each one houe one hour x values are there till to range 6.
For this I required the time and adding timedelta into it inside the for loop
For the refernce:
here just want to use my csv file only read the start time and the value only. Then starting from time add one hour one hour into it to get the new value for x.
take the first 5 value and the time.
my start time =t
X value
start time x
0 5
1 hr 4.97
2 hr 4.94
3 hr 4.91
4 hr 4.88
5 hr 4.85
6 hr 4.82
If in between two value of 5 time range < range of 6 hr
Then add
5 value into 5 hr value then new value
start time x
0 hr 5+4.85
1hr 9.82
If in between two value of 5 time range > range of 6 hr
start time x
0 5
1 hr 4.97
2 hr 4.94
3 hr 4.91
4 hr 4.88
5 hr 4.85
6 hr 4.82
Normally run this code.
So this should be continue.
Use:
data['date_time']= pd.to_datetime(data['date'] + " " + data['time'],
format='%d/%m/%Y %H:%M:%S', dayfirst=True)
mask = data['x3'].eq(5)
data['duration'] = data['date_time'].mask(mask, data['date_time'].dt.floor('d'))
m = mask.cumsum()
mask1 = (m != 0) & ~mask
td = pd.to_timedelta(np.arange(mask1.sum()) % 5 + 1, unit='h')
data['hours'] = (pd.Series(td, index=data.index[mask1])
.reindex(data.index, fill_value=pd.Timedelta(0)))
data['new'] = data['hours'].dt.total_seconds() / 3600
data['new1'] = 5 - 0.006 *data['new']
print (data)
print (data)
date time x3 date_time duration hours \
0 10/3/2018 6:15:00 7 2018-03-10 06:15:00 2018-03-10 06:15:00 00:00:00
1 10/3/2018 6:45:00 5 2018-03-10 06:45:00 2018-03-10 00:00:00 00:00:00
2 10/3/2018 7:45:00 7 2018-03-10 07:45:00 2018-03-10 07:45:00 01:00:00
3 10/3/2018 9:00:00 7 2018-03-10 09:00:00 2018-03-10 09:00:00 02:00:00
4 10/3/2018 9:25:00 7 2018-03-10 09:25:00 2018-03-10 09:25:00 03:00:00
5 10/3/2018 9:30:00 5 2018-03-10 09:30:00 2018-03-10 00:00:00 00:00:00
6 10/3/2018 11:00:00 7 2018-03-10 11:00:00 2018-03-10 11:00:00 04:00:00
7 10/3/2018 11:30:00 7 2018-03-10 11:30:00 2018-03-10 11:30:00 05:00:00
8 10/3/2018 13:30:00 7 2018-03-10 13:30:00 2018-03-10 13:30:00 01:00:00
9 10/3/2018 13:50:00 5 2018-03-10 13:50:00 2018-03-10 00:00:00 00:00:00
10 10/3/2018 15:00:00 7 2018-03-10 15:00:00 2018-03-10 15:00:00 02:00:00
11 10/3/2018 15:25:00 7 2018-03-10 15:25:00 2018-03-10 15:25:00 03:00:00
12 10/3/2018 16:25:00 7 2018-03-10 16:25:00 2018-03-10 16:25:00 04:00:00
13 10/3/2018 18:00:00 5 2018-03-10 18:00:00 2018-03-10 00:00:00 00:00:00
14 10/3/2018 19:00:00 7 2018-03-10 19:00:00 2018-03-10 19:00:00 05:00:00
15 10/3/2018 19:30:00 7 2018-03-10 19:30:00 2018-03-10 19:30:00 01:00:00
16 10/3/2018 20:00:00 7 2018-03-10 20:00:00 2018-03-10 20:00:00 02:00:00
17 10/3/2018 22:05:00 7 2018-03-10 22:05:00 2018-03-10 22:05:00 03:00:00
new new1
0 0.0 5.000
1 0.0 5.000
2 1.0 4.994
3 2.0 4.988
4 3.0 4.982
5 0.0 5.000
6 4.0 4.976
7 5.0 4.970
8 1.0 4.994
9 0.0 5.000
10 2.0 4.988
11 3.0 4.982
12 4.0 4.976
13 0.0 5.000
14 5.0 4.970
15 1.0 4.994
16 2.0 4.988
17 3.0 4.982
I have a dataset with three inputs named X1,X2,X3 with date and time. Here for my X3 value I created the summation equation to created the new value in every hour according to the X3 value.
My summation equation is:
I want to put that A values into numpy array named X
So here I wrote the code to for the summation equation to create the new data of A :
Y = df['X3'].astype(float)
X=list()
for _ in range(len(Y)):
A=0
if Y.all() ==5:
for i in range(1,16):
A=np.sum(5)*(i)
elif Y.all() ==7:
for i in range(1,16):
A=np.sum(7)*(i)
X.append(A)
print(X)
Then I got only 0 values :
My data:
date time x3
10/3/2018 6:00:00 0
10/3/2018 7:00:00 5
10/3/2018 8:00:00 0
10/3/2018 9:00:00 7
10/3/2018 10:00:00 0
10/3/2018 11:00:00 0
10/3/2018 12:00:00 0
10/3/2018 13:45:00 0
10/3/2018 15:00:00 0
10/3/2018 16:00:00 0
10/3/2018 17:00:00 0
10/3/2018 18:00:00 0
10/3/2018 19:00:00 5
10/3/2018 20:00:00 0
10/3/2018 21:30:00 7
10/4/2018 6:00:00 0
10/4/2018 7:00:00 0
10/4/2018 8:00:00 5
10/4/2018 9:00:00 7
10/4/2018 11:00:00 5
10/4/2018 12:00:00 5
10/4/2018 13:00:00 5
10/4/2018 16:00:00 0
10/4/2018 17:00:00 0
10/4/2018 18:00:00 7
10/5/2018 7:00:00 5
10/5/2018 8:00:00 0
What my desired output is:
Assume that at 7:00:00 a.m have X3 value 5 . so then by using the summation equation one hour one hour A value will be display using this summation equation till to 16 hrs.
Then assume now X3 have 7 value. So then by using summation equation one hour one hour A value will be display using the summation equation till to 16 hrs.
Like wise what ever the values are in X3 it will add the summation equation then will be run the code.
Not sure if it's the desired output, but at least it should get you started.
You first create a datetimeindex from date and time columns; use resample to groupby your datetime index.
df.index = pd.DatetimeIndex(df['date'].astype(str) + ' ' + df['time'].astype(str))
df.resample('16H').sum()
x3
2018-10-03 00:00:00 12
2018-10-03 16:00:00 12
2018-10-04 08:00:00 34
2018-10-05 00:00:00 5
This question is related to this question
How to get time difference in specifc rows include in one column data using python
Here I have three inputsX1,X2,X3. So here I want to find time difference only X3 inputs.
Code:
df=pd.read_csv('data6.csv')
df['date'] = pd.to_datetime(df['date'] + " " + df['time'], format='%d/%m/%Y %H:%M:%S', dayfirst=True)
df.time = pd.to_datetime(df.time, format="%H:%M:%S")
df = df[df['X3'] != 0]
values_others_rows = np.NaN
sub_df = df[df.X3 != 0]
out_values = (sub_df.time.dt.hour - sub_df.shift().time.dt.hour) \
.to_frame() \
.fillna(sub_df.time.dt.hour.iloc[0]) \
.rename(columns={'time': 'out'}) # Rename column
print(out_values)
df = df.join(out_values) # Add out values
print(df)
When I use this code came time difference but with minus value. Because I have different days values.
I got the value with minus :
As a example:
date time x3
10/3/2018 6:00:00 0
10/3/2018 7:00:00 2 start time =0
10/3/2018 8:00:00 0 time difference=2
10/3/2018 9:00:00 50 first time =9:00:00
10/3/2018 10:00:00 0 :
10/3/2018 11:00:00 0 :
10/3/2018 12:00:00 0 :
10/3/2018 13:45:00 0
10/3/2018 15:00:00 0
10/3/2018 16:00:00 0
10/3/2018 17:00:00 0
10/3/2018 18:00:00 0
10/3/2018 19:00:00 20
10/3/2018 20:00:00 0
10/4/2018 6:00:00 50 new day : start time=0
10/4/2018 7:00:00 50 first time: 7:00:00 time difference=1
10/4/2018 8:00:00 0
10/4/2018 9:00:00 0
10/4/2018 11:00:00 10 second time: 11:00:00 time difference=4
10/4/2018 12:00:00 20
10/4/2018 13:00:00 50
So I want to write this in my code. But I don't know how to write this. Can anyone help me to solve this problem?
My csv file :
CSV file
After using new code nothing display of time difference
After print(df)
When I used the jezrael code again the minus value is showing:
df=pd.read_csv('data6.csv')
df['time'] = pd.to_datetime(df['date'] + " " + df['time'], format='%d/%m/%Y %H:%M:%S', dayfirst=True)
df.time = pd.to_datetime(df.time, format="%d/%m/%Y %H:%M:%S")
df1 = df[df.X3!= 0]
df['new'] = df1['time'].dt.minute.groupby(df1['date']).diff()
df['new'] = df['new'].fillna(0).astype(int)
print(df)
Image of data['new']
But my expected time difference is:
date time x3 time_difference
10/3/2018 6:00:00 0 -
10/3/2018 7:00:00 2 start_time=0
10/3/2018 8:00:00 0
10/3/2018 9:00:00 50 2hr
10/3/2018 10:00:00 0
10/3/2018 11:00:00 0
10/3/2018 12:00:00 0
10/3/2018 13:45:00 0
10/3/2018 15:00:00 0
10/3/2018 16:00:00 0
10/3/2018 17:00:00 0
10/3/2018 18:00:00 0
10/3/2018 19:00:00 20 12hr from starting time
10/3/2018 20:00:00 0
10/4/2018 6:00:00 50 start_time=0
10/4/2018 7:00:00 50 1hr
10/4/2018 8:00:00 0
10/4/2018 9:00:00 0
10/4/2018 11:00:00 10 5hr
10/4/2018 12:00:00 20 6hr
10/4/2018 13:00:00 0
Filter rows by condition and use DataFrameGroupBy.diff for difference, last replace missing values by 0:
df = pd.read_csv('data6 - data6.csv')
#print (df)
df.time = pd.to_datetime(df.time, format="%H:%M:%S")
df1 = df[df.x3 != 0]
df['new'] = df1['time'].dt.hour.groupby(df1['date']).diff()
df['new'] = df['new'].fillna(0).astype(int)
print(df.head(20))
date time x1 x2 x3 new
0 10/3/2018 1900-01-01 06:00:00 63 0 0 0
1 10/3/2018 1900-01-01 07:00:00 63 0 2 0
2 10/3/2018 1900-01-01 08:00:00 104 11 0 0
3 10/3/2018 1900-01-01 09:00:00 93 0 50 2
4 10/3/2018 1900-01-01 10:00:00 177 0 0 0
5 10/3/2018 1900-01-01 11:00:00 133 0 0 0
6 10/3/2018 1900-01-01 12:00:00 70 0 0 0
7 10/3/2018 1900-01-01 13:45:00 83 0 0 0
8 10/3/2018 1900-01-01 15:00:00 127 0 0 0
9 10/3/2018 1900-01-01 16:00:00 205 0 0 0
10 10/3/2018 1900-01-01 17:00:00 298 0 0 0
11 10/3/2018 1900-01-01 18:00:00 234 0 0 0
12 10/3/2018 1900-01-01 19:00:00 148 0 20 10
13 10/3/2018 1900-01-01 20:00:00 135 0 0 0
14 10/3/2018 1900-01-01 21:30:00 100 0 50 2
15 10/4/2018 1900-01-01 06:00:00 166 0 0 0
16 10/4/2018 1900-01-01 07:00:00 60 0 0 0
17 10/4/2018 1900-01-01 08:00:00 120 10 10 0
18 10/4/2018 1900-01-01 09:00:00 80 40 20 1
19 10/4/2018 1900-01-01 11:00:00 60 70 50 2
Here I have a dataset with three inputs x1,x2,x3 with date and time. Here in my X3 column I have similar values in rows.
What I want to do is I want to find the time difference in similar values in rows when the start time will be 0.
Here I used the code , but it gave me time difference from other columns also.
here is my code:
df['time_diff']= pd.to_datetime(df['date'] + " " + df['time'], format='%d/%m/%Y %H:%M:%S', dayfirst=True)
df['Duration'] = df.groupby('x3')['time_diff'].diff()
Gave me this time difference , But it is not the solution what I look
But my expected output is:
date time x3 Expected output of time difference
10/3/2018 6:00:00 0 NaN
10/3/2018 7:00:00 5 0 =start time for 5
10/3/2018 8:00:00 0 NaN
10/3/2018 9:00:00 7 0=start time for 7
10/3/2018 10:00:00 0 NaN
10/3/2018 11:00:00 0 NaN
10/3/2018 12:00:00 0 NaN
10/3/2018 13:45:00 0 NaN
10/3/2018 15:00:00 0 NaN
10/3/2018 16:00:00 0 NaN
10/3/2018 17:00:00 0 NaN
10/3/2018 18:00:00 0 NaN
10/3/2018 19:00:00 5 12 hr =from starting time of 5
10/3/2018 20:00:00 0 NaN
10/3/2018 21:30:00 7 12.30hr = from starting time of 7
10/4/2018 6:00:00 0 NaN
10/4/2018 7:00:00 0 NaN
10/4/2018 8:00:00 5 0 = starting time of 5 because new day
10/4/2018 9:00:00 7 0 = starting time of 5 because new day
10/4/2018 11:00:00 5 3hr
10/4/2018 12:00:00 5 4hr
10/4/2018 13:00:00 5 5hr
10/4/2018 16:00:00 0 NaN
10/4/2018 17:00:00 0 NaN
10/4/2018 18:00:00 7 11hr
Filter out rows with x3==0 and groupby with both columns with GroupBy.transform and GroupBy.first for reepat first value per all values of group, so possible subtract by original column with converting to hours:
df['time_diff']= pd.to_datetime(df['date'] + " " + df['time'],
format='%d/%m/%Y %H:%M:%S', dayfirst=True)
mask = df['x3'].ne(0)
df['Duration'] = df[mask].groupby(['date','x3'])['time_diff'].transform('first')
df['Duration'] = df['time_diff'].sub(df['Duration']).dt.total_seconds().div(3600)
print (df)
date time x3 Expected time_diff Duration
0 10/3/2018 6:00:00 0 NaN 2018-03-10 06:00:00 NaN
1 10/3/2018 7:00:00 5 0 2018-03-10 07:00:00 0.0
2 10/3/2018 8:00:00 0 NaN 2018-03-10 08:00:00 NaN
3 10/3/2018 9:00:00 7 0 2018-03-10 09:00:00 0.0
4 10/3/2018 10:00:00 0 NaN 2018-03-10 10:00:00 NaN
5 10/3/2018 11:00:00 0 NaN 2018-03-10 11:00:00 NaN
6 10/3/2018 12:00:00 0 NaN 2018-03-10 12:00:00 NaN
7 10/3/2018 13:45:00 0 NaN 2018-03-10 13:45:00 NaN
8 10/3/2018 15:00:00 0 NaN 2018-03-10 15:00:00 NaN
9 10/3/2018 16:00:00 0 NaN 2018-03-10 16:00:00 NaN
10 10/3/2018 17:00:00 0 NaN 2018-03-10 17:00:00 NaN
11 10/3/2018 18:00:00 0 NaN 2018-03-10 18:00:00 NaN
12 10/3/2018 19:00:00 5 12hr 2018-03-10 19:00:00 12.0
13 10/3/2018 20:00:00 0 NaN 2018-03-10 20:00:00 NaN
14 10/3/2018 21:30:00 7 12.30hr 2018-03-10 21:30:00 12.5
15 10/4/2018 6:00:00 0 NaN 2018-04-10 06:00:00 NaN
16 10/4/2018 7:00:00 0 NaN 2018-04-10 07:00:00 NaN
17 10/4/2018 8:00:00 5 0 2018-04-10 08:00:00 0.0
18 10/4/2018 9:00:00 7 0 2018-04-10 09:00:00 0.0
19 10/4/2018 11:00:00 5 3hr 2018-04-10 11:00:00 3.0
20 10/4/2018 12:00:00 5 4hr 2018-04-10 12:00:00 4.0
21 10/4/2018 13:00:00 5 5hr 2018-04-10 13:00:00 5.0
22 10/4/2018 16:00:00 0 NaN 2018-04-10 16:00:00 NaN
23 10/4/2018 17:00:00 0 NaN 2018-04-10 17:00:00 NaN
24 10/4/2018 18:00:00 7 11hr 2018-04-10 18:00:00 9.0
mask = df['x3'].ne(0)
df['Duration'] = df[mask].groupby(['date','x3'])['time_diff'].apply(lambda x : (((x-x.iloc[0])//timedelta(minutes=1))/60))
Output
date time x3 time_diff Duration
10/3/2018 6:00:00 0 2018-03-10 06:00:00 NaN
10/3/2018 7:00:00 5 2018-03-10 07:00:00 0.0
10/3/2018 8:00:00 0 2018-03-10 08:00:00 NaN
10/3/2018 9:00:00 7 2018-03-10 09:00:00 0.0
10/3/2018 10:00:00 0 2018-03-10 10:00:00 NaN
10/3/2018 11:00:00 0 2018-03-10 11:00:00 NaN
10/3/2018 12:00:00 0 2018-03-10 12:00:00 NaN
10/3/2018 13:45:00 0 2018-03-10 13:45:00 NaN
10/3/2018 15:00:00 0 2018-03-10 15:00:00 NaN
10/3/2018 16:00:00 0 2018-03-10 16:00:00 NaN
10/3/2018 17:00:00 0 2018-03-10 17:00:00 NaN
10/3/2018 18:00:00 0 2018-03-10 18:00:00 NaN
10/3/2018 19:00:00 5 2018-03-10 19:00:00 12.0
10/3/2018 20:00:00 0 2018-03-10 20:00:00 NaN
10/3/2018 21:30:00 7 2018-03-10 21:30:00 12.5
10/4/2018 6:00:00 0 2018-04-10 06:00:00 NaN
10/4/2018 7:00:00 0 2018-04-10 07:00:00 NaN
10/4/2018 8:00:00 5 2018-04-10 08:00:00 0.0
10/4/2018 9:00:00 7 2018-04-10 09:00:00 0.0
10/4/2018 11:00:00 5 2018-04-10 11:00:00 3.0
10/4/2018 12:00:00 5 2018-04-10 12:00:00 4.0
10/4/2018 13:00:00 5 2018-04-10 13:00:00 5.0
10/4/2018 16:00:00 0 2018-04-10 16:00:00 NaN
10/4/2018 17:00:00 0 2018-04-10 17:00:00 NaN
10/4/2018 18:00:00 7 2018-04-10 18:00:00 9.0