How to specify the special time in one column using python - python-3.x

Here I have a dataset with on input and date and time. Here I just want to convert time into 00:00:00 for specific value which is contain in input column, and other time will be display as it is.
Then I wrote the code for that.
Then what I want is specify that 00:00:00 only. So I wrote the code for it and got an error `'RangeIndex' object has no attribute 'strftime'"
Can anyone help me to solve this ?
My code :
df['time_diff']= pd.to_datetime(df['date'] + " " + df['time'],
format='%d/%m/%Y %H:%M:%S', dayfirst=True)
mask = df['x3'].eq(5)
df['Duration'] = np.where(df['x3'].eq(5), np.timedelta64(0), pd.to_timedelta(df['time']))
Then I got the output:
date time x3 Duration
0 10/3/2018 6:15:00 0 06:15:00
1 10/3/2018 6:45:00 5 00:00:00
2 10/3/2018 7:45:00 0 07:45:00
3 10/3/2018 9:00:00 0 09:00:00
4 10/3/2018 9:25:00 0 09:25:00
5 10/3/2018 9:30:00 0 09:30:00
6 10/3/2018 11:00:00 0 11:00:00
7 10/3/2018 11:30:00 0 11:30:00
8 10/3/2018 13:30:00 0 13:30:00
9 10/3/2018 13:50:00 5 00:00:00
10 10/3/2018 15:00:00 0 15:00:00
11 10/3/2018 15:25:00 0 15:25:00
12 10/3/2018 16:25:00 0 16:25:00
13 10/3/2018 18:00:00 0 18:00:00
14 10/3/2018 19:00:00 0 19:00:00
15 10/3/2018 19:30:00 0 19:30:00
16 10/3/2018 20:00:00 0 20:00:00
17 10/3/2018 22:05:00 0 22:05:00
18 10/3/2018 22:15:00 5 00:00:00
19 10/3/2018 23:40:00 0 23:40:00
20 10/4/2018 6:58:00 5 00:00:00
21 10/4/2018 13:00:00 0 13:00:00
22 10/4/2018 16:00:00 0 16:00:00
23 10/4/2018 17:00:00 0 17:00:00
Then I want to specify this 00:00:00 time only then :
match_time="00:00:00"
time = data['duration'].loc[data.index.strftime("%H:%M:%S") == match_time]
Got an error :
Expected output :
time
00:00:00
00:00:00
Just read only 00:00:00 time
My csv :
subset:
date time x3
10/3/2018 6:15:00 0
10/3/2018 6:45:00 5
10/3/2018 7:45:00 0
10/3/2018 9:00:00 0
10/3/2018 9:25:00 0
10/3/2018 9:30:00 0
10/3/2018 11:00:00 0
10/3/2018 11:30:00 0
10/3/2018 13:30:00 0
10/3/2018 13:50:00 5
10/3/2018 15:00:00 0
10/3/2018 15:25:00 0
10/3/2018 16:25:00 0
10/3/2018 18:00:00 0
10/3/2018 19:00:00 0
10/3/2018 19:30:00 0
10/3/2018 20:00:00 0
10/3/2018 22:05:00 0
10/3/2018 22:15:00 5
10/3/2018 23:40:00 0
10/4/2018 6:58:00 5
10/4/2018 13:00:00 0
10/4/2018 16:00:00 0
10/4/2018 17:00:00 0
My csv file :
My csv file

Because types of values in column Duration are timedeltas, compare by string converted to timedelta too:
print (data['Duration'].dtype)
#timedelta64[ns]
match_time="00:00:00"
time = data[data['Duration'] == pd.to_timedelta(match_time)]
print (time)
date time x3 Duration
1 10/3/2018 6:45:00 5 0 days
9 10/3/2018 13:50:00 5 0 days
18 10/3/2018 22:15:00 5 0 days
20 10/4/2018 6:58:00 5 0 days
EDIT: If always timedeltas are less like 1 day:
First convert timedeltas to strings - added 0 days:
print (df['Duration'].astype(str))
#0 0 days 06:15:00.000000000
#1 0 days 00:00:00.000000000
#2 0 days 07:45:00.000000000
#3 0 days 09:00:00.000000000
#4 0 days 09:25:00.000000000
#5 0 days 09:30:00.000000000
#6 0 days 11:00:00.000000000
#7 0 days 11:30:00.000000000
#8 0 days 13:30:00.000000000
#9 0 days 00:00:00.000000000
#10 0 days 15:00:00.000000000
#11 0 days 15:25:00.000000000
#12 0 days 16:25:00.000000000
#13 0 days 18:00:00.000000000
#14 0 days 19:00:00.000000000
#15 0 days 19:30:00.000000000
#16 0 days 20:00:00.000000000
#17 0 days 22:05:00.000000000
#18 0 days 00:00:00.000000000
#19 0 days 23:40:00.000000000
#20 0 days 00:00:00.000000000
#21 0 days 13:00:00.000000000
#22 0 days 16:00:00.000000000
#23 0 days 17:00:00.000000000
#Name: Duration, dtype: object
And then remove first and last part of strings by slicing:
print (df['Duration'].astype(str).str[-18:-10])
#0 06:15:00
#1 00:00:00
#2 07:45:00
#3 09:00:00
#4 09:25:00
#5 09:30:00
#6 11:00:00
#7 11:30:00
#8 13:30:00
#9 00:00:00
#10 15:00:00
#11 15:25:00
#12 16:25:00
#13 18:00:00
#14 19:00:00
#15 19:30:00
#16 20:00:00
#17 22:05:00
#18 00:00:00
#19 23:40:00
#20 00:00:00
#21 13:00:00
#22 16:00:00
#23 17:00:00
#Name: Duration, dtype: object
df['Duration'] = df['Duration'].astype(str).str[-18:-10]
match_time="00:00:00"
time = df[df['Duration'] == match_time]
print (time)
date time x3 Duration
1 10/3/2018 6:45:00 5 00:00:00
9 10/3/2018 13:50:00 5 00:00:00
18 10/3/2018 22:15:00 5 00:00:00
20 10/4/2018 6:58:00 5 00:00:00
Solution for all timedeltas:
def f(x):
ts = x.total_seconds()
hours, remainder = divmod(ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('{:02d}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
df['Duration'] = df['Duration'].apply(f)
match_time="00:00:00"
time = df[df['Duration'] == match_time]

You are trying to convert the dataframe index (0,1,2,...23) to a string time format object, and not the items content in the 'Duration' column.
First convert each item in the column 'Duration' then compare it to 'match_time' and finally save the resulting sliced frame, all at once:
match_time="00:00:00"
df=data.loc[data['Duration'].apply(lambda x: x.strftime("%H:%M:%S"))==match_time]
Then you will get all indexes which match your desired 'match_time' as follows:
date time x3 Duration
1 2018-10-03 2018-10-03 00:00:00 5 00:00:00
9 2018-10-03 2018-10-03 00:00:00 5 00:00:00
18 2018-10-03 2018-10-03 00:00:00 5 00:00:00
20 2018-10-04 2018-10-04 00:00:00 5 00:00:00

Related

How to add timedelta for time in csv file using

Here I have a dataset with date, time and one input. Then I need to first read some special time which is having same value. Then I want to add to that one hour one hour as a timedelta till to range of 6hours.
So here only 5 change. only change values with 5, I just need to separate the time which is related to the 5 values and then add timedelta (hours=1*6) into that each time
This is not write as a another column of data csv file.
data['date_time']= pd.to_datetime(data['date'] + " " + data['time'],
format='%d/%m/%Y %H:%M:%S', dayfirst=True)
t = data.loc[data['date_time'] == 5]
for it in range(6):
time=[]
time= t+timedelta(hours=1*it)
But it gave me an error.
Here what I expected output:
first separate the specific time of values 5
date time x3 first expected time
10/3/2018 6:15:00 7
10/3/2018 6:45:00 5 first seperate the time related to 6:45:00 =1st time of 5
10/3/2018 7:45:00 7 the 5
10/3/2018 9:00:00 7
10/3/2018 9:25:00 7
10/3/2018 9:30:00 5 9:30:00 = 2nd time of 5
10/3/2018 11:00:00 7
10/3/2018 11:30:00 7
10/3/2018 13:30:00 7
10/3/2018 13:50:00 5 13:50:00 = 3rd time 0f 5
10/3/2018 15:00:00 7
10/3/2018 15:25:00 7
After separating add timedelta each of that time separately
final expected output:
time final output
6:45:00 6:45:00 +timedelta(hours=1*6)
09:30:00 9:30:00 +timedelta(hours=1*6)
13:50:00 13:50:00 +timedelta(hours=1*6)
Then for 7 value:
date time x3 first seperate time of 7
10/3/2018 6:15:00 7 6:15:00
10/3/2018 6:45:00 5
10/3/2018 7:45:00 7 7:45:00
10/3/2018 9:00:00 7 9:00:00
10/3/2018 9:25:00 7 9:25:00
10/3/2018 9:30:00 5
10/3/2018 11:00:00 7 11:00:00
10/3/2018 11:30:00 7 11:30:00
10/3/2018 13:30:00 7 13:30:00
Then add timedelta into that time separately:
6:15:00 6:15:00 +timedelta(hours=1*2)
7:45:00 7:45:00 +timedelta(hours=1*2)
9:00:00 9:00:00 +timedelta(hours=1*2)
This timedelta I want to write in for loop
Subset of my csv file:
date time x3
10/3/2018 6:15:00 7
10/3/2018 6:45:00 5
10/3/2018 7:45:00 7
10/3/2018 9:00:00 7
10/3/2018 9:25:00 7
10/3/2018 9:30:00 5
10/3/2018 11:00:00 7
10/3/2018 11:30:00 7
10/3/2018 13:30:00 7
10/3/2018 13:50:00 5
10/3/2018 15:00:00 7
10/3/2018 15:25:00 7
10/3/2018 16:25:00 7
10/3/2018 18:00:00 5
10/3/2018 19:00:00 7
10/3/2018 19:30:00 7
10/3/2018 20:00:00 7
10/3/2018 22:05:00 7
My csv file :
Csv file
For the reference to jezrael:
Here I want get the new value of 5 with the help of one summation equation in the range of timedelta (6 hours) from starting time for each value of 5.
Assume My new input take as X
Then
I will take start time first as t of value of 5
Then
x=[]
x3 = data['X3']
for _ in range (x3):
if x3.all()==5:
for i in range(t+timedelta(hours=1*it)for it in range(1,6)):
X1 = 5 - 0.006 *np.sum(i*5)
x.append(X1)
So each one houe one hour x values are there till to range 6.
For this I required the time and adding timedelta into it inside the for loop
For the refernce:
here just want to use my csv file only read the start time and the value only. Then starting from time add one hour one hour into it to get the new value for x.
take the first 5 value and the time.
my start time =t
X value
start time x
0 5
1 hr 4.97
2 hr 4.94
3 hr 4.91
4 hr 4.88
5 hr 4.85
6 hr 4.82
If in between two value of 5 time range < range of 6 hr
Then add
5 value into 5 hr value then new value
start time x
0 hr 5+4.85
1hr 9.82
If in between two value of 5 time range > range of 6 hr
start time x
0 5
1 hr 4.97
2 hr 4.94
3 hr 4.91
4 hr 4.88
5 hr 4.85
6 hr 4.82
Normally run this code.
So this should be continue.
Use:
data['date_time']= pd.to_datetime(data['date'] + " " + data['time'],
format='%d/%m/%Y %H:%M:%S', dayfirst=True)
mask = data['x3'].eq(5)
data['duration'] = data['date_time'].mask(mask, data['date_time'].dt.floor('d'))
m = mask.cumsum()
mask1 = (m != 0) & ~mask
td = pd.to_timedelta(np.arange(mask1.sum()) % 5 + 1, unit='h')
data['hours'] = (pd.Series(td, index=data.index[mask1])
.reindex(data.index, fill_value=pd.Timedelta(0)))
data['new'] = data['hours'].dt.total_seconds() / 3600
data['new1'] = 5 - 0.006 *data['new']
print (data)
print (data)
date time x3 date_time duration hours \
0 10/3/2018 6:15:00 7 2018-03-10 06:15:00 2018-03-10 06:15:00 00:00:00
1 10/3/2018 6:45:00 5 2018-03-10 06:45:00 2018-03-10 00:00:00 00:00:00
2 10/3/2018 7:45:00 7 2018-03-10 07:45:00 2018-03-10 07:45:00 01:00:00
3 10/3/2018 9:00:00 7 2018-03-10 09:00:00 2018-03-10 09:00:00 02:00:00
4 10/3/2018 9:25:00 7 2018-03-10 09:25:00 2018-03-10 09:25:00 03:00:00
5 10/3/2018 9:30:00 5 2018-03-10 09:30:00 2018-03-10 00:00:00 00:00:00
6 10/3/2018 11:00:00 7 2018-03-10 11:00:00 2018-03-10 11:00:00 04:00:00
7 10/3/2018 11:30:00 7 2018-03-10 11:30:00 2018-03-10 11:30:00 05:00:00
8 10/3/2018 13:30:00 7 2018-03-10 13:30:00 2018-03-10 13:30:00 01:00:00
9 10/3/2018 13:50:00 5 2018-03-10 13:50:00 2018-03-10 00:00:00 00:00:00
10 10/3/2018 15:00:00 7 2018-03-10 15:00:00 2018-03-10 15:00:00 02:00:00
11 10/3/2018 15:25:00 7 2018-03-10 15:25:00 2018-03-10 15:25:00 03:00:00
12 10/3/2018 16:25:00 7 2018-03-10 16:25:00 2018-03-10 16:25:00 04:00:00
13 10/3/2018 18:00:00 5 2018-03-10 18:00:00 2018-03-10 00:00:00 00:00:00
14 10/3/2018 19:00:00 7 2018-03-10 19:00:00 2018-03-10 19:00:00 05:00:00
15 10/3/2018 19:30:00 7 2018-03-10 19:30:00 2018-03-10 19:30:00 01:00:00
16 10/3/2018 20:00:00 7 2018-03-10 20:00:00 2018-03-10 20:00:00 02:00:00
17 10/3/2018 22:05:00 7 2018-03-10 22:05:00 2018-03-10 22:05:00 03:00:00
new new1
0 0.0 5.000
1 0.0 5.000
2 1.0 4.994
3 2.0 4.988
4 3.0 4.982
5 0.0 5.000
6 4.0 4.976
7 5.0 4.970
8 1.0 4.994
9 0.0 5.000
10 2.0 4.988
11 3.0 4.982
12 4.0 4.976
13 0.0 5.000
14 5.0 4.970
15 1.0 4.994
16 2.0 4.988
17 3.0 4.982

why only 0 values are displaying by using summation equation in python

I have a dataset with three inputs named X1,X2,X3 with date and time. Here for my X3 value I created the summation equation to created the new value in every hour according to the X3 value.
My summation equation is:
I want to put that A values into numpy array named X
So here I wrote the code to for the summation equation to create the new data of A :
Y = df['X3'].astype(float)
X=list()
for _ in range(len(Y)):
A=0
if Y.all() ==5:
for i in range(1,16):
A=np.sum(5)*(i)
elif Y.all() ==7:
for i in range(1,16):
A=np.sum(7)*(i)
X.append(A)
print(X)
Then I got only 0 values :
My data:
date time x3
10/3/2018 6:00:00 0
10/3/2018 7:00:00 5
10/3/2018 8:00:00 0
10/3/2018 9:00:00 7
10/3/2018 10:00:00 0
10/3/2018 11:00:00 0
10/3/2018 12:00:00 0
10/3/2018 13:45:00 0
10/3/2018 15:00:00 0
10/3/2018 16:00:00 0
10/3/2018 17:00:00 0
10/3/2018 18:00:00 0
10/3/2018 19:00:00 5
10/3/2018 20:00:00 0
10/3/2018 21:30:00 7
10/4/2018 6:00:00 0
10/4/2018 7:00:00 0
10/4/2018 8:00:00 5
10/4/2018 9:00:00 7
10/4/2018 11:00:00 5
10/4/2018 12:00:00 5
10/4/2018 13:00:00 5
10/4/2018 16:00:00 0
10/4/2018 17:00:00 0
10/4/2018 18:00:00 7
10/5/2018 7:00:00 5
10/5/2018 8:00:00 0
What my desired output is:
Assume that at 7:00:00 a.m have X3 value 5 . so then by using the summation equation one hour one hour A value will be display using this summation equation till to 16 hrs.
Then assume now X3 have 7 value. So then by using summation equation one hour one hour A value will be display using the summation equation till to 16 hrs.
Like wise what ever the values are in X3 it will add the summation equation then will be run the code.
Not sure if it's the desired output, but at least it should get you started.
You first create a datetimeindex from date and time columns; use resample to groupby your datetime index.
df.index = pd.DatetimeIndex(df['date'].astype(str) + ' ' + df['time'].astype(str))
df.resample('16H').sum()
x3
2018-10-03 00:00:00 12
2018-10-03 16:00:00 12
2018-10-04 08:00:00 34
2018-10-05 00:00:00 5

How to convert time as 00:00:00 for specific value using python

I have a dataset with one input and date ,time.
I just want to convert time into 00:00:00 for specific value which is contain in input column, and other time will be display as it is.
I tried a code and it gave me 00:00:00 for specific value , but other time show as NaT.
Can anyone help me to solve the error?
my code:
df['time_diff']= pd.to_datetime(df['date'] + " " + df['time'],
format='%d/%m/%Y %H:%M:%S', dayfirst=True)
mask = df['x3'].eq(5)
df['Duration'] = np.where(df['x3']== 5, df['time_diff'], np.datetime64('NaT') )
df['Duration'] = df['time_diff'].sub(df['Duration']).dt.total_seconds().div(3600)
Then it gave me this output:
date time x3 duration
10/3/2018 6:15:00 0 NaN
10/3/2018 6:45:00 5 00:00:00
10/3/2018 7:45:00 0 NaN
10/3/2018 9:00:00 0 NaN
10/3/2018 9:25:00 0 NaN
10/3/2018 9:30:00 0 NaN
10/3/2018 11:00:00 0 NaN
10/3/2018 11:30:00 0 NaN
10/3/2018 13:30:00 0 NaN
10/3/2018 13:50:00 5 00:00:00
10/3/2018 15:00:00 0 NaN
10/3/2018 15:25:00 0 NaN
10/3/2018 16:25:00 0 NaN
10/3/2018 18:00:00 0 NaN
10/3/2018 19:00:00 0 NaN
10/3/2018 19:30:00 0 NaN
10/3/2018 20:00:00 0 NaN
10/3/2018 22:05:00 0 NaN
10/3/2018 22:15:00 5 00:00:00
10/3/2018 23:40:00 0 NaN
10/4/2018 6:58:00 5 00:00:00
10/4/2018 13:00:00 0 NaN
10/4/2018 16:00:00 0 NaN
10/4/2018 17:00:00 0 NaN
But what I expected output is :
date time x3 duration expected output is
10/3/2018 6:15:00 0 NaN 6:15:00
10/3/2018 6:45:00 5 00:00:00 00:00:00
10/3/2018 7:45:00 0 NaN 7:45:00
10/3/2018 9:00:00 0 NaN 9:00:00
10/3/2018 9:25:00 0 NaN 9:25:00
10/3/2018 9:30:00 0 NaN 9:30:00
10/3/2018 11:00:00 0 NaN 11:00:00
10/3/2018 11:30:00 0 NaN 11:30:00
10/3/2018 13:30:00 0 NaN 13:30:00
10/3/2018 13:50:00 5 00:00:00 00:00:00
10/3/2018 15:00:00 0 NaN 15:00:00
10/3/2018 15:25:00 0 NaN 15:25:00
10/3/2018 16:25:00 0 NaN 16:25:00
10/3/2018 18:00:00 0 NaN 18:00:00
10/3/2018 19:00:00 0 NaN 19:00:00
10/3/2018 19:30:00 0 NaN 19:30:00
10/3/2018 20:00:00 0 NaN 20:00:00
10/3/2018 22:05:00 0 NaN 22:05:00
10/3/2018 22:15:00 5 00:00:00 00:00:00
10/3/2018 23:40:00 0 NaN 23:40:00
10/4/2018 6:58:00 5 00:00:00 00:00:00
10/4/2018 13:00:00 0 NaN 13:00:00
10/4/2018 16:00:00 0 NaN 16:00:00
10/4/2018 17:00:00 0 NaN 17:00:00
Use numpy.where with create new column by condition - with 0 timedelta and with column time converted to timedeltas:
df['Duration'] = np.where(df['x3'].eq(5), np.timedelta64(0), pd.to_timedelta(df['time']))
print (df)
date time x3 Duration
0 10/3/2018 6:15:00 0 06:15:00
1 10/3/2018 6:45:00 5 00:00:00
2 10/3/2018 7:45:00 0 07:45:00
3 10/3/2018 9:00:00 0 09:00:00
4 10/3/2018 9:25:00 0 09:25:00
5 10/3/2018 9:30:00 0 09:30:00
6 10/3/2018 11:00:00 0 11:00:00
7 10/3/2018 11:30:00 0 11:30:00
8 10/3/2018 13:30:00 0 13:30:00
9 10/3/2018 13:50:00 5 00:00:00
10 10/3/2018 15:00:00 0 15:00:00
11 10/3/2018 15:25:00 0 15:25:00
12 10/3/2018 16:25:00 0 16:25:00
13 10/3/2018 18:00:00 0 18:00:00
14 10/3/2018 19:00:00 0 19:00:00
15 10/3/2018 19:30:00 0 19:30:00
16 10/3/2018 20:00:00 0 20:00:00
17 10/3/2018 22:05:00 0 22:05:00
18 10/3/2018 22:15:00 5 00:00:00
19 10/3/2018 23:40:00 0 23:40:00
20 10/4/2018 6:58:00 5 00:00:00
21 10/4/2018 13:00:00 0 13:00:00
22 10/4/2018 16:00:00 0 16:00:00
23 10/4/2018 17:00:00 0 17:00:00

How to put the every start time as 0 in every day for specific column input data using panda python

This question is related to this question
How to get time difference in specifc rows include in one column data using python
Here I have three inputsX1,X2,X3. So here I want to find time difference only X3 inputs.
Code:
df=pd.read_csv('data6.csv')
df['date'] = pd.to_datetime(df['date'] + " " + df['time'], format='%d/%m/%Y %H:%M:%S', dayfirst=True)
df.time = pd.to_datetime(df.time, format="%H:%M:%S")
df = df[df['X3'] != 0]
values_others_rows = np.NaN
sub_df = df[df.X3 != 0]
out_values = (sub_df.time.dt.hour - sub_df.shift().time.dt.hour) \
.to_frame() \
.fillna(sub_df.time.dt.hour.iloc[0]) \
.rename(columns={'time': 'out'}) # Rename column
print(out_values)
df = df.join(out_values) # Add out values
print(df)
When I use this code came time difference but with minus value. Because I have different days values.
I got the value with minus :
As a example:
date time x3
10/3/2018 6:00:00 0
10/3/2018 7:00:00 2 start time =0
10/3/2018 8:00:00 0 time difference=2
10/3/2018 9:00:00 50 first time =9:00:00
10/3/2018 10:00:00 0 :
10/3/2018 11:00:00 0 :
10/3/2018 12:00:00 0 :
10/3/2018 13:45:00 0
10/3/2018 15:00:00 0
10/3/2018 16:00:00 0
10/3/2018 17:00:00 0
10/3/2018 18:00:00 0
10/3/2018 19:00:00 20
10/3/2018 20:00:00 0
10/4/2018 6:00:00 50 new day : start time=0
10/4/2018 7:00:00 50 first time: 7:00:00 time difference=1
10/4/2018 8:00:00 0
10/4/2018 9:00:00 0
10/4/2018 11:00:00 10 second time: 11:00:00 time difference=4
10/4/2018 12:00:00 20
10/4/2018 13:00:00 50
So I want to write this in my code. But I don't know how to write this. Can anyone help me to solve this problem?
My csv file :
CSV file
After using new code nothing display of time difference
After print(df)
When I used the jezrael code again the minus value is showing:
df=pd.read_csv('data6.csv')
df['time'] = pd.to_datetime(df['date'] + " " + df['time'], format='%d/%m/%Y %H:%M:%S', dayfirst=True)
df.time = pd.to_datetime(df.time, format="%d/%m/%Y %H:%M:%S")
df1 = df[df.X3!= 0]
df['new'] = df1['time'].dt.minute.groupby(df1['date']).diff()
df['new'] = df['new'].fillna(0).astype(int)
print(df)
Image of data['new']
But my expected time difference is:
date time x3 time_difference
10/3/2018 6:00:00 0 -
10/3/2018 7:00:00 2 start_time=0
10/3/2018 8:00:00 0
10/3/2018 9:00:00 50 2hr
10/3/2018 10:00:00 0
10/3/2018 11:00:00 0
10/3/2018 12:00:00 0
10/3/2018 13:45:00 0
10/3/2018 15:00:00 0
10/3/2018 16:00:00 0
10/3/2018 17:00:00 0
10/3/2018 18:00:00 0
10/3/2018 19:00:00 20 12hr from starting time
10/3/2018 20:00:00 0
10/4/2018 6:00:00 50 start_time=0
10/4/2018 7:00:00 50 1hr
10/4/2018 8:00:00 0
10/4/2018 9:00:00 0
10/4/2018 11:00:00 10 5hr
10/4/2018 12:00:00 20 6hr
10/4/2018 13:00:00 0
Filter rows by condition and use DataFrameGroupBy.diff for difference, last replace missing values by 0:
df = pd.read_csv('data6 - data6.csv')
#print (df)
df.time = pd.to_datetime(df.time, format="%H:%M:%S")
df1 = df[df.x3 != 0]
df['new'] = df1['time'].dt.hour.groupby(df1['date']).diff()
df['new'] = df['new'].fillna(0).astype(int)
print(df.head(20))
date time x1 x2 x3 new
0 10/3/2018 1900-01-01 06:00:00 63 0 0 0
1 10/3/2018 1900-01-01 07:00:00 63 0 2 0
2 10/3/2018 1900-01-01 08:00:00 104 11 0 0
3 10/3/2018 1900-01-01 09:00:00 93 0 50 2
4 10/3/2018 1900-01-01 10:00:00 177 0 0 0
5 10/3/2018 1900-01-01 11:00:00 133 0 0 0
6 10/3/2018 1900-01-01 12:00:00 70 0 0 0
7 10/3/2018 1900-01-01 13:45:00 83 0 0 0
8 10/3/2018 1900-01-01 15:00:00 127 0 0 0
9 10/3/2018 1900-01-01 16:00:00 205 0 0 0
10 10/3/2018 1900-01-01 17:00:00 298 0 0 0
11 10/3/2018 1900-01-01 18:00:00 234 0 0 0
12 10/3/2018 1900-01-01 19:00:00 148 0 20 10
13 10/3/2018 1900-01-01 20:00:00 135 0 0 0
14 10/3/2018 1900-01-01 21:30:00 100 0 50 2
15 10/4/2018 1900-01-01 06:00:00 166 0 0 0
16 10/4/2018 1900-01-01 07:00:00 60 0 0 0
17 10/4/2018 1900-01-01 08:00:00 120 10 10 0
18 10/4/2018 1900-01-01 09:00:00 80 40 20 1
19 10/4/2018 1900-01-01 11:00:00 60 70 50 2

How to calculate time difference between two difference values continously in pandas using python

Here I have a dataset with three inputs x1,x2,x3 with date and time. Here in my X3 column I have similar values in rows.
What I want to do is I want to find the time difference in similar values in rows when the start time will be 0.
Here I used the code , but it gave me time difference from other columns also.
here is my code:
df['time_diff']= pd.to_datetime(df['date'] + " " + df['time'], format='%d/%m/%Y %H:%M:%S', dayfirst=True)
df['Duration'] = df.groupby('x3')['time_diff'].diff()
Gave me this time difference , But it is not the solution what I look
But my expected output is:
date time x3 Expected output of time difference
10/3/2018 6:00:00 0 NaN
10/3/2018 7:00:00 5 0 =start time for 5
10/3/2018 8:00:00 0 NaN
10/3/2018 9:00:00 7 0=start time for 7
10/3/2018 10:00:00 0 NaN
10/3/2018 11:00:00 0 NaN
10/3/2018 12:00:00 0 NaN
10/3/2018 13:45:00 0 NaN
10/3/2018 15:00:00 0 NaN
10/3/2018 16:00:00 0 NaN
10/3/2018 17:00:00 0 NaN
10/3/2018 18:00:00 0 NaN
10/3/2018 19:00:00 5 12 hr =from starting time of 5
10/3/2018 20:00:00 0 NaN
10/3/2018 21:30:00 7 12.30hr = from starting time of 7
10/4/2018 6:00:00 0 NaN
10/4/2018 7:00:00 0 NaN
10/4/2018 8:00:00 5 0 = starting time of 5 because new day
10/4/2018 9:00:00 7 0 = starting time of 5 because new day
10/4/2018 11:00:00 5 3hr
10/4/2018 12:00:00 5 4hr
10/4/2018 13:00:00 5 5hr
10/4/2018 16:00:00 0 NaN
10/4/2018 17:00:00 0 NaN
10/4/2018 18:00:00 7 11hr
Filter out rows with x3==0 and groupby with both columns with GroupBy.transform and GroupBy.first for reepat first value per all values of group, so possible subtract by original column with converting to hours:
df['time_diff']= pd.to_datetime(df['date'] + " " + df['time'],
format='%d/%m/%Y %H:%M:%S', dayfirst=True)
mask = df['x3'].ne(0)
df['Duration'] = df[mask].groupby(['date','x3'])['time_diff'].transform('first')
df['Duration'] = df['time_diff'].sub(df['Duration']).dt.total_seconds().div(3600)
print (df)
date time x3 Expected time_diff Duration
0 10/3/2018 6:00:00 0 NaN 2018-03-10 06:00:00 NaN
1 10/3/2018 7:00:00 5 0 2018-03-10 07:00:00 0.0
2 10/3/2018 8:00:00 0 NaN 2018-03-10 08:00:00 NaN
3 10/3/2018 9:00:00 7 0 2018-03-10 09:00:00 0.0
4 10/3/2018 10:00:00 0 NaN 2018-03-10 10:00:00 NaN
5 10/3/2018 11:00:00 0 NaN 2018-03-10 11:00:00 NaN
6 10/3/2018 12:00:00 0 NaN 2018-03-10 12:00:00 NaN
7 10/3/2018 13:45:00 0 NaN 2018-03-10 13:45:00 NaN
8 10/3/2018 15:00:00 0 NaN 2018-03-10 15:00:00 NaN
9 10/3/2018 16:00:00 0 NaN 2018-03-10 16:00:00 NaN
10 10/3/2018 17:00:00 0 NaN 2018-03-10 17:00:00 NaN
11 10/3/2018 18:00:00 0 NaN 2018-03-10 18:00:00 NaN
12 10/3/2018 19:00:00 5 12hr 2018-03-10 19:00:00 12.0
13 10/3/2018 20:00:00 0 NaN 2018-03-10 20:00:00 NaN
14 10/3/2018 21:30:00 7 12.30hr 2018-03-10 21:30:00 12.5
15 10/4/2018 6:00:00 0 NaN 2018-04-10 06:00:00 NaN
16 10/4/2018 7:00:00 0 NaN 2018-04-10 07:00:00 NaN
17 10/4/2018 8:00:00 5 0 2018-04-10 08:00:00 0.0
18 10/4/2018 9:00:00 7 0 2018-04-10 09:00:00 0.0
19 10/4/2018 11:00:00 5 3hr 2018-04-10 11:00:00 3.0
20 10/4/2018 12:00:00 5 4hr 2018-04-10 12:00:00 4.0
21 10/4/2018 13:00:00 5 5hr 2018-04-10 13:00:00 5.0
22 10/4/2018 16:00:00 0 NaN 2018-04-10 16:00:00 NaN
23 10/4/2018 17:00:00 0 NaN 2018-04-10 17:00:00 NaN
24 10/4/2018 18:00:00 7 11hr 2018-04-10 18:00:00 9.0
mask = df['x3'].ne(0)
df['Duration'] = df[mask].groupby(['date','x3'])['time_diff'].apply(lambda x : (((x-x.iloc[0])//timedelta(minutes=1))/60))
Output
date time x3 time_diff Duration
10/3/2018 6:00:00 0 2018-03-10 06:00:00 NaN
10/3/2018 7:00:00 5 2018-03-10 07:00:00 0.0
10/3/2018 8:00:00 0 2018-03-10 08:00:00 NaN
10/3/2018 9:00:00 7 2018-03-10 09:00:00 0.0
10/3/2018 10:00:00 0 2018-03-10 10:00:00 NaN
10/3/2018 11:00:00 0 2018-03-10 11:00:00 NaN
10/3/2018 12:00:00 0 2018-03-10 12:00:00 NaN
10/3/2018 13:45:00 0 2018-03-10 13:45:00 NaN
10/3/2018 15:00:00 0 2018-03-10 15:00:00 NaN
10/3/2018 16:00:00 0 2018-03-10 16:00:00 NaN
10/3/2018 17:00:00 0 2018-03-10 17:00:00 NaN
10/3/2018 18:00:00 0 2018-03-10 18:00:00 NaN
10/3/2018 19:00:00 5 2018-03-10 19:00:00 12.0
10/3/2018 20:00:00 0 2018-03-10 20:00:00 NaN
10/3/2018 21:30:00 7 2018-03-10 21:30:00 12.5
10/4/2018 6:00:00 0 2018-04-10 06:00:00 NaN
10/4/2018 7:00:00 0 2018-04-10 07:00:00 NaN
10/4/2018 8:00:00 5 2018-04-10 08:00:00 0.0
10/4/2018 9:00:00 7 2018-04-10 09:00:00 0.0
10/4/2018 11:00:00 5 2018-04-10 11:00:00 3.0
10/4/2018 12:00:00 5 2018-04-10 12:00:00 4.0
10/4/2018 13:00:00 5 2018-04-10 13:00:00 5.0
10/4/2018 16:00:00 0 2018-04-10 16:00:00 NaN
10/4/2018 17:00:00 0 2018-04-10 17:00:00 NaN
10/4/2018 18:00:00 7 2018-04-10 18:00:00 9.0

Resources