As the title suggests, I have an hourly df looks like this:
date_time traffic_volume
date_time
2012-10-02 09:00:00 2012-10-02 09:00:00 5545.0
2012-10-02 10:00:00 2012-10-02 10:00:00 4516.0
2012-10-02 11:00:00 2012-10-02 11:00:00 NaN
2012-10-02 12:00:00 2012-10-02 12:00:00 NaN
2012-10-02 13:00:00 2012-10-02 13:00:00 NaN
2012-10-02 14:00:00 2012-10-02 14:00:00 NaN
2012-10-02 15:00:00 2012-10-02 15:00:00 5584.0
2012-10-02 16:00:00 2012-10-02 16:00:00 6015.0
The majority of the NaNs I imputed using
df['traffic_volume'] = df['traffic_volume'].interpolate(method='time')
The problem now is that for a certain subset of time-series (the remaining NaN's), I want to impute by putting the same value of that day but last year.
I used
df['traffic_volume'] = df.apply(lambda x: df.loc[ x['date_time'] + pd.offsets.DateOffset(years=-1)]['traffic_volume'] if x['traffic_volume']==np.NaN else x['traffic_volume'], axis=1)
The line of code ran but my NaN's weren't Imputed. My question is why? and if there is a better way what is it?
Thank you.
P.S The reason I don't want to use bfill, ffill or interpolate is because the sequence of NaN's are too much and the data loses granularity.
The fix is to use pd.isna(x['traffic']) instead of x['traffic_volume']==np.NaN for the if condition in the lambda. I still don't understand why the initial line ran but didn't impute.
I would like to sum the sales depending on the time diapasons of the day-night in which they occur. For example, I would like to sum all sales that happened between 22:00h and 2:00h.
Hour Sales
18:58 49
18:00 49.5
03:01 31
20:00 139
09:15 61.5
11:36 5
08:00 24
16:32 25
12:30 96.5
17:30 75.5
09:00 80
00:10 24
15:00 24
18:00 216
09:30 24
06:30 47.5
So if I try to do a sumifs where the hour is >=22:00 and <23:00, the formula works. However, if I try to sumifs the values between 22:00 and 2:00, in other words the first criteria is ">=22:00" and the second is "<2:00", the sumifs cannot work. I do understand why but I'm struggling to find an alternative way to solve this task.
As stated, we need to add 1 when it rolls to the next day, which means SUMPRODUCT:
=SUMPRODUCT($B$2:$B$17,((E2<D2)+$A$2:$A$17>=D2)*(($A$2:$A$17<D2)+$A$2:$A$17<(E2<D2)+E2))
I have a list of data with total number of orders and I would like to calculate the average number of orders per day of the week. For example, average number of order on Monday.
0 2018-01-01 00:00:00 3162
1 2018-01-02 00:00:00 1146
2 2018-01-03 00:00:00 396
3 2018-01-04 00:00:00 848
4 2018-01-05 00:00:00 1624
5 2018-01-06 00:00:00 3052
6 2018-01-07 00:00:00 3674
7 2018-01-08 00:00:00 1768
8 2018-01-09 00:00:00 1190
9 2018-01-10 00:00:00 382
10 2018-01-11 00:00:00 3170
Make sure your date column is in datetime format (looks like it already is)
Add column to convert date to day of week
Group by the day of week and take average
df['Date'] = pd.to_datetime(df['Date']) # Step 1
df['DayofWeek'] =df['Date'].dt.day_name() # Step 2
df.groupby(['DayofWeek']).mean() # Step 3
I have an issue when I try to convert Local date/time to UTC date/time during the winter change hour.
I have a list of each date of a given year with 30 min step:
Those dates are in Local time.
Then I am converting them into UTC. I use
local = pytz.timezone("Europe/Paris")
utc = pytz.timezone("UTC")
Local_time.astimezone(utc).strftime()
It worked well during the first change hour of the year:
Here is the local time (-1 hour)):
2019-03-31 01:00:00 2019-03-31 01:30:00
2019-03-31 01:30:00 2019-03-31 03:00:00
2019-03-31 03:00:00 2019-03-31 03:30:00
Then the output is:
2019-03-31T00:00:00Z 2019-03-31T00:30:00Z
2019-03-31T00:30:00Z 2019-03-31T01:00:00Z
2019-03-31T01:00:00Z 2019-03-31T01:30:00Z
So here everything is fine.
But during the second change of hour:
Here is the local time with (+1 hour):
2019-10-27 01:30:00 2019-10-27 02:00:00
2019-10-27 02:00:00 2019-10-27 02:30:00
2019-10-27 02:30:00 2019-10-27 02:00:00
2019-10-27 02:00:00 2019-10-27 02:30:00
2019-10-27 02:30:00 2019-10-27 03:00:00
Then in UTC, it gives me :
2019-10-26T23:30:00Z 2019-10-27T00:00:00Z
2019-10-27T00:00:00Z 2019-10-27T00:30:00Z
2019-10-27T00:30:00Z 2019-10-27T00:00:00Z
2019-10-27T00:00:00Z 2019-10-27T00:30:00Z
2019-10-27T00:30:00Z 2019-10-27T02:00:00Z
I cannot figure out it put off 2 heures everywhere, whereas it should put off only one hour from 03:00 local time ??
I sorry to have asked this questions, I found it by myself actually.
It was quiet obvious..
I used the offset I had in input, with dateutil library
It worked !
I need to write an 'if' statement to output either DAY, NIGHT, or WEEKEND based on the day of the week and times as below:
output DAY if the date and time is Monday to Friday 7am to 9pm
output NIGHT if the date and time is Monday to Thursday 9pm to 7am
output WEEKEND if the date and time is Friday 9pm to Monday 7am.
My data comes like this below in half hourly increments :
24/04/2015 16:30
24/04/2015 18:00
24/04/2015 18:30
24/04/2015 20:30
24/04/2015 21:00
24/04/2015 21:30
24/04/2015 23:00
24/04/2015 23:30
25/04/2015 0:00
25/04/2015 0:30
25/04/2015 1:00
25/04/2015 10:00
25/04/2015 11:30
25/04/2015 22:00
25/04/2015 22:30
25/04/2015 23:00
25/04/2015 23:30
26/04/2015 0:00
26/04/2015 0:30
26/04/2015 18:30
26/04/2015 19:00
26/04/2015 19:30
26/04/2015 20:00
26/04/2015 20:30
26/04/2015 21:00
26/04/2015 21:30
26/04/2015 23:00
26/04/2015 23:30
27/04/2015 0:00
27/04/2015 0:30
27/04/2015 1:00
27/04/2015 6:30
27/04/2015 7:00
27/04/2015 7:30
(theres a total of 17,000 rows of half hourly data for an entire year, SO I have altered some of the days and times so that it's easier to work with and there should be some data that matches all 3 of the DAY, NGHT, WEEKEND criteria)
I have studied this solution here https://stackoverflow.com/a/15754238/1602250, and it makes sense but I can't get it to work.
I've output the day of the week into a 2nd column and tried this:
=IF(AND(A2="Fri",A1=">9:00:01 p.m.",A1="<7:00:01 a.m."),"WEEKEND") - but this needs to say between Fri9pm and Mon7am.
I've also tried this one, doesn't work either.
=IF(OR(A2="Mon",A2="Tue",A2="Wed",A2="Thu",A2="Fri"),IF(A1=">7:00:00 a.m.", A1="<9:00:00 p.m.", "DAY", IF(AND(OR(A2="Sat",A2="Sun", "WEEKEND")))
Please can someone help, I'm going half crazy...
I suppose your data in column A, and datatype is text.
So I'll get date/time.
Column B: Get date: =DATE(MID(A2,7,4),MID(A2,4,2),LEFT(A2,2))
Column C: Get time: =RIGHT(A2,LEN(A2)-11)
Column D: Do req: =IF(AND(WEEKDAY(B2)>=2,WEEKDAY(B2)<=6,TIMEVALUE(C2)>=TIMEVALUE(TEXT("7:00","HH:mm")),TIMEVALUE(C2)<TIMEVALUE(TEXT("21:00","HH:mm"))),"DAY",IF(AND(WEEKDAY(B2)>=2,WEEKDAY(B2)<=5,OR(TIMEVALUE(C2)>TIMEVALUE(TEXT("7:00","HH:mm")),TIMEVALUE(C2)>=TIMEVALUE(TEXT("21:00","HH:mm")))),"NIGHT","WEEKEND"))
please see attachment. Hope this help.