Python pandas Calculate difference between 2 timestamp columns in a dataframe excluding weekends in seconds - python-3.x

I have 2 columns with timestamp on them, I need the difference between them in seconds in a 3rd column , excluding weekends. How am I supposed to do this in Python/pandas?
I want it to exclude Saturday/Sunday from the timeline.
Ex -
1 . Starts at Thursday/Friday and ends at Monday/Tuesday - Calculate duration only for the time it lied between Thursday/Friday and then directly Monday/Tuesday.
2 . If it starts on Saturday and ends on Monday - Calculate only for Monday.
3 . If ex.Starts on Friday and ends on Sunday, Calculate only for Friday.
4 . If starts and ends on Saturday and Sunday - result is 0 seconds

First, Convert the timestamp in both the columns to DateTime if not it's already in this format.
Second, find if the day is a weekend or a weekday and use total_seconds method to find the diff in seconds in the following way:
def find_diff_in_secs(row):
day_of_week = row["start"].weekday()
if day_of_week<5:
# Find the diff in secs
diff_in_secs =(row["end"]-row["start"]).total_seconds()
return diff_in_secs
else:
return "NA"
df.apply(find_diff_in_secs)

Related

how to calculate the date from the year, week-of-year and day-of-week in EXCEL?

I have the Year, Week-of-Year and Day-of-the-Week as follows:
Year = 2022 (A2) ; Week Year = 35 (B2); Week Day = 4 or Thursday (C2)
and I would like to estimate the Date as dd.mm.yyyy, which is highlighted in yellow as it shows in the EXCEL picture.
I tried many formulas, but I am sure there might be an easy one.
I think you are counting the weeks starting from zero because for 9/1/2022 (YYYY/MM/DD format) the corresponding week is 36 as per the result of function WEEKNUM(DATE(2022,9,1)). In order to use the logic to multiply the number of weeks by 7. You need to use as a reference the first day of the year, if it was a Sunday, if not then go back to the previous Sunday, so you can count the entire week. Bottom line use as a reference date, the Sunday of the first week of the year, not the first day of the year (YYYY/1/1)
Here is the approach we use in cell E2:
=LET(y, A2:A6, wk, B2:B6, wDay, C2:C6, fDay, DATE(y,1,1), seq, SEQUENCE(7),
fDay - IF(WEEKDAY(fDay)=1,0, WEEKDAY(fDay,2)) + 7*wk
+ XLOOKUP(wDay, TEXT(seq,"dddd"), seq-1))
We use the LET function to avoid repeating the same calculation. The following expression finds the previous Sunday if the first day of the year (fDay) was not a Sunday:
fDay - IF(WEEKDAY(fDay)=1,0, WEEKDAY(fDay,2))
The XLOOKUP function is used to get the numeric representation of the weekday and use the TEXT function to generate the weekdays in a long format. Since we count the entire week, if the weekday is a Sunday (column C in my screenshot), then we don't need to add any day to our reference date, that is why we use seq-1.
Here is the output for several sample data. Assuming the week count starts with zero, if not the formula needs to be adjusted as also the input data.
Notice that the year 2021 started on a Friday, so if we want to find a day for the first week (0) before Friday it will return a date from the previous year. Like in the case of Monday. If you want an error message instead, then the formula can be modified as follow:
=LET(y, A2:A6, wk, B2:B6, wDay, C2:C6, fDay, DATE(y,1,1), seq, SEQUENCE(7),
result, fDay - IF(WEEKDAY(fDay)=1,0, WEEKDAY(fDay,2)) + 7*wk
+ XLOOKUP(wDay, TEXT(seq,"dddd"), seq-1),
IF(YEAR(result) <> y, "ERROR: Date from previous year", result))
I found the solution:
Year = 2022 (A2) ; Week Year = 35 (B2); Week Day = 4 or Thursday (C2)
=DATE (A2,1,3)-WEEKDAY(DATE(A2,1,3)) + 7 * B2 + C2 - 6
I found this solution, but you need to do further testing if it really works.
I calculate month from week: =+MONTH(DATE(YEAR(A2);1;1)+B2*7-1)
I calculate week day number from week day name: =MATCH(D2;{"Monday";"Tuesday";"Wednesday";"Thursday";"Friday";"Saturday";"Sunday"};0)
And then make date using: =DATE(A2;C2;E2)

Is there any function in excel to find day time between two date and time in Excel?

I need a formula to calculate between two date and time excluding lunch time, holidays, weekend and between 09:00 and 18:00 work hours.
For example, 25/07/2022 12:00 and 29/07/2022 10:00 and answer has to be 1 day, 06:00
Thanks in advance.
I had a formula but it didn't work when hours bigger than 24 hours.
I don't know how you got to 1 day and 6 hours, but here is a customizable way to filter your time difference calculation:
=LET(
start,E3,
end,E4,
holidays,$B$3:$B$5,
array,SEQUENCE(INT(end)-INT(start)+1,24,INT(start),TIME(1,0,0)),
crit_1,array>=start,
crit_2,array<=end,
crit_3,WEEKDAY(array,2)<6,
crit_4,HOUR(array)>=9,
crit_5,HOUR(array)<=18,
crit_6,HOUR(array)<>13,
crit_7,ISERROR(MATCH(DATE(YEAR(array),MONTH(array),DAY(array)),holidays,0)),
result,SUM(crit_1*crit_2*crit_3*crit_4*crit_5*crit_6*crit_7),
result
)
Limitation
This solution only works on an hourly level, i.e. the start and end dates and times will only be considered on a full hour basis. When providing times like 12:45 as input, the 15 minute increment won't be accounted for.
Explanation
The 4th item in the LET() function SEQUENCE(INT(end)-INT(start)+1,24,INT(start),TIME(1,0,0)) creates an array that contains all hours within the start and end date of the range:
(transposed for illustrative purposes)
then, based on that array, the different 'crit_n' statements are the individual criteria you mentioned. For example, crit_1,array>=start means that only the dates and times after the start date and time will be counted, or crit_6,HOUR(array)<>13 is the lunch break (assuming the 13th hour is lunch time), ...
All of the individual crit_n's are then arrays of the same size containing TRUE and FALSE elements.
At the end of the LET() function, by multiplying all the individual crit_n arrays, the product returns a single array that will then only contain those hours where all individual criteria statements are TRUE:
So then the SUM() function is simply returning the total number of hours that fit all criteria.
Example
I assumed lunch hour to be hour 13, and I assumed the 28th to be a holiday within the given range. With those assumptions and the other criteria you already specified above, I'm getting the following result:
Which looks like this when going into the formula bar:
In cell G2, you can put the following formula:
=LET(from,A2:A4,to,B2:B4,holidays,C2:C2,startHr,E1,endHr,E2, lunchS, E3, lunchE, E4,
CALC, LAMBDA(date,isFrom, LET(noWkDay, NETWORKDAYS(date,date,holidays)=0,
IF(noWkDay, 0, LET(d, INT(date), start, d + startHr, end, d + endHr,
noOverlap, IF(isFrom, date > end, date < start), lunchDur, lunchE-lunchS,
ls, d + lunchS, le, d + lunchE,
isInner, IF(isFrom, date > start, date < end),
diff, IF(isFrom, end-date-1 - IF(date < ls, lunchDur, 0),
date-start-1 - IF(date > le, lunchDur, 0)),
IF(noOverlap, -1, IF(isInner, diff, 0)))))),
MAP(from,to,LAMBDA(ff,tt, LET(wkdays, NETWORKDAYS(ff,tt,holidays),
duration, wkdays + CALC(ff, TRUE) + CALC(tt, FALSE),
days, INT(duration), time, duration - TRUNC(duration),
TEXT(days, "d") &" days "& TEXT(time, "hh:mm") &" hrs "
)))
)
and here is the output:
Explanation
Used LET function for easy reading and composition. The main idea is first to calculate the number of working days excluding holidays from column value to to column value. We use for that NETWORKDAYS function. Once we have this value for each row, we need to adjust it considering the first day and last day of the interval, in case we cannot count as a full day and instead considering hours. For inner days (not start/end of the interval) it is counted as an entire day.
We use MAP function to do the calculation over all values of from and to names. For each corresponding value (ff, tt) we calculate the working days (wkdays). Once we have this value, we use the user LAMBDA function CALC to adjust it. The function has a second input argument isFrom to consider both scenarios, i.e. adjustment at the beginning of the interval (isFrom = TRUE) or to the end of the interval (isFrom=FALSE). The first input argument is the given date.
In case the input date of CALC is a non working day, we don't need to make any adjustment. We check it with the name noWkDay. If that is not the case, then we need we need to determine if there is no overlap (noOverlap):
IF(isFrom, date > end, date < start)
where start, end names correspond to the same date as date, but with different hours corresponding to start Hr and end Hr (E1:E2). For example for the first row, there is no overlap, because the end date doesn't have hour information, i.e. (12:00 AM), in such case the corresponding date should not be taken into account and CALC returns -1, i.e. one day needs to be subtracted.
In case we have overlap, then we need to consider the case the working hours are lower than the maximum working hours (from 9:00 to 18:00). It is identified with the name isInner. If that is the case, we calculate the actual hours. We need to subtract 1 because it is going to be one less full working day and instead to consider the corresponding hours (that should be less than 9hrs, which is the maximum workday duration). The calculation is carried under the name diff:
IF(isFrom, end-date-1 - IF(date < ls, lunchDur, 0),
date-start-1 - IF(date > le, lunchDur, 0))
If the actual start is before the start of the lunch time (ls), then we need to subtract lunch duration (lunchDur). Similarly if the actual end is is after lunch time, we need to discount it too.
Finally, we use CALC to calculate the interval duration:
wkdays + CALC(ff, TRUE) + CALC(tt, FALSE)
Once we have this information, it is just to put in the specified format indicating days and hours.
Now let's review some of the sample input data and results:
The interval starts on Monday 7/25 and ends on Friday 7/29, therefore we have 5 working days, but 7/26 is a holiday, so the maximum number of working days will be 4 days.
For the interval [7/25, 7/29] starts and ends on midnight (12:00 AM), therefore the last day of the interval should not be considered, so actual working days will be 3.
Interval [7/25 10:00, 7/29 17:00]. For the start of the interval we cannot count one day, instead 8hrs and for the end of the interval, the same situation 8hrs, so instead of 4days we are goin to have 2days plus 16hrs, but we need to subtract in both cases the lunch duration (1hr) so the final result will be 2 days 14hrs.

Get difference between two week days that are in string

Problem Statement:
Am developing a custom job scheduler that needs to be run on given days. It takes start date and end date as string and third param is list of week days on which job should run.
Start day can be different with given days but first job should run on next valid day
Let suppose Start date is 2022-09-07 (so day name is Wednesday) but given frequency days are ["Monday", "Friday", "Saturday"] so i need to run my first job on coming Friday and for this i need to calculate difference between start date and first valid day (in this case it's Friday)
So how can i do this python to run my first job on valid day (that can be in any position of given frequency days list) and also after one job complete i need to also get next valid day. I did some work but unfortunately its not working. Here is what i did
sorted_week_days_list = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
start_date = "2022-09-07"
valid_frequency_days = ["Monday", "Tuesday", "Friday"] # It can be any days in sorted order
start_date_object = datetime.datetime.strptime(start_date, "%Y-%m-%d")
given_start_day = start_date_object.strftime("%A")
if given_start_day not in valid_frequency_days:
# Need help to implement logic to get date for valid day
You should use the datetime.weekday() method to pull out the day of the week for days of interest. Assuming that you have dates similar to the format you show above, it is easy to convert, and also just use the day index for your "allowable start days" (Monday=0).
Then you can jig up a little function to look for the next start date in your sorted list and figure out how many days you need to wait.
Example below does that and also "rolls over" the weekend as needed.
Code:
from datetime import datetime, timedelta
from bisect import bisect_left
start_date = "2022-09-09"
valid_start_dates = [1, 4] # It can be any days in sorted order
start_date_object = datetime.strptime(start_date, "%Y-%m-%d")
d=start_date_object.weekday()
print(f'the numbered day of the week is: {d}')
def days_till_start(day, valid_start_days):
idx = bisect_left(valid_start_days, day)
if idx >= len(valid_start_days): # wrap around to next start
return valid_start_days[0] + 7 - day
elif day == valid_start_days[idx]:
return 0
else:
return valid_start_days[idx] - day
print(days_till_start(d, valid_start_dates))
start_dates = ['2022-09-05', '2022-09-06', '2022-09-07', '2022-09-08', '2022-09-09', '2022-09-10']
start_wkdys = [datetime.strptime(d, "%Y-%m-%d").weekday() for d in start_dates]
for d in start_wkdys:
print(f'day index is: {d}')
print(f'next start date is {days_till_start(d, valid_start_dates)} away')
print()
Output:
the numbered day of the week is: 4
0
day index is: 0
next start date is 1 away
day index is: 1
next start date is 0 away
day index is: 2
next start date is 2 away
day index is: 3
next start date is 1 away
day index is: 4
next start date is 0 away
day index is: 5
next start date is 3 away

What is the purpose of offset in conjunction with datetime in Python 3.x?

I'm new to python and I'm looking to work with datetime. I have some files generated every Sunday and I like to move the furthest Sunday out of the current folder eg: 2020-04-12, 2020-04-19, 2020-04-26.
I have found some examples on getting a specific date from today's date and I was able to modify it a tab bit. Eg. I can go back and get last week's Sunday with a specific date:
from datetime import date
from datetime import timedelta
import datetime
today = datetime.datetime(2020,4,13)
offset = (today.weekday() + 1) % 7
sunday = today - timedelta(days=offset)
#print (offset)
print(sunday)
I am confused by the offset variable. What is (today.weekday() + 1) % 7 doing? I have read the Python doc and not quite wrapping my head around it. With +1, I get the date 2020-04-12, which is a Sunday, great. When I do -1 (the other thing is if I set it to (today.weekday() - 1) % 7), I get 2020-04-07, a Tuesday. How did it jump from Sunday the 12th to Tuesday the 7th?
Additionally, how do I get it to jump back 3 weeks? that's where I'm also stuck.
Alright, so if today's Wednesday, then today.weekday() is 2, because it starts counting from 0 on Monday. Not sure why, but that's life.
So (2 + 1) % 7) = 3. That means that 3 days ago was Sunday. Hence your code:
offset = (today.weekday() + 1) % 7 # How many days ago was sunday
sunday = today - timedelta(days=offset) # Go backwards from today that many days
You'll notice that if you subtract one instead of add one, that means we're going backwards (because we're sutracting the timedelta object) by two fewer days than before (because 2 - 1 is equivalent to (2 + 1) - 2, that is, two fewer days). If you started by going backwards enough days to get to Sunday, and now you're going backwards two fewer days, you'll end up on Tuesday, which is two days later than Sunday.
The easiest way to shift which week you're headed to is to set the weeks argument in timedelta:
n_weeks = 3
sunday = today - timedelta(days=offset, weeks=n_weeks)
that's equivalent to, but much prettier than:
sunday = today - timedelta(days=offset + n_weeks * 7)

Blue Prism How to calculate text date times?

is it possible to calculate text dates? Or to remove just the rest of my text?
I need to calculate days of leave. I have two data items and I must substract one from another. For example 11/3/2017 12:00:00 AM - 3/31/2017 12:00:00 AM. How can I do this? So far I've tried Replace function. I wanted to use Mid, Trim and Left but the characters in my date items are going to change. so sometimes I will have 9 characters (11/3/2017) and sometimes 8 (1/3/2017).
Which function could I use ?
this format is month/day/year and mine is day/month/year
Please see below how this could be achieved using the out of the box function available in Blue Prism Calculate Stage.
DateDiff(9, ToDate([Date1]), ToDate([Date2]))
DateDiff is an out of the box function which is used to find the difference in date.
DateDiff (interval,date1,date2)
Parameters
The three parameters are as follows:
Interval - A code specifying the desired units of the return value.
date1 - The first of the two dates for comparison.
date2 - The second of the two dates for comparison.
Interval Definition
0 - Year
1 - Week of year (Calendar week)
2 - Weekday (Full 7 day week)
3 - Second
4 - Quarter
5 - Month
6 - Minute
7 - Hour
8 - Day of year
9 - Day
ToDate([Text}) This function is used to convert text to Date, provided the format matches.
Date1 - 3/31/2017 12:00:00 AM
Date2 - 11/3/2017 12:00:00 AM
DateDiff Result is 217 days as we are using interval 9 to find days. (Please see above the interval option for more options)

Resources