From time products to numberic products - python-3.x

I have a list of products which is defined by time (half-hour). I would like to change from time to numbers without have to use 48 if functions:
From: To:
0000 1
0030 2
0100 3
0130 4
.... ...
2330 48
Do you guys have any smart shortcut to this?

You need to split the string into the hours and minutes parts. This can be done by slicing the string and then converting to integers.
From here, each hour represents 2 half-hours so we multiply by 2 and since you are saying the time 00:00 has one half-hour (?) we need to add 2 if there is 30 mins in the minutes parts, otherwise add just 1 to account for that initial offset. To check whether the minutes is 30, we could do ==30 but since the only other case is 0, we can just check the truthiness of the minutes value in a ternary.
So, the one liner:
def half_hour(t: str):
return int(t[:2]) * 2 + (2 if int(t[2:]) else 1)
and some tests:
>>> half_hour('0000')
1
>>> half_hour('0030')
2
>>> half_hour('0100')
3
>>> half_hour('0130')
4
>>> half_hour('2330')
48

You could do this with the datetime module if you wanted to. It's perhaps a little clumsy since time-only datetime instances are assumed to be on January 1, 1900, but it doesn't require that you do the string parsing yourself:
_epoch = datetime.datetime(1900, 1, 1)
_halfhour = datetime.timedelta(minutes=30)
def half_hour(time_string):
parsed_time = datetime.datetime.strptime(time_string, '%H%M')
delta = parsed_time - _epoch
return delta // _halfhour + 1

Related

Sequentially comparing groupby values conditionally

Given a dataframe
data = [['Bob','25'],['Alice','46'],['Alice','47'],['Charlie','19'],
['Charlie','19'],['Charlie','19'],['Doug','23'],['Doug','35'],['Doug','35.5']]
df = pd.DataFrame(data, columns = ['Customer','Sequence'])
Calculate the following:
First Sequence in each group is assigned a GroupID of 1.
Compare first Sequence to subsequent Sequence values in each group.
If difference is greater than .5, increment GroupID.
If GroupID was incremented, instead of comparing subsequent values to the first, use the current Sequence.
In the desired results table below...
Bob only has 1 record so the GroupID is 1.
Alice has 2 records and the difference between the two Sequence values (46 & 47) is greater than .5 so the GroupID is incremented.
Charlie's Sequence values are all the same, so all records get GroupID 1.
For Doug, the difference between the first two Sequence values (23 & 35) is greater than .5, so the GroupID for the second Sequence becomes 2. Now, since the GroupID was incremented, I want to compare the next value of 35.5 to 35, not 23, which means the last two rows share the same GroupID.
Desired results:
CustomerID
Sequence
GroupID
Bob
25
1
Alice
46
1
Alice
47
2
Charlie
19
1
Charlie
19
1
Charlie
19
1
Doug
23
1
Doug
35
2
Doug
35.5
2
My implementation:
# generate unique ID based on each customers Sequence
df['EventID'] = df.groupby('Customer')[
'Sequence'].transform(lambda x: pd.factorize(x)[0]) + 1
# impute first Sequence for each customer for comparison
df['FirstSeq'] = np.where(
df['EventID'] == 1, df['Sequence'], np.nan
)
# groupby and fill first Sequence forward
df['FirstSeq'] = df.groupby('Customer')[
'FirstSeq'].transform(lambda v: v.ffill())
# get difference of first Sequence and all others
df['FirstSeqDiff'] = abs(df['FirstSeq'] - df['Sequence'])
# create unique GroupID based on Sequence difference from first Sequence
df["GroupID"] = np.cumsum(df.FirstSeqDiff > 0.5) + 1
The above works for cases like Bob, Alice and Charlie but not Doug because it is always comparing to the first Sequence. How can I modify the code to change the compared Sequence value if the GroupID is incremented?
EDIT:
The dataframe will always be sorted by Customer and Sequence. I guess a better way to explain my goal is to assign a unique ID to all Sequence values whose difference are .5 or less, grouping by Customer.
The code has errors -> add df = df.astype({'Customer':str,'Sequence':np.float64}) would fix it. But still you cannot get what you want with this design. Try to define your own lambda function myfunc, which solves your problem directly:
data = [['Bob','25'],['Alice','46'],['Alice','47'],['Charlie','19'],
['Charlie','19'],['Charlie','19'],['Doug','23'],['Doug','35'],['Doug','35.5']]
df = pd.DataFrame(data, columns = ['Customer','Sequence'])
df = df.astype({'Customer':str,'Sequence':np.float64})
def myfunc(series):
ret = []
series = series.sort_values().values
for i,val in enumerate(series):
if i==0:
ret.append(1)
else:
ret.append(ret[-1]+(series[i]-series[i-1]>0.5))
return ret
df['EventID'] = df.groupby('Customer')[
'Sequence'].transform(lambda x: myfunc(x))
print (df)
Happy coding my friend.

Comparison between dates starts with -1

I have the following code:
import pandas as pd
from datetime import datetime, timedelta
df = pd.DataFrame ({
'Date':['4/22/2020 14:32:10','4/21/2020 4:32:10','4/20/2020 1:32:10']
})
date ='04/22/2020'
datetime_object = datetime.strptime(date, '%m/%d/%Y')
df['Date'] = pd.to_datetime(df['Date'],format='%m/%d/%Y %H:%M:%S')
days_diff = (datetime_object - df['Date']).dt.days
print(days_diff)
0 -1
1 0
2 1
Why the result is not looking like the one below? Why the no of days starts with -1 and not with 0?
0 0
1 1
2 2
This is because it's flooring the answers
for the first case
'4/22/2020 14:32:10' the diff is = -14/ 24 = ~ -0.6 days
o/p:- -1
for the second case
'4/21/2020 4:32:10' the diff is = 20/24 = ~ 0.8 days
o/p:- 0
for the third case
'4/20/2020 1:32:10' the difff is = 47/24 = ~1.9 days
o/p:- 1
I hope it helps.
Solution would be convert all the datetimes to dates
as in following line i have done with 'Date' column
days_diff = (datetime_object.date() - df['Date'].dt.date ).dt.days
In [32]: days_diff
Out[32]:
0 0
1 1
2 2
Name: Date, dtype: int64
The issue is to do with the fact you are subtracting the higher date from the lower date which leaves you with a negative result. In the datetime module, subtracting one date object from another creates a time delta object like so
days1 = self.toordinal()
days2 = other.toordinal()
secs1 = self._second + self._minute * 60 + self._hour * 3600
secs2 = other._second + other._minute * 60 + other._hour * 3600
base = timedelta(days1 - days2,
secs1 - secs2,
self._microsecond - other._microsecond)
If we mimic that with your dates we see the following days and secs created for each date object
737537 0
737537 52330
subtracting day2 from days1 and secs2 form secs 1 means we pass the following to the timedelta object
0 -52330
So we are saying create a time delta object where the difference is 0 days and negative 52,330 seconds. Which is quite correct. However the timedelta object is a complex object and allows fractional values, and also many other types, like weeks or minutes etc. it also does not apply any limits to the values. so in the seconds part you can pass 10 seconds or 100,000 seconds. Now 100,000 seconds is actually more seconds than there are in a day. So the code takes this into account and will divmod the seconds to work out if there are any extra days in these seconds.
days, seconds = divmod(seconds, 24*3600)
d += days
s += int(seconds) # can't overflow
Now Here the issue lies in understanding what divmod does. div mod will do a floor division and remainder of the calculation. Now in a positive case thats fine.
print(divmod(52330, 24*3600))
print(divmod(-52330, 24*3600))
(0, 52330)
(-1, 34070)
Since the floor division will round down to 0 days and return you the remaining seconds. However in the negative case the floor division will round down to -1 since -52330 / 86400 is -0.6056.... So floor division rounds this down to -1 and the remainder is the difference between between 86400 and 52330 so leaves 34070 seconds.
So you wouldnt face this issue if you are always subtracting the oldest date from the newest date so you never end up with a negative difference. Infact it doesnt make sense to subtract a newer date from an older date.
for the other cases you listed the difference between 4/21/2020 4:32:10 and 4/22/2020 00:00:00 is indeed 0 days since the difference is actually only 20 hours, this behavior is correct the difference is not 1 days its 20 hours.

How Do I Minus A Total From It's Next Highest Multiple Of 10

So I have a total, say 24, I need my code to find the nearest hightest multiple of 10. This would be 30 of course, so I need the code to calculate (30-24). If the number is 20 it would be 20 beucase it is equal to a highest muliple of 10. I then need to store the result for later use.
>>> def nh(val):
... return 9 - ((val + 9) % 10)
...
>>> nh(24)
6
>>> nh(20)
0

In excel,how do you generate random times so that they only contain even-numbered hours (e.g., 2:46pm, 4:43pm, 6:32pm, etc.) between 10 am and 8pm?

In excel,how do you generate random times so that they only contain even-numbered hours (e.g., 2:46pm, 4:43pm, 6:32pm, etc.) between 10 am and 8pm?
For your hours, you need to generate a random number between 5 and 9 then multiply it by 2. The minutes can be completely random and it looks like you don't care about the seconds so i will choose 0.
=TIME(RANDBETWEEN(5,9)*2,RANDBETWEEN(0,59),0)
If you want to include 8:00 PM, the following should make it equally random:
=IF(RANDBETWEEN(1,5*60+1)=1,TIME(20,0,0),TIME(RANDBETWEEN(5,9)*2,RANDBETWEEN(0,59),0))
Consider:
=TIME(RANDBETWEEN(5,9)*2,RANDBETWEEN(0,60),0)
Try this
TIME(INT(RAND()*5)*2+10,INT(RAND()*60),INT(RAND()*60))
UPDATE
Consider include 8:00pm, another way here(using two columns):
A: =TIME(0, INT(RAND()*(5*60+1)), 0) -- from 00:00:00 to 05:00:00 (include 5*60 seconds)
-- RAND() >=0 but < 1 so INT(RAND()*(5*60+1)) has the same effect to RANDBETWEEN(0,5*60)
B: =TIME(10,0,0) + TIME(HOUR(A1)*2,MINUTE(A1), 0) -- from 10:00:00 to 18:00:00 with even hours
-- and 20:00:00 included
-- more concise version by #MarkBalhoff's remiding in the comment
B: =TIME(10 + HOUR(A1)*2, MINUTE(A1), 0)

Rounding Up Minutes above 8, 23, 38, 53 to the nearest quarter hour

Here’s a challenge for someone. I’m trying to round session times up to the nearest quarter hour (I report my total client hours for license credentialing)
8 minutes or above: round up to 15
23 minutes or above: round up to 30
38 minutes or above: round up to 45
53 minutes or above: round up to 60
Ex: in the first quarter hour, minutes below 8 will be their exact value: 1=1, 2=2, 3=3.
When 8 is entered, it is automatically rounded up to 15 (the same holds true for the rest of the hour: ex: 16=16, 17=17, 18=18, 19=19, 20=20, 21=21, 22=22. But 23 through 29 are all rounded up to 30.
Ideally, I could enter both hours and minutes in a single column, ex: 1.54
However, I realize that it may be necessary to create a separate column for hours and minutes in order to make this work (i.e., so that my formula is only concerned with rounding up minutes. I can add my hours and minutes together after the minutes are rounded.) Thus:
Column A = Hours (3 hours maximum)
Column B = Minutes
Column C = Minutes Rounded up to nearest ¼ hour
Column D = Col A + Col C
In column B I would like to enter minutes as 1 through 60 (no decimal- i.e., in General, not Time format)
38 minutes in column B would automatically be rounded up to 45 minutes in column C
Does anyone have any ideas? How can I do this using the fewest number of columns?
[A Previously posted question - "Round up to nearest quarter" - introduces the concept of Math.Ceiling. Is this something I should use? I couldn't wrap my head around the answer).
With Grateful Thanks,
~ Jay
How's this go?
DECLARE #time DATETIME = '2014-03-19T09:59:00'
SELECT CASE
WHEN DATEPART(mi, #time) BETWEEN 8 AND 15 THEN DATEADD(mi, 15-DATEPART(mi, #time), #time)
WHEN DATEPART(mi, #time) BETWEEN 23 AND 30 THEN DATEADD(mi, 30-DATEPART(mi, #time), #time)
WHEN DATEPART(mi, #time) BETWEEN 38 AND 45 THEN DATEADD(mi, 45-DATEPART(mi, #time), #time)
WHEN DATEPART(mi, #time) BETWEEN 53 AND 59 THEN DATEADD(mi, 60-DATEPART(mi, #time), #time)
ELSE #time
END
Assume "sessions" is your table (CTE below contains 2 sample records), with session start time & end time stored (as noted in comments above, just store these data points, don't store the calculated values). You might be able to do the rounding as below. (not sure if this is what you want, since it either rounds up or down... do you not want to round down?)
;WITH sessions AS (
SELECT CAST('20140317 12:00' AS DATETIME) AS session_start, CAST('20140317 12:38' AS DATETIME) AS session_end
UNION ALL
SELECT CAST('20140317 12:00' AS DATETIME), CAST('20140317 12:37:59' AS DATETIME) AS session_end
)
SELECT *, DATEDIFF(MINUTE, session_start, session_end) AS session_time
, ROUND(DATEDIFF(MINUTE, session_start, session_end)/15.0, 0) * 15.0 AS bill_time
FROM sessions;
EDIT:
Hi Jay, I don't think you mentioned it is an Excel problem! I was assuming SQL. As Stuart suggested in a comment above, it would be helpful if you modified your question to indicate it is for Excel, so that others can possibly get help from this dialog in the future.
With Excel, you can do it with two columns that contain the session start date and time (column A) and session end date and time (column B), plus two formulas:
Column C (Actual Minutes) = ROUND((B1-A1) * 1440,0)
Column D (Billing Minutes) = (FLOOR(C1/15, 1) * 15) + IF(MOD(C1,15) >= 8, 15, MOD(C1,15))
This is what my table looks like:
3/18/2014 12:00 3/18/2014 12:38 38 45
3/18/2014 14:00 3/18/2014 14:37 37 37

Resources