Comparison between dates starts with -1 - python-3.x

I have the following code:
import pandas as pd
from datetime import datetime, timedelta
df = pd.DataFrame ({
'Date':['4/22/2020 14:32:10','4/21/2020 4:32:10','4/20/2020 1:32:10']
})
date ='04/22/2020'
datetime_object = datetime.strptime(date, '%m/%d/%Y')
df['Date'] = pd.to_datetime(df['Date'],format='%m/%d/%Y %H:%M:%S')
days_diff = (datetime_object - df['Date']).dt.days
print(days_diff)
0 -1
1 0
2 1
Why the result is not looking like the one below? Why the no of days starts with -1 and not with 0?
0 0
1 1
2 2

This is because it's flooring the answers
for the first case
'4/22/2020 14:32:10' the diff is = -14/ 24 = ~ -0.6 days
o/p:- -1
for the second case
'4/21/2020 4:32:10' the diff is = 20/24 = ~ 0.8 days
o/p:- 0
for the third case
'4/20/2020 1:32:10' the difff is = 47/24 = ~1.9 days
o/p:- 1
I hope it helps.
Solution would be convert all the datetimes to dates
as in following line i have done with 'Date' column
days_diff = (datetime_object.date() - df['Date'].dt.date ).dt.days
In [32]: days_diff
Out[32]:
0 0
1 1
2 2
Name: Date, dtype: int64

The issue is to do with the fact you are subtracting the higher date from the lower date which leaves you with a negative result. In the datetime module, subtracting one date object from another creates a time delta object like so
days1 = self.toordinal()
days2 = other.toordinal()
secs1 = self._second + self._minute * 60 + self._hour * 3600
secs2 = other._second + other._minute * 60 + other._hour * 3600
base = timedelta(days1 - days2,
secs1 - secs2,
self._microsecond - other._microsecond)
If we mimic that with your dates we see the following days and secs created for each date object
737537 0
737537 52330
subtracting day2 from days1 and secs2 form secs 1 means we pass the following to the timedelta object
0 -52330
So we are saying create a time delta object where the difference is 0 days and negative 52,330 seconds. Which is quite correct. However the timedelta object is a complex object and allows fractional values, and also many other types, like weeks or minutes etc. it also does not apply any limits to the values. so in the seconds part you can pass 10 seconds or 100,000 seconds. Now 100,000 seconds is actually more seconds than there are in a day. So the code takes this into account and will divmod the seconds to work out if there are any extra days in these seconds.
days, seconds = divmod(seconds, 24*3600)
d += days
s += int(seconds) # can't overflow
Now Here the issue lies in understanding what divmod does. div mod will do a floor division and remainder of the calculation. Now in a positive case thats fine.
print(divmod(52330, 24*3600))
print(divmod(-52330, 24*3600))
(0, 52330)
(-1, 34070)
Since the floor division will round down to 0 days and return you the remaining seconds. However in the negative case the floor division will round down to -1 since -52330 / 86400 is -0.6056.... So floor division rounds this down to -1 and the remainder is the difference between between 86400 and 52330 so leaves 34070 seconds.
So you wouldnt face this issue if you are always subtracting the oldest date from the newest date so you never end up with a negative difference. Infact it doesnt make sense to subtract a newer date from an older date.
for the other cases you listed the difference between 4/21/2020 4:32:10 and 4/22/2020 00:00:00 is indeed 0 days since the difference is actually only 20 hours, this behavior is correct the difference is not 1 days its 20 hours.

Related

how can I use the epoch

I need to print out “Your birthday is 31 March 2001 (a years, b days, c hours, d minutes and e seconds ago).”
I create input
birth_day = int(input("your birth day?"))
birth_month = int(input("your birth month?"))
birth_year = int(input("your birth year?"))
and I understand
print("your birthday is"+(birth_day)+(birth_month)+(birth_year)) to print out first sentence. but I faced problem with second one which is this part (a years, b days, c hours, d minutes and e seconds ago)
I guess I have to use “the epoch”
and use some of various just like below
year_sec=365*60*60*24
day_sec=60*60*24
hour_sec=60*60
min_sec=60
calculate how many seconds of the date since 1 January 1970 00:00:00 UTC:
import datetime, time
t = datetime.datetime(2001, 3, 31, 0, 0)
time.mktime(t.timetuple())
985960800.0
can anyone, could you solve my problem please?
Thank a lot
EDIT: See this answer in the thread kaya3 mentioned above for a more consistently reliable way of doing the same thing. I'm leaving my original answer below since it's useful to understand how to think about the problem, but just be aware that my answer below might mess up in tricky situations due to the quirks of the Gregorian calendar, in particular:
Every year that is exactly divisible by four is a leap year, except for years that are exactly divisible by 100, but these centurial years are leap years if they are exactly divisible by 400. For example, the years 1700, 1800, and 1900 are not leap years, but the years 1600 and 2000 are.
ORIGINAL ANSWER:
You can try using the time module:
import time
import datetime
def main(ask_for_hour_and_minute, convert_to_integers):
year, month, day, hour, minute = ask_for_birthday_info(ask_for_hour_and_minute)
calculate_time_since_birth(year, month, day, hour, minute, convert_to_integers)
def ask_for_birthday_info(ask_for_hour_and_minute):
birthday_year = int(input('What year were you born in?\n'))
birthday_month = int(input('What month were you born in?\n'))
birthday_day = int(input('What day were you born on?\n'))
if ask_for_hour_and_minute is True:
birthday_hour = int(input('What hour were you born?\n'))
birthday_minute = int(input('What minute were you born?\n'))
else:
birthday_hour = 0 # set to 0 as default
birthday_minute = 0 # set to 0 as default
return (birthday_year, birthday_month, birthday_day, birthday_hour, birthday_minute)
def calculate_time_since_birth(birthday_year, birthday_month, birthday_day, birthday_hour, birthday_minute, convert_to_integers):
year = 31557600 # seconds in a year
day = 86400 # seconds in a day
hour = 3600 # seconds in a hour
minute = 60 # seconds in a minute
# provide user info to datetime.datetime()
birthdate = datetime.datetime(birthday_year, birthday_month, birthday_day, birthday_hour, birthday_minute)
birthdate_tuple = time.mktime(birthdate.timetuple())
# figure out how many seconds ago birth was
seconds_since_birthday = time.time() - birthdate_tuple
# start calculations
years_ago = seconds_since_birthday // year
days_ago = seconds_since_birthday // day % 365
hours_ago = seconds_since_birthday // hour % 24
minutes_ago = seconds_since_birthday // minute % 60
seconds_ago = seconds_since_birthday % minute
# convert calculated values to integers if convert_to_integers is True
if convert_to_integers is True:
years_ago = int(years_ago)
days_ago = int(days_ago)
hours_ago = int(hours_ago)
minutes_ago = int(minutes_ago)
seconds_ago = int(seconds_ago)
# print calculations
print(f'Your birthday was {years_ago} years, {days_ago}, days, {hours_ago} hours, {minutes_ago} minutes, {seconds_ago} seconds ago.')
# to ask for just the year, month, and day
main(False, False)
# to ask for just the year, month, and day AND convert the answer to integer values
main(False, True)
# to ask for just the year, month, day, hour, and minute
main(True, False)
# to ask for just the year, month, day, hour, and minute AND convert the answer to integer values
main(True, True)
Tried to use descriptive variable names so the variables should make sense, but the operators might need some explaining:
10 // 3 # the // operator divides the numerator by the denominator and REMOVES the remainder, so answer is 3
10 % 3 # the % operator divides the numerator by the denominator and RETURNS the remainder, so the answer is 1
After understanding the operators, the rest of the code should make sense. For clarity, let's walk through it
Create birthdate by asking user for their information in the ask_for_birthday_info() function
Provide the information the user provided to the calculate_time_since_birth() function
Convert birthdate to a tuple and store it in birthdate_tuple
Figure out how many seconds have passed since the birthday and store it in seconds_since_birthday
Figure out how many years have passed since the birthday by dividing seconds_since_birthday by the number of seconds in a year
Figure out how many days have passed since the birthday by dividing seconds_since_birthday by the number of seconds in a day and keeping only the most recent 365 days (that's the % 365 in days_ago)
Figure out how many hours have passed since the birthday by dividing seconds_since_birthday by the number of seconds in a hour and keeping only the most recent 24 hours (that's the % 24 in hours_ago)
Figure out how many minutes have passed since the birthday by dividing seconds_since_birthday by the number of seconds in a minute and keeping only the most recent 60 minutes (that's the % 60 in minutes_ago)
Figure out how many seconds have passed since the birthday by dividing seconds_since_birthday and keeping only the most recent 60 seconds (that's the % 60 in seconds_ago)
Then, we just need to print the results:
print(f'Your birthday was {years_ago} years, {days_ago}, days, {hours_ago} hours, {minutes_ago} minutes, {seconds_ago} seconds ago.')
# if you're using a version of python before 3.6, use something like
print('Your birthday was ' + str(years_ago) + ' years, ' + str(days_ago) + ' days, ' + str(hours_ago) + ' hours, ' + str(minutes_ago) + ' minutes, ' + str(seconds_ago) + ' seconds ago.')
Finally, you can add some error checking to make sure that the user enters valid information, so that if they say they were born in month 15 or month -2, your program would tell the user they provided an invalid answer. For example, you could do something like this AFTER getting the birthday information from the user, but BEFORE calling the calculate_time_since_birth() function:
if not (1 <= month <= 12):
print('ERROR! You provided an invalid month!')
return
if not (1 <= day <= 31):
# note this isn't a robust check, if user provides February 30 or April 31, that should be an error - but this won't catch that
# you'll need to make it more robust to catch those errors
print('ERROR! You provided an invalid day!')
return
if not (0 <= hour <= 23):
print('ERROR! You provided an invalid hour!')
return
if not (0 <= minute <= 59):
print('ERROR! You provided an invalid minute!')
return
if not (0 <= second <= 59):
print('ERROR! You provided an invalid second!')
return

Non-standard Julian day time stamp

I have a timestamp in a non-standard format, its a concatenation of a number of elements. I'd like to convert at least the last part of the string into hours/minutes/seconds/decimal seconds so I can calculate the time gap between them (typically of the order of 2-5 seconds).
I have looked at this link but it assumes a 'proper' Julian time. How to convert Julian date to standard date?
My time stamp looks like this
1380643373
It is set up as ddd hh mm ss.s
This timestamp represent 138th day, 06:43:37.3
Is there a datetime method of working with this or do I need to strip out the various parts (hh,mm,ss.s) and concatenate them in some way? As I am only interested in the seconds, if I can just extract them I could deal with that by adding 60 if the second timestamp is smaller than the first - i.e event passes over the minute change boundary.
If you're only interested in seconds, you can do:
timestamp = 1380643373
seconds = (timestamp % 1000) / 10 # Gives 37.3
timestamp % 1000 gives you the last three digits of timestamp. Then you divide that by 10 to get seconds.
If it's a string, you can take the last three characters by slicing it.
timestamp = "1380643373"
seconds = int(timestamp[-3:]) / 10 # Gives 37.3
It's pretty easy to convert the timestamp to a datetime using the divmod() function repeatedly:
import datetime
base_date = datetime.datetime(2000, 1, 1, 0, 0, 0) # Midnight on Jan 1 2000
timestamp = 1380643373
timestamp, seconds = divmod(timestamp, 1000) # Gives 1380643, 373
seconds = seconds / 10 # Gives 37.3
timestamp, minutes = divmod(timestamp, 100) # Gives 13806, 43
days, hours = divmod(timestamp, 100) # Gives 138, 6
tdelta = datetime.timedelta(days=days, hours=hours, minutes=minutes, seconds=seconds) # Gives datetime.timedelta(days=138, seconds=24217, microseconds=300000)
new_date = base_date + tdelta

Calculating time to earn a specific amount as interest

I cannot figure out the approach to this as the principle amount shall change after every year(if calculated annually, which shall be the easiest). Eventual goal is to calculate exact number of years, months and days to earn say 150000 as interest on a deposit of 1000000 at an interest rate of say 6.5%. I have tried but cannot seem to figure out how to increment the year/month/day in the loop. I don't mind if this is down voted because I have not posted any code(Well, they are wrong). This is not as simple as it might seem to beginners here.
It is a pure maths question. Compound interest is calculated as follows:
Ptotal = Pinitial*(1+rate/100)time
where Ptotal is the new total. rate is usually given in percentages so divide by 100; time is in years. You are interested in the difference, though, so use
interest = Pinitial*(1+rate/100)time – Pinitial
instead, which is in Python:
def compound_interest(P,rate,time):
interest = P*(1+rate/100)**time - P
return interest
A basic inversion of this to yield time, given P, r, and target instead, is
time = log((target+Pinitial)/Pinitial)/log(1+rate/100)
and this will immediately return the number of years. Converting the fraction to days is simple – an average year has 365.25 days – but for months you'll have to approximate.
At the bottom, the result is fed back into the standard compound interest formula to show it indeed returns the expected yield.
import math
def reverse_compound_interest(P,rate,target):
time = math.log((target+P)/P)/math.log(1+rate/100)
return time
timespan = reverse_compound_interest(2500000, 6.5, 400000)
print ('time in years',timespan)
years = math.floor(timespan)
months = math.floor(12*(timespan - years))
days = math.floor(365.25*(timespan - years - months/12))
print (years,'y',months,'m',days,'d')
print (compound_interest(2500000, 6.5, timespan))
will output
time in years 2.356815854829652
2 y 4 m 8 d
400000.0
Can we do better? Yes. datetime allows arbitrary numbers added to the current date, so assuming you start earning today (now), you can immediately get your date of $$$:
from datetime import datetime,timedelta
# ... original script here ...
timespan *= 31556926 # the number of seconds in a year
print ('time in seconds',timespan)
print (datetime.now() + timedelta(seconds=timespan))
which shows for me (your target date will differ):
time in years 2.356815854829652
time in seconds 74373863.52648607
2022-08-08 17:02:54.819492
You could do something like
def how_long_till_i_am_rich(investment, profit_goal, interest_rate):
profit = 0
days = 0
daily_interest = interest_rate / 100 / 365
while profit < profit_goal:
days += 1
profit += (investment + profit) * daily_interest
years = days // 365
months = days % 365 // 30
days = days - (months * 30) - (years * 365)
return years, months, days
years, months, days = how_long_till_i_am_rich(2500000, 400000, 8)
print(f"It would take {years} years, {months} months, and {days} days")
OUTPUT
It would take 1 years, 10 months, and 13 days

Handle different time formats in a dataframe

I am working on a dataframe with a column regrouping different time format like
Time ID ...
0 1 hrs 1 min 1 sec 1
1 1 min 1 sec 2
2 1 sec 1
I would like to calculate the mean of the time column grouped by ids.
My problem is that the time format depends of the row.
I tried to use the mean() function on the Time column
df[["ID", "Time"]].groupby(["ID"]).agg(lambda x: x.mean())
but it does not work.
I tried to format to date to then calculate the mean, but the
format="%H hrs %M min %S sec" only apply to the first case and I get an Error:
ValueError: time data '1 min 1 sec' does not match format '%H hrs %M min %S sec' (search)
Convert Time to Timedelta and convert to seconds and call mean. Before doing it, you need replace hrs to hours.
s = pd.to_timedelta(df.Time.replace('hrs', 'hours', regex=True)).dt.total_seconds()
s.groupby(df.ID).mean()
Out[110]:
ID
1 1831.0
2 61.0
Name: Time, dtype: float64

Round Pandas date to nearest year/month

I am trying to round a pandas datetime column to its nearest year or month but I cannot figure out how to do it. For instance, this minimal example rounds to the closest hour:
pd.Timestamp.now().round('60min')
What I'd like is a way to replace the '60min' in order to round pd.Timestamp.now() to obtain either 2020-01-01 (for the year case) or 2019-08-01 (for the month case) (note that now() is exactly 2019-07-30 16:41:23.612004 at the time of asking!).
The pandas.Series.dt.round doc suggest a freq argument linking to this page, but trying the months/years options there return this error:
ValueError: is a non-fixed frequency
Any idea what I am missing?
If the column is really DateTime column (check with df.dtypes), you can get the year, month & day with the code below.
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
df['round_Year'] = df['Date']+ pd.offsets.YearBegin(-1)
rounds off to start of current year. Change -1 to 0 rounds off to start of next year.
df['round_Month'] = df['Date'] + pd.offsets.MonthBegin(-1)
rounds off to start of current Month. Change -1 to 0 rounds off to start of next Month
Example of rounding a Python Timestamp to the nearest half year:
from Pandas import Timestamp
def round_date_to_nearest_half_year(ts: Timestamp) -> Timestamp:
if 4 <= ts.month <=8:
return Timestamp(ts.year, 7, 1)
elif ts.month >=9:
return Timestamp(ts.year+1, 1, 1)
elif ts.month <= 3:
return Timestamp(ts.year, 1, 1)
else:
raise Exception("Logic error.")
Test:
print(round_date_to_nearest_half_year(Timestamp("2022-6-5")))
print(round_date_to_nearest_half_year(Timestamp("2022-7-3")))
print(round_date_to_nearest_half_year(Timestamp("2022-12-15")))
print(round_date_to_nearest_half_year(Timestamp("2023-1-5")))
Out:
2022-07-01 00:00:00
2022-07-01 00:00:00
2023-01-01 00:00:00
2023-01-01 00:00:00

Resources