Python Calculate New Date Based on Date Range - python-3.x

I have a Python Pandas DataFrame containing birth dates of hockey players that looks like this:
Player Birth Year Birth Date
Player A 1990 1990-05-12
Player B 1991 1991-10-30
Player C 1992 1992-09-10
Player D 1990 1990-11-15
I want to create a new column labeled 'Draft Year' that is calculated based on this rule:
If MM-DD is before 09-15, Draft Year = Birth Year + 18
Else if MM-DD is after 09-15 Draft Year = Birth Year + 19
This would make the output from the example:
Player Birth Year Birth Date Draft Year
Player A 1990 1990-05-12 2008
Player B 1991 1991-10-30 2010
Player C 1992 1992-09-10 2010
Player D 1990 1990-11-15 2009
I've tried separating the MM-DD from the date format by using
Data['Birth Date'] = Data['Birth Date'].str.split('-').str[1:]
But that returns me a list of [mm, dd] which is tricky to work with. Any suggestions on how to do this concisely would be greatly appreciated!

Use numpy.where:
data['Birth Date']=pd.to_datetime(data['Birth Date']) #to convert to datetime
cond=(df['Birth Date'].dt.month>=9)&(df['Birth Date'].dt.day>=15)
cond2=(df['Birth Date'].dt.month>=10)
data['Draft Year']=np.where(cond|cond2,data['Birth Year']+19,data['Birth Year']+18)
print(data)
Output
Player Birth Year Birth Date Draft Year
0 PlayerA 1990 1990-05-12 2008
1 PlayerB 1991 1991-10-30 2010
2 PlayerC 1992 1992-09-10 2010
3 PlayerD 1990 1990-11-15 2009

Datetime in the form yyyy-mm-dd are sortable as strings. This solution takes advantage of that fact:
df['Draft Year'] = df['Birth Year'] + np.where(df['Birth Date'].dt.strftime('%m-%d') < '09-15', 18, 19)

Quick and Dirty
Make a column that is 100 * the month and add it to the day
cutoff = df['Birth Date'].pipe(lambda d: d.dt.month * 100 + d.dt.day)
df['Draft Year'] = df['Birth Year'] + 18 + (cutoff > 915)
df
Player Birth Year Birth Date Draft Year
0 Player A 1990 1990-05-12 2008
1 Player B 1991 1991-10-30 2010
2 Player C 1992 1992-09-10 2010
3 Player D 1990 1990-11-15 2009

Related

Handle ValueError while creating date in pd

I'm reading a csv file with p, day, month, and put it in a df. The goal is to create a date from day, month, current year, and I run into this error for 29th of Feb:
ValueError: cannot assemble the datetimes: day is out of range for month
I would like when this error occurs, to replace the day by the day before. How can we do that? Below are few lines of my pd and datex at the end is what I would like to get
p day month year datex
0 p1 29 02 2021 28Feb-2021
1 p2 18 07 2021 18Jul-2021
2 p3 12 09 2021 12Sep-2021
Right now, my code for the date is only the below, so I have nan where the date doesn't exist.
df['datex'] = pd.to_datetime(df[['year', 'month', 'day']], errors='coerce')
You could try something like this :
df['datex'] = pd.to_datetime(df[['year', 'month', 'day']], errors='coerce')
Indeed, you get NA :
p day year month datex
0 p1 29 2021 2 NaT
1 p2 18 2021 7 2021-07-18
2 p3 12 2021 9 2021-09-12
You could then make a particular case for these NA :
df.loc[df.datex.isnull(), 'previous_day'] = df.day -1
p day year month datex previous_day
0 p1 29 2021 2 NaT 28.0
1 p2 18 2021 7 2021-07-18 NaN
2 p3 12 2021 9 2021-09-12 NaN
df.loc[df.datex.isnull(), 'datex'] = pd.to_datetime(df[['previous_day', 'year', 'month']].rename(columns={'previous_day': 'day'}))
p day year month datex previous_day
0 p1 29 2021 2 2021-02-28 28.0
1 p2 18 2021 7 2021-07-18 NaN
2 p3 12 2021 9 2021-09-12 NaN
You have to create a new day column if you want to keep day = 29 in the day column.

Filter and display all duplicated rows based on multiple columns in Pandas [duplicate]

This question already has answers here:
How do I get a list of all the duplicate items using pandas in python?
(13 answers)
Closed 2 years ago.
Given a dataset as follows:
name month year
0 Joe December 2017
1 James January 2018
2 Bob April 2018
3 Joe December 2017
4 Jack February 2018
5 Jack April 2018
I need to filter and display all duplicated rows based on columns month and year in Pandas.
With code below, I get:
df = df[df.duplicated(subset = ['month', 'year'])]
df = df.sort_values(by=['name', 'month', 'year'], ascending = False)
Out:
name month year
3 Joe December 2017
5 Jack April 2018
But I want the result as follows:
name month year
0 Joe December 2017
1 Joe December 2017
2 Bob April 2018
3 Jack April 2018
How could I do that in Pandas?
The following code works, by adding keep = False:
df = df[df.duplicated(subset = ['month', 'year'], keep = False)]
df = df.sort_values(by=['name', 'month', 'year'], ascending = False)

how to Get week number from specified year date in python?

I have a time-series data and i want to get the week number from the initial date
date
20180401
20180402
20180902
20190130
20190401
Things Tried
Code
df["date"]= pd.to_datetime(df.date,format='%Y%m%d')
df["week_no"]= df.date.dt.week
But the week getting reset in 2019 results in getting a common week number of 2018.
is there anything we can do in it ??
You can use this function that will calculate the difference between two days in weeks:
def Wdiff(fromdate, todate):
d = pd.to_datetime(todate) - pd.to_datetime(fromdate)
return int(d / np.timedelta64(1, 'W'))
You can create a datetime object with the specified date, then retrieve the week number using the isocalendar method:
import datetime
myDate = datetime.date(2018, 4, 1)
week = myDate.isocalendar()[1]
print(week)
You could then calculate the total number of remaining weeks in 2018, then add the total number of weeks in each year in between, and finally add the week number of the current date.
For example, this code would print the number of weeks from the 1st of April 2018 to the 6th May 2020:
import datetime
myDate = datetime.date(2018, 4, 1)
currentDate = datetime.date(2020, 5, 6)
weeks = datetime.date(myDate.year, 12, 28).isocalendar()[1] -
myDate.isocalendar()[1]
for i in range(myDate.year, currentDate.year):
weeks += datetime.date(i, 12, 28).isocalendar()[1]
weeks += currentDate.isocalendar()[1]
print(weeks)
Note that because of the way isocalendar works, the 28th of December will always be in the last week of the given year.
The ISO year consists of 52 or 53 full weeks, and where a week starts on a Monday and ends on a Sunday. The first week of an ISO year is the first (Gregorian) calendar week of a year containing a Thursday. This is called week number 1, and the ISO year of that Thursday is the same as its Gregorian year.
You can get more information about isocalendar here: https://docs.python.org/3/library/datetime.html
To get the week number, but as a 2-digit string (with leading zero),
you can run:
df['week_no'] = df.date.dt.strftime('%W')
The result, for slightly extended source data is:
date week_no
0 2018-04-01 13
1 2018-04-02 14
2 2018-09-02 35
3 2018-12-30 52
4 2018-12-31 53
5 2019-01-01 00
6 2019-01-02 00
7 2019-01-03 00
8 2019-01-04 00
9 2019-01-05 00
10 2019-01-06 00
11 2019-01-07 01
12 2019-01-30 04
13 2019-04-01 13
Note that the last day of 2018 (monday) has week No == 53 and "initial" days
in 2019 (up to 2019-01-06 - Sunday) have week No == 00.
If you want this column as int, append .astype(int) to the above code.

day of Year values starting from a particular date

I have a dataframe with a date column. The duration is 365 days starting from 02/11/2017 and ending at 01/11/2018.
Date
02/11/2017
03/11/2017
05/11/2017
.
.
01/11/2018
I want to add an adjacent column called Day_Of_Year as follows:
Date Day_Of_Year
02/11/2017 1
03/11/2017 2
05/11/2017 4
.
.
01/11/2018 365
I apologize if it's a very basic question, but unfortunately I haven't been able to start with this.
I could use datetime(), but that would return values such as 1 for 1st january, 2 for 2nd january and so on.. irrespective of the year. So, that wouldn't work for me.
First convert column to_datetime and then subtract datetime, convert to days and add 1:
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['Day_Of_Year'] = df['Date'].sub(pd.Timestamp('2017-11-02')).dt.days + 1
print (df)
Date Day_Of_Year
0 02/11/2017 1
1 03/11/2017 2
2 05/11/2017 4
3 01/11/2018 365
Or subtract by first value of column:
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['Day_Of_Year'] = df['Date'].sub(df['Date'].iat[0]).dt.days + 1
print (df)
Date Day_Of_Year
0 2017-11-02 1
1 2017-11-03 2
2 2017-11-05 4
3 2018-11-01 365
Using strftime with '%j'
s=pd.to_datetime(df.Date,dayfirst=True).dt.strftime('%j').astype(int)
s-s.iloc[0]
Out[750]:
0 0
1 1
2 3
Name: Date, dtype: int32
#df['new']=s-s.iloc[0]
Python has dayofyear. So put your column in the right format with pd.to_datetime and then apply Series.dt.dayofyear. Lastly, use some modulo arithmetic to find everything in terms of your original date
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['day of year'] = df['Date'].dt.dayofyear - df['Date'].dt.dayofyear[0] + 1
df['day of year'] = df['day of year'] + 365*((365 - df['day of year']) // 365)
Output
Date day of year
0 2017-11-02 1
1 2017-11-03 2
2 2017-11-05 4
3 2018-11-01 365
But I'm doing essentially the same as Jezrael in more lines of code, so my vote goes to her/him

Dynamically Lookup Value with Between - Excel

I have a chronological list of Product, Year, Month, Profit (like below).
Summary Table
Product Year Month Profit
TV 2018 1 10
TV 2018 2 20
TV 2018 3 30
TV 2018 4 50
TV 2018 5 35
TV 2018 6 60
TV 2018 7 90
Heater 2018 1 20
Heater 2018 2 3
Heater 2018 3 8
Heater 2018 4 4
Heater 2018 5 6
Heater 2018 6 11
Heater 2018 7 1
What I wanted to do is lookup another sheet that has all of the price changes within by month and year as well as the table below shows.
Sale Price
Product Year Month Price
TV 2018 1 $1,000.00
TV 2018 4 $800.00
TV 2018 7 $950.00
Heater 2018 1 $20.00
Heater 2018 2 $60.00
Heater 2018 5 $45.00
So the end result for example, TV Month = 2 and Year = 2018, I want it to pull in $1,000 to be part of my profit calculation.
to get the correct Price, use:
=INDEX(J:J,AGGREGATE(14,6,ROW($I$2:$I$7)/(($G$2:$G$7=A2)*($H$2:$H$7=B2)*($I$2:$I$7<=C2)),1))

Resources