Extract Day of Week More Pythonically - python-3.x

I have a df with fields year, month, day, formatted as integers. I have used the following to extract the day of the week.
How can I do this more pythonically?
### First Attempt - Succeeds
lst = []
for i in zip(df['day'], df['month'], df['year']):
lst.append(calendar.weekday(i[2], i[1], i[0]))
df['weekday'] = lst
### Second Attempt -- Fails
df['weekday'] = df.apply(lambda x: calendar.weekday(x.year, x.month, x.day))
AttributeError: ("'Series' object has no attribute 'year'", 'occurred at index cons_conf')

Try .to_datetime and the dt accessor:
import pandas as pd
data = pd.DataFrame({'year': [2018, 2018, 2018], 'month': [12, 12, 12], 'day': [1, 2, 3]})
data['weekday'] = pd.to_datetime(data[['year', 'month', 'day']]).dt.weekday
print(data)
Giving:
year month day weekday
0 2018 12 1 5
1 2018 12 2 6
2 2018 12 3 0
Note that weekday is zero-indexed.

Related

Get sum of group subset using pandas groupby

I have a dataframe as shown. Using python, I want to get the sum of 'Value' for each 'Id' group upto the first occurrence of 'Stage' 12.
df = pd.DataFrame({'Id':[1,1,1,2,2,2,2],
'Date': ['2020-04-23', '2020-04-25', '2020-04-28', '2020-04-20', '2020-05-01', '2020-05-05', '2020-05-12'],
'Stage': [11, 12, 15, 11, 14, 12, 12],
'Value': [5, 4, 6, 12, 2, 8, 3]})
Id Date Stage Value
1 2020-04-23 11 5
1 2020-04-25 12 4
1 2020-04-28 15 6
2 2020-04-20 11 12
2 2020-05-01 14 2
2 2020-08-05 12 8
2 2020-05-12 12 3
My desired output:
Id Value
1 9
2 22
Would be very thankful if someone could help.
Let us try use the groupby transform idxmax filter the dataframe , then do another round of groupby
idx = df['Stage'].eq(12).groupby(df['id']).transform('idxmax')
output = df[df.index <= idx].groupby('id')['Value'].sum().reset_index()
Detail
the transform with idxmax will return the first index match with 12 for all the groupby row, then we need to filter the df with index less than that to get the data until the first 12 show up.

Pandas Dataframe: Reduce the value of a 'Days' by 1 if the corresponding 'Year' is a leap year

If 'Days' is greater than e.g 10 and corresponding 'Year' is a leap year, then reduce 'Days' by 1 only in that particular row. I tried some operations but couldn't do it. I am new in pandas. Appreciate any help.
sample data:
data = [['1', '2005'], ['2', '2006'], ['3', '2008'],['50','2009'],['69','2008']]
df=pd.DataFrame(data,columns=['Days','Year'])
I want 'Days' of row 5 to become 69 and everything else remains the same.
In [98]: import calendar
In [99]: data = [['1', '2005'], ['2', '2006'], ['3', '2008'],['50','2009'],['70','2008']] ;df=pd.DataFrame(data,column
...: s=['Days','Year'])
In [100]: df = df.astype(int)
In [102]: df["New_Days"] = df.apply(lambda x: x["Days"]-1 if (x["Days"] > 10 and calendar.isleap(x["Year"])) else x["D
...: ays"], axis=1)
In [103]: df
Out[103]:
Days Year New_Days
0 1 2005 1
1 2 2006 2
2 3 2008 3
3 50 2009 50
4 70 2008 69

how to merge month day year columns in date column?

The date is in separate columns
Month Day Year
8 12 1993
8 12 1993
8 12 1993
I want to merge it in one column
Date
8/12/1993
8/12/1993
8/12/1993
I tried
df_date = df.Timestamp((df_filtered.Year*10000+df_filtered.Month*100+df_filtered.Day).apply(str),format='%Y%m%d')
I get this error
AttributeError: 'DataFrame' object has no attribute 'Timestamp'
Using pd.to_datetime with astype(str)
1. as string type:
df['Date'] = pd.to_datetime(df['Month'].astype(str) + df['Day'].astype(str) + df['Year'].astype(str), format='%d%m%Y').dt.strftime('%d/%m/%Y')
Month Day Year Date
0 8 12 1993 08/12/1993
1 8 12 1993 08/12/1993
2 8 12 1993 08/12/1993
2. as datetime type:
df['Date'] = pd.to_datetime(df['Month'].astype(str) + df['Day'].astype(str) + df['Year'].astype(str), format='%d%m%Y')
Month Day Year Date
0 8 12 1993 1993-12-08
1 8 12 1993 1993-12-08
2 8 12 1993 1993-12-08
Here is the solution:
df = pd.DataFrame({'Month': [8, 8, 8], 'Day': [12, 12, 12], 'Year': [1993, 1993, 1993]})
# This way dates will be a DataFrame
dates = df.apply(lambda row:
pd.Series(pd.Timestamp(row[2], row[0], row[1]), index=['Date']),
axis=1)
# And this way dates will be a Series:
# dates = df.apply(lambda row:
# pd.Timestamp(row[2], row[0], row[1]),
# axis=1)
apply method generates a new Series or DataFrame iteratively applying provided function (lambda in this case) and joining the results.
You can read about apply method in official documentation.
And here is the explanation of lambda expressions.
EDIT:
#JohnClements suggested a better solution, using pd.to_datetime method:
dates = pd.to_datetime(df).to_frame('Date')
Also, if you want your output to be a string, you can use
dates = df.apply(lambda row: f"{row[2]}/{row[0]}/{row[1]}",
axis=1)
You can try:
df = pd.DataFrame({'Month': [8,8,8], 'Day': [12,12,12], 'Year': [1993, 1993, 1993]})
df['date'] = pd.to_datetime(df)
Result:
Month Day Year date
0 8 12 1993 1993-08-12
1 8 12 1993 1993-08-12
2 8 12 1993 1993-08-12
Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
Month 3 non-null int64
Day 3 non-null int64
Year 3 non-null int64
date 3 non-null datetime64[ns]
dtypes: datetime64[ns](1), int64(3)
memory usage: 176.0 bytes

Python3 How to convert date into monthly periods where the first period is September

Working with a group that has a Fiscal Year that starts in September. I have a dataframe with a bunch of dates that I want to calculate a monthly period that = 1 in September.
What works:
# Convert date column to datetime format
df['Hours_Date'] = pd.to_datetime(df['Hours_Date'])
# First quarter starts in September - Yes!
df['Quarter'] = pd.PeriodIndex(df['Hours_Date'], freq='Q-Aug').strftime('Q%q')
What doesn't work:
# Gives me monthly periods starting in January. Don't want.
df['Period'] = pd.PeriodIndex(df['Hours_Date'], freq='M').strftime('%m')
# Gives me an error
df['Period'] = pd.PeriodIndex(df['Hours_Date'], freq='M-Aug').strftime('%m')
Is there a way to adjust the monthly frequency?
I think it is not implemented, check anchored offsets.
Possible solution is subtract or Index.shift 8 for shift by 8 months:
rng = pd.date_range('2017-04-03', periods=10, freq='m')
df = pd.DataFrame({'Hours_Date': rng})
df['Period'] = (pd.PeriodIndex(df['Hours_Date'], freq='M') - 8).strftime('%m')
Or:
df['Period'] = pd.PeriodIndex(df['Hours_Date'], freq='M').shift(-8).strftime('%m')
print (df)
Hours_Date Period
0 2017-04-30 08
1 2017-05-31 09
2 2017-06-30 10
3 2017-07-31 11
4 2017-08-31 12
5 2017-09-30 01
6 2017-10-31 02
7 2017-11-30 03
8 2017-12-31 04
9 2018-01-31 05
I think 'M-Aug' is not applicable for month , so you can do little bit adjust by using np.where, Data From Jez
np.where(df['Hours_Date'].dt.month-8<=0,df['Hours_Date'].dt.month+4,df['Hours_Date'].dt.month-8)
Out[271]: array([ 8, 9, 10, 11, 12, 1, 2, 3, 4, 5], dtype=int64)

Split out if > value, divide, add value to column - Python/Pandas

import pandas as pd
df = pd.DataFrame([['Dog', 10, 6], ['Cat', 7 ,5]], columns=('Name','Amount','Day'))
Name Amount Day
Dog 10 6
Cat 7 5
I would like to make the DataFrame look like the following:
Name Amount Day
Dog1 6 6
Dog2 2.5 7
Dog3 1.5 8
Cat 7 5
First step: For any Amount > 8, split into 3 different rows, with new name of 'Name1', 'Name2','Name3'
Second step:
For Dog1, 60% of Amount, Day = Day.
For Dog2, 25% of Amount, Day = Day + 1.
For Dog3, 15% of Amount, Day = Day + 2.
Keep Cat the same because Cat Amount < 8
Any ideas? Any help would be appreciated.
df = pd.DataFrame([['Dog', 10, 6], ['Cat', 7 ,5]], columns=('Name','Amount','Day'))
template = pd.DataFrame([
['1', .6, 0],
['2', .25, 1],
['3', .15, 2]
], columns=df.columns)
def apply_template(r, t):
t = t.copy()
t['Name'] = t['Name'].radd(r['Name'])
t['Amount'] *= r['Amount']
t['Day'] += r['Day']
return t
pd.concat([apply_template(r, template) for _, r in df.query('Amount > 8').iterrows()],
ignore_index=True).append(df.query('Amount <= 8'), ignore_index=True)

Resources