insert column in multiheader dataframe using loc - python-3.x

I am trying to insert column in a multi header data frame which is the output of pandas pivot. I have used pandas. loc option for this, but I am not able to insert a column at a specific location.
Here is my code:
data = {'Commander': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'Date': ['2012, 02, 08', '2012, 02, 08', '2012, 02, 08',
'2012, 02, 08', '2012, 02, 08'],
'Hour':['00','01','02','03','04'],
'Subject': ['Maths','Science','Biology','Chemistry','Physics'],
'Score': [4, 24, 31, 3, 1],
'Grade':[1,2,1,4,5],
'credit':[20,50,40,20,10]}
df = pd.DataFrame(data)
df1=pd.pivot_table(df,index=['Commander','Hour'],columns=['Date'],values=['Score','Grade','credit'],aggfunc=np.max)
I am trying to insert another subcolumn for grade.for which I tried below code, which allows me to insert column but it inserted column in the last, not under grade. Can anyone please guide how to achieve this.
df1.loc[:,('Grade','subcredit')]=df1.loc[:,('Grade','2012, 02, 08')]*5

You just need to add one more line of code, to sort the index (the parameter axis=1 sorts the column index):
df1.sort_index(axis=1)
Grade Score credit
Date 2012, 02, 08 subcredit 2012, 02, 08 2012, 02, 08
Commander Hour
Amy 04 5 25 1 10
Jake 03 4 20 3 20
Jason 00 1 5 4 20
Molly 01 2 10 24 50
Tina 02 1 5 31 40

insert the column exactly where you want it. This is useful if sort has the possibility of re-ordering other columns you don't want to change.
Find the last column that belongs to 'Grade' checking argmax on the reversed array. We'll insert right after this.
i = int(len(df1.columns) - (df1.columns.get_level_values(0) == 'Grade')[::-1].argmax())
df1.insert(i, ('Grade', 'subcredit'), df1.loc[:,('Grade','2012, 02, 08')]*5)
Grade Score credit
Date 2012, 02, 08 subcredit 2012, 02, 08 2012, 02, 08
Commander Hour
Amy 04 5 25 1 10
Jake 03 4 20 3 20
Jason 00 1 5 4 20
Molly 01 2 10 24 50
Tina 02 1 5 31 40

Related

Summing an array of values based on multiple criteria and look up table

I am given the following sales table which provide the sales that each employee made, but instead of their name I have their ID and each ID may have more than 1 row.
To map the ID back to the name, I have a look up table with each employee's name and ID. One thing to keep in mind is that any given name could potentially have more than one ID assigned to it, as described in the example below:
Sales Table:
Year
ID
North
South
West
East
2020
A
58
30
74
72
2020
A
85
40
90
79
2020
B
9
82
20
5
2020
B
77
13
49
21
2020
C
85
55
37
11
2020
C
29
70
21
22
2021
A
61
37
21
42
2021
A
22
39
2
34
2021
B
62
55
9
72
2021
B
59
11
2
37
2021
C
41
22
64
47
2021
C
83
18
56
83
ID table:
ID
Name
A
Allison
B
Brandon
C
Brandon
I am trying to sum up each employee's sales by a given year, and aggregate all their transactions by their name (rather than ID), so that my result looks like the following:
Result:
Report
2021
Allison
258
Brandon
721
I want the user to be able to select the year, and the report would automatically sum up each person's sales by the year and their name. Again, Brandon was assigned ID B and C, so the report should be able to obtain all 2021 sales under B and C.
I posted a similar question which did not include the added complexity of having a name tied to more than one ID. In that thread, I was provided a solution with the following formula:
=SUMPRODUCT($C$2:$F$13*($B$2:$B$13=INDEX($I$2:$I$4,MATCH(N3,$J$2:$J$4,0)))*($A$2:$A$13=$N$2))
While this formula works on names that only have one ID tied to it, I believe the INDEX and MATCH component falls through once it encounters a duplicate name on the ID table.
I am currently using Excel 2016, so any solution would need to be compatible with that version at least. Thanks in advance for any guidance on this.
Try this formula solution can work in your Excel 2016
In L4, formula copied down :
=SUMPRODUCT(($A$3:$A$14=K$3)*(VLOOKUP(T(IF({1},$B$3:$B$14)),$H$3:$I$5,2,0)=K4)*$C$3:$F$14)

Modelling a moving window with a shift( ) function in python problem

Problem: Lets suppose that we supply robots to a factory. Each of these robots is programmed to switch into the work mode after 3 days (e.g. if it arrives on day 1, it starts working on day 3), and then they work for 5 days. after that, the battery runs out and they stop working. The number of robots supplied each day varies.
The following code is the supplies for the first 15 days like so:
import pandas as pd
df = pd.DataFrame({
'date': ['01','02', '03', '04', '05','06', \
'07','08','09','10', '11', '12', '13', '14', '15'],
'value': [10,20,20,30,20,10,30,20,10,20,30,40,20,20,20]
})
df.set_index('date',inplace=True)
df
Let's now estimate the number of working robots on each of these days like so ( we move two days back and sum up only the numbers within the past 5 days):
04 10
05 20+10 = 30
06 20+20 = 40
07 30+20 = 50
08 20+30 = 50
09 10+20 = 30
10 30+10 = 40
11 20+30 = 50
12 10+20 = 30
13 20+10 = 30
14 30+20 = 50
15 40+30 = 70
Is it possible to model this in python? I have tried this - not quite but close.
df_p = (((df.rolling(2)).sum())).shift(5).rolling(1).mean().shift(-3)
p.s. if you dont think its complicated enough then I also need to include the last 7-day average for each of these numbers for my real problem.
Let's try shift forward first the window (5) less the rolling window length (2) and taking rolling sum with min periods set to 1:
shift_window = 5
rolling_window = 2
df['new_col'] = (
df['value'].shift(shift_window - rolling_window)
.rolling(rolling_window, min_periods=1).sum()
)
Or with hard coded values:
df['new_col'] = df['value'].shift(3).rolling(2, min_periods=1).sum()
df:
value new_col
date
01 10 NaN
02 20 NaN
03 20 NaN
04 30 10.0
05 20 30.0
06 10 40.0
07 30 50.0
08 20 50.0
09 10 30.0
10 20 40.0
11 30 50.0
12 40 30.0
13 20 30.0
14 20 50.0
15 20 70.0

How to join a series into a dataframe

So I counted the frequency of a column 'address' from the dataframe 'df_two' and saved the data as dict. used that dict to create a series 'new_series'. so now I want to join this series into a dataframe making 'df_three' so that I can do some maths with the column 'new_count' and the column 'number' from 'new_series' and 'df_two' respectively.
I have tried to use merge / concat the items of 'new_count' were changed to NaN
Image for what i got(NaN)
df_three
number address name new_Count
14 12 ab pra NaN
49 03 cd ken NaN
97 07 ef dhi NaN
91 10 fg rav NaN
Image for input
Input
new_series
new_count
12 ab 8778
03 cd 6499
07 ef 5923
10 fg 5631
df_two
number address name
14 12 ab pra
49 03 cd ken
97 07 ef dhi
91 10 fg rav
output
df_three
number address name new_Count
14 12 ab pra 8778
49 03 cd ken 6499
97 07 ef dhi 5923
91 10 fg rav 5631
It seems you forget parameter on:
df = df_two.join(new_series, on='address')
print (df)
number address name new_count
0 14 12 ab pra 8778
1 49 03 cd ken 6499
2 97 07 ef dhi 5923
3 91 10 fg rav 5631

Python3 How to convert date into monthly periods where the first period is September

Working with a group that has a Fiscal Year that starts in September. I have a dataframe with a bunch of dates that I want to calculate a monthly period that = 1 in September.
What works:
# Convert date column to datetime format
df['Hours_Date'] = pd.to_datetime(df['Hours_Date'])
# First quarter starts in September - Yes!
df['Quarter'] = pd.PeriodIndex(df['Hours_Date'], freq='Q-Aug').strftime('Q%q')
What doesn't work:
# Gives me monthly periods starting in January. Don't want.
df['Period'] = pd.PeriodIndex(df['Hours_Date'], freq='M').strftime('%m')
# Gives me an error
df['Period'] = pd.PeriodIndex(df['Hours_Date'], freq='M-Aug').strftime('%m')
Is there a way to adjust the monthly frequency?
I think it is not implemented, check anchored offsets.
Possible solution is subtract or Index.shift 8 for shift by 8 months:
rng = pd.date_range('2017-04-03', periods=10, freq='m')
df = pd.DataFrame({'Hours_Date': rng})
df['Period'] = (pd.PeriodIndex(df['Hours_Date'], freq='M') - 8).strftime('%m')
Or:
df['Period'] = pd.PeriodIndex(df['Hours_Date'], freq='M').shift(-8).strftime('%m')
print (df)
Hours_Date Period
0 2017-04-30 08
1 2017-05-31 09
2 2017-06-30 10
3 2017-07-31 11
4 2017-08-31 12
5 2017-09-30 01
6 2017-10-31 02
7 2017-11-30 03
8 2017-12-31 04
9 2018-01-31 05
I think 'M-Aug' is not applicable for month , so you can do little bit adjust by using np.where, Data From Jez
np.where(df['Hours_Date'].dt.month-8<=0,df['Hours_Date'].dt.month+4,df['Hours_Date'].dt.month-8)
Out[271]: array([ 8, 9, 10, 11, 12, 1, 2, 3, 4, 5], dtype=int64)

Excel indexmatch, vlookup

I have a holiday calendar for several years in one table. Can anyone help – How to arrange this data by week and show holiday against week? I want to reference this data in other worksheets and hence arranging this way will help me to use formulae on other sheets. I want the data to be: col A having week numbers and column B showing holiday for year 1, col. C showing holiday for year 2, etc.
Fiscal Week
2015 2014 2013 2012
Valentine's Day 2 2 2 3
President's Day 3 3 3 4
St. Patrick's Day 7 7 7 7
Easter 10 12 9 11
Mother's Day 15 15 15 16
Memorial Day 17 17 17 18
Flag Day 20 19 19 20
Father's Day 21 20 20 21
Independence Day 22 22 22 23
Labor Day 32 31 31 32
Columbus Day 37 37 37 37
Thanksgiving 43 43 43 43
Christmas 47 47 47 48
New Year's Day 48 48 48 49
ML King Day 51 51 51 52
It's not too clear what year 1 is, so I'm going to assume that's 2015, and year 2 is 2014, etc.
Here's how you could set it up, if I understand correctly. Use this index/match formula (psuedo-formula):
=Iferror(Index([holiday names range],match([week number],[2015's week numbers in your table],0)),"")
It looks like this:
(=IFERROR(INDEX($A$3:$A$17,MATCH($H3,B$3:B$17,0)),""), in the cell next to the week numbers)
You can then drag the formula over, and the matching group (in above picture, B3:B17) will "slide over" as you drag the formula over.

Resources