Reshaping Multi Indexed DF [duplicate] - python-3.x

This question already has answers here:
Convert columns into rows with Pandas
(6 answers)
Closed last month.
I have a dataframe that is structured like so (similar to a pivot table):
A
B
December 2022
January 2023
A1
B1
100
200
A1
B2
101
201
I'd like to and transpose my dataframe in a way so it reads:
Month
A
B
Value
December 2022
A1
B1
100
December 2022
A1
B2
101
January 2023
A1
B1
200
January 2023
A1
B2
201
etc. I've attempted
df.T
But it gives me:
A
A1
A1
B
B1
B2
December 2022
100
101
January 2023
200
201

You should use pd.melt:
>>> df.melt(id_vars=['A', 'B'], var_name='Month', value_name='Value')
A B Month Value
0 A1 B1 December 2022 100
1 A1 B2 December 2022 101
2 A1 B1 January 2023 200
3 A1 B2 January 2023 201
then to reorder columns, you can use this hack:
>>> df.melt(id_vars=['A', 'B'], var_name='Month', value_name='Value') \
.set_index('Month').reset_index()
Month A B Value
0 December 2022 A1 B1 100
1 December 2022 A1 B2 101
2 January 2023 A1 B1 200
3 January 2023 A1 B2 201
Update: according to #sammywemmy's comment:
var_cols = ['A', 'B']
out = df.melt(id_vars=var_cols, var_name='Month', value_name='Value') \
[['Month'] + var_cols + ['Value']]
print(out)
# Output
Month A B Value
0 December 2022 A1 B1 100
1 December 2022 A1 B2 101
2 January 2023 A1 B1 200
3 January 2023 A1 B2 201

Related

Excel function to dynamically SUM UP data based on matching rows and columns

I have a table with metrics shown as rows and month shown as columns.
Example is below:
Quarter
2022-01-01
2022-01-01
2022-01-01
2022-04-01
2022-04-01
2022-04-01
2022-07-01
2022-07-01
2022-07-01
2022-10-01
2022-10-01
2022-10-01
Month
2022-01-01
2022-02-01
2022-03-01
2022-04-01
2022-05-01
2022-06-01
2022-07-01
2022-08-01
2022-09-01
2022-10-01
2022-11-01
2022-12-01
Metrics
Jan 2022
Feb 2022
Mar 2022
Apr 2022
May 2022
Jun 2022
Jul 2022
Aug 2022
Sep 2022
Oct 2022
Nov 2022
Dec 2022
Revenue
1000
1000
1000
500
500
500
100
100
100
0
0
0
Cost
10
10
10
10
10
10
20
20
20
0
5
10
I want to have a dynamic summary table of quarterly data. I can use sumifs and look up the quarter month using this function:
SUMIFS([Value row range],[Quarter range],[Quarter wanted])
However, i still have to manually select the correct value row range to sum. Is it possible to select the entire table and then match the correct row based on matching labels (metric in this case)?
Insert Report Month
Dec-22
Last 3 quarter report
Metrics
Q2 2022
Q3 2022
Q4 2022
Revenue
1500
300
0
Cost
30
60
15
I'm aware of the index & match function, but it only looks for the first match and does not sum up all months in the same quarter.
Thanks for helping!
Excel 365 for MAC should have the BYCOL function,
Given:
Your data table is a Table named Metrics
Report_Month is a Named Range containing a "real date" in the month of the final month of the desired quarter.
The following formula will return your output and will adjust as you add columns to the data table.
A11: =Metrics[[#All],[Metrics]]
B11: =LET(x,EDATE(Report_Month,SEQUENCE(,3,-6,3)),TEXT(MONTH(x)/3,"\Q0 ") & YEAR(x))
B12: =BYCOL(XLOOKUP(TEXT(DATE(YEAR(Report_Month),MONTH(Report_Month)-9+SEQUENCE(3,,1,1)+SEQUENCE(,3,0,3),1),"mmm-yy"),Metrics[#Headers],INDEX(Metrics,XMATCH(A12,Metrics[Metrics]),0)),LAMBDA(arr,SUM(arr)))
Select B12 and fill down as far as needed.
Notes
DATE(YEAR(Report_Month),MONTH(Report_Month)-9+SEQUENCE(3,,1,1)+SEQUENCE(,3,0,3),1)
creates a matrix of the previous nine month starting dates with each column consisting of a given quarter:
So for 12/1/2022 =>
The TEXT function then formats the same as the column headers in the Metrics table.
XLOOKUP will then return the appropriate columns from the table into that matrix, and using the BYCOL allows us to SUM by column which is the relevant quarter.

Handle ValueError while creating date in pd

I'm reading a csv file with p, day, month, and put it in a df. The goal is to create a date from day, month, current year, and I run into this error for 29th of Feb:
ValueError: cannot assemble the datetimes: day is out of range for month
I would like when this error occurs, to replace the day by the day before. How can we do that? Below are few lines of my pd and datex at the end is what I would like to get
p day month year datex
0 p1 29 02 2021 28Feb-2021
1 p2 18 07 2021 18Jul-2021
2 p3 12 09 2021 12Sep-2021
Right now, my code for the date is only the below, so I have nan where the date doesn't exist.
df['datex'] = pd.to_datetime(df[['year', 'month', 'day']], errors='coerce')
You could try something like this :
df['datex'] = pd.to_datetime(df[['year', 'month', 'day']], errors='coerce')
Indeed, you get NA :
p day year month datex
0 p1 29 2021 2 NaT
1 p2 18 2021 7 2021-07-18
2 p3 12 2021 9 2021-09-12
You could then make a particular case for these NA :
df.loc[df.datex.isnull(), 'previous_day'] = df.day -1
p day year month datex previous_day
0 p1 29 2021 2 NaT 28.0
1 p2 18 2021 7 2021-07-18 NaN
2 p3 12 2021 9 2021-09-12 NaN
df.loc[df.datex.isnull(), 'datex'] = pd.to_datetime(df[['previous_day', 'year', 'month']].rename(columns={'previous_day': 'day'}))
p day year month datex previous_day
0 p1 29 2021 2 2021-02-28 28.0
1 p2 18 2021 7 2021-07-18 NaN
2 p3 12 2021 9 2021-09-12 NaN
You have to create a new day column if you want to keep day = 29 in the day column.

Look up a date value from each cell in a column and return a year date dependent upon where date falls between two dates

I'm wanting to add formula to locate the Policy Year in each cell in column B (starting in B2) which is determined from interrogating the date shown in the corresponding cell in Column A and then checking whether it sits in a range (inception date and expiry date) D2:E5 The Policy Year sits in C2:C5 I've shown the values I'd expect the formula in the cells in column B to draw from Column C.
COLUMN A COLUMN B EXPECTED VALUE COLUMN C COLUMN D COLUMN E
2 April 2017 2016 2016 5 December 2016 4 December 2017
5 June 2017 2016 2017 5 December 2017 4 December 2018
6 December 2017 2017 2018 5 December 2018 4 December 2019
4 January 2018 2017 2019 5 December 2019 4 December 2020
6 August 2018 2017
4 December 2018 2017
29 December 2018 2018
6 March 2020 2019

How to fill empty cell value in pandas with condition

My sample dataset is as below. Actuall data till 2020 is available.
Item Year Amount final_sales
A1 2016 123 400
A2 2016 23 40
A3 2016 6
A4 2016 10 100
A5 2016 5 200
A1 2017 123 400
A2 2017 23
A3 2017 6
A4 2017 10
A5 2017 5 200
I have to extrapolate 2017 (and subsequent years) final_sales column data from 2016 for every Item if 2017 data not available.
In the above dataset final_sales not available for the year 2017 for A2 and A4 but available for 2016 year. How to bring in 2016 data (final_sales) value if corresponding year final_sales not available?
Expected results as below. Thanks.
Item Year Amount final_sales
A1 2016 123 400
A2 2016 23 40
A3 2016 6
A4 2016 10 100
A5 2016 5 200
A1 2017 123 400
A2 2017 23 40
A3 2017 6
A4 2017 10 100
A5 2017 5 200
It looks like you want to fill forward where there is missing data.
You can do this with 'fillna', which is available on pd.DataFrame objects.
In your case, you only want to fill forward for each item, so first group by item, and then use fillna. The method 'pad' just carries forward in order (hence why we sort first).
df['final_sales'] = df.sort_values('Year').groupby('Item')['final_sales'].fillna(method='pad')
Note that on your example data, A3 is missing for 2016 as well, so there is nothing to carry forward and it remains missing for 2017.
For me working GroupBy.ffill, only necessary sorted Year column like in question sample data:
#if necessary sorting by both columns
df = df.sort_values(['Year', 'Item'])
df['final_sales'] = df.groupby('Item')['final_sales'].ffill()
print (df)
Item Year Amount final_sales
0 A1 2016 123 400.0
1 A2 2016 23 40.0
2 A3 2016 6 NaN
3 A4 2016 10 100.0
4 A5 2016 5 200.0
5 A1 2017 123 400.0
6 A2 2017 23 40.0
7 A3 2017 6 NaN
8 A4 2017 10 100.0
9 A5 2017 5 200.0
Something like this?:
def fill_final(x):
if x['year'] != 2016:
return df[(df['year'] == 2016) & (df['Item'] == x['Item'])]['final_sales']
else: return x['final_sales']
df['final_sales'] = df.apply(lambda x: fill_final(x), axis = 1)
did not test this but should set you on the right path

How to set values based on a date range in Excel?

I want to set values based on a arrival and departure date:
Idx Arrive Depart 01. Jan 02. Jan 03. Jan 04. Jan 05. Jan ...
1 01. Jan 04. Jan 1 1 1
2 02. Jan 04. Jan 1 1
3 02. Jan 05. Jan 1 1 1
4 01. Jan 05. Jan 1 1 1 1
5 03. Jan 05. Jan 1 1
... ... ... ... ... ... ... ... ...
Total 2 4 5 3
For example, Idx 1:
Arrives on 01 January
Departs on 04 January
A total of 3 nights accommodation needed (value of '1' in the columns 01, 02 and 03 January) You'll note that a '1' isn't entered in the 04 January column, as this is the date of departure and no accommodation isn't required that night.
How to archieve this in Excel?
Assuming that Arrive is in column A and the column headers (Arrive, Depart, 01. Jan) are on row 1, you want to put the following formula into cell C2:
=IF(AND(C$1>=$A2,C$1<$B2),1,"")
From there, you can copy the formula into the other cells. The formula assumes that the dates on the left and at the top are proper data values, i.e. Excel treats them as dates.

Resources