Dynamically Lookup Value with Between - Excel - excel-formula
I have a chronological list of Product, Year, Month, Profit (like below).
Summary Table
Product Year Month Profit
TV 2018 1 10
TV 2018 2 20
TV 2018 3 30
TV 2018 4 50
TV 2018 5 35
TV 2018 6 60
TV 2018 7 90
Heater 2018 1 20
Heater 2018 2 3
Heater 2018 3 8
Heater 2018 4 4
Heater 2018 5 6
Heater 2018 6 11
Heater 2018 7 1
What I wanted to do is lookup another sheet that has all of the price changes within by month and year as well as the table below shows.
Sale Price
Product Year Month Price
TV 2018 1 $1,000.00
TV 2018 4 $800.00
TV 2018 7 $950.00
Heater 2018 1 $20.00
Heater 2018 2 $60.00
Heater 2018 5 $45.00
So the end result for example, TV Month = 2 and Year = 2018, I want it to pull in $1,000 to be part of my profit calculation.
to get the correct Price, use:
=INDEX(J:J,AGGREGATE(14,6,ROW($I$2:$I$7)/(($G$2:$G$7=A2)*($H$2:$H$7=B2)*($I$2:$I$7<=C2)),1))
Related
How to calculate cumulative sum based on months in a pandas dataframe?
I want to calculate cumulative sum of values in a pandas dataframe column based on months. code: import pandas as pd import numpy as np data = {'month': ['April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March'], 'kpi': ['sales', 'sales quantity', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales quantity', 'sales', 'sales', 'sales', 'sales'], 're_o' : [1, 1, 1, 11, 11, 11, 12, 12, 12, 13, 13, 13] } # Create DataFrame df = pd.DataFrame(data) df['Q-Total'] = 0 df['Q-Total'] = np.where((df['month'] == 'April') | (df['month'] == 'May') | (df['month'] == 'June'), df.groupby(['kpi'], sort=False)['re_o'].cumsum(), df['Q-Total']) df['Q-Total'] = np.where((df['month'] == 'July') | (df['month'] == 'August') | (df['month'] == 'September'), df.groupby(['kpi'], sort=False)['re_o'].cumsum(), df['Q-Total']) df['Q-Total'] = np.where((df['month'] == 'October') | (df['month'] == 'November') | (df['month'] == 'December'), df.groupby(['kpi'], sort=False)['re_o'].cumsum(), df['Q-Total']) df['Q-Total'] = np.where((df['month'] == 'January') | (df['month'] == 'February') | (df['month'] == 'March'), df.groupby(['kpi'], sort=False)['re_o'].cumsum(), df['Q-Total']) print(df) My required output is given below: month kpi re_o Q-Total 0 April sales 1 1 1 May sales quantity 1 1 2 June sales 1 2 3 July sales 11 11 4 August sales 11 22 5 September sales 11 33 6 October sales 12 12 7 November sales quantity 12 12 8 December sales 12 24 9 January sales 13 13 10 February sales 13 26 11 March sales 13 39 But When I run this code,I got an output like below: month kpi re_o Q-Total 0 April sales 1 1 1 May sales quantity 1 1 2 June sales 1 2 3 July sales 11 13 4 August sales 11 24 5 September sales 11 35 6 October sales 12 47 7 November sales quantity 12 13 8 December sales 12 59 9 January sales 13 72 10 February sales 13 85 11 March sales 13 98 I want to calculate cumulative sum in the below manner: If the months are April,May and June ,take the cumulative sum only from the April,May and June If the months are July,August and September ,take the cumulative sum only from the July,August and September If the months are October,November and December ,take the cumulative sum only from the October,November and December If the months are January,February and March ,take the cumulative sum only from the January,February and March Can anyone suggest a solution?
You can create quarters periods for groups and then use GroupBy.cumsum: g = pd.to_datetime(df['month'], format='%B').dt.to_period('Q') df['Q-Total'] = df.groupby([g,'kpi'])['re_o'].cumsum() print (df) month kpi re_o Q-Total 0 April sales 1 1 1 May sales quantity 1 1 2 June sales 1 2 3 July sales 11 11 4 August sales 11 22 5 September sales 11 33 6 October sales 12 12 7 November sales quantity 12 12 8 December sales 12 24 9 January sales 13 13 10 February sales 13 26 11 March sales 13 39 Details: print (df.assign(q = g)) month kpi re_o Q-Total q 0 April sales 1 1 1900Q2 1 May sales quantity 1 1 1900Q2 2 June sales 1 2 1900Q2 3 July sales 11 11 1900Q3 4 August sales 11 22 1900Q3 5 September sales 11 33 1900Q3 6 October sales 12 12 1900Q4 7 November sales quantity 12 12 1900Q4 8 December sales 12 24 1900Q4 9 January sales 13 13 1900Q1 10 February sales 13 26 1900Q1 11 March sales 13 39 1900Q1
You can define custom groups from a list of lists: groups = [['January', 'February', 'March'], ['April', 'May', 'June'], ['July', 'August', 'September'], ['October', 'November', 'December'], ] # make mapper d = {k:v for v,l in enumerate(groups) for k in l} df['Q-Total'] = df.groupby([df['month'].map(d), 'kpi'])['re_o'].cumsum() output: month kpi re_o Q-Total 0 April sales 1 1 1 May sales quantity 1 1 2 June sales 1 2 3 July sales 11 11 4 August sales 11 22 5 September sales 11 33 6 October sales 12 12 7 November sales quantity 12 12 8 December sales 12 24 9 January sales 13 13 10 February sales 13 26 11 March sales 13 39
Excel: Dynamic Range Date used in other fields: Sumproduct
I am using sumproduct formula to get the first four month, then the second four month, third four month of net sales until one month before today. This is my formula that I used: =IFERROR(SUMPRODUCT($B3:$Y3*(COLUMN($B3:$Y3)>=AGGREGATE(15,6,COLUMN($B3:$Y3)/($B3:$Y3<>0),1)+4*(COLUMNS(B3)-1))*(COLUMN($B3:$Y3)<AGGREGATE(15,6,COLUMN($B3:$Y3)/($B3:$Y3<>0),1)+4*(COLUMNS(B3)))*($B$1:$Y$1<EOMONTH(TODAY(),-1)+1)),0) However, I need to capture the same range as I have it for the net sales as for other measures like COGS in my example. I cannot use the formula above for the other measures like COGS as sometimes they are zero in the same range as in the Net Sales.But I need to capture the zeros here as well. Example 1 Example 2 Net Sales Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec 0 0 2 3 4 5 2 3 2 3 2 4 ---> 1st period= 14 2nd period= 10 COGS (follows the same date range as Net Sales) Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec 0 0 0 0 0 2 1 4 2 3 2 4 ---> 1st period= 2 2nd Period= 11
You can leave the entire range check logic from the first formula and change just the value range, i.e first formula in my sample: =IFERROR(SUMPRODUCT($A3:$L3*(COLUMN($A3:$L3)>=AGGREGATE(15,6,COLUMN($A3:$L3)/($A3:$L3<>0),1)+4*(COLUMN(A3)-1))*(COLUMN($A3:$L3)<AGGREGATE(15,6,COLUMN($A3:$L3)/($A3:$L3<>0),1)+4*(COLUMN(A3)))*($A$2:$L$2<EOMONTH(TODAY(),-1)+1)),0) second formula for COGS: =IFERROR(SUMPRODUCT($O3:$Z3*(COLUMN($A3:$L3)>=AGGREGATE(15,6,COLUMN($A3:$L3)/($A3:$L3<>0),1)+4*(COLUMN(A3)-1))*(COLUMN($A3:$L3)<AGGREGATE(15,6,COLUMN($A3:$L3)/($A3:$L3<>0),1)+4*(COLUMN(A3)))*($A$2:$L$2<EOMONTH(TODAY(),-1)+1)),0)
Find earliest date within daterange
I have the following market data: data = pd.DataFrame({'year': [2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020], 'month': [10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11], 'day': [1,2,5,6,7,8,9,12,13,14,15,16,19,20,21,22,23,26,27,28,29,30,2,3,5,6,9,10,11,12,13,16,17,18,19,20,23,24,25,26,27,30]}) data['date'] = pd.to_datetime(data) data['spot'] = [77.3438,78.192,78.1044,78.4357,78.0285,77.3507,76.78,77.13,77.0417,77.6525,78.0906,77.91,77.6602,77.3568,76.7243,76.5872,76.1374,76.4435,77.2906,79.2239,78.8993,79.5305,80.5313,79.3615,77.0156,77.4226,76.288,76.5648,77.1171,77.3568,77.374,76.1758,76.2325,76.0401,76.0529,76.1992,76.1648,75.474,75.551,75.7018,75.8639,76.3944] data = data.set_index('date') I'm trying to find the spot value for the first day of the month in the date column. I can find the first business day with below: def get_month_beg(d): month_beg = (d.index + pd.offsets.BMonthEnd(0) - pd.offsets.MonthBegin(normalize=True)) return month_beg data['month_beg'] = get_month_beg(data) However, due to data issues, sometimes the earliest date from my data does not match up with the first business day of the month. We'll call the earliest spot value of each month the "strike", which is what I'm trying to find. So for October, the spot value would be 77.3438 (10/1/21) and in Nov it would be 80.5313 (which is on 11/2/21 NOT 11/1/21). I tried below, which only works if my data's earliest date matches up with the first business date of the month (eg it works in Oct, but not in Nov) data['strike'] = data.month_beg.map(data.spot) As you can see, I get NaN in Nov because the first business day in my data is 11/2 (spot rate 80.5313) not 11/1. Does anyone know how to find the earliest date within a date range (in this case the earliest date of each month)? I was hoping the final df would like like below: data = pd.DataFrame({'year': [2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020], 'month': [10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11], 'day': [1,2,5,6,7,8,9,12,13,14,15,16,19,20,21,22,23,26,27,28,29,30,2,3,5,6,9,10,11,12,13,16,17,18,19,20,23,24,25,26,27,30]}) data['date'] = pd.to_datetime(data) data['spot'] = [77.3438,78.192,78.1044,78.4357,78.0285,77.3507,76.78,77.13,77.0417,77.6525,78.0906,77.91,77.6602,77.3568,76.7243,76.5872,76.1374,76.4435,77.2906,79.2239,78.8993,79.5305,80.5313,79.3615,77.0156,77.4226,76.288,76.5648,77.1171,77.3568,77.374,76.1758,76.2325,76.0401,76.0529,76.1992,76.1648,75.474,75.551,75.7018,75.8639,76.3944] data['strike'] = [77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313] data = data.set_index('date')
I Believe, We can get the first() for every year and month combination and later on join that with main data. data2=data.groupby(['year','month']).first().reset_index() #join data 2 with data based on month and year later on year month day spot 0 2020 10 1 77.3438 1 2020 11 2 80.5313 Based on the question, What i have understood is that we need to take every month's first day and respective 'SPOT' column value. Correct me if i have understood it wrong.
Strike = Spot value from first day of each month To do this, we need to do the following: Step 1. Get the Year/Month value from the Date column. Alternate, we can use Year and Month columns you already have in the DataFrame. Step 2: We need to groupby Year and Month. That will give all the records by Year+Month. From this, we need to get the first record (which will be the earliest date of the month). The earliest date can either be 1st or 2nd or 3rd of the month depending on the data in the column. Step 3: By using transform in Groupby, pandas will send back the results to match the dataframe length. So for each record, it will send the same result. In this example, we have only 2 months (Oct & Nov). However, we have 42 rows. Transform will send us back 42 rows. The code: groupby('[year','month'])['date'].transform('first') will give first day of month. Use This: data['dy'] = data.groupby(['year','month'])['date'].transform('first') or: data['dx'] = data.date.dt.to_period('M') #to get yyyy-mm value Step 4: Using transform, we can also get the Spot value. This can be assigned to Strike giving us the desired result. Instead of getting first day of the month, we can change it to return Spot value. The code will be: groupby('date')['spot'].transform('first') Use this: data['strike'] = data.groupby(['year','month'])['spot'].transform('first') or data['strike'] = data.groupby('dx')['spot'].transform('first') Putting all this together The full code to get Strike Price using Spot Price from first day of month import pandas as pd import numpy as np data = pd.DataFrame({'year': [2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020], 'month': [10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11], 'day': [1,2,5,6,7,8,9,12,13,14,15,16,19,20,21,22,23,26,27,28,29,30,2,3,5,6,9,10,11,12,13,16,17,18,19,20,23,24,25,26,27,30]}) data['date'] = pd.to_datetime(data) data['spot'] = [77.3438,78.192,78.1044,78.4357,78.0285,77.3507,76.78,77.13,77.0417,77.6525,78.0906,77.91,77.6602,77.3568,76.7243,76.5872,76.1374,76.4435,77.2906,79.2239,78.8993,79.5305,80.5313,79.3615,77.0156,77.4226,76.288,76.5648,77.1171,77.3568,77.374,76.1758,76.2325,76.0401,76.0529,76.1992,76.1648,75.474,75.551,75.7018,75.8639,76.3944] #Pick the first day of month Spot price as the Strike price data['strike'] = data.groupby(['year','month'])['spot'].transform('first') #This will give you the first row of each month print (data) The output of this will be: year month day date spot strike 0 2020 10 1 2020-10-01 77.3438 77.3438 1 2020 10 2 2020-10-02 78.1920 77.3438 2 2020 10 5 2020-10-05 78.1044 77.3438 3 2020 10 6 2020-10-06 78.4357 77.3438 4 2020 10 7 2020-10-07 78.0285 77.3438 5 2020 10 8 2020-10-08 77.3507 77.3438 6 2020 10 9 2020-10-09 76.7800 77.3438 7 2020 10 12 2020-10-12 77.1300 77.3438 8 2020 10 13 2020-10-13 77.0417 77.3438 9 2020 10 14 2020-10-14 77.6525 77.3438 10 2020 10 15 2020-10-15 78.0906 77.3438 11 2020 10 16 2020-10-16 77.9100 77.3438 12 2020 10 19 2020-10-19 77.6602 77.3438 13 2020 10 20 2020-10-20 77.3568 77.3438 14 2020 10 21 2020-10-21 76.7243 77.3438 15 2020 10 22 2020-10-22 76.5872 77.3438 16 2020 10 23 2020-10-23 76.1374 77.3438 17 2020 10 26 2020-10-26 76.4435 77.3438 18 2020 10 27 2020-10-27 77.2906 77.3438 19 2020 10 28 2020-10-28 79.2239 77.3438 20 2020 10 29 2020-10-29 78.8993 77.3438 21 2020 10 30 2020-10-30 79.5305 77.3438 22 2020 11 2 2020-11-02 80.5313 80.5313 23 2020 11 3 2020-11-03 79.3615 80.5313 24 2020 11 5 2020-11-05 77.0156 80.5313 25 2020 11 6 2020-11-06 77.4226 80.5313 26 2020 11 9 2020-11-09 76.2880 80.5313 27 2020 11 10 2020-11-10 76.5648 80.5313 28 2020 11 11 2020-11-11 77.1171 80.5313 29 2020 11 12 2020-11-12 77.3568 80.5313 30 2020 11 13 2020-11-13 77.3740 80.5313 31 2020 11 16 2020-11-16 76.1758 80.5313 32 2020 11 17 2020-11-17 76.2325 80.5313 33 2020 11 18 2020-11-18 76.0401 80.5313 34 2020 11 19 2020-11-19 76.0529 80.5313 35 2020 11 20 2020-11-20 76.1992 80.5313 36 2020 11 23 2020-11-23 76.1648 80.5313 37 2020 11 24 2020-11-24 75.4740 80.5313 38 2020 11 25 2020-11-25 75.5510 80.5313 39 2020 11 26 2020-11-26 75.7018 80.5313 40 2020 11 27 2020-11-27 75.8639 80.5313 41 2020 11 30 2020-11-30 76.3944 80.5313 Previous Answer to get the first day of each month (within the column data) One way to do it is to create a dummy column to store the first day of each month. Then use drop_duplicates() and retain only the first row. Key assumption: The assumption with this logic is that we have at least 2 rows for each month. If there is only one row for a month, then it will not be part of the duplicates and you will NOT get that month's data. That will give you the first day of each month. import pandas as pd import numpy as np data = pd.DataFrame({'year': [2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020], 'month': [10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11], 'day': [1,2,5,6,7,8,9,12,13,14,15,16,19,20,21,22,23,26,27,28,29,30,2,3,5,6,9,10,11,12,13,16,17,18,19,20,23,24,25,26,27,30]}) data['date'] = pd.to_datetime(data) data['spot'] = [77.3438,78.192,78.1044,78.4357,78.0285,77.3507,76.78,77.13,77.0417,77.6525,78.0906,77.91,77.6602,77.3568,76.7243,76.5872,76.1374,76.4435,77.2906,79.2239,78.8993,79.5305,80.5313,79.3615,77.0156,77.4226,76.288,76.5648,77.1171,77.3568,77.374,76.1758,76.2325,76.0401,76.0529,76.1992,76.1648,75.474,75.551,75.7018,75.8639,76.3944] #create a dummy column to store the first day of the month data['dx'] = data.date.dt.to_period('M') #drop duplicates while retaining only the first row of each month dx = data.drop_duplicates('dx',keep='first') #This will give you the first row of each month print (dx) The output of this will be: year month day date spot dx 0 2020 10 1 2020-10-01 77.3438 2020-10 22 2020 11 2 2020-11-02 80.5313 2020-11 If there is only one row for a given month, then you can use groupby the month and take the first record. data.groupby(['dx']).first() This will give you: year month day date spot dx 2020-10 2020 10 1 2020-10-01 77.3438 2020-11 2020 11 2 2020-11-02 80.5313
data['strike']=data.groupby(['year','month'])['spot'].transform('first') I guess this can be achieved by this without creating any other dataframe.
Summing a years worth of data that spans two years pandas
I have a DataFrame that contains data similar to this: Name Date A B C John 19/04/2018 10 11 8 John 20/04/2018 9 7 9 John 21/04/2018 22 15 22 … … … … … John 16/04/2019 8 8 9 John 17/04/2019 10 11 18 John 18/04/2019 8 9 11 Rich 19/04/2018 18 7 6 … … … … … Rich 18/04/2019 19 11 17 The data can start on any day and contains at least 365 days of data, sometimes more. What I want to end up with is a DataFrame like this: Name Date Sum John April 356 John May 276 John June 209 Rich April 452 I need to sum up all of the months to get a year’s worth of data (April - March) but I need to be able to handle taking part of April’s total (in this example) from 2018 and part from 2019. What I would also like to do is shift the days so they are consecutive and follow on in sequence so rather than: John 16/04/2019 8 8 9 Tuesday John 17/04/2019 10 11 18 Wednesday John 18/04/2019 8 9 11 Thursday John 19/04/2019 10 11 8 Thursday (was 19/04/2018) John 20/04/2019 9 7 9 Friday (was 20/04/2018) It becomes John 16/04/2019 8 8 9 Tuesday John 17/04/2019 10 11 18 Wednesday John 18/04/2019 8 9 11 Thursday John 19/04/2019 9 7 9 Friday (was 20/04/2018) Prior to summing to get the final DataFrame. Is this possible? Additional information requested in comments Here is a link to the initial data set https://github.com/stottp/exampledata/blob/master/SOExample.csv and the required output would be: Name Month Total John March 11634 John April 11470 John May 11757 John June 10968 John July 11682 John August 11631 John September 11085 John October 11924 John November 11593 John December 11714 John January 11320 John February 10167 Rich March 11594 Rich April 12383 Rich May 12506 Rich June 11112 Rich July 11636 Rich August 11303 Rich September 10667 Rich October 10992 Rich November 11721 Rich December 11627 Rich January 11669 Rich February 10335
Let's see if I understood correctly. If you want to sum, I suppose you mean sum the values of columns ['A', 'B', 'C'] for each day and get the total value monthly. If that's right, the first thing to to is set the ['Date'] column as the index so that the data frame is easier to work with: df.set_index(df['Date'], inplace=True, drop=True) del df['Date'] Next, you will want to add the new column ['Sum'] by re-sampling your data frame (from days to months) whilst summing the values of ['A', 'B', 'C']: df['Sum'] = df['A'].resample('M').sum() + df['B'].resample('M').sum() + df['C'].resample('M').sum() df['Sum'].head() Out[37]: Date 2012-11-30 1956265 2012-12-31 2972076 2013-01-31 2972565 2013-02-28 2696121 2013-03-31 2970687 Freq: M, dtype: int64 The last part about squashing February of 2018 and 2019 together as if they were a single month might yield from: df['2019-02'].merge(df['2018-02'], how='outer', on=['Date', 'A', 'B', 'C']) Test this last step and see if it works for you. Cheers
PowerPivot Cohort Analysis
I'm trying to do cohort analysis using Excel's PowerPivot. I have a table recording which users have purchased which products in which months eg. UserID Product Date Quantity 1 Ham Mar 15 2 1 Cheese Jan 15 7 2 Ham Mar 15 8 3 Fish Mar 15 2 2 Cheese Apr 15 8 I want to use a calculated field to filter for a cohort of users who purchased a given product in a given month but be able to analyse all their purchases. Eg cohort Ham, March 15 --> Users 1, 2 UserID Product Date Quantity 1 Ham Mar 15 2 1 Cheese Jan 15 7 2 Ham Mar 15 8 2 Cheese Apr 15 8 I know this could be done easily using SQL but I am working with colleagues who prefer to use Excel over Access/Some SQL interface. Thankyou
Create a calculated column like this: =if([UserID]&SlicerValue=[UserID]&[Product],[UserID]) where HAM would be selected from slicer created from a table of unique products.