PowerPivot Cohort Analysis - excel

I'm trying to do cohort analysis using Excel's PowerPivot. I have a table recording which users have purchased which products in which months eg.
UserID Product Date Quantity
1 Ham Mar 15 2
1 Cheese Jan 15 7
2 Ham Mar 15 8
3 Fish Mar 15 2
2 Cheese Apr 15 8
I want to use a calculated field to filter for a cohort of users who purchased a given product in a given month but be able to analyse all their purchases.
Eg cohort Ham, March 15
--> Users 1, 2
UserID Product Date Quantity
1 Ham Mar 15 2
1 Cheese Jan 15 7
2 Ham Mar 15 8
2 Cheese Apr 15 8
I know this could be done easily using SQL but I am working with colleagues who prefer to use Excel over Access/Some SQL interface.
Thankyou

Create a calculated column like this:
=if([UserID]&SlicerValue=[UserID]&[Product],[UserID])
where HAM would be selected from slicer created from a table of unique products.

Related

Excel: Dynamic Range Date used in other fields: Sumproduct

I am using sumproduct formula to get the first four month, then the second four month, third four month of net sales until one month before today. This is my formula that I used:
=IFERROR(SUMPRODUCT($B3:$Y3*(COLUMN($B3:$Y3)>=AGGREGATE(15,6,COLUMN($B3:$Y3)/($B3:$Y3<>0),1)+4*(COLUMNS(B3)-1))*(COLUMN($B3:$Y3)<AGGREGATE(15,6,COLUMN($B3:$Y3)/($B3:$Y3<>0),1)+4*(COLUMNS(B3)))*($B$1:$Y$1<EOMONTH(TODAY(),-1)+1)),0)
However, I need to capture the same range as I have it for the net sales as for other measures like COGS in my example. I cannot use the formula above for the other measures like COGS as sometimes they are zero in the same range as in the Net Sales.But I need to capture the zeros here as well.
Example 1
Example 2
Net Sales
Jan
Feb
Mar
Apr
May
June
July
Aug
Sept
Oct
Nov
Dec
0
0
2
3
4
5
2
3
2
3
2
4
---> 1st period= 14 2nd period= 10
COGS (follows the same date range as Net Sales)
Jan
Feb
Mar
Apr
May
June
July
Aug
Sept
Oct
Nov
Dec
0
0
0
0
0
2
1
4
2
3
2
4
---> 1st period= 2 2nd Period= 11
You can leave the entire range check logic from the first formula and change just the value range, i.e first formula in my sample:
=IFERROR(SUMPRODUCT($A3:$L3*(COLUMN($A3:$L3)>=AGGREGATE(15,6,COLUMN($A3:$L3)/($A3:$L3<>0),1)+4*(COLUMN(A3)-1))*(COLUMN($A3:$L3)<AGGREGATE(15,6,COLUMN($A3:$L3)/($A3:$L3<>0),1)+4*(COLUMN(A3)))*($A$2:$L$2<EOMONTH(TODAY(),-1)+1)),0)
second formula for COGS:
=IFERROR(SUMPRODUCT($O3:$Z3*(COLUMN($A3:$L3)>=AGGREGATE(15,6,COLUMN($A3:$L3)/($A3:$L3<>0),1)+4*(COLUMN(A3)-1))*(COLUMN($A3:$L3)<AGGREGATE(15,6,COLUMN($A3:$L3)/($A3:$L3<>0),1)+4*(COLUMN(A3)))*($A$2:$L$2<EOMONTH(TODAY(),-1)+1)),0)

Excel - Count based on criteria in 3 other columns

I'm looking for help in getting a count based on criteria in 3 other columns. I started to do a pivot table, but I cannot see how to add an IF statement to the distinct count there.
I need a count of each customer within the customer type, by each supplier, if the Cases > 0 for that year.
Here's a sample data set:
Supplier
Customer
Type
2019 Cases
2020 Cases
ABC
Al's Store
Package
3
2
ABC
Ben's
Package
0
6
ABC
Kroger
Grocery
2
1
ABC
Publix
Grocery
1
0
XYZ
Al's Store
Package
0
5
XYZ
Ben's
Package
4
0
XYZ
Kroger
Grocery
0
1
XYZ
Publix
Grocery
3
7
I need a result like this. My actual report will have each supplier on their own tab.
Supplier
Type
2019 Customer Count
2020 Customer Count
My Reason
ABC
Package
1
2
Al's bought in both years, but Ben's only in 2020
ABC
Grocery
2
1
Kroger bought in both years, but Publix only in 2019
XYZ
Package
1
1
Al only bought in 2020, Ben only bought in 2019
XYZ
Grocery
1
2
Kroger only bought in 2020
Thanks!

Summing a years worth of data that spans two years pandas

I have a DataFrame that contains data similar to this:
Name Date A B C
John 19/04/2018 10 11 8
John 20/04/2018 9 7 9
John 21/04/2018 22 15 22
… … … … …
John 16/04/2019 8 8 9
John 17/04/2019 10 11 18
John 18/04/2019 8 9 11
Rich 19/04/2018 18 7 6
… … … … …
Rich 18/04/2019 19 11 17
The data can start on any day and contains at least 365 days of data, sometimes more. What I want to end up with is a DataFrame like this:
Name Date Sum
John April 356
John May 276
John June 209
Rich April 452
I need to sum up all of the months to get a year’s worth of data (April - March) but I need to be able to handle taking part of April’s total (in this example) from 2018 and part from 2019. What I would also like to do is shift the days so they are consecutive and follow on in sequence so rather than:
John 16/04/2019 8 8 9 Tuesday
John 17/04/2019 10 11 18 Wednesday
John 18/04/2019 8 9 11 Thursday
John 19/04/2019 10 11 8 Thursday (was 19/04/2018)
John 20/04/2019 9 7 9 Friday (was 20/04/2018)
It becomes
John 16/04/2019 8 8 9 Tuesday
John 17/04/2019 10 11 18 Wednesday
John 18/04/2019 8 9 11 Thursday
John 19/04/2019 9 7 9 Friday (was 20/04/2018)
Prior to summing to get the final DataFrame. Is this possible?
Additional information requested in comments
Here is a link to the initial data set https://github.com/stottp/exampledata/blob/master/SOExample.csv and the required output would be:
Name Month Total
John March 11634
John April 11470
John May 11757
John June 10968
John July 11682
John August 11631
John September 11085
John October 11924
John November 11593
John December 11714
John January 11320
John February 10167
Rich March 11594
Rich April 12383
Rich May 12506
Rich June 11112
Rich July 11636
Rich August 11303
Rich September 10667
Rich October 10992
Rich November 11721
Rich December 11627
Rich January 11669
Rich February 10335
Let's see if I understood correctly. If you want to sum, I suppose you mean sum the values of columns ['A', 'B', 'C'] for each day and get the total value monthly.
If that's right, the first thing to to is set the ['Date'] column as the index so that the data frame is easier to work with:
df.set_index(df['Date'], inplace=True, drop=True)
del df['Date']
Next, you will want to add the new column ['Sum'] by re-sampling your data frame (from days to months) whilst summing the values of ['A', 'B', 'C']:
df['Sum'] = df['A'].resample('M').sum() + df['B'].resample('M').sum() + df['C'].resample('M').sum()
df['Sum'].head()
Out[37]:
Date
2012-11-30 1956265
2012-12-31 2972076
2013-01-31 2972565
2013-02-28 2696121
2013-03-31 2970687
Freq: M, dtype: int64
The last part about squashing February of 2018 and 2019 together as if they were a single month might yield from:
df['2019-02'].merge(df['2018-02'], how='outer', on=['Date', 'A', 'B', 'C'])
Test this last step and see if it works for you.
Cheers

Dynamically Lookup Value with Between - Excel

I have a chronological list of Product, Year, Month, Profit (like below).
Summary Table
Product Year Month Profit
TV 2018 1 10
TV 2018 2 20
TV 2018 3 30
TV 2018 4 50
TV 2018 5 35
TV 2018 6 60
TV 2018 7 90
Heater 2018 1 20
Heater 2018 2 3
Heater 2018 3 8
Heater 2018 4 4
Heater 2018 5 6
Heater 2018 6 11
Heater 2018 7 1
What I wanted to do is lookup another sheet that has all of the price changes within by month and year as well as the table below shows.
Sale Price
Product Year Month Price
TV 2018 1 $1,000.00
TV 2018 4 $800.00
TV 2018 7 $950.00
Heater 2018 1 $20.00
Heater 2018 2 $60.00
Heater 2018 5 $45.00
So the end result for example, TV Month = 2 and Year = 2018, I want it to pull in $1,000 to be part of my profit calculation.
to get the correct Price, use:
=INDEX(J:J,AGGREGATE(14,6,ROW($I$2:$I$7)/(($G$2:$G$7=A2)*($H$2:$H$7=B2)*($I$2:$I$7<=C2)),1))

Merge columns from different sheets into specific order

I have an Excel file with three sheets of annual data. For example:
Sheet 1 is for year 2006
Site1 Site2 Site4
Jan 10 12 14
Feb 0 15 9
Sheet 2 is for year 2007
Site1 Site3 Site4
Jan 14 10 18
Feb 4 16 2
Sheet 3 is for year 2008
Site2 Site3 Site4 Site5
Jan 12 13 7 12
Feb 5 13 5 16
In Sheet 4, I want to combine these data under the specific Site_number (if the Site_number is unique, I want to add a column for that data). For example:
Sheet 4 should look like this:
Site1 Site2 Site3 Site4 Site5
2006 Jan 10 12 14
Feb 0 15 9
2007 Jan 14 10 18
Feb 4 16 2
2008 Jan 12 13 7 12
Feb 5 13 5 16
What would be a good way to go about this?
There are very many way of achieving your objective and with the columns apparently already sorted I would be tempted merely to add blank columns until each sheet has each Site in the same column. However instead with a lookup function something like:
=IFERROR(INDEX($C$10:$F$12,ROW(),IFERROR(MATCH(C$1,$C$10:$F$10,0),"")),"")
copied across and down to suit should work, provided Row1 has a complete list of unique Sites and, for the purposes of illustration, your original data is in the same sheet but moved down to start at Row10 and across one column (the latter to allow for manual addition of the year).
I'd suggest one sheet at a time and then merely copy and add/append into a new sheet as required.

Resources