DAX help: % monthly share of another table - pivot

I have a DAX formula for my Powerpivot I cannot get to solve and was hoping for help.
I have two pivot tables connected already
Showing a cohort of actions taken within Month 1,….X on the sign up month
Total Sign Ups on monthly basis
I have tried to attached the sheet here but somehow I cant so I have add a screenshot of the sheet.1
What I have so far is:
=DIVIDE(
SUM(Range[conversion to KYC completed]),
SUM('Range 1'[Sum of signups]))
But this does not give me what I want as I think I’m missing the monthly grouping somehow.
Question 1:
What I want is to get the share of actions completed within 1,...,X months out of the total sign up that given month (e.g. Jan) (so the data from Table 2)
Question 2:
In best case I would also like to show total sign ups in the beginning of the cohort to make the cohort easier to understand, so having the monthly total sign up (which the cohort is calculated based on). But now I cannot get just the totals month by month. Is there anyways just to add in a monthly total column in the pivot without applying these number as a value across all columns?
Something like this is the ultimate outcome for me 2
UPDATED WITH SAMPLE DATA
Signup month, KYC completed month, Age by month, signups, conversion to KYC completed
Jan-17 Jul-18 18 97 75
Jan-17 Jul-18 18 99 79
Jan-17 Dec-18 23 95 80
Feb-17 May-18 15 99 74
Feb-17 Jul-18 17 90 75
Feb-17 Jul-18 17 95 76
Feb-17 Aug-18 18 92 71
Mar-17 May-18 14 94 73
Apr-17 Jul-18 15 93 75
May-17 Sep-18 16 94 70
May-17 Oct-18 17 98 72
Jun-17 May-18 11 95 79
Jul-17 Oct-18 15 97 74
Jul-17 Jul-18 12 94 78
Aug-17 Sep-18 13 96 74
Sep-17 Nov-18 14 95 80
Sep-17 Oct-18 13 94 79
DESIRED OUTCOME
The % for Month 1....X is calculated KYC Completed / Monthly Sign up
OUTPUT WITH THIS CODE
=VAR SignUpMonth = IF(HASONEVALUE('Range 1'[Row Labels]), BLANK())
RETURN
DIVIDE(CALCULATE(SUM([conversion to KYC completed])),
CALCULATE(SUM('Range 1'[Sum of signups]),
FILTER(ALL(Range), Range[Signup month (Month Index)] = SignUpMonth)))
[

Thanks for the sample data Franzi. Still not too clear what you're asking for, but perhaps this will help a little.
Signed Up to Signed In Ratio =
VAR SignUpMonth = SELECTEDVALUE(Table1[Signup month], BLANK())
RETURN
DIVIDE(CALCULATE(SUM([conversion to KYC completed])),
CALCULATE(SUM(Table1[ signups]),
FILTER(ALL(Table1), Table1[Signup month] = SignUpMonth)))
So. Let's break it down.
If I understand correct, you want to see the cross section of number of signins for a given month ( x axis ) signup combo ( y axis ) and divide that number by the total signups ( y axis ) per signup month.
number of signins for a given month ( x axis ) signup combo ( y axis ):
CALCULATE(SUM([conversion to KYC completed]))
TOTAL signups ( y axis ) per signup month
CALCULATE(SUM(Table1[ signups]),
FILTER(ALL(Table1), Table1[Signup month] = SignUpMonth))

Related

Performing Calculations From A Pandas Data Frame with Multiple Conditions

Forgive the question as I'm a science major, not computer science and I'm teaching myself Python to help with a class project.
I have a Pandas data frame that I've imported from a .csv that looks like:
Item_ID Event_ID Value
27 83531 2533501.8
28 83531 1616262
31 83531 269829
32 83531 55.8
33 83531 269829
34 83531 4882
35 83531 269829
36 83531 4882
37 83531 55.8
38 83531 55.8
27 83532 7137904.8
28 83532 5873877.6
31 83532 497381
32 83532 55.7
33 83532 497381
34 83532 7568
35 83532 497381
36 83532 7568
37 83532 55.7
38 83532 55.7
This data is from a manual entry that is done multiple times daily where the Item_ID is type of measurement, Event_ID is the unique identifier for each "data entry event" by the user, and value is the value of the measurement.
I need to perform a number of calculations on each unique Event_Id.
Calc1 = ([28]/[27])*(([31]*[32])/[28])*(([33]-[34])/[33])
Calc2 = [36]/[35]
Calc3 = ([35]-[113])/[35]
Calc4 = [37]
Calc5 = [38]
Each number in the above formula represents an Item_ID. I want the replace the Item_ID in the formula with the value from the same row for each Event_ID.
This project was started a month ago and will run for 6 more weeks. By then, there will be to many data points to perform the calculations by hand.
As these calculations cannot be performed across Event_IDs, the formula for Event_ID 85831 would look like:
Calc1_Data = ([1616262]/[2533501.8])*(([269829]*[55.8])/[1616262])*(([269829]-[4882])/[269829])
Calc2_Data = [4882]/[269829]
Calc3_Data = ([497381]-[0])/[497381]) ***0 would be placed hear as Item_ID 113 does not exist for this
Event_ID
Calc4_Data = [55.7]
Calc5_Data = [55.7]
The results would then be put into a new data frame that I could then perform my analysis on.
Event_ID Clac1_Result Calc2_Result Calc3_Result Calc4_Result Calc5_Result
85829
85830
85331 RESULTS HERE
85332 RESULTS HERE
85833
85834
This is my first go at asking a question here since I've been able to find all of my other answers in the library docs or previously asked questions. If I didn't provide enough information let me know and I'll clarify if possible.
Thanks
You can use groupby followed by agg methods to do that.
First, define your calculations as functions:
# Define calculations
def Calc1(x):
return (x[28]/x[27])*((x[31]*x[32])/x[28])*((x[33]-x[34])/x[33])
def Calc2(x):
return x[36]/x[35]
# Calc3 = lambda x: (x[35]-x[113])/x[35] # commenting out because there's no 113 in the provided example
def Calc4(x):
return x[37]
def Calc5(x):
return x[38]
Then, perform the calculations using the groupby and agg:
df = df.set_index('Item_ID') # set 'Item_ID' to index so that we can use fewer code inside the functions
df = df.groupby('Event_ID').agg([Calc1, Calc2, Calc4, Calc5]) # group by Event_ID, and perform the set of specified calculations
df.columns = df.columns.droplevel(0) # reset column names
Output:
Calc1 Calc2 Calc4 Calc5
Event_ID
83531 5.835418 0.018093 55.8 55.8
83532 3.822212 0.015216 55.7 55.7

Cohort in Excel with aggregated monthly data

I'm trying to make a cohort in Excel Pivot with a dataset having:
aggregated number of monthly sign ups (month by month), aggregated number user of completed next step, number of months between sign up and the next action taken.
What I can't figure out when i do the pivot to have the cohort, is what to put into the value field in the pivot? Normally I would take the Customer IDs as value, but since I only have the data on aggregated monthly level I'm not sure if i put the number of sign ups or the number of next step completed?
Also how do I get the sum of each cohort so i can calculated the retention rate?
I hope this make sense.
Signup month Action completed month Months between sign up and action completed signups conversion to Action completed
Jan-17 Sep-18 20 95 71
Jan-17 Jan-18 12 95 77
Jan-17 Jun-18 17 96 72
Jan-17 Jan-18 12 92 78
Jan-17 Dec-18 23 91 78
Jan-17 Jul-18 18 100 73
Jan-17 Oct-18 21 92 79
Jan-17 Feb-18 13 95 70
Jan-17 Jan-18 12 91 79
Jan-17 May-18 16 93 71
Jan-17 Jun-18 17 95 72
Is this what you are looking to achieve?
REVISION #1
This layout shows the total number of signups, by the month in which the signup occurred, distributed by the number of months btwn the signup and action completed. The action completed month may be omitted and will still achieve the same result; it is there FYI only.
REVISION #2
This is an example of the average months between the signup and action. Is this what you are looking for?

Ranking Dates Based on Another Column - Spotfire

Does anyone know of way to circumvent the Spotfire limitation for using the OVER function to RANK or order dates when using a custom expression?
Providing a little background, I am trying to identify or mark a lease based on the below data as 1, 2, 3 etc. For example, since we see twice 63 in the left column, I would like to return a 1 and a 2 to identify the two different leases, starting on 1/1/2016 and 8/1/2016. Then a 1 and 2 for 72, a 1 for 140 and so one. Unfortunately, OVER functions can only be used with aggregation methods and I don't know of another method to produce the result that I am looking for.
Tenant Lease_From Lease_To Tenant_status
63 1/1/2016 1/31/2017 Current
63 8/1/2017 7/31/2018 Current
72 10/1/2016 7/31/2017 Current
72 8/1/2017 7/31/2018 Current
140 2/1/2017 7/31/2018 Current
149 8/1/2016 7/31/2017 Current
149 8/1/2017 7/31/2018 Current
156 1/15/2017 3/31/2018 Current
156 4/1/2018 3/31/2019 Current
Use this:
Rank([Lease_From], [Tenant])
Gives this as the result:
Tenant Lease_From Lease_To Tenant_status Rank([Lease_From], [Tenant])
63 1/1/2016 1/31/2017 Current 1
63 8/1/2017 7/31/2018 Current 2
72 10/1/2016 7/31/2017 Current 1
72 8/1/2017 7/31/2018 Current 2
140 2/1/2017 7/31/2018 Current 1
149 8/1/2016 7/31/2017 Current 1
149 8/1/2017 7/31/2018 Current 2
156 1/15/2017 3/31/2018 Current 1
156 4/1/2018 3/31/2019 Current 2
please consider #blakeoft's answer as the correct one!
that said, as an FYI, First() is considered an aggregation method, and OVER statements can be included inside of an If()! so you can accomplish the same thing with an expression like:
If([Lease_From] = First([Lease_From]) OVER ([Tenant]), 1, 2)
when you combine If() and OVER in this way, you can get some really cool and powerful visualizations, BUT you do lose the ability to mark data effectively. this is because the expression is evaluated from the context of the If() rather than the OVER; in other words, all rows are considered instead of only the ones selected.
you can get around this with some black magic (AKA data functions) but it's a bit contrived.
again, in this situation, Rank() is absolutely the correct solution.

Dividing excel chart into sections

I want to create this typ of chart in excel:
With the vertical gridlines dividing the chart by year, and the labels for each year. The guy who made this chart said he thinks he just drew in the lines and added the labels manually somehow. But can this be done any other way? drawing lines in charts isnt very exact and the only other solutions i've found can't really produce the same result.
If you have data that looks something like:
Jan-14 4
Feb-14 30
Mar-14 56
Apr-14 23
May-14 3
Jun-14 62
Jul-14 74
Aug-14 12
Sep-14 3
Oct-14 15
Nov-14 63
Dec-14 74
Jan-15 45
Feb-15 3
Mar-15 4
Apr-15 56
May-15 23
Jun-15 3
Jul-15 62
Aug-15 74
Sep-15 12
Oct-15 3
Nov-15 15
Dec-15 63
Jan-16 74
You can select that data and add a new scatter plot style chart. It will, by default, look very similar to the one above. To get vertical lines at the years, you can right-click the x-axis and choose "Format Axis". Click "Fixed" for the "Major Unit" and enter 356 as the number.
Right click again on the x-axis and choose "Add Major Gridlines". You should get a vertical line for each year.
As for the boxes/labels with the years, you may have to do that manually or get creative with VBA.

Excel date/product count to specified limit

Column A "Sales Dates", Column B "=A2-A1" for "Date Diff", Column C "Customer Name", Column D "Item", Column E "Items Ordered Count"
My issue is I have to do a running 30 day total for each customer to see that specific items are not being ordered above "x" number within any 30-day period.
Does anyone have any ideas?
I may not be fully understanding your question, but I don't think you can do what you ask in excel. This might be a situation where a database that can do SQL might come in handy.
The best I can come up with in excel is a Pivot Table, with the customers as rows, dates as columns (group by month), and sum of Items Ordered in the data area. Then conditional format the data area to highlight values > your limit.
Perhaps if you provide some sample data & output I can come up with something more like what you need.
The formula would look something like this:
{=SUM(IF((A$2:A2>=A2-29)*(D$2:D2=D2),E$2:E2,0))}
It should be entered into cell F2 and copied down to the last row of your data. I pasted in a test spreadsheet below so you can see where things go (sorry for the formatting--hopefully it will look better if you paste it into Excel).
IMPORTANT: This is an array formula, so after you type in the formula (and don't type in the braces {} when you do), you must press Ctrl-Shift-Enter instead of just Enter (see this link for more details).
What does the formula do? It does two loops:
First, it loops through all the Sales Dates from the beginning of the log to the current row and checks if each date is between the date of the current row and 29 days earlier (which makes a 30-day window). (By "current row" I mean the row where the formula is located.)
Second, it loops through all the Items from the beginning of the log to the current row and checks if there is a match with the Item of the current row.
For any row where both checks are true (the "*" in the formula does an "and" operation), Items Ordered Count is added to the sum, otherwise zero is added to the sum. So, when it's finished, you have a count for each row of how many orders there were in the past 30 days for that item.
HTH,
-Dan
Sales Dates Date Diff Customer Name Item Items Ordered Count 30-Day Count
1/1/2009 0 dfsadf 11336 70 70
1/2/2009 1 asdfd 10218 121 121
1/3/2009 1 fsdfjkfl 10942 101 101
1/6/2009 3 slkdjflsk 13710 80 80
1/7/2009 1 slkdjls 10480 127 127
1/9/2009 2 sdjjf 11336 143 213
1/11/2009 2 woieuriwe 11501 84 84
1/14/2009 3 owqieyurtn 10191 78 78
1/15/2009 1 weisd 10480 113 240
1/16/2009 1 woieuriwe 12024 133 133
1/17/2009 1 vkcjl 13818 125 125
1/20/2009 3 sdflkj 11336 128 341
1/23/2009 3 jnbkdl 10480 141 381
1/25/2009 2 pqcvnlz 10480 137 518
1/27/2009 2 hwodkjgfh 12878 80 80
1/28/2009 1 zjdnfg;pwlkd 10942 123 224
1/31/2009 3 zlkdjnf;psod 13173 93 93
2/2/2009 2 zlknpdodfg 11336 119 390
2/4/2009 2 zjhdfpwskjh 12004 57 57
2/5/2009 1 asdfd 10218 121 121
2/8/2009 3 fsdfjkfl 10942 101 224
2/11/2009 3 slkdjflsk 13710 80 80
2/14/2009 3 slkdjls 10480 127 405
2/16/2009 2 sdjjf 11336 143 390
2/18/2009 2 woieuriwe 11501 84 84
2/21/2009 3 owqieyurtn 10191 78 78
2/24/2009 3 weisd 10480 113 240
2/25/2009 1 woieuriwe 12024 133 133
2/27/2009 2 vkcjl 13818 125 125
2/28/2009 1 sdflkj 11336 128 390

Resources