Can I define different aggregation methods for subtotals in different dimension in an Excel pivot table?
The following example shows a result I'm trying to obtain. The metric to aggregate is, let's say, lines of code of a software project. The 2 dimensions in question are Date and Organization. In source data, Organization is broken down into 2 columns, Department and Project, while Date is a single column and Excel makes up the Months/Years summaries automatically when making the ODBC data connection.
A metric such as this one should be aggregated differently along the different dimensions. For the Organization dimension, the subtotal for all projects of the department is the SUM, but in the date dimension, the subtotal for all months of the year is the MAX of any given month (or perhaps AVG, or last etc. but certainly not SUM).
I've tried to define the different aggregation methods in Excel in the field settings, but it always selects one or the other method for both dimensions. Is there a way to do it, preferably using standard Pivot Table mechanisms or at worst a UDF in Excel?
What I would do to tackle this problem is to add both aggregation functions: sum and max , then hide ( or shrink a lot ) those columns you do not want to display.
in the above example I shrink columns B,D,F and I because of they has values that are out of scope for your requirements.
The "Total Max of Loc" displays a value consistent with the function expressed throughout the entire column: that is "the maximum number of lines of code reached by each project in each department; this could lead to misunderstandings when we observe the values of the subtotals and grand total; i.e: The "Grand Total - Total Max of Loc" is not the "Total Max of Sum of Loc": in the example, it shows 18 which represents the absolute maximum value of Loc in a Project in each Department; In the same way the Total Max of Loc for Department 2 is 18 and form Department 1 is 12
When requested a different behavior as expressed in comment to this answer, I think we are entering into the strong customizations space and some solution could be found by writing custom macro and by leveraging the getpivotdata function or, if it can be acceptable for your case, simply by the addition of a new column with the max()formula and possibly hiding the column "Total Max of Loc"
Related
In PowerBI, I need to create a Performance Indicator (KPI) measure which evaluates dataset values in a scale from 0 to 1, with target (1) being the MAX value in a 20 years history. It's a national airport trip records open database. The formula is basically [value]/[max value].
My dataset has a lot of fields and I wish I could filter it by any of these fields, with a line chart showing the 0-1 indicator for each month based on the filters.
This is my workaround test solution:
Table 1 - Original dataset: if I filter something here, below tables also update (there are more fields to the left, including YEAR and MONTH
Table 2 - Reference to original dataset, aggregating YEAR-MONTH by the sum of "take-offs" (decolagens)
Table 3 - Reference to above (sum) table, aggregating MONTH by the max of "take-offs" (decolagens)
Table 4 - 'Sum table' merged to 'Max table' by MONTH as new table: then do [Value]/[Max] and we've got the indicator
So if i filter the original dataset by any fields, all other tables update accordingly and the indicators always stays between 0-1, works like a charm.
TL;DR
The problem is: I need to create a dashboard of this on Power Bi. So I need this calculation to be in a measure or another workaround.
My possible solution: by pure DAX code in the measure field, to produce Tables 2 and 3 so I'll divide the month sum values by their month max value (which will both be produced according to PowerBi dashboard slicers) and get the indicator dinamically produced.
I'm stuck at: I don't understand how can I reference a sum/max aggregate table in dax code. Something like = SUM (dataset[take-offs]) / MAX (SUM (dataset[take-offs])). Of course these functions do not work like that, but I hope I made my point clear: how can I produce this four table effect with a single measure?
Other solutions are welcome.
Link to the original dataset: https://www.anac.gov.br/assuntos/dados-e-estatisticas/dados-estatisticos/arquivos/DadosEstatsticos.csv
It's an open dataset, so I guess there's no problem sharing it. Please help! :)
EDIT: please download the dataset and try to solve this. Personally I think it's a quality statistics doubt that will eventually help others. The calculation works, it only needs a Power Bi Measure port.
Add the ALL formula:
Measure = SUMX(ALL('Table'),[Valor])/SUM('Table'[Max])
Example
Overall goal for my report:
I am creating a pivot table in excel right now (eventually in Power Bi) that will update daily through data imports to reflect weekly changes in sales. I am then trying to perform a Z score analysis on each week to see if there are any outliers within the data.
What I will need to do is be able to subtract a mean of all of the data from each weekly set of sales, then divide it by the standard deviation.
Current thought process for data:
If I can get the grand total at the bottom, could I get that as a value entered for each row in another column? Can I do it as a total average and a total standard deviation? I can do it outside of a pivot table, but I want something in a pivot table so it auto-populates.
Current Data
Desired Data
You can tackle this in at least two approaches:
Dynamic calculation using measures
Back-end calculation
The first approach consists of defining measures in the following context:
CALCULATE([MEASURES], All('Calendar'), VALUES('Calendar'[Year]), VALUES('Calendar'[Month]))
This allows you to calculate a measure in a context that consider the entire month. Therefore, for each day you would have a measure that gives you the stdev of the entire month.
Pro: dynamic; fast to implement; can be based on measures already defined
Cons: more calculation in front-end slows down your report
The second approach consists of pre-calculating this values in the back-end. Here you have two possible approaches:
Data source: add these new columns in the data source (e.g. Database)
Pro: best-practices and clean approach
Cons: static; cannot use measures already defined
Calculated Column in DAX: define the value as a Calculated Column in the back-end of Power BI using the same structured defined for the Measure:
CALCULATE([MEASURES], All('Calendar'), VALUES('Calendar'[Year]), VALUES('Calendar'[Month]))
Pro: fast to implement
Cons: static; really against best-practices
In Power BI I used following measures (replace 'stack' with 'your table name')
Total StdDev = CALCULATE(STDEV.P(stack[sum of sales]), ALL(stack))
TotalMean = CALCULATE(AVERAGE(stack[sum of sales]),ALL(stack))
Z score = (SUM(stack[sum of sales]) - [TotalMean])/[Total StdDev]
I used average to calculate MEAN and I get different result to yours (please see below).
If you can share formula that you used to calculate 'TotalMean' maybe I can update it.
Is there a way to let a pivot table calculate the difference between 2 columns automatically when the values are shown as a % of the parent column total?
Now I need to to manually but the table is dynamic and number of competitors may vary. Function seems so easy but can't find it after googling etc...
See example picture below of what I want to achieve.
(Column F automated by the pivot table is the goal)
If trying to solve this with PivotTables, you've got a couple of options:
Use a 'Traditional' PivotTable that's based on a range. This will give you percentage differences, but you can't get percentage point differences like you're asking for without using external formulas.
Use a 'OLAP' PivotTable that's based on data you've added to the Excel Data Model. This will give you both percentage differences and percentage point differences, without having to resort to using external formulas.
In both cases, I recommend that you unpivot your data first, so that it is in what's known as a Flat File. Currently you're using a cross-tabulated data source (i.e. your source has columns called Year 1, Year 2), and the type of percentage comparisons across years you want to do doesn't work if your data is a crosstab. Basically, PivotTables aren't meant to consume cross-tabulated data.
Instead, you really want your data laid out so that you have a column called Amount and a column called Year, and then you can use the Show Values As options available from the right-click menu to show as percentage differences across years. To transform your data into a flat file, see my answers at convert cross table to list to make pivot table
That said, you can still use the GETPIVOTDATA function on your existing (unpivoted) data layout in a way that is somewhat more robust to changes in your PivotTable structure than just subtracting one reference from the other:
But again, I recommend transforming your data into a Flat File. Then you can additionally do the following:
Using a 'Traditional' PivotTable:
You can kinda solve your problem entirely within a self contained 'Traditional' PivotTable if you drag the Amount column to the Values area, put the Year column in the Columns area, put your Competitors in the Rows area, and choose one of the percentage Show Values As options you'll see when you right-click a cell in the Values area.
I say kinda, because without using external formulas (or without calculating the percentages back in your source data), you can only get it to show percent increases (see far right column), not percentage point increase like you want (see far left column). That said, I think percent increase is less confusing. But I guess it depends on what you want to show. If you want to show say change in market share from one year to the next, then percentage points make sense.
Of course, you could always use the GETPIVOTDATA function to do the additional math for you like we did earlier, like I've done on that left hand side.
Using an OLAP PivotTable based on the DataModel
Calculating percentage point increases likely requires using PivotTables built using the Data Model. See my answer at https://stackoverflow.com/a/49973465/2507160 that explains a little about the Data Model (although it doesn't answer this specific question).
Here's the result:
Here's the measures I used to do this:
Total Year 1:
=CALCULATE(SUM(Table2[Value]),ALLSELECTED(Table2[Competitor]),Table2[Year] = "Year 1")
Total Year 2:
=CALCULATE(SUM(Table2[Value]),ALLSELECTED(Table2[Competitor]),Table2[Year] = "Year 2")
% Year 1:
=CALCULATE(SUM(Table2[Value]),Table2[Year] = "Year 1")/[Total Year 1]
% Year 2:
=CALCULATE(SUM(Table2[Value]),Table2[Year] = "Year 2")/[Total Year 2]
p.p. Diff:
= [% Year 2] -[% Year 1]
You can add Calculated Fields to Pivot Tables, of varying levels of complexity. Finding the difference between two fields is about as simple as it gets.
The example below is borrowed from contextures.com, where there are many more examples more further information.
To add a calculated field:
Select a cell in the pivot table, and on the Excel Ribbon, under the PivotTable Tools tab, click the Options tab (Analyze tab in
Excel 2013).
In the Calculations group, click Fields, Items, & Sets, and then click Calculated Field.
Type a name for the calculated field, for example, RepBonus.
In the Formula box, type =Total * 3%
Click Add to save the calculated field, and click Close.
The RepBonus field appears in the Values area of the pivot table, and in the field list in the PivotTable Field List.
(Source)
EDIT:
#jeffreyweir - I'm not gonna lie, I don't know off the top of my head how to make this work (and don't have time to experiment) but by the looks of these options, isn't a calculated field with a "straight subtraction" of existing fields (ie., 3$-2%=1%) very possible with Difference from? (as opposed to % Difference from which is also an option but for a different result).
In fact, automatic year-over-year difference reporting should be readily possible with the <previous> and <next> comparison operators...?
(Click to Embiggen)
Also, did you see the link where I got the example? Kind of a hoakey site but it has some more complex pivot table instructions.
I'm trying to get the average number of "on time shipment" based on items rolled up to "ship numbers" and then by "order number". I have one order number in this scenario that is shipped via multiple shipments. It seems to me that after rolling it up via PowerPivot and then creating a pivot table, it's calculating the average based on the total lines of the "order number" instead the pivot.
PowerPivot Data:
Pivot based on data above:
How can I get the average number based on the pivot table rather than the PowerPivot total data of the order number? I'm probably not making any sense, but hopefully the images below explain it better. As you can see, when you roll up the items by ship number then by order number, you'll see that the actual average is 0.6 but the pivot is showing 0.5.
Help!
Technically speaking, the average is correct - if you look at the source data, for some reason all rows are duplicated and if you do regular average calculation, it's actually 0.5.
What you are looking for is calculating average for distinct values, which can be done easily with AVERAGEX function.
I have copied your table and created those 2 Calculated Fields (in Excel 2010, it's Measures):
Average on Time:
=AVERAGE(Table1[On Time])
Average on Time (UNIQUE)
=AVERAGEX(VALUES(Table1[Ship Number]), [Average on Time])
Using AverageX with VALUES() function makes it easier to calculate any expression ONLY for unique values.
If you then put both measures on PivotTable, you should get this:
First column is same as yours (using "regular" AVERAGE function). The second one shows the average calculated over distinct (unique) values of Ship Numbers.
Hope this helps.
PS: This great article by Kasper de Jonge helped me quite a bit with similar scenarios.
I have created a power pivot table as shown in the picture. I want to calculate quarter over quarter sales change. For which I have to divide for example corporate family "Acer" 's sales in 2012Q4 by sum of all the corporate family. I am using calculated measure to do this, but I am not sure what formula I can use.
My need is to create two columns, one for 2012Q4 percent of total and one for 2013Q1 percent of total. Then I will create another measure to find the difference. So the formula for 2012Q4 should be like this 1624442 / (1624442+22449+1200+16123) . Any idea which function can help me do it?
It sounds like you are measuring the change in the percent of total for each corporate family from quarter to quarter. You will need to create 3 calculated measures. I'm not sure what your model looks like so I can't give you the exact formula, but here is the idea.
CurrentQtr%ofTotal:= Divide(Sum('Sales'[Units]),Calculate(Sum('Sales'[Units]), All['Product'[Corporate Family])))
PrevQtr%ofTotal:= DIVIDE(CALCULATE(Sum('Sales'[Units]), DATEADD(DimDate[DateKey], -1, QUARTER)),
CALCULATE(Sum('Sales'[Units]), DATEADD(DimDate[DateKey], -1, QUARTER), All('Product'[Corporate Family]))))
Change%ofTotal:= DIVIDE(([CurrentQtr%ofTotal]-[PrevQtr%ofTotal]),[PrevQtr%ofTotal])
I used the divide function because it handles divide by zero errors. You use the ALL function to remove the filter on the Corporate Family column from the filter context. The Change%ofTotal is just to find the differenc. I'm calculating % change but you may just want to subtract.
Here's the link to a good blog post on time intelligence. And here's one on calculating percent of total.
For percentages please follow the tutorial on the Tech on the Net.
Adding another column where you calculate a difference between two pivot columns will not work - this column is "unpivotable", as it relies on a column defintion. You would need to copy and paste pivot as values to another worksheet and do the extra calculation there.