Why is my SUMX DAX function returning this result? - excel

Suppose I have 2 tables:
fTransactions
ProdID RepID Revenue
1 1 10
1 1 10
1 2 10
dSalesReps
RepID RepName
1 joe
2 sue
With dSalesReps having the following measures with no filters applied yet:
RepSales:=CALCULATE(SUM(fTransactions[Revenue]))
RepSales2:=SUMX(fTransactions, CALCULATE(SUM(fTransactions[Revenue]))
The first measure performs how I think it would. It goes to the fTransactions table and sums up the Revenue column.
The second measure, after a lot of trial and error to figure it out, seems to sort of group itself on unique rows in fTransactions. In the above example, fTransactions has 2 rows where everything is identical, then a last row where something is different. This seems to result in the following:
(10 + 10) first iteration that sums the first "grouping"
+
(10 + 10) second iteration that sums the first "grouping" again
+
(10) last iteration that sums the second "grouping"
= 20 + 20 + 10 = 50
At least that's how it looks to be operating. I just don't understand why. I thought it would go to the fTransactions table, sum all of Revenue for each iteration, then sum those sums as a final step.

This is caused by something called "context-transition" (see sqlbi more detailed explanation).
In practice, your formula "RepSales" uses a "Row Context" (created by SUMX) which is turned in an equivalent "Filter Context" (by CALCULATE), but since you don't have an unique key in the table, it gets and uses multiple rows in each iteration, below the explanation.
For the first row, the row context is ProdID=1 AND RepID=1, which turned in an equivalent filter context (stays the same, in this case) is ProdID=1 AND RepID=1 but the filter context is global, and two rows (the first 2) match this filter.
This is repeated for each row.
it does not happen with the formula "RepSales" because it does not iterate multiple times (as you already noticed)
This is your current situation:
To prove that, just add a rowID to the transaction table:
It does not happen because the equivalent filter context also include the RowID column, which matches only one row
Hope this helps, use the sqlbi article as a reference, it will be an exhaustive guide to understand this

Related

How to return count of number in sequence after validating three values from 3 different columns even though there will be matched data found?

I have a range of cells in excel. How to increment numbers when meeting data validation from three different columns?
I tried using formula COUNTIF($A$2:A2,A2) which creates a number sequence. But I have other data to validate from another column for it to return the correct number sequence.
First validation: count the emp no in column range A1:A5 which return a result under Hierarchy column.
Second validation: check the % value under column L as per below level of hierarchy in which the problem comes from.
1 - 0.25
2 - 0.25
3 - 0.5
4 - 0.5
5 - 1
Third validation: check the type of Relation (see Relation column) that needs to check when returning number of sequence too. Below is the Relation Level table.
I don't know on how to join these three conditions for the result to be as below.
My really problem here is on how will i get a sequence number if a person does have 3 children and should be tagged as 2,3,4 (next to spouse which is 1) then the next relation which is parent will be tagged then as next number sequence from the last count of child wherein will be 5 given that as per relation table, Parent level is 3 but it will be adjusted as per count of relations a person has. And for this specific instance, if Parent count will be 5, it still should have 0.5 EE % (see relation table level vs % hierarchy level) even though the count of number is 5. I hope this will make sense. But let me know if you have any questions.
Hope someone could help me on this coz I am not that expert when it comes to excel formula. Thank you!

Excel Array formula to count moving average outliers

I've tried a few things on this and settled on a 'cheap' solution. Wanted to know if this can be done directly and more elegantly.
Problem Statement and Sample Data
Assume we have a table in excel with ~200 columns and a large number of rows (~10k).
Sample Data:
identifier
val1
val2
val3
...
val200
ID_1
100
102
34
...
89
We want to add a column at the end that shows us how many "moving average" outliers exist. A moving average outlier is defined as a point that is outside the range (mean - 2 * std deviations, mean + 2 * std deviations), where the mean and std dev is calculated using the previous 10 values (therefore its a moving average outlier).
We will not test the first 10 values. But from val11, the previous 10 values will be used to form the window and we want to test if the value is an outlier.
My Solution so far
I created another table of same dimensions as the original. In cells from val11 (to val200, for all columns), I put in the formula below in the new table. And then, I can simply sum the columns in each row in the new table.
Assume val11 is on X2 in the "shocks" worksheet (for first row):
=IF(OR(shocks!X2<AVERAGEA(shocks!D2:W2)-2STDEVA(shocks!D2:W2),shocks!X2>AVERAGEA(shocks!D2:W2)+2STDEVA(shocks!D2:W2)),1,"")
But if possible, I want to avoid having a second table since it bloats and slows down the file. Any help would be greaty appreciated

Why does the DAX formula in my calculated column use propagation to filter in one instance and not in another?

Suppose I have a couple of tables:
fTransactions
Index ProdID RepID Revenue
1 1 1 10
2 1 1 20
3 2 2 30
4 2 2 10
dSalesReps
RepID RepName CC1 CCC2
1 joe 40 70
2 sue 30 70
3 bob 70
CC1 contains a calculated column with:
CALCULATE(SUM(fTransactions[Revenue]))
It's my understanding that it's taking the row context and changing to filter context to filter the fTransaction table down to the RepID and summing. Makes sense per an sqlbi article on the subject:
"because the filter context containing the current product is automatically propagated to sales due to the relationship between the two tables"
CC2 contains a calculated column with:
SUMX(fTransactions, CALCULATE(SUM(fTransactions[Revenue]))
However, this one puts the same value in all the columns and doesn't seem to propagate the RepID like the other example. The same sqlbi article mentions that a filter is made on the entire fTransactions row. My question is why does it do that here and not the other example, and what happened to the propagation of RepID?
"CALCULATE places a filter on all the columns of the table to identify a single row, not on its row number"
A calculated column is created in a loop: power pivot goes row by row and calculates the results. CALCULATE converts each row into a filter context (context transition).
In the second formula, however, you have 2 loops, not one:
First, it loops dSalesReps table (because that's where you are creating the column);
Second, it loops fTransactions table, because you are using SUMX function, which is an iterator.
CALCULATE function is used only in the second loop, forcing context transition for each row in fTransactions table. But there is no CALCULATE that can force context transition for the rows in the dSalesReps. Hence, there is no filtering by Sale Reps.
Fixing the problem is easy: just wrap the second formula in CALCULATE. Better yet, drop the second CALCULATE - it's not necessary and makes the formula slow:
CCC2 =
CALCULATE(
SUMX(fTransactions, SUM(fTransactions[Revenue]))
)
This formula is essentially identical to the first one (the first formula in the background translates to the second one, SUM function is just a syntax sugar for SUMX).
You could also write the formula as:
CC2 = SUMX( RELATEDTABLE( fTransactions ), fTransactions[Revenue] )
or
CC2 = SUMX( CALCULATETABLE( fTransactions ), fTransactions[Revenue] )
The key is that fTransactions as the first argument of SUMX needs to be filtered for each SalesRep (i.e. on the current row). Without the filter then you are just iterating the entire fTransactions table for each SalesRep. Somehow SUMX needs to know you just want the fTransactions for the SalesRep whose revenue you are trying to compute.

Is there a non-VBA way to calculate the average of the sum of two sets of columns?

I'm creating an excel spreadsheet to track when an item is received as well as when a response to the item having been received has been made (ie: my mail was delivered at 1:00pm (item received) but I didn't check the mail until 5:00pm (response to item having been received)).
I need to track both the date and time of the item being received and want to separate these in two separate columns. At the moment this translates to:
Column A: Date item received
Column B: Time item received
Column L: Date item was responded to having been received
Column M: Time item was responded to having been received
In essence I'm looking to run calculations on the response time between when the item is received and when it has been responded to (ie: average response time, number of responses in less than an hour, and even things like the number of responses that took between 2 and 3 hours where Bob was the person who responded).
The per-line pseudo code would look something like:
(Lr + Mr) - (Ar + Br) ' where L,M,A,B are the columns and 'r' is the row number.
An example, with the following data:
1. A B L M
2. 1/5/19 10:00 1/5/19 12:00
3. 1/5/19 21:00 1/6/19 1:00
4. 1/5/19 22:00 1/5/19 23:00
5. 1/6/19 3:00 1/6/19 4:00
The outcome for the average response time would be 2 hours (average(rows 2-5) = average(2, 4, 1, 1) = 2)
The number of items with an average response times would be as follows:
(<=1 hour) = 2
(>1 & <=2) = 2
(>2 & <=3) = 0
(>3) = 1
I don't know (or can find) a function that will perform this and then let me use it within something like a countifs() or averageifs() function.
While I could do this (fairly easily) in VBA, the practical implementation of this spreadsheet limits me to standard Excel. I suspect that sumproduct() will be fundamental to make this work, but I feel that I need something like a sumsum() function (which doesn't exist) and I'm not familiar with sumproduct() to better understand what to even look for to set something like this up.
If you are not so familiar with SUMPRODUCT() or the likes I would suggest one helper column. Like so:
You can see the formula used is:
=((C2+D2)-(A2+B2))
You can probably do all type of calculations on this helper column. Note, column is formatted hh:mm. However, if you want to look into SUMPRODUCT() you could think about these:
Formula in H2:
=SUMPRODUCT(--(ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)<=1))
Formula in H3:
=SUMPRODUCT((ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)>1)*(ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)<=2))
Formula in H4:
=SUMPRODUCT((ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)>2)*(ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)<3))
Formula in H5:
=SUMPRODUCT(--(ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)>3))
The helper column is the easiest approach. It gives you the time differences that you can then easily analyse however you want. Analysis without the helper column is possible, but the approach differs depending on what type of analysis you want to do.
For the example you provided, which is counting the number of time differences grouped into ranges, you would use the FREQUENCY function:
=FREQUENCY(C2:C5+D2:D5-A2:A5-B2:B5,F2:F4)
In F2:F4 (called the "bins"), enter the upper limit of each range you want to count. The Frequency function counts up to and including the first value, then counts from there up to and including the second value, and so on. Enter the bins as times, e.g. 1:00 for 1 hour.
Note that Frequency is an array-entered and an array-returning function. This you means you need to first select the range that will contain all output values, G2:G5 in this example, then enter the function, then press CTRL+SHIFT+ENTER
Also note that Frequency returns an array that is one element larger than the number of bins specified. The extra element is the count of all values greater than the largest bin specified.

Sum of Averages in Excel Pivot Table

I am measuring room utilization (time used/time available) from a data dump. Each row contains the available time for the day and the time used for a particular case.
The image is a simplified version of the data.
If you read the yellow and green highlights (Room 1):
In room 1, there are 200 available minutes on 1/1/2016.
Case 1 took 60 minutes, case 2 took 50 minutes.
There are 500 available minutes on 1/2/2016, and only one case occurred that day, using 350 minutes.
Room 1 utilization = (60 + 50 + 350)/(200 + 500)
The problem with summing the available time is that it double counts the 200 minutes for 1/1/2016, giving: Utilization = (60+50+350)/(200+200+500)
There are hundreds of rows in this data (and there will be multiple data dumps of differing #'s of rows) with multiple cases occurring each day.
I am trying to use a pivot table, but I cannot obtain the 'sum of averages' for a particular room (see image). I am using a macro to pull the numbers out of the grand total column.
Is this possible? Do you see another way to obtain utilization?
(note: there are lots of other columns in the data, like case start, case end, day of week, etc, that are not used in this calculation but are available)
The reason that you're getting 300 for both Average of Available Time columns is because the grand total is a grand total based on the overall average and not a sum of the averages.
Room 1: 200 + 200 + 500 / 3 = 300
Room 2: 300 + 300 + 300 / 3 = 300
I could not comment on the original question, so my solution is based on a few assumptions.
Assumption #1: The data will always be grouped. E.G. All cases in room 1 on a given day will grouped in sequential rows.
Assumption #2: The available time column is a single value for the whole day, there will never be differing available times on the same day.
Solution: Use column E as the Actual Available Time. This column will use a formula to determine if the current row has a unique combination (Date + Room + Available Time) to the previous and if so, the cell will contain that row's available time.
Formula to use in E2:
=IF(AND($A1 = $A2, $B1 = $B2, $C1 = $C2), 0, $C2)
Extend the formula as far down as necessary and then include the new column in your PivotTable data range.
End Result
I created a unique reference by combining columns and then used sumif/countif/countif.
So the formula in column E would be:
=sumif(colB,cellB,ColC)/Countif(colB,cellE)/Countif(colB,cellE)
Doesn't matter if the data is in order or not then.
Extend the formula as far down as necessary and then include the new column in your PivotTable data range.
The easiest method I would recommend is this.
=SUM(H:H)-GETPIVOTDATA("Average of Available Time",$G$3)
The first term sums the H column, and the second term subtracts the grand total value. It is a dynamic solution, and will change to fit the size of the pivot table.
My assumptions are that the Pivot Table was originally placed in cell G3.

Resources