I have this table which has foreign keys from several other keys:
Basically, this table shows which students registered in which module run by which teacher in what term.
I want to query the following:
How many students have registered for more than one module run by a given tutor?
It will look something like this:
For example, Vasiliy Kuznetsov runs two modules: FunPro and NO. If one student registers for both of them, he is counted as one.
My sql oriented mind is telling me this: Count all the rows in which student_id and tutor_id are the same. For example, in one row student_id is 5 and tutor_id is 10, and the same is true for the third row. Then, I count it as one.
How can I do that with DAX formulas?
RowCount:=
COUNTROWS( ModuleRegistration )
StudentsWithTwoOrMoreRegistrations:=
COUNTROWS(
FILTER(
VALUES( ModuleRegistration[Student_ID] )
,[RowCount] >= 2
)
)
I refer to arguments positionally, thus the first argument to a function is (1), the second (2), and so on.
So, [RowCount] is trivial.
[StudentsWithTwoOrMoreRegistrations] is a bit more involved. DAX, being a functional language, is best understood inside-out.
FILTER() takes a table expression in (1) and evaluates a boolean predicate, (2), for each row in (1). It returns all rows from (1) for which (2) evaluates to true.
Our FILTER()'s (1) is VALUES( ModuleRegistration[Student_ID] ). VALUES() returns the unique rows from a field based on current filter context (it respects slicers and filters in the pivot table). Thus, we will return some subset of the unique list of [Student_ID]s.
Our FILTER()'s (2) is [RowCount] >= 2. For each [Student_ID] in (1), we'll evaluate [RowCount], checking how many times that student appears in ModuleRegistration. [RowCount] is evaluated in the combination of filter context from the pivot table (the [Faculty Name] field in your sample pivot provides filter context) and row context from FILTER()'s (1). Thus it counts how many times the student appears in ModuleRegistration for the [Faculty Name] on the pivot table row.
We check that [RowCount] is >= 2.
You've not indicated if your measure needs to handle grand totals, or how you might want to see that. If you need more help for the grand total to get it to behave the way you like, let me know.
Edit for grand total
There are a few ways you might want to handle grand totals. I'm gong to assume that you want a unique count of students.
StudentsWithTwoOrMoreRegistrations:=
COUNTROWS(
SUMMARIZE(
FILTER(
SUMMARIZE(
ModuleRegistration
,ModuleRegistration[Tutor_ID]
,ModuleRegistration[Student_ID]
)
,[RowCount] >= 2
)
,ModuleRegistration[Student_ID]
)
)
WTF happened to our measure?
Let's examine:
Starting with the innermost SUMMARIZE(). SUMMARIZE() navigates relationships outward from the table in (1) and groups by the columns listed in (2)-(N) (these don't have to be from the table in (1), but must be reachable by navigating relationships).
This is equivalent to the following in SQL:
SELECT
mr.Tutor_ID
,mr.Student_ID
FROM ModuleRegistration mr
We use FILTER() on this table like earlier. [RowCount] is evaluated in the combination of filter context from the pivot table and the row in the table, defined by our SUMMARIZE() above.
Now our row context is instead of just a student, a student-tutor pair. This pair will have a [RowCount] >= 2 when the student has taken more than one module from a tutor.
Our FILTER() returns the pairs which have a [RowCount] >= 2. This output table has two fields, [Tutor_ID] and [Student_ID], but we want to count distinct [Student_ID]s out of this.
Thus, we use the table from FILTER() as our (1) in the outer SUMMARIZE(). We group only by the values of [Student_ID]. We then count the rows of this table.
When only one [Faculty_Name] is in context, e.g. on a pivot table row, then our inner SUMMARIZE() is grouping by a single value of [Tutor_ID] and whatever [Student_ID]s are associated with it. This is identical to our earlier measure.
When we have many [Tutor_ID]s in context, like in the grand total, then we'll see the appropriate behavior of only counting each [Student_ID] once.
Related
In Excel connected to SSAS, I am trying to build a pivot table and add a custom Measure Calculation using "OLAP Tools" and/or "OLAP Pivot Table Exensions". I am trying to add a calculation that is really simple in my mind, but I cannot get it to work. The calc I need is:
GOAL: A record count of the [Items] dimension records grouped by any of the
[Items] dimension fields.
In particular I am trying to group by [Items].[Items Groups] and [Items].[Item]. Item is the lowest grain, so the count should return value "1". I have created a couple calculations that are kind of in the ballpark (see below). But the calcs don't appears to be working as desired.
What I have tried:
Attempt #1 -- [Measures].[Items Count (With net amount values)]
DISTINCTCOUNT( {[Items].[Item].MEMBERS} )
The calc 'Items Count (With net amount values)' appears to be
returning a decent count value, but it appears it only counts the Item
if there are transnational records found (not sure why). Also, when
at the lowest grain level the calc returns that value for the parent
group, not the dimension level selected on the rows.
Attempt #2 -- [Measures].[Items Count (All)]
[Items].[Item].[Item].Count
This calc returns the TOTAL item count for the entire dimension
regardless of the dimension level placed on the rows.
Attempt #3 -- [Measures].[Items Count]
COUNT ( { [Items].[Item].MEMBERS}, EXCLUDEEMPTY)
This calc freezes up Excel and I have to quit Excel. No idea why. I have seen this sytnax recommended on a few different sites.
Screenshot:
Help please? This seems really simple, but I am not very skilled with MDX. In DAX and SSAS TABULAR this would be very simple expression. But I'm struggling to count the rows with MDX in SSAS MD.
The "Outside Purchased Beef" group has 18 items with transactions, but 41 items in total. I do not know how to calculate the "41" value.
SSAS Excel-CalcMeasure-CountRows.png
Take a look at the following samples on AdventureWorks.
with member [Measures].[CountTest]
as
count(existing [Product].[Subcategory].members - [Product].[Subcategory].[All])
select
{
[Measures].[Internet Sales Amount],[Measures].[CountTest]
}
on columns,
{
([Product].[Category].[Category]
,[Product].[Subcategory].[Subcategory] -- comment this line for the second result
)
}
on rows
from [Adventure Works]
Now comment the indicated line for the parent view.
I can rank my data with this formula, which groups by Year, Trust and ID, and ranks the Areas.
rankx(
filter(Table,
[Year]=earlier([Year])&&[Trust]=earlier([Trust])&&[ID]=earlier([ID])),
[Area], ,1,Dense)
This works fine - unless you have data where the same Area appears more than once in the same group, whereupon it gives all rows the rank of 1. Is there any way to force unique rank values? So two rows that have the same Area would be given the rank of 1 and 2 (in an arbitrary order)? Thank you for your time.
Assuming you don't have duplicate rows in your table, you can add another column as a tie-breaker in your expression.
Suppose your table has an additional column, [Name], that is distinct between your multiple [Area] rows. Then you could write your formula like this:
= RANKX(
FILTER(Table,
[Year] = EARLIER([Year]) &&
[Trust] = EARLIER([Trust]) &&
[ID] = EARLIER([ID])),
[Area] & [Name], , 1, Dense)
You can append as many columns as you need to get the tie-breaking done.
I have a situation where I kind of need a many-to-many join - which I know isn't possible.
I have one fact table and two dimension tables.
The fact table contains account numbers (as in GL accounts) and amounts. Plus a date field, so the account numbers are not unique.
The first dimension table has just one column listing the reports that can be created by combining the accounts in different ways.
The second dimension table could be called a "roll-up" table. It has 3 columns: report, account, and a line item description field. The latter defines which line on the respective report that the account should be mapped to.
So I want to have a pivot table that has the line item description in the row area and the amount in the values area. With a mechanism for the user to specify which report they want to view. But the join on the account field between the roll-up table and the fact table is many-to-many. If the roll-up table were somehow filtered based on the specific report that the user has selected, THEN it would become one-to-many. Hence the "dynamic" joins in my title.
I've been trying to come up with a connecting table of some kind, but without any luck so far. If anybody has any suggestions/pointers, that would be much appreciated.
I figured out a way to do it using a DAX formula that calculates the field to be placed in the Values area. It uses FILTER and CROSSJOIN combinations to effect the dynamic joins. Note that in order to use a CROSSJOIN I added prefix letters to a couple of the field names (to make them unique). Also, I made it that the Report table (the first dimension table I described) has only one row - containing the report that the user wishes to view.
The DAX formula is as follows:
SUMX (
FILTER (
CROSSJOIN (
fBalances,
FILTER (
CROSSJOIN (
dRollUp,
dReport
), dRollup[Report] = dReport[uReport]
)
), fBalances[fAccount] = dRollUp[Account]
), fBalances[Amount]
)
Subsequent update: I moved it into Power BI where I added a parameter (called myReport) for the user to specify the report. Consequently I deleted the dReport table.
So the Power BI DAX formula becomes:
SUMX (
FILTER (
CROSSJOIN (
fBalances,
FILTER (
CROSSJOIN (
dRollUp,
myReport
), dRollup[Report] = FIRSTNONBLANK ( myReport[myReport], TRUE() )
)
), fBalances[fAccount] = dRollUp[Account]
), fBalances[Amount]
)
I would like to calculate the sum of open positions in a receivables account. The entries in the accounting system provide three relevant columns in the source table to that end:
booking date
due (=pay) date
amount due
I would like to have a measure that I can use for a graph, showing the total of all open positions on each day.
An open position is an amount booked with a booking date before "today" and with a due date after "today".
I tried the following approach in my Power Pivot model (with three calendar tables):
booking date related to "calendar table 1"
due date related to "calendar table 2"
Date columns of "calendar table 1" and "calendar table 2" related to a third "calendar table main"
For that formula I am getting an error message:
Hm, not sufficiently proficient in PowerPivot to solve this problem.
SumAmt:=
SUM( Source_Table[Amount] )
OpenPositions:=
CALCULATE(
[SumAmt]
;FILTER(
VALUES( Source_Table[Booking_Date] )
;Source_Table[Booking_Date] < MAX( Calendar_Main[Calendar_Date] )
)
;FILTER(
VALUES( Source_Table[Due_Date] )
;Source_Table[Due_Date] > MAX( Calendar_Main[Calendar_Date] )
)
)
Your error is pretty self-explanatory. If you use a direct column reference in CALCULATE() you can only reference a single column. You are referencing two, Calendar_Main[Calendar_Date] and either Source_Data[Booking_Date] or Source_Data[Due_Date]. This is simply not allowed, so it throws the error.
The workaround is simply to wrap complex filtering logic in table expressions and use those as arguments to CALCULATE(). Pretty much, unless you are hard-coding a literal predicate for a single column, you should be using some sort of table expression, like FILTER(), as your arguments to CALCULATE().
What we do is call FILTER() twice to check the dates. We use MAX()s because we cannot perform comparisons between column references, we need to perform inequality comparisons between scalars.
Since we're FILTER()ing over Source_Data[Booking_Date] and Source_Data[Due_Date], the references to these are evaluated in row context and refer to the value of the current row in FILTER()'s iteration. The reference to Calendar_Main[Calendar_Date] is just a column reference, so we wrap it in MAX() to get a scalar value for our inequality. The MAX() refers to the current filter context coming in from the pivot table, which would be the current row label or column label.
If you aggregate to the month level, this will give you essentially the closing balance, since we're using MAX()s. At the month level the value will be identical to that on the last date of the month.
Finally, with the inequalities you've set up, you're ignoring anything opened on the current day or due on the current day. I'd expect you want [Booking_Date] <= [Calendar_Date] and [Due_Date] > [Calendar_Date].
I have a requirement in Power Pivot where I need to show value based on the Dimension Column value.
If value is Selling Price then Amount Value of Selling Price from Table1 should display, if Cost Price then Cost Price Amount Should display, if it is Profit the ((SellingPrice-CostPrice)/SellingPrice) should display
My Table Structure is
Table1:-
Table2:-
Required Output:-
If tried the below option:-
1. Calculated Measure:=If(Table[Category]="CostPrice",[CostValue],If(Table1[category]="SellingPrice",[SalesValue],([SalesValue]-[CostValue]/[SalesValue])))
*[CostValue]:=Calculate(Sum(Table1[Amount]),Table1[Category]="CostPrice")
*[Sales Value]:=Calculate(Sum(Table1[Amount]),Table1[Category]="SellingPrice")
Tried this in both Calculated Column and Measure but not giving me required output.
Cost:=
CALCULATE(
SUM( Table1[Amount] )
,Table1[Category] = "CostPrice"
)
Selling:=
CALCULATE(
SUM( Table1[Amount] )
,Table1[Category] = "SellingPrice"
)
Profit:=
DIVIDE(
[Selling] - [Cost]
,[Selling]
)
ConditionalMeasure:=
IF(
HASONEFILTER( Table2[Category] )
,SWITCH(
VALUES( Table2[Category] )
,"CostPrice"
,[Cost]
,"SellingPrice"
,[Selling]
,"Profit"
,[Profit]
)
,[Profit]
)
HASONEFILTER() checks that there is filter context on the named field and that the filter context includes only a single distinct value.
This is just a guard to allow our SWITCH() to refer to VALUES( Table2[Category] ). VALUES() returns a table of all distinct values in the named column or table. So, a 1x1 table can be implicitly converted to a scalar, which we need in SWITCH().
SWITCH() is a case statement.
Our else condition in the IF() is just returning [Profit]. You might want something else, but it's unclear what should happen at the grand total level. You can leave this off, and the measure will be blank in IF()'s else condition.
I was thinking about this a little. I'm not sure why you have your categories on rows. Usually the data set would have columns like: item | CostPrice | SellingPrice | Profit. Then you can just use the columns to define your fields. The model becomes easier and more maintainable.