I have run into performance problems with MDX measure calculations for a summary report, using SQL Server 2008 R2.
I have a Person dimension, and a related fact table containing multiple records per person. (Qualifications)
Eg [Measures].[Other Qual Count] would give me the number of qualifications of a certain type.
Each person could have multiple, so [Measures].[Other Qual Count] > 1 for one person.
However on my summary report I would like to indicate this as 1 per person only. (To indicate the number of persons with Other qualifications.)
The summary report rolls up the values against some other dimensions including an unknown Region hierarchy (it can be one of 3 hierarchies).
I have done this as follows:
MEMBER [Measures].[Other Count2]
AS
SUM(
EXISTING [Person].[Staff Code].[Staff Code].Members,
IIF([Measures].[Other Count] > 0, 1, NULL)
)
However, I have to create several more derived measures - deriving from each other, and all at Person level to avoid unwanted multiple counts. The query slows down from <1 second to 1min+ (my goal is <3s).
The reason for all the derivations is a lot of logic to determine within which one of 6 mutually exclusive column a person will be reported in.
I have also tried to create a Cube Calculation, but this gives me the same value as [Other Count].
SCOPE (({[Person].[Staff Code].[Staff Code].MEMBERS}, [Measures].[Has Other Qual]));
THIS = ([Person].[Staff Code].[Staff Code], [Measures].[Has Other Qual]).Count;
END SCOPE;
Is there a better MDX/Cube calculation that can be used, or any suggestions on improving performance?
This is unfortunately my first time working with MDX and ran into this problem close to a deadline, so I am trying to make this work if possible without changes to the cube.
I have resolved the issue by changing the cube, which was simpler than expected.
On the Data Source View, I created a named query which summarizes the existing fact table at Person level. I also derive all the columns which I will need on my reports.
Treating this named query as a separate fact table, I added a measure group for it and that resolved all my problems.
Related
Goal
Create a working relationship between my Category Sales and Voids PivotTables so I can leverage one slicer for all data.
Background
Using two PowerQueries, I pull in data from SQL to Excel. Because Sales and Voids have DateStamp and StoreID columns in common, I essentially concatenate these in the SQL query to create an ID. For example:
select concat(StoreID,convert(int,DateStamp)) as ID, DateStamp, StoreID, Category, Sales from...
select concat(StoreID,convert(int,DateStamp)) as ID, DateStamp, StoreID, Voids from...
This is a one-to-many relationship between the two (Sales --> Voids)
Problem
Despite creating the relationship in Excel (through Manage Relationships, as PowerPivot is not available) I can't get it to apply and Excel tells me relationships between tables may be needed. I've no idea what I'm doing wrong.
Workaround
The only workaround I can think of is to take the void value for a given day and divide by the number of categories that have sales, then just do a join to create one table that I pull into Excel. It would technically work for my application, but I'd love to know why the relationship isn't working.
Thanks.
The answer is to export your data into the data model so that you can use power pivot, PLUS a export another power query (or several) into the data model that is a deduplicated table of keys.
Then, in the data model editor, set up the data relationships so that there is a one to many relationship between your deduplicated key table(s) and the "actual data".
Then, in a power pivot, use those "key" tables as much as possible, maybe even to the ruthless ideal(1) of using ONLY key tables in your primary row and column headers, and if you have a second level of categorization then a deduplicated table of primary to secondary, and so on, then using the real data tables only in the body of your power pivot.
(1) - Keep in mind that this is just an ideal I'm just explaining to help you understand and maybe start moving towards as much as actually makes sense. As with most things, in reality, the ideal is almost never worth reaching because there are other factors (like your own patience and time).
Due to performance issues I need to remove a few distinct counts on my DAX. However, I have a particular scenario and I can't figure out how to do it.
As example, let's say one or more restaurants can be hired at one or more feasts and prepare one or more menus (see data below).
I want a PowerPivot table that shows in how many feasts each restaurant was present (see table below). I achieved this by using distinctcount.
Why not precalculating this on Power Query? The real data I have is a bit more complex (more ID columns) and in order to be able to pivot the data I would have to calculate thousands of possible combinations.
I tried adding to my model a Feast dimensional table (on the example this would only be 1 column of 2 rows). I was hoping to use that relationship to be able to make a straight count, but I haven't been able to come up with the right DAX to do so.
You could use COUNTROWS() combined with VALUES().
Specifically, COUNTROWS() will give you the count of rows in a table. That means COUNTROWS is expecting a table is input. Here's the magic part: VALUES() will return a table as results, and the table it returns are the distinct values in the table/column that you provide as the argument for VALUES().
I'm not sure if I'm explaining it well, so for the sample data you provided, the measure would look like this (assuming the table is named Table1):
Unique Feasts:=COUNTROWS(VALUES('Table1'[Feast Id]))
You can then create a pivot table from Powerpivot, and drag Restaurant Id into Rows, and drag the measure above into Values. Same result as DISTINCTCOUNT, but with less performance overhead (I think).
I have 2 tables.
First table: dimensional table to show available units of cars at start of selling cycle and for how long these units will be available.
Second table: to show how many cars were sold on a given month within their "available cycle".
I'd like to compare the "selling behaviour" within each cycle. Thus, I want to display the total initial units available next to the units sold at each stage within the cycle. The second dimension works fine, but not the first one.
This is what I get:
And this the desired output (note rows 4 and 5 for available_units)
I tried the below DAX code without success:
SumAvailableUnits:=CALCULATE(SUM([available_units]),FILTER(ALL(Table1[month_within_cycle]),[month_within_cycle]>=MAX([months_available])))
First, DAX Formatter is your friend. You may like writing unreadable single line measures, but no one else likes reading them.
I've also taken the liberty of cleaning up your table names and adding fully qualified column references. (Ignoring that your dimension isn't a pure dimension, as it holds numeric values that you aggregate in a measure)
SumAvailableUnits :=
CALCULATE (
SUM ( DimCar[available_units] ),
FILTER (
ALL ( FactSale[month_within_cycle] ),
FactSale[month_within_cycle] >= MAX ( DimCar[months_available] )
)
)
And immediately we see a problem. With the fully qualified column references, it is clear that you're trying to filter the lookup table (the one side) by the base table (the many side). In Power Pivot for Excel, we do not have bi-directional relationships (though they're available in Power BI and coming for Excel 2016). Our relationships' filter context only flows from the lookup table to the base table, typically from the dimension to the fact.
Additionally, your DimCar, by holding [available_units] and [months_available] encodes an implicit assumption that a specific [car_id] can only ever refer to a single, unchanging lot. You will never see another row with [car_id] = 1. This strikes me as highly unlikely. Even if it is the case the better solution is a model change.
In general, anything that goes onto a row or column label should come from a dimension, not a fact. Similarly, anything you're aggregating in a measure should live in a fact table. The usual exception is dimension counts - either bare, or as a denominator in a measure. Following these will get you 80% of the way in terms of dimensional modeling. You can see the tables and model diagram I've ended up with in the image below.
Here are the measure definitions
SumAvailableUnits:=SUM( FactAvailability[available_units] )
SumSold:=SUM( FactSale[cars_sold] )
Here are my source tables, my model diagram with relationships, and a pivot table built from these pieces and the measures above. Note the source of [month_within_cycle] in the pivot.
Finally, you might notice that my grand total behaves in a different way than in your original. Since the values are repeated monthly, we get a much larger grand total. If you need to instead end with the sum from the latest month (which it looks like you have in your sample), you can use an alternate measure, provided below. I don't understand why this would be your desired grand total, but you can achieve it fairly easily. Personally, I'd probably blank the measure at the grand total level.
SumAvailableUnits - GrandTotal:=
SUMX(
TOPN(
1
,FactAvailability
,FactAvailability[month_within_cycle]
,0
)
,FactAvailability[available_units]
)
This uses SUMX() to step through the table provided, defined by TOPN(). TOPN() returns the first row (in this case) in FactAvailability, including ties, after sorting by [month_within_cycle], out of all rows available in the filter context. In the context of a specific month, you get all the rows associated with that month - identical to the simple sum. In the grand total context, you get the rows associated with the last month.
SUMX() iterates over that table and accumulates the values of [available_units] in a sum.
I have a simple SSRS report which has one group and details. The grouping is by employee and the details are performance data on each. After tedious calculation, it just comes down to
select * from table
and I have SSRS do the grouping on the employee column. There are several tasks for each employee, so that is why the grouping in the first place.
My problem is, the user would like to be able to distribute these stats to the employees, and it would be easier if there were some white space between these groups (between each employee).
I've tried adding a blank row inside or outside the group, but I can't find a way to do that so it won't put a row between each task. I tried using a list, but in the end, got the same problem--the group still forced it to behave that way.
I know I can insert a page break between groups, but that would be a huge waste of paper, having each employee on a separate sheet.
Is there a way to essentially have each employee (group), be in a separate "table"--such that I would have maybe a half dozen on a sheet that could easily be guillotined?
EDIT: Here's a screenshot of it as it is now:
Table
I don't know how I could use a rectangle because the results (groups) are all in the same table. The idea would be to insert a space between each group (person).
I have several documents which contain statistical data of performance of companies. There are about 60 different excel sheets representing different months and I want to collect data into one big table. Original tables looks something like this, but are bigger:
Each company takes two rows which represent their profit from the sales of the product and cost to manufacture the product.I need both of these numbers.
As I said, there are ~60 these tables and I want to extract information about Product2. I want to put everything into one table where columns would represent months and rows - profit and costs of each company. It could be easily done (I think) with INDEX function as all sheets are named similarly. The problem I faced is that at some periods of time other companies enter the market:
Some of them stay, some of them fail. I would like to collect information on all companies that exist today or ever existed, but newly found companies distort the list (in second picture we see, that company BA is in 4th row, not BB). As row of a company changes from time to time, using INDEX becomes problematic, because in some cases results of different companies get into one row. Adjusting them one by one seems very painful.
Maybe there is some quick and efficient method to solve such problem?
Any help or ideas would be appreciated.
One think you may want to try is linking the Excel spreadsheets as tables in Access. From there you can create a query that ties the tables together. As data changes in the spreadsheets, the query will reflect those changes.