PowerPivot DAX query slow with one fact table but not with another - excel

I have 2 lookup tables; Members (83,000 rows) and Groups (2,500 rows).
There are 2 fact tables; GroupMembers (190,000 rows) and Metrics (650,000 rows).
I created two separate models:
1. Members, Groups and GroupMembers
2. Members, Groups and Metrics
Both models have only one calculated measure on the fact table (COUNTROWS). If I bring in the Members>MemberID on Rows and Filter on Groups>GroupID, using CountRows for values; model 2 performs super fast and model 1 is really slow.
Here are the DAX query results.
Model 1:
Model 2:
The only difference between the two fact tables is that GroupMembers only has unique combinations of Groups and Members, whereas the Metrics table has other columns so the Groups and Members combinations are not unique.
Both excel files can be found here: http://1drv.ms/1GdK1WK
Please help!
[EDIT]
I did some further testing and found that if I duplicate the data in GroupMembers (i.e. load the same 190,000 rows twice so GroupID/UserID combination is not unique), the performance is great. Go figure! :)
I'm opening a support case with MS and will update this thread.

From Microsoft support team; apparently the UserID int values are too big for the query. They gave me a workaround to create an identity key column for the user table, and use that as the relationship between Users and UserGroups. I posted the workaround file in a new folder in OneDrive http://1drv.ms/1BEEmDJ
The support team will talk to the product team regarding the root issue and they'll investigate why large int columns are causing a performance issue.

Related

Excel 2016 Relationship

Goal
Create a working relationship between my Category Sales and Voids PivotTables so I can leverage one slicer for all data.
Background
Using two PowerQueries, I pull in data from SQL to Excel. Because Sales and Voids have DateStamp and StoreID columns in common, I essentially concatenate these in the SQL query to create an ID. For example:
select concat(StoreID,convert(int,DateStamp)) as ID, DateStamp, StoreID, Category, Sales from...
select concat(StoreID,convert(int,DateStamp)) as ID, DateStamp, StoreID, Voids from...
This is a one-to-many relationship between the two (Sales --> Voids)
Problem
Despite creating the relationship in Excel (through Manage Relationships, as PowerPivot is not available) I can't get it to apply and Excel tells me relationships between tables may be needed. I've no idea what I'm doing wrong.
Workaround
The only workaround I can think of is to take the void value for a given day and divide by the number of categories that have sales, then just do a join to create one table that I pull into Excel. It would technically work for my application, but I'd love to know why the relationship isn't working.
Thanks.
The answer is to export your data into the data model so that you can use power pivot, PLUS a export another power query (or several) into the data model that is a deduplicated table of keys.
Then, in the data model editor, set up the data relationships so that there is a one to many relationship between your deduplicated key table(s) and the "actual data".
Then, in a power pivot, use those "key" tables as much as possible, maybe even to the ruthless ideal(1) of using ONLY key tables in your primary row and column headers, and if you have a second level of categorization then a deduplicated table of primary to secondary, and so on, then using the real data tables only in the body of your power pivot.
(1) - Keep in mind that this is just an ideal I'm just explaining to help you understand and maybe start moving towards as much as actually makes sense. As with most things, in reality, the ideal is almost never worth reaching because there are other factors (like your own patience and time).

How to use the unique items from in a column from several tables as the row criteria of a pivot table in excel data model using Dax?

I have 3 different tables with the customer name and there are duplicates as well as unique customers in the 3 tables and I need to get the unique for all 3 to be used as the rows criteria in the pivot table.
I've been finding a way to do so but I cannot seem to figure it out.
The measure I tried is: Customers:=DISTINCT(UNION(VALUES('Test1 - Invoice'[CustomerID]),VALUES('Test2 - Invoice'[CustomerID]),VALUES('Test3 - Invoice'[CustomerID])))
But I get the error below:
Semantic Error: Too many arguments were passed to the VALUES function.
The maximum argument count for the function is 1.
I am quite new to DAX and have no idea how to do it. I believe it is because measures are only for values if i'm not mistaken
I read that to place on other fields of the pivot table, it has to be a calculated column although I do not see how it can be a calculated column as well.
One approach is to create a separate table to store the Customer Name dimension - then create relationships between that Customer dimension table and your 3 fact tables. This would be most effective at the Power Query stage, but can be done using DAX.
An alternative is to merge your 3 fact tables - again, this would be best done with Power Query, but is possible with DAX.

PowerBi desktop

I have 2 tables I am trying to create a relationship with in power bi. Both tables have the same values for example:
Table1 has location and Table2 has Location however the location is different. Every time I try to connect them say I need a unique value. Can someone please help me so I can connect them together?
Here is a passage from documentation regarding relationships in Power BI (Create and manage relationships in Power BI Desktop. In short, one of the tables you choose for the relationship should have unique values in the join column. So far in Power BI, you can define 1:*, 1:1 and *:1 relationship.
BlockquoteNote that you'll see an error that states One of the columns must have unique values if none of the tables selected for the relationship has unique values. At least one table in a relationship must have a distinct, unique list of key values, which is a common requirement for all relational database technologies.
If you encounter that error, there are a couple ways to fix the issue:
Use "Remove Duplicate Rows" to create a column with unique values. The drawback to this approach is that you will lose information when duplicate rows are removed, and often a key (row) is duplicated for good reason.
Add an intermediary table made of the list of distinct key values to the model, which will then be linked to both original columns in the relationship.
One of your table has to have unique values in Location (Primary Key) while the other can have duplicate values in Location (Foreign Key). Plus, the table with duplicates (the fact normally) must have values that are present in the other table (in the dimension).
In my opinion, to match your needs, you hshould add all the possible location in the table which would have unique values (the dimension).
I hope I made myself clear.

Avoid DISTINCTCOUNT in PowerPivot

Due to performance issues I need to remove a few distinct counts on my DAX. However, I have a particular scenario and I can't figure out how to do it.
As example, let's say one or more restaurants can be hired at one or more feasts and prepare one or more menus (see data below).
I want a PowerPivot table that shows in how many feasts each restaurant was present (see table below). I achieved this by using distinctcount.
Why not precalculating this on Power Query? The real data I have is a bit more complex (more ID columns) and in order to be able to pivot the data I would have to calculate thousands of possible combinations.
I tried adding to my model a Feast dimensional table (on the example this would only be 1 column of 2 rows). I was hoping to use that relationship to be able to make a straight count, but I haven't been able to come up with the right DAX to do so.
You could use COUNTROWS() combined with VALUES().
Specifically, COUNTROWS() will give you the count of rows in a table. That means COUNTROWS is expecting a table is input. Here's the magic part: VALUES() will return a table as results, and the table it returns are the distinct values in the table/column that you provide as the argument for VALUES().
I'm not sure if I'm explaining it well, so for the sample data you provided, the measure would look like this (assuming the table is named Table1):
Unique Feasts:=COUNTROWS(VALUES('Table1'[Feast Id]))
You can then create a pivot table from Powerpivot, and drag Restaurant Id into Rows, and drag the measure above into Values. Same result as DISTINCTCOUNT, but with less performance overhead (I think).

Wide rows vs Collections in Cassandra

I am trying to model many-to-many relationships in Cassandra something like Item-User relationship. User can like many items and item can be bought by many users. Let us also assume that the order in which the "like" event occurs is not a concern and that the most used query is simply returning the "likes" based on item as well as the user.
There are a couple of posts dicussing data modeling
http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/
An alternative would be to store a collection of ItemID in the User table to denote the items liked by that user and do something similar in the Items table in CQL3.
Questions
Are there any hits in performance using the collection? I think they translate to composite columns? So the read pattern, caching and other factors should be similar?
Are collections less performant for write heavy applications? Is updating the collection frequently less performant?
There are a couple of advantages of using wide rows over collections that I can think of:
The number of elements allowed in a collection is 65535 (an unsigned short). If it's possible to have more than that many records in your collection, using wide rows is probably better as that limitation is much higher (2 billion cells (rows * columns) per partition).
When reading a collection column, the entire collection is read every time. Compare this to wide row where you can limit the number of rows being read in your query, or limit the criteria of your query based on clustering key (i.e. date > 2015-07-01).
For your particular use case I think modeling an 'items_by_user' table would be more ideal than a list<item> column on a 'users' table.

Resources