Power BI - Anti Left join - powerbi-desktop

Data:
I have two datasets, design-wise set up in Excel as a matrix with first ID and lots of rows, and with the rest of the columns in the data set have 1-1 headers id numbers, so like 500 rows and around 45 columns.
Like ID, ColumnB, ColumnC
The other matrix has the same headers, but different order. It does not seem to matter.
Challenge:
So I need to find the differences between the two. I made an anti-left join on ID and then I get the ID that are in the one data set and not the other, right? So I make one for each way, so I get the ID that are missing in the respective datasets(/matrix).
I need to do the same trick, even if both IDs are there and then I get only the data with a difference across all columns, so if there for a rowID is a "X" in ColumnB in dataset1, but NO "X" in ColumnB dataset2, then I want to include it in my new table. So if there are, for the two rows compared in the two datasets, a difference in just one of the columns, I need to know and want it in my new data, only the data with a difference.
Tried:
I tried to mark not only ID columns, but all the columns in the anti-left join setup, but it does not seem to work at all.

Related

Power BI - RLS: adding label to a visual and aggregate values

I need to set some RLSs on some tables and columns, and for a specific Matrix I want to show all the all the other values, that are not shown to that specific user, summed up together as "Others". I give you an example.
I have a matrix which on the rows and columns has different product categories and the values are the impact in $ that each category has on the others (it is the sum of multiples rows under IMPACT column in the source table). Particularly, the rows are the impacted products while the columns the impacting ones.
The Matrix looks like this:
I want to create an additional label , OTHERS, which has to be displayed both as a column and row, which sums up all the values of the non-shown values. For instance, for the RED RLS categories I want to get a matrix like this:
FYI, I want to keep the totals of the rows and columns.
Does anybody have any ideas?
I tried to create some measures and calculated tables but I was not able to get the final view I want.

Spark - partitioning/bucketing of n-tables with overlapping but not identical ids

i'm currently trying to optimize some kind of query of 2 rather large tables, which are characterized like this:
Table 1: id column - alphanumerical, about 300mil unique ids, more than 1bil rows overall
Table 2: id column - identical semantics, about 200mil unique ids, more than 1bil rows overall
Lets say on a given day, 17.03. i want to join those two tables on id.
Table 1 is left, table 2 is right, i get like 90% of matches, meaning table 2 has like 90% of those ids present in table 1.
One week later, said table 1 did not change (could but to make explanation easier, consider it didn't), table 2 was updated and now contains more records. I do the join again and now, from the former missing ids some came up, so i got like 95% matches now.
In general, table1.id has some matches with table2.id at a given time which might change on a day-per-day base.
I now want to optimize this join and came up on the bucketing feature. Is this possible?
Example:
1st join: id "ABC123" is present in table1, not in table2. ABC123 gets sorted into a certain bucket, e.g. "1".
2nd join (week later): id "ABC123" now came up in table2; how can it be ensured it comes into the bucket on table 2 which then is co-located with table 1?
Or am i having a general problem of understanding how it works?

Python - Join/ Merge Based on multiple column match

I would like to join two data frames based on multiple columns because there are duplicate IDs in the data sets.
I have tried a few ways, one of which is listed below.
However, I cannot get it right. The option below gives me all rows from both data frames. I figure this should be easy but for some reason, it is not working.
I checked the results. There are matches and instead of joining on the match, I just get both rows in the final data frame.
I am comparing two different data sets to ensure the same data exists in both sets.There can be more than one transaction with the same ID but I need to make sure that all that exists in one data frame, also exists in the other.
new_df = Enterprise.merge(Tableau,
left_on=['ID','AID','Amount','Tax','CC'],
right_on = ['ID','AID','Amount','Tax','CC'],
how='left')

How to merge 2 BIG Tables into 1 adding up existing values with PowerQuery

I have 2 big tables (1 has 690K Rows, 2nd one has 890K rows).
They have the same format and columns:
Username - Points - Bonuses - COLUMN D... COLUMN - K.
Lets say in the first table i have the "Original" usernames and in the 2nd table i have "New" usernames + Some of the "Original" usernames (So people who are still playing + people who are new to the game).
What I'm trying to do is to merge them so i can have in a single table (sum up) their values.
I've already made my tables proper System Tables.
I created their connection in the workbook.
I've tried to merge them but i keep getting less rows than i expect to have, so some records are being left out or not being summed.
I've tried Left Outer, Right Outer, Full Outer with no success.
This is where im standing:
As #Jenn said, i had to append the tables instead of merging them and i also used a filter inside PowerQuery to remove all blanks/zeros before loading it into Excel, i was left with 500K Unique rows, instead of 1.6 Million. Thanks for the comment!
I would append the tables, as indicated above. First load each table separately into PowerQuery, and then append one table into the other one. The column names look a little long and it may make sense to simplify the column names so that the system doesn't read them as different columns due to an inadvertent typo.

Pivot Table with multiple rows all having the same level hierarchy

I have imported a bunch of data using PowerQuery into a single table and am building dashboard reporting. I have been using Pivot Tables to build my reports, which has worked fine so far.
However, I've come to a point though where I want to simply show the count of multiple columns (calculated fields). So I have column A,B,C,D, and want to show the count each of each. But, I don't want them to be subsets (or children) of one another, and I don't want to build a bunch of Pivot Tables (file is already getting pretty big, and I want them row by row for easy viewing). Any suggestions?
Also, I am using the "Columns" field already to show the counts by certain weeks (week one, week two, etc.).
Thanks,
-A
Thanks for the follow-up. Within PowerPivot, I have four calculated fields/columns that are True/False for each column. I want to know how many times each of those columns were marked "True" (I can rename the "True" field to distinguish between which field it's referencing). But I don't want four pivot tables. Right now I can only think of making four pivot tables, filtering out the false for each one, then hiding the rows so the "True" values stack on top of one another. If I put all the four fields together in the same Pivot, the three below the first become subsets. I don't want subsets, just occurrence counts.
Does this help provide clarification?
If I understand you correctly, here's an example that shows what you're trying to achieve:
The table on the left has the TRUE/FALSE entries and the PivotTable on the right just shows the number of true items in each of those columns.
The format of the DAX measure to produce these count totals is:
[Count of A]=CALCULATE(COUNTROWS(PetFacts),PetFacts[A]=TRUE)
(Apologies to any parrot owners who may get upset that I have inadvertently re-classified their pets as cold-blooded!)

Resources