Power Query: Merge Only Unique Values Once - excel

I am trying to figure out how to go about merging two tables and only having the matching rows in the first table appear once with a unique value from the second table.
I have two tables. For the sake of this question, I put everything into one picture here. I have an Orders table which is green. I have several order numbers for specific parts for each order. Each order is sorted by a priority for the order; 1 being the highest priority which will need to be fulfilled first.
In blue is my Stock table. This is what is currently being held in the warehouse. Each part has a serial number and some parts are located on different shelf locations.
The non-colored table is what I want my end result to be using Power Query. The orders with the highest priority get filled with the first serial number available in the inventory. Then the next order is filled with the next serial number and so on until there is no inventory left; in which case the query will just show blank.
For the sake of the question, this is what the tables look like before my attempts to merge.
I have been trying all types of merges and sorting combinations but no mater what I do I end up with duplicates, entire row orders removed or incorrect priority fulfillment. I have a working formula that I use in Excel; an array formula which I can post if it will help, but since we are moving towards Power Query, I really want to learn how to go about doing this if possible. I am sure there is something easily logical about this task that I am overlooking so any assistance would help. Thank you.

If you create a grouped index column you can then merge on that.
For both your Order table and your Stock table, perform a Group By operation on the PART# column.
New Column Name = All, Operation = All Rows
With the All column still grouped, add a new custom column named GROUP_INDEX. Use this code:
Table.AddIndexColumn([All],"Index",1)
You can now delete the All column. Expand the GROUP_INDEX column to expand the other columns (except for PART#)
Now you can merge both tables. Make sure your Order table is the left (upper) table. Merge on the PART# AND Index columns. Select Left Outer join.
Output:

Related

Create calculated column in one table using data from another table in Spotfire?

Apologies for asking this, I see a few similar questions but unfortunately I don't really understand the answers given. I am fairly new to Spotfire.
I am trying to create a calculated column in one table based off of data from another table. The first table (which I am trying to add the new column to) is a monthly data census of all contracts the company has. Each contract has one row. Call it "monthly table". The second table is a complete transaction history for all contracts, so each contract can have hundreds of rows. Call it "total table".
I am not sure if I can link the data tables. I read in a few similar questions to insert columns from the table with data into the table where the new column will go, or to merge them beforehand, but the monthly table has about 13,000 rows compared to the total table's 550,000, so I am not sure how this would work.
I am trying to create a column to sum all the transaction amounts for each contract if the transaction type (also in the total table) is equal to a certain string. Like a "net transactions to date" column. I successfully created this column in the total table, but then each cell for a contract that meets the conditions had the transaction sum. I want it in the monthly table, where each contract just has one row, and it only displays the sum once. This is the code I have:
IF(
Concatenate([Contract_Number],"(transaction type string)",[Month],Year([Date]))
=[Type],
Sum([Amount]),
NULL)
It is currently in the total table. If possible I would like it in the monthly table, and for the [Number] and [Date] columns to refer to the monthly table, and [Type] and [Amount] to the total table.
Sorry if this is too much information or confusing! Also I know there is a problem with my summation (it is summing all transactions and only displaying in the correct rows, but it needs to only sum correct rows), but I think I can figure that one out, I just need help getting it in the right table.
I am currently working on transitioning a process/workbook from Excel to Spotfire.
If it would help I can provide the current formula used for this process in Excel.
After a few more hours of tinkering, I figured it out! I thought I would explain what I did here in case anyone else encounters a similar conundrum.
TLDR; Made the desired column in the total table, and added the new column to the monthly table with a left single join on the contract number.
My month and date are tied to a document property which is a drop-down of unique date values in my monthly table. Within the total table, I used a similar code to what I've typed above, but instead of referencing the Date column directly, I referenced the document property, so it was pulling the date from the monthly table into the total table. I also switched from the form IF(condition, SUM()) to SUM(IF()) OVER (). This ultimately summed the correct values. For example, there might be five different types of transactions for each contract, but I only wanted the sum of two types. This resulted in the correct sum being displayed. The sum was displayed in every single cell corresponding to the correct contract number in the new column, so (in the same example) in all 5 contract #1 rows, the sum of the two correct types was displayed.
Then, I went to the data canvas for my monthly table, and added the new column. I chose a left single join, as each contract had only one row in the new table, so that the correct sum would only be displayed once.
End result code:
SUM(IF(Concatenate([Contract_Number],"type-string",[Month_ValDt],[Year_ValDt]) =
[Type], [Amount],0)) OVER ([Contract_Number])
Where [Month_ValDt] and [Year_ValDt] are new columns I made in the total table that display the month and ye
ar from the document property that is tied to the date in the monthly table.
Reasoning for the property is that we have a few years of data but I was told to make it dynamic so only one month of data is visible at a time, hence the drop-down.

How to get distinct count within pivot table(Excel for Mac) having filters?

Excel for Mac doesn't support Power Pivot and thereby doesn't have distinct count feature.
What is the best workaround to get distinct count in such cases?
Sample Excel Columns:
Period Criteria1 Criteria2 Criteria3 Data
Sample Pivot table:
Different values in 'Period' will be pivot columns.
'Criteria1' can be a filter to pivot table.
Both 'Criteria2'&'Criteria3' columns can be pivot rows.
Now, count of 'Data' can be obtained directly through pivot.
How to obtain distinct count of 'Data' ?
Answer Options
Using 'Countif' on raw data - Cons: Very slow on large data.
Counting unique keys made by concatenating Criteria columns - Cons: Gets complex and takes more effort in large data with many criteria columns
Is there any better workarounds to obtain distinct count within pivot table(Excel for Mac) having filters/multiple criteria's?
I think I had a comparable problem and here's how I "fixed" it.
Add a column to the table named "DistinctValue" - or "Crit2:Crit3" doesn't matter
Add a formula, concatenating the values from all fields you have as a row in your pivot table: =[#Criteria2]&":"&[#Criteria3] -
depends a little bit on your values, but for me : as concatenator
works fine. Space or no character may also work.
Add a column "Distinct"
Add a formula: =IF(MATCH([#DistinctValue],Y:Y,0)=ROW([#DistinctValue]), 1, 0)
Use this row in the pivot as a value with SUM and name it "Count".
This gives you a 1 for every first row of distinct values in your data table which is used in your Pivot. The data rows used for the pivot table should have exactly one row with a 1 for each section of rows. If you sum it up, you exactly get the distinct count.
If you add a new row to the pivot, you need to add it to the formula in 2. to get distinct values again.
Edit: You probably have to exchange , with ; for other languages in the formulas when also translating the formula names to German for instance.
The following solution has a few steps, but the payoff is that it scales very nicely, because the formula is just as performant on large sets as on small ones, as opposed to lookup/match which get slower the larger the set is.
(1) Custom sort on the fields with the duplicate values. e.g.
"Email Address"
n.b. If you prefer to count a particular instance (i.e. the record with the latest creation date), set the sort so that those duplicates will appear first.
(2) Create a new column, Call it Unique Count or what have you. In that column use a formula that is 1 if the preceding value and current value are not equal. E.g. =IF(EXACT(A2,A3),0,1)
(3) Fill down.
(4) Pivot on this table. Now when you do Count/Sum, use the Unique Count.

Combining Non-Zero Values from Multiple Columns into One in Excel

Every month I get given a budget from one of our clients in a Google sheet, which I need to convert into a SQL query so it can be uploaded into our database. As the number of rows and columns changes, I want to write some formula to semi-automate the process for time saving and mistake elimination.
This budget has spends in multiple columns, which I've managed to write formulas to combine into one column, with the correct details in the columns next to it (see example links below).
How I've transformed the data so far
The issue is this budget per country and partner, then has to be split again across multiple options. This leaves me with three columns worth of spend values, that I'd really like to combine back into one column, and ideally skip out all the zero values.
I've found an array formula on this site that will skip the zeroes, but I can't get it to work on more than one column.
=IFERROR(INDEX($U:$U,SMALL(ROW(myRange)*(myRange<>0),SUMPRODUCT(N(myRange=0))+ROWS($1:1))),"")
From this Question's Answer
Is it possible to write a formula, that skips the zero values down one column, and then starts at the next? And that will also allow me to keep the correct matching details from the other columns alongside it, as well as bring in the column headers for the options as entries in a new column?
Thanks
Edit:
Here is the final format I'm looking for:
There is a concatenated field off the end that combines all the columns. Most of the values are populated by various Vlookups, to transform from the text version, into the database IDs, needed to fill the table.
It's also worth saying, that not being able to skip the zeros, is OK, as I can manually delete them fairly easily.
But as the number of countries and partners can and will change, I want the formula to be able to move column at the end of the dataset.

Count values for each row with a unique ID

I have a bunch of rows in a table. Each row reflects an event in a patient. However, one patient can have experienced multiple events, so it's possible for there to be multiple rows with the same patient number. Now I'd like to count the amount of male patients in my database, without counting the ones that had multiple events multiple times. Each patient is identified by a unique patient ID that could be used for this.
This shouldn't be all that complicated if not for the fact that I'm using a table that also has several filters, so I need to use SUBTOTAL for any counting functions.
I literally have no idea where to start, so I can't really provide any code...
Any function that could point me in the right direction would be greatly appreciated.
Thanks for the help.
~Laurens
Use a Pivot Table to filter and count all your patients database. Select your data and select Insert -> Tables -> Pivot Tables. Put your filters at the Filter section of the table and the Patient ID in the Rows section. Then, you can use COUNT to get the number of patients.
For more information about Pivot Tables, you can check this: https://support.office.com/en-us/article/Create-a-PivotTable-to-analyze-worksheet-data-a9a84538-bfe9-40a9-a8e9-f99134456576
To get the number of unique IDs in the same column, if the IDs are numeric, you can use SUM with FREQUENCY:
=SUM(IF(FREQUENCY($A$1:$A$1000,$A$1:$A$1000)>0,1))
If they're text and numbers mixed, you can get unique IDs with this one:
=SUM(IF(FREQUENCY(MATCH($A$1:$A$1000,$A$1:$A$1000,0),MATCH($A$1:$A$1000,$A$1:$A$1000,0))>0,1))
(From here)
Here you go
You've not mentioned whether an event is optional.
You might want to add extra column H with formula like h2=if(c2="",0,1) with 1/0 and multiply it as well in G.
Basically if column G contains a 1 you include it
Here's what the results of the formula look like:
Revision
Table is sorted by Patient id..
on change of patient id column H contains a 1, it'll be 0 otherwise.
So H2 is hard coded to 1, H3,H4,H6 will evaluate to 1.
So now G2=H2*E2 etc. You can filter by column H.
The beauty of mapping things into binary zeros and ones is you can do multiplication to achieve a logical AND result, whilst at the same time breaking a complex task into a series of steps. You can then apply a filter to the data to get the rows where column G are not zero, and see the totals count. Normally I'd insert a column between header and data on row 2 and then have G2=SUM(G3:G9)
Sum column H for number of patients.

Count number of unique combinations of two columns

I have a spreadsheet of statistics from sports games over a season, for different leagues - each row holds a single event that happened in a game, such as a penalty. There are many rows of events for each individual game. One of the columns is the league, another is the home team and another is the away team. How can I count the total number of games in a given league? In other words, I would need to count the number of unique pairs of strings from Home and Away, where League = "Ligue 1".
EDIT
I have tried:
=SUMPRODUCT(1/(COUNTIFS(E2:E81078,"Ligue 1",F2:F81078,F2:F81078,G2:G81078,G2:G81078)))
which returns a DIV/0 error (it does work if I dont include the column E = "Ligue 1" criteria).
This is similar to your formula but deals with the division by zero
=SUM(IFERROR((1/COUNTIFS(E2:E81078,"Ligue 1",F2:F81078,F2:F81078,G2:G81078,G2:G81078)),0))
Enter it with Ctrl+Shift+Enter rather than just Enter. If done correctly you will see {} around the formula
Try not to use ranges that are bigger than your data because it will slow these kind of formulas down significantly
Update
This might also work if your data is ordered the way you show in your question. It counts the number of times the home team changes in Ligue 1 data :
=SUMPRODUCT((F3:F81079<>F2:F81078)*(E2:E81078="Ligue 1"))
Note that the ranges in column F are offset by one row
You can do this with a Pivot Table.
Add a "helper" column where you concatenate the two teams, preferably with a delimiter in between, eg:
=CONCATENATE(B2, "|", C2)
Use, for example Teams for the column header
Then, Insert ► Pivot Table and be sure to select to Add to Data Model
This adds the option for Distinct Counts to the Values Settings
Then Drag "league" to the Rows area, "Teams" to the Values area, and select Distinct Count for the Value Setting
You might get a table similar to below, which you can format in many different ways:
EXCEL SCREENSHOT=SUMPRODUCT(1/COUNTIFS($B$1:$B$7,B1:B7,$C$1:$C$7,C1:C7))
TRY THIS =SUMPRODUCT(1/COUNTIFS($B$1:$B$7,B1:B7,$C$1:$C$7,C1:C7))

Resources