How do I lock an additional column to rows imported from Power Query in Excel 2016 without a unique key column? - excel

I am using Power Query in Excel 2016 to combine data from 12 different workbooks within the same folder system into one table, and need to add an additional column in the master table that tracks the status of each row. However, when I refresh the data, the Status column does not follow the rows to which it is initially applied.
I have already looked at [ Inserting text manually in a custom column and should be visible on refresh of the report ] but this solution only works with a unique ID column. Because each of the 12 workbooks is edited separately and because there is no single column that can be guaranteed to have unique values between all of the different spreadsheets, I don't have a key to join the data to the additional column.

I believe there is always a way of finding a Unique ID. If you can get your head around this, it is not that difficult to solve your problem.
See my below example, I used three sample workbooks saved in a Test folder. Depends on the way you add them to the query editor, in my example I used From Folder and follow the prompts without making any changes and combined the tables automatically. Once combined there is a Source.Name column automatically added. I suggest to leave this column in your output table as it can form part of the Unique ID if your data is highly identical across the workbooks.
An optional step (not in my screenshot) is to add an Index column and concatenate the index number with a product/task name so it can make that specific line of data entry even more unique.
Once you added the Status column with data entered manually on the master table, load the master table back to query editor.
Then go back to the original query (Test (Input) in my example) and merge it with the reloaded output query. See my screen-shot for how to 'uniquely' merge the two tables.
The rest is self-explanatory. I think the key is finding elements of the Unique ID and incorporate it in the merge part.
Let me know if you have any questions. Cheers :)

Related

Excel - List of key values created from external files in power query, trouble with editing mapped values

I am attempting to create a standardized list of names for a long list of free typed values in a list of csv's pulled from Jira.
What I have tried so far has been to use Get Data -> From File -> From Folder
And then narrow it down to just the column I need and then remove all duplicate rows.
After loading that, I have tried adding a column that's just an empty string. I have done this both in power query and in the data model with the same effect. I want to have the second column so the user can map the values in the key column on a worksheet. This table will be used as a map for pivot tables to standardize names. Attempting to update the value in a worksheet and then refreshing to see that change in the data model just reverts the value back to an empty string.
Obviously i'm going about this the wrong way. The goal is to be able to maintain this key, value map over the months as new keys are added to it and just have to map those new entries rather than having to do a lot of work with comparing every time to see whats new. Is there a better way to achieve what I am trying to do and still maintain it being expandable over the months without having to redo the entire workbook?

How to get specific cell contents of an Excel sheet at another sheet dynamically?

I have an Excel file including thousands rows as follow. First column includes names and second column include group of each. I want to have all names belong to group "A" at another sheet dynamically. because name and group list may be changes. In other words, what command or function I should use to list all names belong to group "A"?
There are 3 ways to do this. The options are below. One thing you did not include in your question is what the results should look like.
Equations like =FILTER(A:B,B:B="A")
Pivot tables to use this convert data to table, then create pivot table. This requires refresh when new data is added.
Power query to use this convert data to table, then go to Data>From Table/Range. This requires refresh when data is added but you can change the "connection properties" under Data>Refresh All˅.
Now if you want all in group "A" to be listed with in one cell, then I would use option 3 with groupby as talk about here.
If the answer works for you the expectation is that you checkmark it and upvote it. If the answer does not work for you add a comment at the bottom and the problem you experience. For your situation you will need to adapt the answer.

Excel: find and order matches by column

I´m currently working with a huge epidemiological dataset with several Excel-files. The files contain pathology and clinical report for almost 30k patients. Each patient can have several pathology and clinical reports. The patients are assigned an unique ID.
I want combine all files into one so that ID for patient X001 would contain all the information form all the files. I cannot just copy/paste because the number of rows (IDs) in the files vary.
Here is an example of what I want to accomplish.
I want to combine two lists as follows.
As you can see that List1 and List 2 vary in row numbers. Also there are IDs in list1 that are not found in list2 and vice versa.
I want to merge them so that they align and match, see image below. Can someone provide a code for this? I cannot do this manually since I have 100k rows in list1 and 30k rows in list2...that would take several weeks to do with a risk of errors.
You can merge tables combined utilizing Excels built in Power Query, which can be found under the Data tab.
Note: Photos are taken from Excel 2016
The first step is to create the queries:
Within the Get & Transform section under the Data click on New Query -> From File -> From Workbook and select the appropriate workbook that has the table you want to merge
Select the appropriate sheets in which your tables are found, and confirm that they are displaying properly
If you notice that the table is not correct, you can make changes to it via the Edit button below.
For example, if you notice that your Column headers are being treated as a normal value, you can click Use First Row as Headers under the Power Query Editor Home -> Transform
I would also recommend changing the name of the query so it makes more sense down the line
Once you are happy with the way the query is looking, click on the Close and Load Dropdown menu under the Power Query Editor Home and select Close and Load To...
Select Only Create Connection to add it into your Workbook Queries without duplicating the table.
Repeat the above steps for each table in which you are looking to merge.
Once you have all of your tables linked via Queries, you can now move on to merging them:
Under the same section of New Query select Combine Queries -> Merge
Select the two queries you are looking to merge in each of the respective boxes
Confirm that they are correct via the preview window (don't worry if not all rows show)
Rule of thumb would also be to select your largest query first, and the smaller second
Next, highlight the columns in which you are looking to merge based on. For your example it would be the ID. This is done simply by clicking on the column within the preview
Finally change the Join Kind to Full Outer and click OK
From here you should be back in the Power Query Editor
The final steps are modifying this merged query to your desired output
You should notice that there is a new column added next to your first original table with the name of the query at the top, next to the name is a button that allows you to expand out this query.
Select the appropriate columns you would like to merge into the other table and click OK
If at any point you make a mistake, you can retrace your changes under Applied Steps within the Query Settings Pane
Once you are happy with the way your newly merged query looks, go ahead and click on Close and Load
Your should now have access to your new merged query that will update based on changes made to the original connected files
If you want to make any additional changes going forward from this point just click anywhere inside of the table and you should see both the Table Tools and Query Tools tabs appear at the top

PowerQuery duplicate rows from external source

I have prepared a master Excel file which pulls data by means of a Power Query from several smaller Excel worksheets, all containing the same set of data (same columns) - one per employee.
Today I noticed that for some employees, some of the data is duplicated in the master table, even though said duplicates do not exist in their separate worksheets.
The master query is made up of separate "Connection Only" queries, pointing to each individual file. Regardless of how many times I click Refresh All, Manage Data Model, the duplicates still stay there.
Has anyone encountered anything similar or would you have any ideas what could be the reason behind this and how to get it sorted out?
Thank you!
You havent really provided enough info about your design, but I'm guessing you are using Merge Query steps to combine the "smaller Excel worksheets" ? If so then the typical issue is that you have not specified the correct columns to match on in the Merge Queries Step definition.
If the combination of columns you have chosen on at least one side of the Merge are not unique, then duplicated rows will appear on the subsequent Expand step.
The way to find these is to start a new Query against each source table in turn, select the columns you are matching on and use Keep Rows / Keep Duplicates. You should see no rows resulting - any rows that do appear are the source of your duplicates.
I usually save such queries and include them in the Refresh as an automated test going forwards. I put them in a separate Query Group e.g. "Tests - should return 0 rows".

Two Excel tables created from SQL queries collide in formula auto-fill

I have an Excel 2010 workbook with two SQL queries each returning data to a separate worksheet as a named table. They return the same db fields, but one is constrained on the values of one of the fields. I have additional columns using formulas to transform these field data, and these are also identical between worksheets.
Upon refresh, Excel autofills the formulae per the conventions of a named table. One of the sheets/tables--call it Table 1-- autofills with native references (e.g., for a field/column named variable, the corresponding formula uses [#[variable]] as its reference. However, the other table--call it Table 2--autofills with references to Table 1, i.e., 'Table 1'[#[variable]].
I have searched and replaced these several times, and rewritten the formulae, but each time I refresh the data query these references pop up. I searched to replace Table 1 with Table 2, as it occurred to me this may be a namespace collision and Excel just takes the first-created table as canon. This, though, doesn't fix the issue, nor did changing the column names to create a non-colliding namespace.
The only other thing I can think is that I'd copied the formulas from Table 1 and even though I removed the table name perhaps Excel has held onto the reference. Is there a table cache or such that Excel references to keep pulling these? Should I create a new query and new table and manually create the formulae, or would that run into the same issue?
[Entering this as an answer so it's not shown as an outstanding question.]
Creating the relevant tables from scratch results in no such namespace collision nor any wonkiness thus far, as we'd expect. I realized that I'd left something out of my initial question: I had copied, in whole or part (likely whole), the tab containing Table 1 to create Table 2. Even editing the resulting new SQL query and the formulae on Table 2, it seems Excel--in its effort to help--recalls several components of the table and does not update this cached information.

Resources