Power Query in Excel to Select Specific Cells from a column - excel

I'm using Power Query in Excel to reference a table within the same workbook. I want to select specific columns within that table. I know that can be accomplished by loading the table into Power Query and then choosing the columns I want to use. The resulting query is:
let
Source = Excel.CurrentWorkbook(){[Name="Legend_Data_Merged"]}[Content],
#"Removed Other Columns" = Table.SelectColumns(
Source,
{
"Observation number",
"First Sales Offer - Products",
"Middle Sales Offer(s) - Products",
"Last Sales Offer - Products"
}
)
in
#"Removed Other Columns"`
So, here's my question/issue:
I think this way is first pulling the entire table into Power Query, then stripping down from there. What I want to do is define the source table as the "Legend_Data_Merged" table, but choose which columns to pull from that table in the same operation. This way, it never has to load the entire table into Power Query. The reason is the table itself is about 120 columns long, and I only need three columns, and I have about 20 of these similar queries and it's starting to hog memory. Am I wrong in my logic here? And if not, anyone have an idea on what the query would be?
Could there maybe be a way to define the columns in the [content] part of the source operation ?
Thanks.

It may be a very simple attempt, but why not add a Worksheet "DataTransfer" where you set only references to the columns you need and read this small table with power query ?
If your columns are close together you could also set a named range and read only this range with powerquery.
But anyway, when the workbook is open, your big table is already in memory. There should not be much memory allocation, when reading the table with powerquery and selecting the three columns.

It's possible there's some problem in Excel or Power Query. How much memory are you seeing used by the excel.exe and Microsoft.Mashup.Container.NetFX40.exe process?
The only way to directly remove the columns from [Content] is to modify the actual data of the Excel table. You could try that to see if it makes a difference, but Power Query generally tries to be smart about only loading columns it needs.
If your query is using a lot of memory, you might get performance saving your data in a more efficient format (I'd try CSV). In any case, try turning off the "load to worksheet" and instead just load to data model.

You can refer to my question and answer here.
What you will want to do is use the Table.SelectColumns method instead of Remove.
let
db = Sql.Databases("sqlserver.database.url"){[Name="DatabaseName"]}[Data],
Sales_vDimCustomer = Table.SelectColumns(
db{[Schema="Sales",Item="vDimCustomer"]}[Data],
{
"Name",
"Representative",
"Status",
"DateLastModified",
"UserLastModified",
"ExtractionDate"
}
)
in
Sales_vDimCustomer
When viewing the raw sql using Express Profiler it will be done in one statement where
SELECT
$Table.Name,
$Table.Representative,
$Table.Status,
$Table.DateLastModified,
$Table.UserLastModified,
$Table.ExtractionDate
FROM
Sales.vDimCustomer as $Table
PowerBi and Power Query will also now show an error/ warning message with this recommendation when trying to import a large number of columns.

Related

Excel Data Queries - Ignore missing table / assign specific table number for every query

I am having a bit of trouble to create an automated report based on an HTML file. The file contains tables with data structured from the web page, and I just create tables from the tables recognized by Excel. So far it does what I need, but sometimes one or more tables from the HTML file is missing, and causing the tables to shuffle between them, like table 0 is missing then table 1 will take it's place and break the entire sheet because the wrong table is in the place of table 0.
What I wanted to know if it's a way to assign every query to a specific table number for each query. Like Table 0 will get the value from the specified query, not the first one that comes in the list of queries. The code so far is this for Power Query Editor:
let
Source = Web.Page(File.Contents("D:\AUTO.html")),
Data0 = Source{0}[Data]
in Data0
I use this code because the columns or rows will not always be the same, sometimes one can be missing and if I use the original code that is generated when getting the data from the page it will give errors and not load the table if there is a missing column/row.
Any help is appreciated.
MissingField.Ignore
When you use functions like Table.SelectColumns or RenameColumns or ReorderColumns you can use the MissingField.Ignore options to avoid the missing field error to stop your query
eg:
= Table.SelectColumns(#"blah",{"column1", "column2", "column3"}, MissingField.Ignore)
documentation:
https://learn.microsoft.com/en-us/powerquery-m/missingfield-error

Freeze columns from Excel SQL Query

I have an Excel SQL query made using the Query wizard / power query. Sometimes, when I refresh the data, the columns shuffle order! I have already tried checking/unchecking "preserve column sort" as suggested here: https://www.mrexcel.com/board/threads/sql-changing-column-order-when-put-into-excel.207385/
#Nathan brings up a good point. Try specifying in the query itself, the order of the columns.
If that does not work, The solution would be to accomodate the different order of columns as a possible outcome every time by making the result query into a ListObject Table. (I believe you can check this as part of the Query Wizard, to import as Table)
Then you can use the name of the columns without knowing the range address in your worksheet formulas and VBA code. You could simply refer to the column name in the format shown in this tutorial, then get the properties for column or row number using any number of methods.
Tutorial on using ListObject Tables

Power query data load into model and table does not sort correctly

I have a table that I fetch via a connect through Power Query, which has a list of names. I apply some steps including sorting the names column alphabetically and then loading it to a Table and the "Data Model". However the table that is loaded onto a worksheet contains the list of the names sorted in a completely different order, its like Excel is ignoring my sorting preference completely. I tried to sort the data in the "Data Model" resorting it in Power query even the table in the worksheet itself, but after I hit refresh it reverts to the wrong order.
Try Table.Buffer wrapped around the sort
= Table.Buffer(Table.Sort(Source,{{"date", Order.Ascending}}))
or, alternatively, add an index at start, and resort on index when done
I can confirm buffering doesn't work. Incredible bug.

PowerQuery to Excel sorting

How do I force an Excel table to keep the same sorting I've applied in Power Query?
I have loaded a data model query from an access database file, which I have then shaped and sorted using Power Query.
Afterwards I have imported it as an Excel table using the "Existing Connections" and made sure that I have the "Preserve column sort/filter/layout" box checked.
However, the data I see in Excel is not sorted and seems to be thrown in completely at random?
I have also checked the "Preserve column sort/filter/layout" box in the "Design - Table tools" under external connections?
I usually just add an index column in PQ and resort in Excel after linking to the existing connection.
The same issue happens in reverse when you bring sorted data into PQ, and it resorts it without being asked. An index column in the initial table import solves that as well

Querying single data points from the Excel Data Model / Power Query (Get & Transform Data)

I'm using an up-to-date version of Excel 2016 (via O365 E3 license) and using Power Query / Get & Transform Data. I can successfully create queries and load them to the page. I have also successfully created Power Pivot reports.
I would like to query single data points from the data loaded via Power Query. For instance, imagine a dataset called DivisionalRevenue with:
Date Division Revenue
2016-01-01 Alpha 1000
2016-01-02 Alpha 1500
2016-01-01 Beta 2000
2016-01-02 Beta 400
I could easily load that to an Excel workbook or include it in the data model and create a power pivot. However, Power Pivot doesn't always meet my requirements, particularly around how the data is displayed on the page. In order to achieve my goal I may want to be able to query individual data points.
I would like to have a cell on the page with a formula in it that I can use to query individual data points. If it was in a pivot table I could use something like:
=GETPIVOTDATA("Revenue",$A$3,"Date",DATE(2016,1,1),"Division","Alpha")
The lookup values (date and division) could be retrieved from a cell on the page or hard-coded into the formula. This is a requirement for several reports I'm working on.
Or, I could add a combined lookup column with Date and Division concatenated and use a vlookup to pull the values like:
=VLOOKUP("42371Alpha",I9:L13,4,FALSE)
Finally, I could use a combination of INDEX and MATCH to identify the correct row number and then pull the data.
All of these solutions require the data to be loaded onto a sheet. One requires a pivot table that has to be refreshed to work properly. The other two require creating arbitrary lookup columns so that you can match a row based on more than one field (date and division in this example), and you have to ensure that that lookup field's formula is properly extended down the length of the data table. In both cases I would have concerns when sharing this workbook with my colleagues in case someone affects the rather fragile setup of the pivot table or the lookup.
So, what I truly want to find is something equivalent to pivot table querying against a dataset.
** This doesn't exist, but I would like to know if something like it does **
=GETQUERYDATA("Revenue","DivisionalRevenue","Date",DATE(2016,1,1),"Division","Alpha")
Does such a thing exist? Can such a thing be done? Can I retrieve arbitrary data points from the dataset created through Power Query / Get & Transform Data?
I think that what you want are cubefunctions:
Some Background
How to easy create cubefunctions from a pivot table
There is a feature in Excel that allows you to query off of a PowerPivot model, but it's not highly advertised for some reason.
Once you have the data in your PowerPivot model, go to your Excel -> Data tab -> Existing Connections -> Tables tab
From there, choose the table that you want to start with. Once that table's data is on your excel sheet, you can actually right click that table -> go to "Table" -> "Edit DAX"
From there you can enter the following DAX function, as an example
EVALUATE
FILTER(SampleData,[Date]=DATE(2016,1,1) && SampleData[Division]="Alpha")
Make sure to choose Command Type=DAX in the drop-down. Here's how it looks on my screen:
To further improve your querying power, you can install the optional "DAX Studio" plugin for Excel, which allows you to write custom DAX queries and then export the results directly back to an Excel sheet.

Resources