Powerquery to SQL using Cell value as parameter very slow - excel

scenario:
Running an XL sheet, using a single powerquery to retrieve data from a SQL server. The resulting dataset gets used in two subqueries. All three datasets are then loaded to pivot tables in the XL sheet.
objective:
Send query using 2 parameters retrieved from XL sheet, thereby reducing size of returned dataset. Folding query back onto SQL server, rather than filter post-retrieval in PQ.
description of problem:
if I use the below configuration for the main query:
let
dbQuery = "SELECT * FROM dbo.somequery",
Source = Sql.Database("<server>", "<database>", [Query=dbQuery])
in
Source
This works fine and returns about 6500 rows almost instantly.
The following function is defined as 'GetRange' in PQ to retrieve a cell value:
(rangeName) =>
Excel.CurrentWorkbook(){[Name=rangeName]}[Content]{0}[Column1]
Two parameters get retrieved by using below syntax for each:
= GetRange("<named cell>")
If I now change the dbQuery to:
dbQuery = "SELECT * FROM dbo.xlPAS_PivotOutput
WHERE parameter1=" & Text.From(parameter1) & " and parameter2=" & Text.From(parameter2),
The query does compile, and results are returned, but Excel PQ takes about 1 to 2 minutes to actually return a value?
It appears that simply retrieving the 2 cell values and using them as input parameters takes a huge effort for some reason ?

I've been trying to debug a query that was pulling in data from a URL that I pick up from a cell in the spreadsheet. It was running ridiculously slowly, whereas using the URL explicitly in the query was running fine. I kept trying to simplify the spreadsheet to troubleshoot up to the point I had a query that was just selecting the text value from a single cell in an otherwise empty spreadsheet and this was still slow as hell.
The Name Manager revealed that there were a couple of invalid name references and one referencing a range in another Excel file on another location which I couldn't access. Deleting these obsolete names seem to have fixed the problem.
The fact that name references that are unused in the spreadsheet cause this type of problem is definitely an Excel bug, but at least knowing this, searching for invalid name references and deleting them seems to fix the problem.

Related

Functions pulling from a refreshing power query giving REF! error

I have a changing power query which size changes based on the amount of PDF's from a folder. I have functions set up on a separate sheet tab to pull data from a range of cells from the power query tab. The main problem I have is if I load only a few of these pdf's my references will get deleted since the table range is getting removed. I have been trying to find work arounds to this, any help would be appreciated.
I tried to do a Pivot table of this data I need but I am trying to match certain criteria like an invoice and specific words to get a value. This did not work for my situation.
Example of Power Query
this is the data I am pulling that will turn into REF! if I take a pdf out and the rows shrink.

Reference an excel cell in a Where statement to change the Query

I would like to use an Excel cell to change the reference data in a Where statement so that I don't have to keep going into power query to change the statement.
Instead of the 31690 in the below code I would like to reference cell B7 in sheet1 of the same Workbook instead.
Is this possible? and if so how?
Thanks in advance.
WHERE ORDERDATE >= #Month13#(lf)#(tab)and STOCKCODE is not null#(lf)#(tab)AND SALESORD_HDR.ACCNO = '31690'
Maybe something like this?
For this approach to work, you need to make sure your spreadsheet has a table and the table's range starts with A1 and spans beyond the cell with the value in it--in this case, B7. Here's an example:
I started by creating this spreadsheet with a table named Table1:
Then, I used Table1 as the source in Power Query.
Notice that with the table above, what was row 7 is row 6. This is because the column headers don't have row numbers in Power Query. This change in row numbering matters for finding your targeted cell.
Then I added some custom M code. This code first extracts the second column's name from the list of column names. (Because the second column would be column B of the spreadsheet.) Then it uses that second column's name to create a table of that column's values, from which it then extracts the sixth row entry. (Because that sixth row entry would be the seventh row entry in the spreadsheet.) Note that the {1} points to the second column and the {5} points to the sixth row. That's because Power Query indexing starts at 0.
I went into Advanced Editor and renamed the step from Custom to DateVariable:
DateVariable = Table.Column(Table1_Table, Table.ColumnNames(Table1_Table){1}){5},
Then I added some more custom M code to concatenate the DateVarable with the rest of your SQL statement as an example:
Here's my M code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
DateVariable = Table.Column(Source, Table.ColumnNames(Source){1}){5},
SQL_Statement = "WHERE ORDERDATE >= #Month13#(lf)#(tab)and STOCKCODE is not null#(lf)#(tab)AND SALESORD_HDR.ACCNO = '" & Text.From(DateVariable) &"'"
in
SQL_Statement
Simple solution using a named range
First, select cell B7 and enter a name in the Name Box (e.g. CellReference). Then right-click on the cell and click on Get Data from Table/Range.
This opens the Power Query Editor with a query that returns a table containing the cell from the named range. Open the advanced editor, delete the entire content of the query and type Text.From(Excel.CurrentWorkbook(){[Name="CellReference"]}[Content][Column1]{0}) and click on Done, this is what it should look like:
Note: Text.From() is used so that value returned by CellReference can be concatenated with the SQL query using &. Also, this function is preferable to Number.ToText() which does not work with text values.
Finally, insert the query name in your SQL query: WHERE ORDERDATE >= #Month13#(lf)#(tab)and STOCKCODE is not null#(lf)#(tab)AND SALESORD_HDR.ACCNO = "&CellReference
Note that if the cell contained a text value instead, then you would need to adjust the syntax like this: ... SALESORD_HDR.ACCNO = '"&CellReference_Text&"'"
How to deal with warnings: new query permission and Formula.Firewall
How to give permission to run all new native database queries
Depending on your Query Options settings, you may get this warning message regarding the permission to run the modified SQL query each time CellReference contains a new value:
If you are certain that the cell will never contain a string of characters that could modify the database, you can disable this warning message by going to File -> Options and settings -> Query Options. Under GLOBAL, go to Security and uncheck Require user approval for new native database queries.
Note that this is a global setting that is immediately applied to all your Excel files, including those that are currently open.
How to disable the Formula.Firewall warning message
Depending on your Privacy Levels settings, you may get a Formula.Firewall warning message preventing the query from being executed:
If you are in a situation where you can disregard privacy levels, you can disable this message by going to File -> Options and settings -> Query Options. Under CURRENT WORKBOOK, go to Privacy and select Ignore the Privacy Levels and potentially improve performance.
Click on OK and refresh the query.
If, on the other hand, your workbook needs to preserve a privacy level of Private or Organizational, to my knowledge there is currently no way of integrating CellReference to a SQL query (even using a SQL parameter set with the Value.NativeQuery function or a Power Query Parameter ) without raising this warning message. The only solution would be to include CellReference in another step in the query, but then the filtering will occur in Power Query and not at the server level: query folding is interrupted when a step includes a query/function/parameter that is linked to an external data source including a named range in the workbook itself.
If your workbook privacy level is set to Public, you should be able to avoid this warning message by using the Value.NativeQuery function (you can even enable query folding for further query steps if you are using a SQL Server or PostgreSQL database). If you still get the warning message, you can try combining the two queries accessing each data source (the database and the worksheet) into a single query.
Note: these steps were tested with Excel Microsoft 365 (Version 2107) on Windows 10 64-bit connected to a local SQL Server 2019 (15.x) database.
This answer was prepared by referring to many blog posts by Chris Webb (linked above) and by Ken Puls (like this one).

Excel Data Queries - Ignore missing table / assign specific table number for every query

I am having a bit of trouble to create an automated report based on an HTML file. The file contains tables with data structured from the web page, and I just create tables from the tables recognized by Excel. So far it does what I need, but sometimes one or more tables from the HTML file is missing, and causing the tables to shuffle between them, like table 0 is missing then table 1 will take it's place and break the entire sheet because the wrong table is in the place of table 0.
What I wanted to know if it's a way to assign every query to a specific table number for each query. Like Table 0 will get the value from the specified query, not the first one that comes in the list of queries. The code so far is this for Power Query Editor:
let
Source = Web.Page(File.Contents("D:\AUTO.html")),
Data0 = Source{0}[Data]
in Data0
I use this code because the columns or rows will not always be the same, sometimes one can be missing and if I use the original code that is generated when getting the data from the page it will give errors and not load the table if there is a missing column/row.
Any help is appreciated.
MissingField.Ignore
When you use functions like Table.SelectColumns or RenameColumns or ReorderColumns you can use the MissingField.Ignore options to avoid the missing field error to stop your query
eg:
= Table.SelectColumns(#"blah",{"column1", "column2", "column3"}, MissingField.Ignore)
documentation:
https://learn.microsoft.com/en-us/powerquery-m/missingfield-error

Merging .csv files in query to retrieve its data from Excel through a UDF and ADODB connection

My goal is to run some code in vba and call the function from an Excel cell to retrieve some data from a closed .csv file or .xlsx file.
This could be done in several ways, but all I've tried have an important constraint.
I start with a very large .csv file. Very large is around 4,000 rows and more than 1,000 columns.
First try:
Save the .csv in an Excel worksheet and use ExecuteExcel4Macro to retrieve the data. This works fine when running a Sub, and even when running a Function. But, unfortunately, you can't use ExecuteExcel4Macro and call it from an Excel cell. The first try is done.
Second try
Use an ADODB Connection and run a query directly from the .csv file or from the saved .xlsx file. This can be used from a cell, but, surprise, surprise, it has a limit of 255 columns or fields. I mean, when you run a query and try to read a field that is positioned in a column number greater than 255, the function does nothing. Second try done.
Third (and last (by now)) try. Need your help here!
Ok, I could divide the original table, which has too many fields into several tables containing a maximum of 255 fields each.
Note: the first column contains the ids of firms, banks or whatever. The rest of the fields are named x1, x2,...x1050 and they correspond to fields of financial statements, so they are all numeric and they are all useful for the analysis.
If I split the big table in different ones, the aspect would be like:
Table 1:
Name x1 x2 x5 x6 x15...
myName1 15025 1546 6546 548 98663
myName2 867486 4684 68786 876 68997
myName3 87663 43397 87987 457 -4554
etc.
...
Table n:
Name x928 x929 x940 x1005 x1250
myName1 765454 541546 76546 74548 18663
myName2 6564 544684 686 41876 58997
myName3 4687 64397 9887 879457 8554
I can do this by running some vba before I store the files, so now I have n .csv files. The point is that I want the formula called from a cell like this:
=GetData(path,file,name,operations)
I mean, the user wants to locate a name in a file and make some operations with "all" the fiels available, from 1 to 1250.
Let's suposse the first splitted table goes from x1 field to x250 field. The second would go from x251 to x500, etc. All of the tables would have a first column with the names field, of course, and all tables would have the same number of rows (not the same number of columns, as not all x's fields exist).
But, important, the operations called by user could be like:
"x3" --> User requests only one field.
"x5+x150" --> User requests the sum of two fields that would be in the same table (as the x150 field is not greater than x250 field)
"x452+x535-x900+x1200-x1" --> User requests operations with many fields that would be kept in different files.
When the user requests only a field, I can write a small routine in the beggining of the function to tell the function in which .csv file is that field stored, like:
if singleField<=250 then
fileToLookAt="SplittedCSV_1"
end if
if singleField>250 and singleField<=500 then
fileToLookAt="SplittedCSV_2"
end if
Then, using an ADODB Connection and Microsoft.Jet.OLEDB.4.0 provider, I would run the query like:
MyQuery = "SELECT x" & singleField & " AS MyData FROM [" & fileToLookAt & ".csv] WHERE Name='" & name & "'"
But, what happens when the user wants an operation involving x's fields stored in all different files, like the third example I gave? I would have to "merge" all tables, using Name field as the key for merging then.
How would you proceed? Is it merging the tables in the Select the best option? How would the Select be?
I mean the Query would be like:
MyQuery = "SELECT x452+x535-x900+x1200-x1 AS MyData FROM [" & MergedTable & ".csv] WHERE Name='" & name & "'"
Thanks a lot for your time.
You could stuff the data into a mdb file using ado and bypass the 256 column limitation. However, using UDF's to retrieve data directly from any external datasource is going to be very slow if you have more than a few. I would create a class to hold the data, with a load method called on opening the spreadsheet, and have your functions query the object. So your load method takes your csv as a datastream and fills a disconected ado recordset defined as a static variable, and then you define a getdata method that returns your desired value based on the parameters passed to it.

How to Link a Excel Table with Access and prevent NULL Values due to wrong Data Type Conversion?

In the current Project i Need to Keep a Excel File which gets Values from a Machine to the Access Database to work with them and Import them in the Data Model.
Problem is some of the Values give invalid results due to the way they are saved. For example the timestamp is saved like
030420 instead of 03:04:20 and Access cant handle that and gives me a #NUMBER
I can not simply Change the datatype in Excel because the whole Excel gets refreshed every hour by a source that i cant influence.
Any help appreciated.
If Erik's proposal does not work, you can
- create a backup copy of your Excel source
- tweak the file: enter text in the first row of the problematic columns
- link the tweaked file into Access
- put back the real file in place.
Now the problematic columns should be read as Text, and you can build a query that solves any issue like conversion, null handling...
Link, don't import, the Excel file, and you have a linked table.
Now, use this linked table as source in a simpel select query where you modify the data and alias the fields as needed. For example:
Select
F1 As SomeName,
F2 As OtherName,
TimeSerial(Mid([F5],1,2),Mid([F5],3,2),Mid([F5],5,2)) As TrueTime
From
LinkedTable
Where
F7 Is Not Null
The use this query for your import.
Consider querying the Excel file instead of using a linked table.
The query can directly query an Excel range:
SELECT * FROM
[Excel 12.0 XML;DATABASE=PathToMyExcel;HDR=Yes;IMEX=1].[MyRange] t
Then, you can use functions like TimeSerial to cast numbers to time values.

Resources