Fairly new to power queries and finding my feet largely by trial and error.
I have build a master query returning ~ 2000 rows of data covering different regions. I want to create sub reports on different tabs for each region. I can easily do this by copying the original table and applying a filter on region for each new query. As my s/s is already 10mb, I am trying to do this is as efficiently as possible in terms of performance. I understand I can do this also by creating a "reference only" to the master query instead of duplicating and filtering master query (so creating 10 versions of master query with different filters).
I have been trying to do this via Query / Reference menu but not sure if I end up with a "connection only" query as it doesn't say it in the Queries panel on the right.
Anyway, I guess the questions are:
1. What is the difference between queries and "connection only" queries (especially with regard to performance / spreadsheet size)?
2. When is best to use "connection only" query?
3. How to create "connection only" query (ideally via menu not code) and how to check if a query is connection only?
Connection only signifies that the data is not being materialised anywhere. It may still be referenced by other queries, but the data isn't being loaded to a worksheet table or to the data model.
You can control the behaviour of queries with the Load To dialog, invoked by Close and Load To from the Query Editor, or right click > Load To from the Workbook Queries pane.
Connection only queries reduce workbook size - good rule of thumb is not to materialise any queries unless you really need to. In your example, it looks like you want to materialise each regional query, but unless you also need a master table of all regions, then your 'master' query may be connection only.
In performance and size terms, it makes more sense to only have your master query, loaded to the data model, with a regional slicer on your report...
Related
In Excel 2021, what exactly is a "data connection", "query" and "domain source name"?
Let's say I have a Workbook "Manahil_Customer_Database.xlsm" in which I have a sheet "sht_Customer_Cities" that has a table "tbl_Customer_Cities". In a new sheet "sht_Report" I want to run two queries using one connection via MS Query. Now when I go through the MS Query route I get one Domain Name Source File "Manahil_Customer_Database.dsn" and one MS Query file "Customer_Countries_Cities.dqy" and one Connection file "Customer_Countries_Cities.odc".
However when I look at the "Queries & Connections" it says 0 Queries and 1 Connection named "Customer_Countries_Cities". I want to be able to establish a single Data Connection via MS Query from the "sht_Report" to the Workbook "Manahil_Customer_Database.xlsm" and than run multiple queries using the same connection.
Power Query replaced MS:Query from Excel 2016 onwards. The objects and panes you are describing relate to Power Query, not MS:Query.
Power Query is far more functional, reliable, flexible and performant than MS:Query.
For example depending on your exact requirement, you might create a base query that gathers all the required data, then refer to that base query in Reference queries that filter the output needed for each destination table.
Here's a starting point for Power Query:
https://support.microsoft.com/en-us/office/about-power-query-in-excel-7104fbee-9e62-4cb9-a02e-5bfb1a6c536a
Power Query is a MS tool that assists you on your ETL tasks.
As read in a previous answer, it is based on M language.
To be able to import / modify / connect your data, the command is:
DATA / GET DATA and select your input
Check this link for a quick introduction:
https://learn.microsoft.com/en-us/power-query/power-query-what-is-power-query
If I understand the situation correctly, you are working internally, within a single excel file. Data connections, queries, and domain sources, are all used to associate externally.
Internally I would think you could use a pivot-table and/or a slicer.
If you provide additional details on what specifically you are trying to do, a better answer could be provided.
Some additional reading below may help further:
Power Query Help
Data Connections
Queries
External Links
I need to delete query steps after loading the data into model. The reason is to hide the sources, protect our know-how, or maybe I'm just not very proud of what I've done ;).
But when I delete PQ connections or change "Load To" option, also the tables disappear from data model and pivot table becomes unresponsive. It's also not possible to modify or delete the connection created in Power Query from Power Pivot window, or even view table properties.
I could use Review > Protect Workbook > Protect Structure to disable viewing and editing queries / connections, but the steps are still visible, and the user cannot modify the workbook; even pivot table drill-through function doesn't work as it needs to create a new sheet to show data rows.
If you need to remove the query steps, then you have to store the data within the Excel file (since a query is just a set of instructions for how to connect to the data and transform it).
What you can do is create a query, load it to a table in an Excel sheet and then delete the query, leaving a static table. You can then create a pivot table using this static table as the source and it should function normally (though you obviously won't be able to refresh the data). I.e. don't create a data model until you've loaded your data and removed the query.
I am supposed to optimize the performance of an old Access DB in my company. It contains several tables with about 20 columns and 50000 rows. The speed is very slow, because the people work with the whole table and set the filters afterwards.
Now I want to compose a query to reduce the amount of data in Excel before transfering the complete rows, but the speed is still very slow.
First I tried the new power query editor from Excel. I first reduced the rows by selecting only the last few ones (by date). Then I made an inner join with the 2nd table.
Finally I got less than 20 rows returned, and I thought I was fine.
But when I started Excel to perform the query, it took 10 - 20 seconds to read the data. I could see, Excel loads the complete tables, before setting the filters.
My next try was to create the same query direcly inside the Access DB, same setting. Then I opened this query in Excel, and the time to load the rows is nearly zero. You select "refresh", and the result is shown instantly.
My question is: Is there any way to perform a query in Excel only (without touching the Access file), that is nearly as fast as a query in Access itself?
Best regards,
Stefan
Of course.
Just run an SQL query from MS Query in Excel. You can create the query in Access, and copy-paste the SQL in MS Query. They're executed by the same database engine, and should run at exactly the same speed.
See this support page on how to run queries using MS Query in Excel.
More complex solutions using VBA are available, but shouldn't be needed.
Suppose I have two tables, TableA(embedded data), TableB(external data).
Scenario 1:
TableB is set On-Demand based on the markings from TableA. When you mark something from TableA, it take some "n" seconds to populate the data in TableB. On-Demand setting on external table is like screenshot named LOD.png
Scenario 2:
On-Demand settings have not been induced on TableB(please note TableB still is External). There has been a relationship created between TableA and TableB. TableB is now limited based on marking from TableA by the option"Limit data using Markings".screenshot named ss2
Questions:
1. Which scenario fetches data quicker.
2. From the debug log, the query passed in both the scenario is the same.Does that mean both scenarios are same or are they different?
Scenario 1 is good if Table B is really large, or records take a long time to fetch from the database. In this case, Table B is only returning rows that are based on what you marked in Table A. This means that the number of rows could be significantly less, but this also means that every time the marking changes, those rows have to be fetched at that time. If the database takes a long time, this can become frustrating. On the flip side, if the database is really fast and you are limiting rows down enough, this can be almost seamless. At the end of the day, you are pulling this data into memory after the query runs, so all Spotfire functionality is available.
Scenario 2 is good if calculations are highly complex and need to take advantage of the power of the external DB to perform. This means that any change to the report, a change of visualization etc., will require a new query to be sent to the external data source resulting in a new table of aggregated data. This means that no changes to a visualization using an in-db data table can be made when you are not connected to the external data source. Please note, there is functionality in Spotfire that is available to in memory data like data on demand that is not available to external data.
I personally like to keep data close to Spotfire to take advantage of all Spotfire functionality, but I cannot tell you exactly which is the correct method in your case. Perhaps these TIBCO links on the difference between in memory data and external data can help:
https://docs.tibco.com/pub/spotfire/6.5.1/doc/html/data/data_overview.htm
https://docs.tibco.com/pub/spotfire/6.5.1/doc/html/data/data_working_with_in-database_data.htm
I have two queries in my workbook that rely on each other. One is set to a connection only, the other is set to load to a table after performing some merging and expanding operations. I noticed that when refreshing, the query set to "Connection only" does not have a visual indication of refreshing.
When I refresh the secondary query that relies on information from the connection only one, does it actually refresh both of them? I am having a hard time finding clear documentation on this. A link to where the information is would also be appreciated.
Further information on the queries themselves:
Both link to SQL tables.
One pulls the latest data available in the table.
The other pulls recent information from a different table.
The second one merges the two tables together (by the key).
The second one then only grabs information from the first when there is missing information in the second.
I am specifically asking; When the second table calls a refresh, does the first table also refresh even though it is a connection only?
Yes effectively the first query is also refreshed - it's query definition is run and the result is pulled into your second query.
Note in the Query Editor window you will see a "Preview" dataset for your first query, which would not be refreshed by your refresh of the second query. That "Preview" dataset is only a design tool, it doesn't affect your results when you actually refresh and deliver data into an Excel table.
I also had a tough time finding information about this. There is a post by Ken Puls in PowerPivotPro.com that helps drive some (good) conclusions about this. Net-net, the "connection only" queries ARE refreshed before the merge query, you just don't see it (which, btw, I think should be implemented in Excel). Hope this helps.