I have developed a Workbook merging data from couple of small Queries together. These queries (tiles) are hidden (conditionally visible: condition : 1=2). It seems to be working during development an manual run of the queries. But when I open the workbook another day in read mode the resulting query (merge) is not going to be started/finished (?). It shows as it would be running but it lasts endlessly till I run the source-queries manually, and then rerun the resulting query.
It looks like pretty standard configuration, why it doesn't work for me?
source-queires
merge query
In workbooks, queries will run if they are
a) visible
-or-
b) referenced by a merge that is visible
We currently do not support merges referencing hidden merges.
But fear not! A lot of people miss that a merge step can actually do many merges at once, so it is almost never need merges that reference another merge.
Related
I currently have approximately 10M rows, ~50 columns in a table that I wrap up and share as a pivot. However, this also means that it takes approximately 30mins-1hour to download the csv or much longer to do a powerquery ODBC connection directly to Redshift.
So far the best solution I've found is to use Python -- Redshift_connector to run update queries and perform an unload a zipped resultset to an S3 bucket then use BOTO3/gzip to download and unzip the file, then finally performing a refresh from the CSV. This resulted in a 600MB excel file compiled in ~15-20 mins.
However, this process still feel clunky and sharing a 600MB excel file among teams isn't the best either. I've searched for several days but I'm not closer to finding an alternative: What would you use if you had to share a drillable table/pivot among a team with a 10GB datastore?
As a last note: I thought about programming a couple of PHP scripts, but my office doesn't have the infrastructure to support that.
Any help would or ideas would be most appreciated!
Call a meeting with the team and let them know about the constraints, you will get some suggestions and you can give some suggestions
Suggestions from my side:
For the file part
reduce the data, for example if it is time dependent, increase the interval time, for example an hourly data can be reduced to daily data
if the data is related to some groups you can divide the file into different parts each file belonging to each group
or send them only the final reports and numbers they require, don't send them full data.
For a fully functional app:
you can buy a desktop PC (if budget is a constraint buy a used one or use any desktop laptop from old inventory) and create a PHP/Python web application that can do all the steps automatically
create a local database and link it with the application
create the charting, pivoting etc modules on that application, and remove the excel altogether from your process
you can even use some pre build applications for charting and pivoting part, Oracle APEX is one examples that can be used.
I have an excel file connected to an access database. I have created a query through Power Query that simply brings the target table into the file and does a couple of minor things to it. I don’t load this to a worksheet but maintain a connection only.
I then have a number of other queries linking to the table created in the first query.
In one of these linked queries, I apply a variety of filters to exclude certain products, customers and so on. This reduces the 400,000 records in the original table in the first query down to around 227,000 records. I then load this table to a worksheet to do some analysis.
Finally I have a couple of queries looking at the 227,000 record table. However, I notice that when I refresh these queries and watch the progress in the right hand pane, they still go through 400,000 records as if they are looking through to the original table.
Is there any way to stop this happening in the expectation that doing so would help to speed up queries that refer to datasets that have themselves already been filtered?
Alternatively is there a better way to do what I’m doing?
Thanks
First: How are you refreshing your queries? If you execute them one at a time then yes, they're all independent. However, when using Excel 2016 on a workbook where "Fast Data Load" is disabled on all queries, I've found that a Refresh All does cache and share query results with downstream queries!
Failing that, you could try the following:
Move the query that makes the 227,000-row table into its own group called "Refresh First"
Place your cursor in your 227,000-row table and click Data - Get &
Transform - From Table,
Change all of your queries to pull from this new query rather than the source.
Create another group called "Refresh Second" that contains every query that
is downstream of the query you created in step 2, and
loads data to the workbook
Move any remaining queries that load to the workbook into "Refresh First", "Refresh Second", or some other group. (By the way: I usually also have a "Connections" group that holds every query that doesn't load data to the workbook, too.)
Unfortunately, once you do this, "Refresh All" would have to be done twice to ensure all source changes are fully propagated because those 227,000 rows will be used before they've been updated from the 400,000. If you're willing to put up with this and refresh manually then you're all set! You can right-click and refresh query groups. Just right-cick and refresh the first group, wait, then right-click and refresh the second one.
For a more idiot-proof way of refreshing... you could try automating it with VBA, but queries normally refresh in the background; it will take some extra work to ensure that the second group of queries aren't started before all of the queries in your "Refresh First" group are completed.
Or... I've learned to strike a balance between fidelity in the real world but speed when developing by doing the following:
Create a query called "ProductionMode" that returns true if you want full data, or false if you're just testing each query. This can just be a parameter if you like.
Create a query called "fModeSensitiveQuery" defined as
let
// Get this once per time this function is retrived and cached, OUTSIDE of what happens each time the returned function is executed
queryNameSuffix = if ProductionMode then
""
else
" Cached",
// We can now use the pre-rendered queryNameSuffix value as a private variable that's not computed each time it's called
returnedFunction = (queryName as text) as table => Expression.Evaluate(
Expression.Identifier(
queryName & queryNameSuffix
),
#shared
)
in
returnedFunction
For each slow query ("YourQueryName") that loads to the table,
Create "YourQueryName Cached" as a query that pulls straight from results the table.
Create "modeYourQueryName" as a query defined as fModeSensitiveQuery("YourQueryName")
Change all queries that use YourQueryName to use modeYourQueryName instead.
Now you can flip ProductionMode to true and changes propagate completely, or flip ProductionMode to false and you can test small changes quickly; if you're refreshing just one query it isn't recomputing the entire upstream to test it! Plus, I don't know why but when doing a Refresh All I'm pretty sure it also speeds up the whole thing even when ProductionMode is true!!!
This method has three caveats that I'm aware of:
Be sure to update your "YourQueryName Cached" query any time the "YourQueryName" query's resulting columns are added, removed, renamed, or typed differently. Or better yet, delete and recreate them. You can do this because,
Power-Query won't recognize your "YourQueryName" and "YourQueryName Cached" queries as dependencies of "modeYourQueryName". The Query Dependences diagram won't be quite right, you'll be able to delete "YourQueryName" or "YourQueryName Cached" without Power Query stopping you, and renaming YourQueryName will break things instead of Power Query automatically changing all of your other queries accordingly.
While faster, the user-experience is a rougher ride, too! The UI gets a little jerky because (and I'm totally guessing, here) this technique seems to cause many more queries to finish simultaneously, flooding Excel with too many repaint requests at the same time. (This isn't a problem, really, but it sure looks like one when you aren't expecting it!)
I know it might be a bit a confusing title but couldn't get up to anythig better.
The problem ...
I have a ADF Pipeline with 3 Activities, first a Copy to a DB, then 2 times a Stored procedure. All are triggered by day and use a WindowEnd to read the right directory or pass a data to the SP.
There is no way I can get a import-date into the XML files that we are receiving.
So i'm trying to add it in the first SP.
Problem is that once the first action from the pipeline is done 2 others are started.
The 2nd action in the same slice, being the SP that adds the dates, but in case history is loaded the same Pipeline starts again a copy for another slice.
So i'm getting mixed up data.
As you can see in the 'Last Attempt Start'.
Anybody has a idea on how to avoid this ?
ADF Monitoring
In case somebody hits a similar problem..
I've solved the problem by working with daily named tables.
each slice puts its data into a staging table with a _YYYYMMDD after, can be set as"tableName": "$$Text.Format('[stg].[filesin_1_{0:yyyyMMdd}]', SliceEnd)".
So now there is never a problem anymore of parallelism.
The only disadvantage is that the SP's coming after this first have to work with Dynamic SQL as the table name where they are selecting from is variable.
But that wasn't a big coding problem.
Works like a charm !
I currently have an excel based data extraction method using power query and vba (for docs with passwords). Ideally this would be programmed to run once or twice a day.
My current solution involves setting up a spare laptop on the network that will run the extraction twice a day on its own. This works but I am keen to understand the other options. The task itself seems to be quite a struggle for our standard hardware. It is 6 network locations across 2 servers with around 30,000 rows and increasing.
Any suggestions would be greatly appreciated
Thanks
if you are going to work with increasing data, and you are going to dedicate a exclusive laptot for the process, i will think about install a database in the laptot (MySQL per example), you can use Access too... but Access file corruptions are a risk.
Download to this db all data you need for your report, based on incremental downloads (only new, modified and deleted info).
then run the Excel report extracting from this database in the same computer.
this should increase your solution performance.
probably your bigger problem can be that you query ALL data on each report generation.
I am working on a project within excel and am starting prepare my document for future performance related problems. The excel file contains large amounts of data and large amounts of images which are all in sets, ie, 40 images belong to one function of the program, another 50 belong to another etc... and only one set of them is used at a time.
This file is only going to get bigger as the number of jobs/functions it has to handle increase. Now, I could just make multiple excel files and let the user choose which one is appropriate for the job but it is requested that this is all done from one file.
Baring this in mind, I started thinking about methods of creating such a large file whilst keeping its performance levels high and had an idea which I am not sure is possible or not. This is to have multiple protected workbooks each one containing the information for each job "set" and a main workbook which accesses these files depending on the user inputs. This will result in many excel files which take time to download initially but whilst being used should eliminate the low performance issues as the computer only has to access a subset of these files.
From what I understand this is sort of like what DLL's are for but I am not sure if the same can be done by excel and if possible would the performance increase be significant?
If anyone has any other suggestions or elegant solutions on how this can be done please let me know.
Rather than saving data such as images in the excel file itself, write your macro to load the appropriate images from files and have your users select which routine to run. This way, you load only files you need. If your data is text / numbers, you can store it in a CSV or, if your data gets very large, use a Microsoft Access database and retrieve the data using the ADODB library.
Inserting Images: How to insert a picture into Excel at a specified cell position with VBA
More on using ADODB: http://msdn.microsoft.com/en-us/library/windows/desktop/ms677497%28v=vs.85%29.aspx