I'm inserting like 6000 values into a Azure table storage. I'm inserting the values, 100 a time, in TableBatchOperations. Values are inserted in a async method which is awaited.
Now many of my integration tests fail. They're trying to retrieve previously inserted values, but instead of the 6K values, only 1K or 2K values are returned. If I insert a Task.Delay of multiple seconds in my test, it succeeds.
So table.ExecuteBatchAsync() runs to completion for all of my 60 batches. Does anybody know why there's still so much time between the (completed) insertion and being able to retrieve the data?
Note: you can reproduce this behavior with Microsoft Azure Table Explorer. During insertion, hit the refresh button for the table.
Note2: I've searched a lot for this phenomenon, can't really find any specs of Microsoft stating the time between insertion and being able to retrieve data. I also couldn't find any similar posts on Stackoverflow.
Related
I have an ADF Pipleline which executes a DataFlow.
The Dataflow has Source A table which has around 1 Million Rows,
Filter which has a query to select only yesterday's records from the source table,
Alter Row settings which uses upsert,
Sink which is archival table where the records are getting upsert
This whole pipeline is taking around 2 hours or so which is not acceptable. Actually, the records being transferred / upserted are around 3000 only.
Core count is 16. Tried the partitioning with round robin and 20 partitions.
Similar archival doesn't take more than 15 minutes for another table which has around 100K records.
I thought of creating source which would select only yesterday's record but the dataset we can select only table.
Please suggest if I am missing anything to optimize it.
The table of the Data Set really doesn't matter. Whichever activity you use to access that Data Set can be toggled to use a query instead of the whole table, so that you can pass in a value to select only yesterday's data from the database.
Or course, if you have the ability to create a stored procedure on the source, you could also do that.
When migrating really large sets of data, you'll get much better performance using a Copy activity to stage the data into an Azure Storage Blob before using another Copy activity to pull from that Blob into the source. But, for what you're describing here, that doesn't seem necessary.
I am supposed to optimize the performance of an old Access DB in my company. It contains several tables with about 20 columns and 50000 rows. The speed is very slow, because the people work with the whole table and set the filters afterwards.
Now I want to compose a query to reduce the amount of data in Excel before transfering the complete rows, but the speed is still very slow.
First I tried the new power query editor from Excel. I first reduced the rows by selecting only the last few ones (by date). Then I made an inner join with the 2nd table.
Finally I got less than 20 rows returned, and I thought I was fine.
But when I started Excel to perform the query, it took 10 - 20 seconds to read the data. I could see, Excel loads the complete tables, before setting the filters.
My next try was to create the same query direcly inside the Access DB, same setting. Then I opened this query in Excel, and the time to load the rows is nearly zero. You select "refresh", and the result is shown instantly.
My question is: Is there any way to perform a query in Excel only (without touching the Access file), that is nearly as fast as a query in Access itself?
Best regards,
Stefan
Of course.
Just run an SQL query from MS Query in Excel. You can create the query in Access, and copy-paste the SQL in MS Query. They're executed by the same database engine, and should run at exactly the same speed.
See this support page on how to run queries using MS Query in Excel.
More complex solutions using VBA are available, but shouldn't be needed.
I am using a form site- Cognito Forms. Multiple students can register per form. The following happens using Microsoft Azure Logic Apps: The form is linked to a webhook. For each student a new row should be inserted into a google sheet. When I look at the run data, the input and output data is correct and the correct number of rows are being created. However When I check the google sheet, sometimes only some of rows are being inserted and in no particular order.
This is an old question, but maybe someone can still benefit from an answer. I think I had a similar problem and I was able to solve it by adding the rows sequentially.
I am assuming you are trying to add the rows using "For each". You can configure it to insert the rows sequentially by going to settings, switch concurrency control on, and set the degree of parallelism to 1.
My guess is that if you don't add the rows sequentially, several inserts hit the Google api at more or less the same time and some of the inserts overwrite other inserts.
I have two queries in my workbook that rely on each other. One is set to a connection only, the other is set to load to a table after performing some merging and expanding operations. I noticed that when refreshing, the query set to "Connection only" does not have a visual indication of refreshing.
When I refresh the secondary query that relies on information from the connection only one, does it actually refresh both of them? I am having a hard time finding clear documentation on this. A link to where the information is would also be appreciated.
Further information on the queries themselves:
Both link to SQL tables.
One pulls the latest data available in the table.
The other pulls recent information from a different table.
The second one merges the two tables together (by the key).
The second one then only grabs information from the first when there is missing information in the second.
I am specifically asking; When the second table calls a refresh, does the first table also refresh even though it is a connection only?
Yes effectively the first query is also refreshed - it's query definition is run and the result is pulled into your second query.
Note in the Query Editor window you will see a "Preview" dataset for your first query, which would not be refreshed by your refresh of the second query. That "Preview" dataset is only a design tool, it doesn't affect your results when you actually refresh and deliver data into an Excel table.
I also had a tough time finding information about this. There is a post by Ken Puls in PowerPivotPro.com that helps drive some (good) conclusions about this. Net-net, the "connection only" queries ARE refreshed before the merge query, you just don't see it (which, btw, I think should be implemented in Excel). Hope this helps.
Here is the situation we have:
a) I have an Access database / application that records a significant amount of data. Significant fields would be hours, # of sales, # of unreturned calls, etc
b) I have an Excel document that connects to the Access database and pulls data in to visualize it
As it stands now, the Excel file has a Refresh button that loads new data. The data is loaded into a large PivotTable. The main 'visual form' then uses VLOOKUP to get the results from the form, based on the related hours.
This operation is slow (~10 seconds) and seems to be redundant and inefficient.
Is there a better way to do this?
I am willing to go just about any route - just need directions.
Thanks in advance!
Update: I have confirmed (due to helpful comments/responses) that the problem is with the data loading itself. removing all the VLOOKUPs only took a second or two out of the load time. So, the questions stands as how I can rapidly and reliably get the data without so much time involvement (it loads around 3000 records into the PivotTables).
You need to find out if its the Pivot Table Refresh or the VLOOKUP thats taking the time.
(try removing the VLOOKUP to see how long it take just to do the Refresh).
If its the VLOOKUP you can usually speed that up.
(see http://www.decisionmodels.com/optspeede.htm for some hints)
If its the Pivot table Refresh then it depends on which method you are using to get the data (Microsoft Query, ADO/DAO, ...) and how much data you are transferring.
One way to speed this up is to minimize the amount of data you are reading into the pivot cache by reducing the number of columns and/or predefining a query to subset the rows.