ADF Mapping Data Flows failing with BatchUpdateException - azure

I have a number of Mapping Data flows that have been running regularly for the past several months, and some of them started failing yesterday.
The data flow pattern is -
Source: 2 Azure SQL DB tables, a lookup table in Synapse
Sink: 1 table in Synapse (Azure SQL DB)
We have enabled Polybase Staging for better performance, as each activity takes too long without it, and have a linked service to an Azure Blob Storage account for this.
Last night's run failed midway for some of our larger tables with the following error, but the smaller tables were all successful. Nothing has changed on any of these pipelines, or on any of the linked services in several months.
Going into debug mode, I can't look at the data preview for any of the Synapse sink activities unless I disable the 'Staging' option in settings. If I try with Staging enabled, it says "Blob storage staging properties should be specified", which I have entered in debug settings, yet still get the error.
The strange thing is that this problem is only occurring on the data flows moving larger amounts of data, the smaller tables are fine in debug mode as well. All these data flows were successful 2 days ago, so is this perhaps a space issue in Blob Storage?
The pipeline activity error code:
{"StatusCode":"DFExecutorUserError",
"Message":"Job failed due to reason: at Sink 'SinkIntoSynapse':
java.sql.BatchUpdateException: There are no batches in the input script.",
"Details":"at Sink 'SinkIntoSynapse':
java.sql.BatchUpdateException: There are no batches in the input script."}

I have seen this caused by having a commented out SQL statement in the Pre-copy script section of the sink setting.
If you have anything in the Pre-copy script section, try removing it before publishing and running the data facatory again.

I confirm what Kevin said: in my case, I had started writing a SQL script, even after I cancelled it, I was still getting the error.
Try clicking the Recycle Bin icon, shown in the screenshot.
Worked for me.

Related

Why my DB is locked when an ADF copy Activity is running?

Just want to know why, when a Azure Data Factory (ADF) copy activity that's transfering data from the cloud to an on-prem db, blocks all the table.
When i click on the + button of the tables, a timeout error messagge appears. I'm dealing with huge data.
I'm not using a ForEach Activity but all Copy Activities run in parallel.
If you are using a ForEach activity, check the sequential box to make sure all tasks are not running in parallel and locking the tables. See this documentation for more info.
If "isSequential" is set to False, ensure that there is a correct configuration to run multiple executables. Otherwise, this property should be used with caution to avoid incurring write conflicts. For more information, see Parallel execution section.

Azure - never ending Full Backup Uploading in Database Migration Service

I have been migrating some databases from a SQL Server to an SQL Managed Instance. 13 of 14 DBs have been successfully restored. There is only one remaining, the biggest one with almost 600 Gb. It has been more than a week continuously uploading the initial full backup and it is still running.
It is a big database but I thought it has been a long time and it should have been finished by now. For this reason I have been trying some cmd/az commands but I don´t get anything more than a running status.
The strange thing is that I can´t see the DB (in recovery mode) in the SQL Management Studio and the file has not been created yet in the container of the Storage Account. All the other databases appear in SSMS and in the storage account.
I had around 75 Gb more than the total size of the databases in the Storage Account, so I guess that was not the issue. In any case, I added 500 Gb more but still no results.
Is it possible to stop the task and restart it to see if this helps? Obviously I would not like to upload all databases again if possible.
Could you please help ?
Thank you!
As explained in the comments before, the best options for the migration of old SQL Servers in my case were:
Check regularly the cpu and network performance of the server.
When you configure your SQL MI, use at least the double storage size of the full DB backups size.
Finally, if you have big DBs, (my case more than 400Gbs), create different activities* to separate the small ones from the big ones. This would help as well if any errors happen into the big DBs. You won´t need to upload all of them again.
*NOTE. I had some issues when I had more than 2 activities: Some of them stayed in "Queued" Status and after a day still did not run. This happened even when the other activities were already completed. So, to fix this, I had to delete all the activities and create the remaining one again.
Have a good day.
I would recommend to open a case with Support to make sure there is no patching or failover happening on the SQL MI during Migration.
I have seen this happen before where the restore is going through for a VLDB and then patching on SQL MI causes it to restart restoring again.
Hopefully this will help

Azure Geo-replication a good fit to reduce downtime during updates

We currently have an Azure SQL database that supports a web application. It is read only from a user perspective. The problem is we have to drop the tables and reload them with new data each hour, this makes the application unavailable for 5 minutes each hour which is unacceptable.
Reading the information for Azure active geo-replication seems a little vague. I thought I might be able to use active geo-replication to fail over to a secondary database and take the primary offline to do the update and then when the update is complete switch back to the original primary to allow the secondary to auto-sync. However, it is not clear from what I have read about active geo-replication if that scenario is possible.
Could someone provide some help on this problem or direct me toward another possible solution.
Thanks
You can give it a try to Azure Data Factory since it allows you to append data to a destination table or invoke a stored procedure with custom logic during copy when SQL Server/SQL Azure is used as a "sink". You can learn more here.
Azure Data Factory allows you to incrementally load data (delta) after an initial full data load by using a watermark column that has the last updated time stamp or an incrementing key. The delta loading solution loads the changed data between an old watermark and a new watermark. You can learn more how to do that with Azure Data Factory on this article.
If the setup takes 5 mins, and the the data is read only, then I would make a new empty DB in every hour, using some C# code or PowerShell script, and run the data import on this new empty DB, and once it is done, I'd change the connection string in the running production application to point to the new updated version of the DB, and then drop the old DB. This way you won't have any downtime, because till the import is running, the application will connect to the old DB, and when the import is done, it will connect to the new DB.

Azure Data Factory - Maximum call stack size exceeded

I created a linked service for creating an HDInsight cluster a few weeks back, was able to modify it after the fact as well for # nodes, cluster type, etc.
When I go to edit the HDInsight linked service today, it throws back "Maximum call stack size exceeded". I tried creating a fresh linked service today with the same parameters and I encounter the same error there.
I tried the same behavior on a different data factory under a different subscription and there it was successful, no errors were thrown. Then I tried it with a different linked service there, and saw the same error... Wondering if anybody has encountered this particular scenario and had insights.
The Issue should have been fixed. Please refresh the UI and try again. Thanks for the patience.

SQL Azure distributing heavy read queries for reporting

We are using SQL Azure for our application and need some inputs on how to handle queries that scan a lot data for reporting. Our application is both read/write intensive and so we don't want the report queries to block the rest of the operations.
To avoid connection pooling issues caused by long running queries we put the code that queries the DB for reporting onto a worker role. This still does not avoid the database getting hit with a bunch of read only queries.
Is there something we are missing here - Could we setup a read only replica which all the reporting calls hit?
Any suggestions would be greatly appreciated.
Have a look at SQL Azure Data Sync. It will allow you to incrementally update your reporting database.
here are a couple of links to get you started
http://msdn.microsoft.com/en-us/library/hh667301.aspx
http://social.technet.microsoft.com/wiki/contents/articles/1821.sql-data-sync-overview.aspx
I think it is still in CTP though.
How about this:
Create a separate connection string for reporting, for example use a different Application Name
For your reporting queries use SET TRANSACTION ISOLATION LEVEL SNAPSHOT
This should prevent your long running queries blocking your operational queries. This will also allow your reports to get a consistent read.
Since you're talking about reporting I'm assuming you don't need real time data. In that case, you can consider creating a copy of your production database at a regular interval (every 12 hours for example).
In SQL Azure it's very easy to create a copy:
-- Execute on the master database.
-- Start copying.
CREATE DATABASE Database1B AS COPY OF Database1A;
Your reporting would happen on Database1B without impacting the actual production database (Database1A).
You are saying you have a lot of read-only queries...any possibility of caching them? (perfect since it is read-only)
What reporting tool are you using? You can output cache the queries as well if needed.

Resources