I have created a ADF pipeline to copy around 18 files from FTP location into Azure Blob container. Initially, I have used Get Metadata Activity to get all the files from the FTP location. Then, I have ForEach activity to loop through all the files. Inside ForEach Activity, I have Copy Data Activity which copies from FTP location to Blob location.
While running the pipeline, some of the files are getting copied however, some of them are getting failed saying below error message -
"ErrorCode=UserErrorFileNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The remote server returned an error: (550) File unavailable (e.g., file not found, no access).,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The remote server returned an error: (550) File unavailable (e.g., file not found, no access).,Source=System,'"
I am not sure what is wrong here, because other files get copied successfully, however, few of them are not. I had to try it multiple times, still no guarantee that all files would get copied.
When I try to see if the connection to FTP Linked service is working or not, it says it connects successfully. FTP linked service is SSL enabled and configured to get password from Azure Key Vault.
Refer below output when I ran pipeline -
Any thoughts as to what is going wrong in here? Is there any limit on number of files being copied at one time?
Thank you in advance.
As #Joel Cochran said the issue may be a concurrency limit issue.
When we select Sequential, Copy activity will be single-threaded. Uncheck it, Copy activity will be multi-threaded, efficiency is greatly improved.
So our solution is:
Uncheck Sequential
Increase the maximum number of parallel operations of internal activities.
Related
I have set of files stored in Azure blob storage. I am trying to index all these files based on daily basis. Some times indexer runs are failing with errors. I am not sure why it is failing sometimes and sometimes it will run successfully.
Tried to resolve error but not able to solve it. Because sometimes indexer runs successfully and sometimes not.
By looking at error, it seems like some of the files in your storage are not supported for azure search indexing or files may be corrupted. Suggesting to check files are corrupted, or the files are not supported for indexing as mentioned here.
I have tried from my side and below are steps i followed,
I have list of files in storage account which are having different formats.
Created index, data source , skillset and indexer.
As there are different formats in data source configured allowed formats in indexer as shown below,
In this case, indexer will not fail for unsupported formats.
Also if you don't want to stop run of indexer in failure case, you can configure setting in indexer as shown below and reference is here,
I'm using Azure Release Pipeline (classic) to upload build files to FTP (not ftps) Server.
It can able to transfer all files successfully, but only one file showing below error,
That only file is not able to overwrite.
Can anyone please help me.
Thanks in advance.
Please check if the file that failed to be overwritten is read-only. I can reproduce the same error if file is read-only on my ftp server. See this thread.
Above error can also be caused by the file being used by another process and being locked. See here.
You can also check if the Agent Account has the modify permission to this file. See here.
I am trying to implement the following flow in an Azure Data Factory pipeline:
Copy files from an SFTP to a local folder.
Create a comma separated file in the local folder with the list of files and their
sizes.
The first step was easy enough, using a 'Copy Data' step with 'SFTP' as source and 'File System' as sink.
The files are being copied, but in the output of this step, I don't see any file information.
I also don't see an option to create a file using data from a previous step.
Maybe I'm using the wrong technology?
One of the reasons I'm using Azure Data Factory, is because of the integration runtime, which allows us to have a single fixed IP to connect to the external SFTP. (easier firewall configuration)
Is there a way to implement step 2?
Thanks for any insight!
There is no built-in feature to achieve this.
You need to use ADF with other service, I suppose you to first use azure function to check the files and then do copy.
The structure should be like this:
You can get the size of the files and save them to the csv file:
Get size of files(python):
How to fetch sizes of all SFTP files in a directory through Paramiko
And use pandas to save the messages as csv(python):
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
Writing a pandas DataFrame to CSV file
Simple http trigger of azure function(python):
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-http-webhook-trigger?tabs=python
(Put the processing logic in the body of the azure function. Basically, you can do anything you want in the body of the azure function except for the graphical interface and some unsupported things. You can choose the language you are familiar with, but in short, there is not a feature in ADF that satisfies your idea.)
I am trying to create a logic app that will transfer files as they are created from my FTP server to my Azure file share. The structure of the folder my trigger is watching is structured by date (see below). Each day that a file is added, a new folder is created, so I need the trigger to check new subfolders but I don't want to go into the app every day to change which folder the trigger looks at. Is this possible?
Here's how my folder(Called data) structure is, each day that a file is added a new folder is created.
-DATA-
2016-10-01
2016-10-02
2016-10-03
...
The FTP Connector uses a configurable polling where you set how many times it should look for a file. The trigger currently does not support dynamic folders. However what you could try is the following:
Trigger your logic app by recurrence (same principle as the FTP trigger in fact)
Action: Create a variable to store the date time (format used in your folder naming)
Action: Do a list files in folder (here you should be able to dynamically set the folder name using the variable you created)
For-each file in folder
Action: Get File Content
Whatever you need to do with the file (call nested logic app in case you need to do multiple processing actions on each fiel is smart if you need to handle resubmits of the flow by file)
In order to avoid that you pick up every file each time, you will need to find a way to exlude files which have been processed in an earlier run. So either rename the file after it's processed to an extension you can exclude in the next run or move the file to a subfolder "Processed\datetime" in the root.
This solution will require more actions and thus will be more expensive. I haven't tried it out, but I think this should work. Or at least it's the approach I would try to set up.
Unfortunately, what you're asking is not possible with the current FTP Connector. And there aren't any really great solution right now...:(
As an aside, I've seen this pattern several times and, as you are seeing, it just causes more problems than it could solve, which realistically is 0. :)
If you own the FTP Server, the best thing to do is put the files in one folder.
If you do not own the FTP Server, politely mention to the owner that this patterns is causing problems and doesn't help you in any way so please, put the files on one folder ;)
I have an SSIS package that does a Bulk Insert, then executes a SQL Task, and then finally writes some database data to a flat file on our network. The package runs fine if I run it from Visual Studio 2012. However, if I deploy the package to the Integration Services Catalog on a 2012 SQL Server and run it from there, the Bulk Insert and SQL Task run fine, but when the package tries to output to the flat file, I get these error messages:
Cannot open the datafile "\\nyfil006\Projects\Accounting\CostRecovery\Cafe de Novo\HospitalityCharges.csv".
HospitalityCharges Flat File failed the pre-execute phase and returned error code 0xC020200E.
I'm able to output the System::UserName to an errorlog, and it's what I think it should be: an account that has full permissions to the folder in the flat file destination (and its parent folders). I've tried creating a blank version of HospitalityCharges.csv, and DelayValidation is set to True for the Data Flow Task that outputs the flat file. My system admin has granted Network Service permissions to the folder as per this link and this link, but that doesn't help. I've also made the connection string an expression as described here. We've also created a mapped drive and used that for the Destination Connection String instead of a UNC path. No joy. Does anyone know why this is happening?
Another note: if I change the flat file destination to point to the C: drive, the package runs fine, whether I run it from Visual Studio or from the Integration Services Catalog.