Azure Logic App FTP Connector not running for files modified before the current date - azure

I am working on FTP Connector of the Azure Logic App and it is working fine if I upload a file with today's last modified date.
But FTP connector is not triggered for files that are modified before the current date.
I have found in the trigger history that the trigger is skipped and Status code 202 is being returned.
Kindly suggest me a solution so as to trigger the FTP Connector whenever any file (even if is modified a year ago) is added on the FTP.

The FTP connector maintains a trigger state, which is always the last date it ran or the date it was created (for the very first run). Thus, it only triggers if there are messages with a modified date which is later than that trigger state.
A potential solution is not to use the FTP trigger, but the recurrent trigger, and then use the FTP connector action list files in folder. This will give you all existing files there. Then you can get the content for each, and if your processing succeeds, you can delete the file.
HTH

Related

Trigger Logic App only once when FTP files are changed or modified

I am using Logic app to detect any change in a FTP folder. There are 30+ files and whenever there is any change the storage copies the file to blob. The issue is it's firing on each file if 30 files are changed then it will fire 30 times. I want it to fire only once no matter how many files in a folder changed. After blobs are copied I am firing a Get request so that my website is updated also. Am I using the wrong approach?
Below you can see my whole logic.
As per your verbatim you have mentioned that you are using the FTP connector but as per your screenshot (that has included file content property on the trigger) it looks like you are using the SFTP-SSH connector trigger as FTP trigger doesn't have that property. Please correct if my understanding is correct.
If you are using When a file is added or modified trigger, then it will trigger your workflow on every file that is modified or added, and this is expected behavior that it will trigger your workflow on every file that is modified or added.
But if you are using the When files are added or modified (properties only) then this action has the setting Split On property which you can disable (by default enabled) so your workflow will execute only once for all the files that are added or modified for the How often do you want to check for the item? property time you have configured.
In case if it is FTP connector then you need to disable the Split On property and it still holds valid. For more details refer to this section.

"When A File Is Added Or Modified" trigger works for updated files but not created ones

I have a Logic App setup to monitor an ftp site that should trigger an action when uploading file. If I upload a file, change the original and upload again, trigger fires. Simply adding a file doesn't work.
Chances are your FTP client is preserving the timestamp of the last file change. Change this to have it trigger on newly added files, too.
FTP triggers work by polling the FTP file system and looking for any file that was changed since the last poll. Some tools let you preserve the timestamp when the files change. In these cases, you have to disable this feature so your trigger can work. Here are some common settings:
SFTP client
Action
Winscp
Go to Options > Preferences > Transfer > Edit > Preserve timestamp > Disable
FileZilla
Go to Transfer > Preserve timestamps of transferred files > Disable
Taken from HOW FTP TRIGGERS WORK.
Next to this, please be advised the actual trigger can have a delay that is up to twice the trigger's polling interval:
When a trigger finds a new file, the trigger checks that the new file is complete, and not partially written. For example, a file might have changes in progress when the trigger checks the file server. To avoid returning a partially written file, the trigger notes the timestamp for the file that has recent changes, but doesn't immediately return that file. The trigger returns the file only when polling the server again. Sometimes, this behavior might cause a delay that is up to twice the trigger's polling interval.

Iterate through files in Data factory

I have a Datalake gen 1 with folder structure /Test/{currentyear}/{Files}
{Files} Example format
2020-07-29.csv
2020-07-30.csv
2020-07-31.csv
Every day one new file gets added to the folder.
I need to create ADF to load the files in the SQL server.
COnditions
When my ADF runs for the first time it needs to iterate all files and load into sql server
When ADF executing starting from second time( daily once) it needs to pick up only todays file and load into SQL server
Can anyone tell me how to design ADF with above conditions
This should be designed as two part.
When my ADF runs for the first time it needs to iterate all files and
load into sql server
You should create a temporary pipeline to acheieve this.(I think you know how to do this, so this part I will not talk about.)
When ADF executing starting from second time( daily once) it needs to
pick up only todays file and load into SQL server
So this needs you to create another pipeline which is continuous running.
Two points to acheive this:
First, trigger this pipeline by event trigger.(When the file is upload, trigger this pipeline.).
Second, filter the file by specific format:
For your requirement, the expression should be #{formatDateTime(utcnow(),'yyyy-MM-dd')}.
On my side, I can do that successful. Please have a try on your side.

Azure ADF how to ensure that the same files that are copied, are also deleted?

Using Azure ADF and currently my setup is as follows:
Event based triggerd on a input BLOB on file upload. File upload triggers a copy activity to output BLOB, and this action is followed by a delete operation on the input BLOB. The input BLOB can take 1 or many files at once (not sure how often the file is scanned/how quickly the event triggers the pipeline). Reading up on the delete function documentation it says:
Make sure you are not deleting files that are being written at the same time.
Would my current setup delete files that are being written?
Event based trigger on file upload >> Write from input Blob to Output Blob >> Delete input Blob
I've made an alternative solution which does a get metadata activity based on event in the beginning of the pipeline, and then does a for loop which deletes the files at the end, not sure if this is necessary though. Would my original solution suffice in an unlikely event where I'm receiving files every 15seconds or so?
Also while I'm at it, in a get metadata activity how can I get the actual path to the file, not just the file name?
Thank you for the help.
Delete Active says:
Make sure you are not deleting files that are being written at the
same time.
Your settings are:
Event based trigger on file upload >> Write from input Blob to Output
Blob >> Delete input Blob
Only after the active Write from input Blob to Output Blob finished(the deleting files are not being written), then the Delete input Blob can works.
Your questions: Would my current setup delete files that are being written?
So did you test these steps? You must test by yourself and you will get the answer.
Please notice:
Delete activity does not support deleting list of folders described by wildcard.
Any other suggestions:
You don't need to using delete actives to delete the input blob after Write from input Blob to Output Blob finished.
You can learn from Data flow, it's Source settings support delete the source file(input blob) after the copy active completed.
Hope this helps.
I could not use Leon Yue solution because my source dataset was a sftp one, which is not supported by Azure dataflows.
To deal with this problem, I used the Filter by last modified of the dataset. I set the End Time to the time the pipeline has started.
With this solution, only the files added to the source before the pipeline started will be consumed by both the copy and delete activities.

Azure DML Data slices?

I have 40 mil blobs of 10 TB in blob storage. I am using DML CopyDirectory to copy these into another storage account for backup purpose. It took nearly 2 weeks to complete. Now i am worried that until which date the blobs are copied to target directory. Is it the date when the job started or the date job finished ?
Does DML uses anything like data slices ?
Now i am worried that until which date the blobs are copied to target directory. Is it the date when the job started or the date job finished ?
As far as I know, when you start the CopyDirectory method, it will just send the request to tell the azure storage account to copy files from another storage account. All the copy operation is azure storage.
If we run the method to start copy the directory, the azure storage will firstly create the file with 0 size as below:
After the job finished, you will find it has change the size as below:
So the result is if the job started it will create the file in the target directory, but the file size is 0. You could see image1's file last modify time.
The azure storage will continue copy the file content to the target directory.
If the job finished, it will change the file last modify time.
So the DML SDK just tell the storage to copy files, then it will continue send the request to the azure storage to check each file's copy status.
Like below:
Thanks. But what happens if the files added to the source directory during this copy operation ? Does the new files as well get copied to the target directory ?
In short answer Yes.
The DML won't get the whole blob list and send request to copy all the file at one time.
It will firstly get a part of your file name list and send request to tell the storage copy file.
The list is sort by the file name.
For example.
If the DML have already copied the file name like 0 file as below.
This target blob folder
If you add the 0 start file to your folder,it will not copy.
This is copy from blob folder.
Copy completely blob folder:
If you add the file at the end of your blob folder and the DML doesn't scan it, it will be copied to the new folder.
so during that 2 weeks at least a million blobs must have been added to the container with very random names. So i think DML doesn't work in the case of large containers ?
As far as I know, the DML is designed for high-performance uploading, downloading and copying Azure Storage Blob and File.
When you using the DML CopyDirectoryAsync to copy the blob file.It will firstly send a request to list the folder's current file, then it will send the request to copy the file.
The default of the operation sending a request to list the folder's current file number is 250.
After get list it will generate a marker which is the next blob search file names. It will start to list the next file name in the folder and start copy again.
And by default, the .Net HTTP connection limit is 2. This implies that only two concurrent connections can be maintained.
It means if you don't set the .Net HTTP connection limit, the CopyDirectoryAsync will just get 500 record and start copy.
After copy completely, the operation will start to copy next files.
You could see this images:
The marker:
I suggest you could firstly set the max http connections to detect more blob files.
ServicePointManager.DefaultConnectionLimit = Environment.ProcessorCount * 8;
Besides, I suggest you could create multiple folder to store the files.
For example, you could create a folder which stores one week files.
Next week, you could start a new folder.
Then you could backup the old folder's file without new files store into that folder.
Finally, you could also write your own code to achieve your requirement, you need firstly get the list of the folder's files.
The max result of one request to get the list is 5000.
Then you could send the request to tell the storage copy each files.
If the file upload to the folder after you get the list, it will not copy to the new folder.

Resources