Filter recent files in Logic Apps' SFTP when files are added/modified trigger - azure

I have this Logic App that connects to an SFTP server and it's triggered by the "files are added or modified" trigger. It's set to run every 10 minutes, looking for new/modified files and copying them to an Azure storage account.
The problem is that this SFTP server path is set to overwrite a set of files every X minutes (I have no control over this) and so, pretty often the Logic App overlaps with the update process of these files and downloads files that are still being written. The result is corrupted files.
Is there a way to add a filter to the When files are added or modified (properties only) so that it only takes into consideration files with a modified date of, at least, 1 minute old?
That way, files that are currently being written won't be added to the list of files to download. The next run of the Logic App would then fetch this ignored files and so on.
UPDATE
I've found a Trigger Conditions in the trigger's setting but I can't find any documentation about it.

According to test the trigger "When files are added or modified", it seems we can not add a filter in the trigger to filter the records which are modified at least 1 minute ago. We can just get the List of Files LastModified datetime and loop them, use "If" condition to judge if we should download it.
Update:
The expression in the screenshot is:
sub(ticks(utcNow()), ticks(triggerBody()?['LastModified']))
Update workaround
Is it possible to add a "Delay" action when the last modified time less than 1 minute ? For example, if the last modified time less than 60 seconds, use "Delay" to wait 5 minutes until the overwrite operation complete, then do the download.

I check the sample #equals(triggers().code, 'InternalServerError'), actually it uses the condition functions in Logical comparison functions, so the key word is make sure the property you want to filter is in the trigger or triggerBody or you will get the below error.
So I change the expression to like #greater(triggerBody().LastModified,'2020-04-20T11:23:00Z'), this could filter the file modified less than 2020-04-20T11:23:00Z not trigger the flow.
Also you could use other function like less ,greaterOrEquals etc in the Logical comparison functions.

Related

Rotate logfiles on an hourly basis by appending date and hour

I wanted to implement a log rotation option in linux. I have a *.trc file where all the logs are getting written. I wanted a new log file to be created every hour. I have done some analysis and found the below
I have done some analysis and got to know about the logrotate option. Where we need to update the rotation details for a specific file in the logrotate.conf file
I wanted to know if there is an option without using the logrotate option. I wanted to rotate the logfiles on an hourly basis, so something like appending date and hour information to the log file and create new files based on the current hour information.
Im looking for some suggestions on how to implement the log rotation using the second option specified above.
Any details on the above would be really helpful
If you have control over the process that creates the logs, you could just timestamp the file at the moment of creation. This will remove the need to rename the log.
Before you write every line you check the time. If one hour passed after that file was created, you close the current file and open a new one with a new timestamp.
If you do not have control over the process, you can pipe the output of your process (stdout,stderr) to multilog, which is a binary that's part of the package daemon-tools in most Linux distros.
https://cr.yp.to/daemontools/multilog.html

Azure Data Factory - Event Triggers on Files In Multiple Folders

We are invoking ADF pipeline based on event based trigger.
Is there a way to trigger this pipeline only when a file arrives in both of these child folders
e.g
ParentFolder
-- ChildFolder1
-- ChildFolder2
Now we would like to trigger our pipelines only if a new file arrives in both of these folders.i.e ChildFolder1 and ChildFolder2
There is no out of box approach this. I can think of the below alternatives.
First Approach
You can set a trigger at the ChildFolder2
You can use a 'lookup activity' or 'Get Metadata Activity' which fetches the file with the name in the ChildFolder1 - See whether the file is created at the ChildFolder1.
If you would like to check after some time - delay it say for 10- 15 minutes. You could make use of the Wait activity
Now, if the file is existent - then you could continue with the rest of the execution of the pipeline. If the file is not created in Childfolder1 - then you could end the pipeline with no activity carried out.
The pipeline will eventually be triggered when the file is created in childfolder2. The Execution flow changes based on an If activity and existence of the file in the childfolder1.
Second Approach
If you don't have filename - and would like to get a file dynamically time created.
In the same way as above - you could set a Event trigger at the childfolder2.
In the pipeline execution you filter the files based out of the timestamp of the file pipeline start. This is slightly tricky.
You do a GetMetada for the childfolder1 and filter it using the foreach and if condition. (get the latest added file in a folder [Azure Data Factory])
If there is any file then execute the pipeline with rest of the activities else you could end the pipeline execution.

Quartz Scheduling to delete files

I am using the file component with an argument quartz scheduler in order to pull some files from a given directory on every hour. Then i transform the data from the files and move the content to other files in other directory. After that I am moving the input files to an archive directory. When a file is moved to this directory it should stay there only a week and then it should be deleted automatically. The problem is that Im not really sure how can i start a new cron job because I dont really know when any of the files is moved to that archive directory. Maybe is something really trivial but I am pretty new to camel and I dont know the solution. Thank you in advance.
Use option "filterFile"
Every file has modified timestamp and you can use this timestamp to filter file that are older than 1 week. Under file component, there exist an option filterFile
filterFile=${date:file:yyyyMMdd}<${date:now-7d:yyyyMMdd}
Above evaluation comes from file language, ${date:file:yyyyMMdd} denote modified timestamp of the file in form (year)(month)(day) and ${date:now-7d:yyyyMMdd} denote current time minus 7 days in form (year)(month)(day).

Using Logic Apps to get specific files from all sub(sub)folders, load them to SQL-Azure

I'm quite new to Data Factory and Logic Apps (but I am experienced with SSIS since many years),
I succeeded in loading a folder with 100 text-files into SQL-Azure with DATA FACTORY
But the files themselves are untouched
Now, another requirement is that I loop through the folders to get all files with a certain file extension,
In the end I should move (=copy & delete) all the files from the 'To_be_processed' folder to the 'Processed' folder
I can not find where to put 'wildcards' and such:
For example, get all files with file extensions .001, 002, 003, 004, 005, ...until... , 996, 997, 998, 999 (thousand files)
--> also searching in the subfolders.
Is it possible to call a Data Factory from within a Logic App ? (although this seems unnecessary)
Please find some more detailed information in this screenshot:
(click to enlarge)
Thanks in advance helping me out exploring this new technology!
Interesting situation.
I agree that using Logic Apps just for this additional layer of file handling seems unnecessary, but Azure Data Factory may currently be unable to deal with exactly what you need...
In terms of adding wild cards to your Azure Data Factory datasets you have 3 attributes available within the JSON type properties block, as follows.
Folder Path - to specify the directory. Which can work with a partition by clause for a time slice start and end. Required.
File Name - to specify the file. Which again can work with a partition by clause for a time slice start and end. Not required.
File Filter - this is where wildcards can be used for single and multiple characters. (*) for multi and (?) for single. Not required.
More info here: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-onprem-file-system-connector
I have to say that separately none of the above are ideal for what you require and I've already fed back to Microsoft that we need a more flexible attribute that combines the 3 above values into 1, allowing wildcards in various places and a partition by condition that works with more than just date time values.
That said. Try something like the below.
"typeProperties": {
"folderPath": "TO_BE_PROCESSED",
"fileFilter": "17-SKO-??-MD1.*" //looks like 2 middle values in image above
}
On a side note; there is already a Microsoft feedback item thats been raised for a file move activity which is currently under review.
See here: https://feedback.azure.com/forums/270578-data-factory/suggestions/13427742-move-activity
Hope this helps
We have used a C# application which we call through 'app services' -> webjobs.
Much easier to iterate through folders. To call SQL we used sql bulkinsert

How to retrive Files generated in the past 120 minutes in Linux and also moved to another location

For one of my Project, I have a certain challenge where I need to take all the reports generated in a certain path, I want this to be an automated process in "Linux". I know the way how to get the file names which have been updated in the past 120 mins, but not the files directly. Now my requirements are in such a way
Take a certain files that have been updated in past 120 mins from the path
/source/folder/which/contains/files
Now do some bussiness logic on this generated files which i can take care of
Move this files to
/destination/folder/where/files/should/go
I know how to achieve #2 and #3 but not sure of #1. Can someone help me how can i achieve this.
Thanks in Advance.
Write a shell script. Sample below. I haven't provided the commands to get the actual list of file names as you said you know how to do that.
#!/bin/sh
files=<my file list>
for file in $files; do
cp $file <destination_dirctory>
done

Resources