Creating a dependency pipeline to check file is latest in ADF

Creating a dependency pipeline to check file is latest in ADF - azure

I am trying to create a dependency pipeline for files before executing my model refresh (web activity) I want to make sure all the related files are there in their respective folders and all files are latest.
Suppose, my model refreshes uses the following file present in adls-
myadls/raw/master/file1.csv
myadls/raw/dim/file2.csv
myadls/raw/dim/file3.csv
myadls/raw/reporting/file4.csv
We need to compare the files last modified with today's date. If both are equal then files are the latest. If any of the files is not the latest then I need an email with the file name that is not the latest and I shouldn't trigger my web activity which usually does model refresh.
I have created this pipeline using get metadata, for each activity, If-condition, web activity, and Set variable activity. But the problem is I am not able to get an email for the file which is not the latest.
How I can get an email for a file which is not the latest file according to my scenario?
Note, the above folders can have more than 100 files, but I am only looking for specific files I am using in my model.

We use SendGrid API to send emails at my company.
You can easily pass the FileNames in the body of the email using any email API out there. You can write the FileNames to a variable then reference the variable in the body. It sounds like you have built almost everything out, so within your ForEach, just have an Append to Variable step that writes a new value to your array variable. Then you can use those array values in your SendEmail Web Activity, or use a string conversion function, there are many ways to do it.
I will update this post with example later

As per your current arch ,you can create variables per foreach activity that would store the file name .
So within foreach activity, in case if the file is not latest using append variable activity
you can save all file names.
and then in the final validation, you can concat all for each loop variables to have the final list of files that are not modified.
But ideally I would suggest the below approach :
Have the list of files created as a lookup activity output.
Provide that to a single foreach activity in sequential execution.
within foreach via IF activity and getmeta data activity, check whether the file is latest or not.
If not via append variable activity append the file name.
Once out of foreach, via If condition check whether the file name variable is blank or has some values.
If it has values, then you can send an email and the filename variable has all the non updated file names

Related

Azure logic apps: Only attach file if condition met outlook v2

I am creating an automated email that sends everyday on a schedule. Within this 4 reports are created using individual sql queries that then generate csv files with essentially a list of names - all this works fine. When some of these are generated they are empty so I have used parallel branches to set a variable to true or false if the csv is created or not.
So at the end of this I have 3/4 files and a variable for each stating whether or not it has been created.
What I want to know is how do I only attach a file if a variable is true as if a null/not created csv is attached it errors.
Any additional information required just ask.
Thanks

What I want to know is how do I only attach a file if a variable is true as if a null/not created csv is attached it errors.
When you use parallel branches, each branch is executed regardless of whether the variable is true/false or if the file is created/not. In this case, you can use the Condition action instead of parallel branches where the email is sent only when the variable is true. Below is the flow of my Logic app.

ADF Storage event trigger when there are multiple files in different folders

I need to trigger my ADF pipeline when three files arrives in paths : container/folder1/file1.parquet
container/folder2/file2.parquet
container/folder3/file3.parquet
Only when these 3 subfolders gets new files(files will be overwritten) should the ADF pipeline trigger.
How can we achieve this?
Update : This should be an AND condition, ie - the pipeline should be triggered only after all 3 files arrive or gets updated.

Update:
There is no out-of-the-box feature to achieve this, you can share your idea here
What you can do is
Setup a Storage event triggered pipeline on first destination i.e. container/folder1/file1.parquet as I have explained earlier.
Then after maybe waiting for few secs using WaitActivity use Get Metadata activity with Field list
Argument as Child items to get list of files in the folder
or
LookupAcitivty chain to look for files at container/folder2/file2.parquet and container/folder3/file3.parquet with file list path property. File list examples
Then you can hold the results in variables for convenience and using Conditional activities like IfActivity compare to see if all the files exist, if True you can proceed with further activities you plan to design in the pipeline when the three files arrived.
In case of explicit availability or 3 files seperately..
You can simply use 3 different new triggers for same pipeline. Each with different folder as its Blob path ends with property in trigger.
Here is sample trigger for first folder i.e. container/folder1/file1.parquet
You can also mention like .parquet using patterns to match files with different names dynamically.
Note: That Blob path begins with and ends with are the only pattern matching allowed in Storage Event Trigger.
Similarly you can create 2 more for container/folder2/file2.parquet and container/folder3/file3.parquet

Azure Data Factory - Copy list of files to flat directory and preserve filename

This is sort of following on from this question asked:
Azure Data Factory - Read in a list of filepaths from a fileshare and save to Blob
What I have is essentially a list of filepaths saved inside a text file, as below:
eg: filepaths.txt ==
C:\Docs\subfolder1\test1.txt
C:\Docs\subfolder2\test2.txt
C:\Docs\subfolder3\test3.txt
The files I want to copy can be in different subfolders. I want to copy all these files to Blob, so that the output in Blob looks like below:
/CombinedSubfolder/test1.txt
/CombinedSubfolder/test2.txt
/CombinedSubfolder/test3.txt
Where all my files from the original fileshare are in the same subfolder (Regardless of what subfolder they were in before) AND they keep their original filename.
I've been trying to mess with lookups and for each loops but I can't seem to figure out the best approach.
Thanks,

Step1: Use Lookup Activity to read data from your text file where you
have file paths saved.
The lookup activity in the Azure Data Factory pipeline is most commonly used for
configuration lookup. It contains the original dataset. Lookup
activity was used to extract data from a source dataset and save it as
the activity's output. The output of the search activity is often
utilised later in the pipeline to make a choice and configure the
system accordingly.
Using : #activity(‘activityName‘).output
Step2: Use ForEach Activity to iterate over each file path.
In your pipeline, the ForEach activity establishes a recurring control
flow. This activity might be used to loop through a list of things and
perform defined tasks.
Step3: Use Copy Activity and have dynamic datasets in source and
sink. Make datasets points to dynamically to your desired
location.

Hi here is the flow i followed using data flow
Mysource:
/Input1/file1.txt
/Input2/file2.txt
and i want to move to a folder called DesiredOutputFolder
Lookup activity gets the source file paths
Then use a foreach loop to loop thru
Inside foreach, i had a data flow and under source settings, i had pointed to source file using a data flow parameter
in a derived column, im using this expression to get just the filename
replace(replace(filaname,'/Input/','/DesiredOutputFolder/'),'/Output/','/DesiredOutputFolder/')
under Sink Setting tab,
once i run, i will have the files moved to a single folder
My Input container looks like
My Output folder looks like

Can Kofax TotalAgility populate a template document with values from CSV and generate a PDF?

I am currently having a requirement where I need to extract values from a CSV file onto a template within Kofax TotalAgility designer and generate a PDF from it accordingly.
The process would pick CSV files from shared folder on the network as an input and fill a template which is in word format with merge fields with corresponding values and generate a PDF from the populated word document as an output to another shared folder.
Any help is much appreciated.

Reading CSV
There is not anything built into KTA that will handle the CSV file. I would recommend you handle this in C# (preferably your own dll rather than a script activity). The specifics of how you store the CSV data you read will probably depend a lot on how exactly your template is organized and the specifics around your data. But ultimately you will want data in separate KTA variables to map into your merge fields.
Document Creation (Word Document from Word Template)
The primary KTA functionality relevant to your goal is the Document Creation activity (under "Other" when choosing activity type). You will want to read the help topics for a full understanding of the options, but it will allow you to map variables into merge fields from a Word Template (dotx).
The configuration interface of the activity does not make this immediately apparent, but the Document Save Location can be a document variable instead of a path. Once you provide a document variable, the interface will expand to also allow you to choose a folder variable to which the document will be added. Then you can map your data from variables into the merge fields.
Create and Export PDF
Note that using a document variable for the Document Save Location, rather than a file path, is essential because PDF generator works on Document/Folders within KTA, not file paths.
After your Document Creation activity, you can add an Image Processing activity (to convert the merged Word document to tif), and then a PDF Generator activity to create your PDF. Use an Export activity to export the PDF to the location of your choosing.
I think PDF Generator requires tif pages be created first, but you could try sending the Word document without the IP activity if you
want to confirm.

How to use a list of values for a parameter?

I am using the test plugin for VS 2012 (although have just installed 2013), and need to know:
Is it possible to have a parameter pass a different value from a selected list while load testing?
I have used the sample load test located here: http://www.visualstudio.com/get-started/load-test-your-app-vs and created a new web test that meets my needs as below.
I have a simple journey recorded that is an email registration web page. The journey is essentially completing name & address, email, conf email, password, conf password. On submission of the form, a verification email is sent.
I need to check that this process can handle around 3000 users. The email to actually send the verification has been hardcoded for test purposes, but I need a unique email to submit the form. I would essentially like to run 3000 test cases through, and just change the email address each time.
What is the best way to do this?

The simple answer is do a web search for data driving (or data driven) Visual Studio web performance tests. You should find many articles and tutorials.
In more detail:
Outline of how to data drive a test
Firstly, Visual Studio distinguishes different types of test. A Load Test is a way of running individual test cases many times, as if by many simultaneous users, gathering data about the test executions and producing a report. The test cases that a load test can execute include Web Performance Tests and Coded UI Tests; both of these can be data driven.
Data driving a Web Performance Test requires a data source. The data can be CSV, XML, Spreadsheet, database and in TFS. I will describe using CSV.
Create a CSV file, containing something similar to the following. Note that the top line of field names is required and those names are used within the test.
Name,Email,Telephone
Fred,fred#example.com,0123 456789
George,george#example.com,0123 456790
Harry,harry#example.com,0123 456791
See also CodedUI test does not read data from CSV input file for some notes CSV file creation.
Open the test project in Visual Studio and open the .webtest file for the test. Use the context (right-click) menu of the top node of the test, ie the test's name (or use the corresponding icon) and select "Add data source ...". Follow the prompts to add the CSV file into the project.
Within the Web Performance Test expand the request to show the form parameters or query string or whatever that is to use the data. View the properties panel of the relevant field and select the appropriate property, in many cases it is the Value property. Click the little triangle for choosing a value for the property. The popup should show the data source, expand the items shown and select the required field. After selecting the field the property will show a value such as {{DataSource1.FileName#csv.Email}}. The doubled curly braces ({{ and }}) indicate the use of a context parameter. All the used data source fields are available as context parameters. All of the data source fields can be made available by altering the Select Columns property of the data source file. Data source field can be used as part of a property value by using values such as
SomeText{{DataSource1.FileName#csv.Email}}AndMoreText
Data source access methods
The data from the datasource can be read and used in four ways. The default is Sequential. Other orders are selected using Solution Explorer to access the properties of the file (eg FileName#csv). The Access Method property can be set to one of:
Sequential data is read sequentially through the file. After the last line of the file is read, the first line of the file will be next line to be read. Thus each line may be read more than once.
Random data is read randomly.
Unique data is read sequentially through the file. After the end of the file is read the test will not be executed again. Thus each line in can only be read once.
Do not move cursor automatically intended for more complex tests where the cursor is moved via calls from plugins.
A web test may use more than one data source file. These files may have different access methods. For example one file containing login names and passwords could be accessed Sequentially and another file with other data could be accessed Randomly. This would allow each login to try many different sets of the other data.
Data sources and loops
Web performance tests may contain loops. The properties of a loop include Advance data cursors. This allows, for example, a data source file to contain items to be found and added to a shopping basket such that each loop iteration adds a new item.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string