Using Logic Apps to get specific files from all sub(sub)folders, load them to SQL-Azure - azure

I'm quite new to Data Factory and Logic Apps (but I am experienced with SSIS since many years),
I succeeded in loading a folder with 100 text-files into SQL-Azure with DATA FACTORY
But the files themselves are untouched
Now, another requirement is that I loop through the folders to get all files with a certain file extension,
In the end I should move (=copy & delete) all the files from the 'To_be_processed' folder to the 'Processed' folder
I can not find where to put 'wildcards' and such:
For example, get all files with file extensions .001, 002, 003, 004, 005, ...until... , 996, 997, 998, 999 (thousand files)
--> also searching in the subfolders.
Is it possible to call a Data Factory from within a Logic App ? (although this seems unnecessary)
Please find some more detailed information in this screenshot:
(click to enlarge)
Thanks in advance helping me out exploring this new technology!

Interesting situation.
I agree that using Logic Apps just for this additional layer of file handling seems unnecessary, but Azure Data Factory may currently be unable to deal with exactly what you need...
In terms of adding wild cards to your Azure Data Factory datasets you have 3 attributes available within the JSON type properties block, as follows.
Folder Path - to specify the directory. Which can work with a partition by clause for a time slice start and end. Required.
File Name - to specify the file. Which again can work with a partition by clause for a time slice start and end. Not required.
File Filter - this is where wildcards can be used for single and multiple characters. (*) for multi and (?) for single. Not required.
More info here: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-onprem-file-system-connector
I have to say that separately none of the above are ideal for what you require and I've already fed back to Microsoft that we need a more flexible attribute that combines the 3 above values into 1, allowing wildcards in various places and a partition by condition that works with more than just date time values.
That said. Try something like the below.
"typeProperties": {
"folderPath": "TO_BE_PROCESSED",
"fileFilter": "17-SKO-??-MD1.*" //looks like 2 middle values in image above
}
On a side note; there is already a Microsoft feedback item thats been raised for a file move activity which is currently under review.
See here: https://feedback.azure.com/forums/270578-data-factory/suggestions/13427742-move-activity
Hope this helps

We have used a C# application which we call through 'app services' -> webjobs.
Much easier to iterate through folders. To call SQL we used sql bulkinsert

Related

Copy files from AWS s3 sub folder to Azure Blob

I am trying to copy files out of a s3 bucket using azure data factory. Firstly I want a list of the directories.
Using the CLI I would use. {aws s3 ls }
From there I can determine from the list in a foreach an push that into a variable.
In adf, I have tried to use 'get metadata', although this works in theory. In practice there are 76 files in each directory and the loop is over 1.5m. This just isn't worth it, it takes far too long, especially as the directories only takes about 20 seconds for 20000 directories.
Is there a method to do this list. When creating the dataset we have a no permissions, however when we use specific location it does.
Many thanks
I have found another way of completing this task.
So to begin with I am using get metadata with the child option. It produces an array.
I push this into a string variable. With this variable you can then create a stored procedure to pick this apart, using openjson to get just the value. This can then be pulled apart further to get the directory names.
I then merge these into a table.
Using lookup I can then run another stored procedure to return the value I require from the table. This whole process runs in a couple of minutes.
Anyone who wants a further explanation, please ask, I will try and create a walk through to assist

How to reference the most current Physical Sequential (PS) file in JCL

I wanted to create a job where I need to consider the latest file available as input file.
File format is as below: FILE1.TEST.TYYMMDD
is there any way to identify latest file based on date present in file name via JCL.
P.S. GDG versions are not created in existing process . Only PS file is created.
Thank you
I wanted to create a job where I need to consider the latest file available as input file. File [name] format is as below: FILE1.TEST.TYYMMDD is there any way to identify latest file based on date present in file name via JCL.
No.
You indicate that GDGs are not created in the existing process. GDGs would be the best way to accomplish your goal. Absent GDGs, you must write code.
You could accomplish your goal by writing (C, clist, COBOL, PL/I, Rexx) code using the LMDINIT and LMDLIST ISPF services. Then you would execute your code by running ISPF in batch. Many mainframe shops have a cataloged procedure to execute ISPF in batch.
Agree with #cschneid that there is not a platform way to handle this. However, I want to point out that GDGs are the platform way of managing PS files for access in a relative form.
Your comment
GDG versions are not created in existing process . Only PS file is
created.
That statement didn't make sense to me. GDGs are not a file type like physical sequential (PS) or partitioned (PO). It's a convention to allow relative reference to files created over time which sounds like what you want. I've only seen the use of GDGs for PS files.
Putting the date in the file name can have its uses but to z/OS its only part of the filename and not meta information that it operates on (like G0000v00's in GDGs.

How can I copy an existing overthere.SshHost file in XL Deploy UI using Puppet?

The Infra team in my company has provided us with sample overthere.SshHost under 'Infrastructure' in XL-Deploy UI that has a predefined private key file and passphrase which is not shared with us.
We are asked to duplicate this file manually in the UI, rename it and create infra entries for our application.
How can I achieve this with puppet?
Lets say the sample file is placed under: Infrastructure/Project1/COMMONS/Template_SshHost
and I need to create an overthere.SshHost under Infrastructure/Project1/UAT/Uat_SshHost and Infrastructure/Project1/PREPROD/Preprod_SshHost by copying the sample file.
Thanks in advance!
You can sync a target file with another file accessible via the local file system by using a File resource whose source attribute specifies the path to the original. You can produce a modified copy in a variety of ways, such as by applying one or more File_line resources (from stdlib) or by applying an appropriate script via an Exec resource.
But if you go that route then you have to either
accept that the target file will be re-synced on every Puppet run, OR
set the File resource's replace attribute to false, in which case changes to the original file will not be propagated into the customized copy.
The latter is probably the more acceptable choice for most people. Its file-copying part might look something like this:
$project_dir = '/path/to/Infrastructure/Project1'
file { "${project_dir}/UAT/Uat_SshHost/overthere.SshHost":
ensure => 'file',
source => "${project_dir}/COMMONS/Template_SshHost/overthere.SshHost",
replace => false,
}
But you might want to consider instead writing a custom type and provider for the target file. That would allow you to incorporate changes from the original template without re-syncing the file on every run, and it would give you a lot more flexibility with respect to the customizations you need to apply. It would also present a simpler interface for you to use in your manifests, which could make managing these easier. But, of course, that's offset by the cost is that writing and maintaining a custom type and provider. Only you can determine whether that would be a worthwhile trade-off.

Parametrization using Azure Data Factory

I have a Pipeline job in Azure Data Factory which I want to use to run the pipeline job but pass all files for a specific month through for example.
I have a folder called 2020/01 inside this folder is numerous files with different names.
The question is: Can one pass a parameter through to only extract and load the files for 2020/01/01 and 2020/01/02 if that makes sense?
Excellent, Thanks Jay it worked and i can now run my pipeline jobs passing through the month or even day level.
Really appreciate your response, have a fantastic day.
Regards
Rayno
The question is: Can one pass a parameter through to only extract and
load the files for 2020/01/01 and 2020/01/02 if that makes sense?
You did't mention which connector you are using in pipeline job,but you mentioned folder in your question.As i know,the majority folder path could be parametrization in ADF copy activity configuration.
You could create a param :
Then apply it in the wildcard folder path:
Even if your files' names have same prefix,you could apply 01*.json on the wildcard file name property.

How to run one feature file as initialization (i.e. before all other feature files) in cucumber-jvm?

I have a cucumber feature file 'A' that serves as setting up environment (data clean up and initialization). I want to have it executed before all other feature files can run.
It's it kind of like #before hook as in http://zsoltfabok.com/blog/2012/09/cucumber-jvm-hooks/. However, that does not work because my feature files 'A' contains hundreds of cucumber steps and it is not as simple as:
#Before
public void beforeScenario() {
tomcat.start();
tomcat.deploy("munger");
browser = new FirefoxDriver();
}
instead it's better to be able to run 'A' as a feature file as a whole.
I've searched around but did not find a answer. I am so surprised that no one has this type of requirement before.
The closest i found is 'background'. But that means i can have only one huge feature file with the content of 'A' as 'background' at the top, and rest of my test in the same file. I really do not want to do that.
Any suggestions?
By default, Cucumber features are run single thread in order by:
Alphabetically by feature file directory
Alphabetically by feature file name within directory
Scenario execution is then by order within the feature file.
So have your initialization feature in the first directory (alhpabetically) with a file name that sorts first (alphabetically) in that directory.
That being said it is generally a bad practice to require an execution order in your feature files. We run our feature files in parallel so order is meaningless. For Jenkins or TeamCity you could add a build step that executes the one feature file followed by a second build step that executes the rest of your feature files.
I have also a project, where we have a single feature file, that contains a very long scenario called Scenario: Test data with a lot of very long scenarios, like this:
Given the system knows about the following employees
|uuid|user-key|name|nickname|
|1|0101140000|Anna|annie|
... hundreds of lines like this follow ...
We see this long SystemKnows scenarios as quite valuable, so that our testers, Product Owner and developers have a baseline of what data are in the system. Our domain is quite complex, and we need this baseline of reference data for everyone to be able to understand the tests.
(These reference data become almost like well known personas, and are a shared team metaphore)
In the beginning, we were relying on the alphabetic naming convention, to have the AAA.feature to be run first.
Later, we discovered that this setup was brittle, and decided to use the following trick, inspired by the PageObject pattern:
Add a background with the single line Given(~'^I set test data for all feature files$')
In the step definition, have a factory to create the test data, and make sure inside the factore method, that it is only created once, like testFactory.createTestData()
In this way, you have both the convenience of expressing reference setup as a scenario, that enhances team communication, but you also have a stable test setup.
Hope this is helpful!
Agata

Resources