SSIS: Loop on multiple directories given in an Excel - excel

I have to read several input files of same structure and different origin with SSIS. These files are stored in multiple directories. Each directory contains multiple files that are related to a specific company location. The files are of same structure.
Factory A --> file A, file B,...
Factory B --> file C, file D,..
There is also an Excel with some details on Factory A and B. We plan to push the data to SAP and each file needs a specific accounting code depending on its location/factory.
E.G.
Factory A, account 12345
Factory A, account 54321
The idea was to first read the Excel line by line with the names of the factories and then do a For Each-Loop on each directory to read the files.
I managed to read a single directory by filling a variable with the directory name. E.g. I start with gstrSubdir="Factory A" and then do a simple Foreach-Loop on the subdirectory. So far, I am fine.
Now I need a loop that reads the first line of the Excel, sets the subdir variable, loops on the subdir and then reads the next line.
In principle I need to know how to do a nested For-Loop in SSIS without C#.
I hope my explanation is somehow understandable.
Would be more than happy to get some advice.
Regards,
Lars

Related

How to get files from a subfolder present under nested parent folder in azure data factory?

My folder structure is like below,
Container/xx56585/DST_1/2021-03-26/xxxxxxxx.csv
Container/xx56585/DST_1/2021-03-26/xxxxxxxx.ctl
Container/xx56585/DST_2/2021-03-26/yyyyyyyyy.csv
Container/xx56585/DST_2/2021-03-26/yyyyyyyyy.ctl
Container/xx56585/DST_3/2021-03-26/zzzzzzzzz.csv
Container/xx56585/DST_3/2021-03-26/zzzzzzzzz.ctl
Container/xx56585/DST_4/2021-03-26/sssssssssss.csv
Container/xx56585/DST_4/2021-03-26/sssssssssss.ctl
I need to copy .csv and .ctl files to sFTP target and move these files to achieve folder(in the blob storage after copy activity)
Please help me on this
Update:
We can use Get Metadata1 to check does the ctl file exist.
Add dynamic content #concat('xx56585/',item(),'/',substring(adddays(utcnow(),-3),0,10),'/') to the path.
I created a simple test to copy files under <rundate> folders to target folder.
My folder structure
Input/xx56585/DST_1/2021-03-26/xxxxxxxx.csv
Input/xx56585/DST_2/2021-03-26/yyyyyyyyy.csv
Input/xx56585/DST_3/2021-03-26/zzzzzzzzz.csv
Input/xx56585/DST_4/2021-03-26/sssssssssss.csv
Output:
Define an Array type variable Array1 and assign the value ["DST_1","DST_2","DST_3","DST_4"].
At ForEach1 activity, we can add dynamic content #variables('Array1') to traverse this array.
Inside ForEach1 activity, we can use Copy activity to copy files under the dynamic path via expression #concat('xx56585/',item(),'/',substring(adddays(utcnow(),-3),0,10),'/').
My current date is 2020-03-29 so I use adddays(utcnow(),-3) to get 2020-03-26 in the above steps.
That's all.
I think we can add filter activity in this before copy activity in which we can use substring function and find if file name contains .ctl or .csv

How to reference the most current Physical Sequential (PS) file in JCL

I wanted to create a job where I need to consider the latest file available as input file.
File format is as below: FILE1.TEST.TYYMMDD
is there any way to identify latest file based on date present in file name via JCL.
P.S. GDG versions are not created in existing process . Only PS file is created.
Thank you
I wanted to create a job where I need to consider the latest file available as input file. File [name] format is as below: FILE1.TEST.TYYMMDD is there any way to identify latest file based on date present in file name via JCL.
No.
You indicate that GDGs are not created in the existing process. GDGs would be the best way to accomplish your goal. Absent GDGs, you must write code.
You could accomplish your goal by writing (C, clist, COBOL, PL/I, Rexx) code using the LMDINIT and LMDLIST ISPF services. Then you would execute your code by running ISPF in batch. Many mainframe shops have a cataloged procedure to execute ISPF in batch.
Agree with #cschneid that there is not a platform way to handle this. However, I want to point out that GDGs are the platform way of managing PS files for access in a relative form.
Your comment
GDG versions are not created in existing process . Only PS file is
created.
That statement didn't make sense to me. GDGs are not a file type like physical sequential (PS) or partitioned (PO). It's a convention to allow relative reference to files created over time which sounds like what you want. I've only seen the use of GDGs for PS files.
Putting the date in the file name can have its uses but to z/OS its only part of the filename and not meta information that it operates on (like G0000v00's in GDGs.

SSIS won't execute foreach loop for dynamic xlsx filename [duplicate]

This question already has answers here:
SSIS - How to loop through files in folder and get path+file names and finally execute stored Procedure with parameter as Path + Filename
(2 answers)
Closed 3 years ago.
I have a xlsx file that will be dropped into a folder on a monthly basis. The filename will change every month (filename_8292019) based on the date, to which I cannot change.
I want to build a foreach loop to pick up the xlsx file and manipulate it (load into SQL server table, the move the file to an archive folder). I cannot figure out how to do this with a dynamic filename (where the date changes.
I was able to successfully run the package when converting the xlsx to CSV, and also when pointing directly to the xlsx filename.
[Flat File Destination [219]] Error: Cannot open the datafile "filename"
OR errors relating to file not found
The Files: entry on the Collection tab of the Foreach Loop container will accept wildcard characters.
The general pattern here is to create a variable, say, FileName. Set your Files: to something like:
Files:
BaseFileName*
or, if you want to be sure to only pick up spreadsheets, maybe:
Files:
BaseFileName*.xlsx
Select either Name and extension or Fully qualified, which will include the full file path. I usually just use Name and extension and put the file path into another variable so when Ops tells me they're moving my drop location, I can change a parameter instead of editing the package. This step tells the container to remember the name of the file it just found so you can use it later for a variable mapping.
On the Variable Mappings tab, select your variable name and assign it to Index 0.
Then, for each spreadsheet, the container will loop, pick up the name of the first file it finds that matches your pattern, and assign the full name, with the date extension (and path, if you go that way), to your variable. Pass the variable as in input parameter to the tasks inside the loop and use that to process the file, including moving it to the archive, or you'll get yourself into an infinite loop, processing the same file(s) over and over. <--Does that sound like the voice of experience? Yeah. Been there, done that.
Edit:
Here, the FullFilePath variable is just the folder name, without a file reference. (Red variable to red entry in the Folder box).
The FileBaseName variable drives what shows up in the Files box. (Blue to blue).
Another variable picks up the actual file name, with the date extension. Later, say in a File System Task, if I need the folder & file name together, I concatenate the variables.
As far as the Excel Connection Manager error you're getting, unfortunately I'm no help. I don't use it. We have SentryOne's Task Factory for SSIS which includes a much more resilient Excel connector.

Using Logic Apps to get specific files from all sub(sub)folders, load them to SQL-Azure

I'm quite new to Data Factory and Logic Apps (but I am experienced with SSIS since many years),
I succeeded in loading a folder with 100 text-files into SQL-Azure with DATA FACTORY
But the files themselves are untouched
Now, another requirement is that I loop through the folders to get all files with a certain file extension,
In the end I should move (=copy & delete) all the files from the 'To_be_processed' folder to the 'Processed' folder
I can not find where to put 'wildcards' and such:
For example, get all files with file extensions .001, 002, 003, 004, 005, ...until... , 996, 997, 998, 999 (thousand files)
--> also searching in the subfolders.
Is it possible to call a Data Factory from within a Logic App ? (although this seems unnecessary)
Please find some more detailed information in this screenshot:
(click to enlarge)
Thanks in advance helping me out exploring this new technology!
Interesting situation.
I agree that using Logic Apps just for this additional layer of file handling seems unnecessary, but Azure Data Factory may currently be unable to deal with exactly what you need...
In terms of adding wild cards to your Azure Data Factory datasets you have 3 attributes available within the JSON type properties block, as follows.
Folder Path - to specify the directory. Which can work with a partition by clause for a time slice start and end. Required.
File Name - to specify the file. Which again can work with a partition by clause for a time slice start and end. Not required.
File Filter - this is where wildcards can be used for single and multiple characters. (*) for multi and (?) for single. Not required.
More info here: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-onprem-file-system-connector
I have to say that separately none of the above are ideal for what you require and I've already fed back to Microsoft that we need a more flexible attribute that combines the 3 above values into 1, allowing wildcards in various places and a partition by condition that works with more than just date time values.
That said. Try something like the below.
"typeProperties": {
"folderPath": "TO_BE_PROCESSED",
"fileFilter": "17-SKO-??-MD1.*" //looks like 2 middle values in image above
}
On a side note; there is already a Microsoft feedback item thats been raised for a file move activity which is currently under review.
See here: https://feedback.azure.com/forums/270578-data-factory/suggestions/13427742-move-activity
Hope this helps
We have used a C# application which we call through 'app services' -> webjobs.
Much easier to iterate through folders. To call SQL we used sql bulkinsert

node.js rename the files incrementally

I have been using Node.JS file system module for performing various file related operations. I have a need to verify the file name if exists in a directory and if exists i would need to keep a suffix at the end of the file. Typically how windows does with duplicate file names..
if TestFile.txt already exists and another file with same names comes in during processing the new file should be renamed as TestFile (1).txt and next file with same name should be renamed as TestFile (2).txt.
What could be the best way to achieve this. Do i have to use a temporary array to keep all file names and traverse through for each? This is a multi threaded environment and there could be 50,000+ documents coming for processing.
Thanks a ton.

Resources