Multiple Excel files using SSIS [duplicate] - excel

I have a source from which the files are to be processed. Multiple files are coming to that loacation at any time randomly (Package should run every 2 hours). I have to process only the new files, i can not delete, move the already processed files from that location. I can only copy the files to Archive location. How can I achieve this ?

You can achieve this using the following steps.
Use the foreach file enumerator for your incoming folder and save
the filename in "IncomingFile" variable. Configure to select "Name
and Extension"[In my code I have used that otherwise you need to do
some modification to the script]
Create tow SSIS variables Like "ArchivePath" as string and
"IsLoaded" as Boolean[default to false].
Create the SSIS script component and use "IncomingFile" and
"ArchivePath" as the readonly variable. "IsLoaded" should be the
ReadandWrite variable.
Write the following code in the script component. If file is already
exists then it will return true. Otherwise False.
public void Main()
{
var archivePath = Dts.Variables["ArchivePath"].Value.ToString();
var incomingFile = Dts.Variables["IncomingFile"].Value.ToString();
var fileFullPath = string.Format(#"{0}\{1}",archivePath,incomingFile);
bool isLoaded = File.Exists(fileFullPath);
Dts.Variables["IsLoaded"].Value = isLoaded;
Dts.TaskResult = (int)ScriptResults.Success;
}
Use the Precedence constraint to call the Data flow task and evaluation operation should be "Expression" . Set something as follows in your expression box.
#IsLoaded==False
Hope this helps.

Your package should process the files in a given directory, then move them to another directory once processed. That way, each time the package runs, it has to fully process the source directory.
To process each files in a directory, use the ForEach Container. You can specify a folder to look in, and some expressions to filter. If, for instance, your filename contains a timestamp, you could use that timestamp to filter your files in or out.
You use a flat file source to read files, then use the filesystem task to move them around.

To start, take a look at the answer here: Enumerate files in a folder using SSIS Script Task
The SSIS Script Task should enumerate all the files in a given folder, then take a snapshot of the already processed files from a table where you will keep a log of what's processed, ignore the already processed ones and just return the non-processed in an object variable for a for-each task to consume.

Related

Azure Synapse Analytics - deleting pipeline Folder

I am new to Synapse and I have to make a pipeline that will delete files from folders in a hierarchy like the attached image. expecting hierarchy. The red half circles mark the files I would like to delete files for example older than 2 months.
As for now I have made a pipline for a single folder and using the for each loop I can get to the files and delete the corresponding one. And it works, since I have about 60-70 folders and even more files I wanted to go a level higher up and make a pipeline for each folder to execute. And with this is a problem. When i use GetMetadata Activity for top folder, and use for each loop to take name folders then i can not acess files in folder just only folder. Could you help me someone how to slove this?
deleting pipline for single folder using for each loop
We can achieve this using nested for each activities with the help of execute pipeline activity. As mentioned, Get metadata with wildcards returns all files without folders and Delete activity is unable to recognize wildcard folder paths(Folder/*).
I have created a similar folder structure for demo. In my pipeline, I have first created an array parameter req_files (sample1.csv and sample2.csv) with names of files required.
Note: If you want to dynamically do this, you can use append variable to build required file names (file09/22 and file08/22).
I used one get metadata to get folder names (which are inside root folder). I am iterating through the output of get metadata in my for each activity (items value is #activity('root folder contents').output.childItems).
Inside my for each, I used another get metadata activity to loop through each of the sub folders (to get file contents).
Now I have the folder name and list of files inside it. I am going to use execute pipeline to implement nested for each. Create 3 parameters in a new pipeline called delete_pipeline (where I perform delete) as current_folder, folder_files and files_needed.
Pass the following dynamic content for each of them from parent pipeline.
current_folder: #item().name
folder_files: #activity('sub folder contents').output.childItems
files_needed: #pipeline().parameters.req_files
Now in delete_pipeline, I have a for each loop to loop through the list of files we are passing (items value is #pipeline().parameters.folder_files).
Inside this for each, I am using an If condition activity. This is because I want to delete files which are not in my req_files parameter (array from parent pipeline which we passed to files_needed parameter in delete_pipeline). The condition for if condition activity will be as following:
#contains(pipeline().parameters.files_needed,item().name)
We need to delete the file only when it is not present in req_files (files_needed). So, when the condition is false, we perform delete.
I have created 2 parameters file_namepath_of_file_to_delete and file_name_to_delete in the dataset I am using for delete activity with following dynamic content.
file_namepath_of_file_to_delete: Folder/#{pipeline().parameters.current_folder}
file_name_to_delete: #item().name
When I run the pipeline, it keeps the required files and deletes the rest. The following are output images for reference.
Debug output: https://i.imgur.com/E6GNVHW.png
My folder after I run the pipeline: https://i.imgur.com/bqN00Dw.png

SSIS won't execute foreach loop for dynamic xlsx filename [duplicate]

This question already has answers here:
SSIS - How to loop through files in folder and get path+file names and finally execute stored Procedure with parameter as Path + Filename
(2 answers)
Closed 3 years ago.
I have a xlsx file that will be dropped into a folder on a monthly basis. The filename will change every month (filename_8292019) based on the date, to which I cannot change.
I want to build a foreach loop to pick up the xlsx file and manipulate it (load into SQL server table, the move the file to an archive folder). I cannot figure out how to do this with a dynamic filename (where the date changes.
I was able to successfully run the package when converting the xlsx to CSV, and also when pointing directly to the xlsx filename.
[Flat File Destination [219]] Error: Cannot open the datafile "filename"
OR errors relating to file not found
The Files: entry on the Collection tab of the Foreach Loop container will accept wildcard characters.
The general pattern here is to create a variable, say, FileName. Set your Files: to something like:
Files:
BaseFileName*
or, if you want to be sure to only pick up spreadsheets, maybe:
Files:
BaseFileName*.xlsx
Select either Name and extension or Fully qualified, which will include the full file path. I usually just use Name and extension and put the file path into another variable so when Ops tells me they're moving my drop location, I can change a parameter instead of editing the package. This step tells the container to remember the name of the file it just found so you can use it later for a variable mapping.
On the Variable Mappings tab, select your variable name and assign it to Index 0.
Then, for each spreadsheet, the container will loop, pick up the name of the first file it finds that matches your pattern, and assign the full name, with the date extension (and path, if you go that way), to your variable. Pass the variable as in input parameter to the tasks inside the loop and use that to process the file, including moving it to the archive, or you'll get yourself into an infinite loop, processing the same file(s) over and over. <--Does that sound like the voice of experience? Yeah. Been there, done that.
Edit:
Here, the FullFilePath variable is just the folder name, without a file reference. (Red variable to red entry in the Folder box).
The FileBaseName variable drives what shows up in the Files box. (Blue to blue).
Another variable picks up the actual file name, with the date extension. Later, say in a File System Task, if I need the folder & file name together, I concatenate the variables.
As far as the Excel Connection Manager error you're getting, unfortunately I'm no help. I don't use it. We have SentryOne's Task Factory for SSIS which includes a much more resilient Excel connector.

How to share a variable between 2 pyRevit scripts?

I am using the latest version of pyRevit, v45.
I'm writing some info in temporary files with
myTempFile = script.get_instance_data_file("id")
This creates a file named pyRevit_2018_xxxx_id.tmp in which I store useful info. If I'm not mistaken, the "xxxx" part is changing every time I reload Revit. Now, I need to get access to this information from another pyRevit script.
How can I retrieve the name of the temp file I need to read? In other words, how do I access "myTempFile" from within the second script, which has no idea of the name of "myTempFile"?
I guess I can share somehow that variable between my script, but what's the proper way to do this? I know this must be a very basic programming question, but I'm indeed not a programmer ;)
Thanks a lot,
Arnaud.
Ok, I realise now that my variables in the 1st script cease to exist after its execution.
So for now I wrote the file name in another file, of which I know the name.. That works.
But if there's a cleaner way to do this, I'd be glad to learn ;)
Arnaud
pyrevit.script module provides 4 different methods for creating temporary files based on their use case:
get_instance_data_file:
for data files marked with Revit instance pid. This means that scripts running on another instance will not see this temp file.
http://pyrevit.readthedocs.io/en/latest/pyrevit/script.html#pyrevit.script.get_instance_data_file
get_universal_data_file:
for temp files accessible to all Revit instances and versions
http://pyrevit.readthedocs.io/en/latest/pyrevit/script.html#pyrevit.script.get_universal_data_file
get_data_file:
Base method to get a standard temp file for current revit version
http://pyrevit.readthedocs.io/en/latest/pyrevit/script.html#pyrevit.script.get_data_file
get_document_data_file:
temp file marked with active document (so scripts working on another document will not see this)
http://pyrevit.readthedocs.io/en/latest/pyrevit/script.html#pyrevit.script.get_document_data_file
Each method uses a pattern to create the temp file name. So as long as the call to the method is the same of different scripts, the method generates the same file name.
Example:
Script 1:
from pyrevit import script
tfile = script.get_data_file('mydata')
Script 2:
from pyrevit import script
tempfile = script.get_data_file('mydata')
In this example tempfile = tfile since the file id is the same.
There is documentation on each so make sure you take a look at those and pick the flavor that serves your purpose.

Alternative to fs.readdirSync large directory directory in Node

I have a single directory with a few million json files in it. I ultimately want to iterate over each file in the directory, read it, do something with the information and then write something into a database.
My script works perfectly when I use a test directory with a few hundred files. However, it stalls when I use the real directory. I strongly believe that I have pinpointed the problem to the use of:
fs.readdirSync('my dir path')
Converting this to the Async function would not help anything since I need the file names before anything else can happen anyways. However, my belief is that this operation hangs because it simply "takes too long" for it to read the entire directory.
For reference here is a broader portion of the function:
function traverseFS(){
var path = 'my dir name and path';
var files = fs.readdirSync(path);
for (var i in files) {
path + '/' + files[i];
var fileText = fs.readFileSync(currentFile,'utf8');
var json= JSON.parse(fileText);
if(json)
// do something
}
}
My question is either:
Is there something I can do get this to work using readdirSync?
Is there another operation I should be using?
You would need to either use a child process (easiest) that creates a directory listing and parse that or write your own streamable binding to scandir() (on *nix) and/or whatever the equivalent is on Windows and use that. For the latter, you may want to use the libuv code (*nix, Windows) as a guide.

How to sync two folders with Gradle?

I need keep folder inside synced with a folder called outside. The inside folder needs to be an exact copy of the outside folder - all subdirectories, files, etc.
The Copy task works great, except that it will only overwrite files - it does not delete files that are still in the inside folder if those files are no longer in the outside folder.
Right now I am using the Delete task, which the Copy task depends on. The Delete task fails every other build, with the below error. The inside folder does get deleted, but the new files from the Copy task are not copied over.
Error:(117) A problem occurred evaluating project ':android'.
> Cannot convert the provided notation to a File or URI: true.
The following types/formats are supported:
- A String or CharSequence path, e.g 'src/main/java' or '/usr/include'
- A String or CharSequence URI, e.g 'file:/usr/include'
- A File instance.
- A URI or URL instance.
I am guessing this happens because of some type of Gradle caching issue - how do I fix this, or design the process better? thanks!
looks like there's a big difference between using a tasks parameter to declare task type/dependencies and doing so in the task method body.
Something like this worked great
task deleteFiles(type: Delete) {
delete destinationHtmlFolder
}
task copyFiles(dependsOn: tasks.withType(Copy)){
println "copying all files"
}

Resources