Pentaho, multiple outputs for multiple inputs - excel

I have been trying to figure out how to set Pentaho to write DIFFERENT files for each input of the job.
My transformation will soon be able to fetch .txt files from an FTP, a varying number of files, the way my transformation is right now, whatever the number of files it gets from the folder(FTP or local) it is generating one big XLS output, the information in the output side is all correct, it all matches the data i want to extract with precision, but for organizing those files, i need pentaho to create a single file, from a single input.
If files (//PentahoIn0001.txt, //PentahoIn0002.txt, //PentahoIn0003.txt) are processed i want (//PentahoOut0001.xls, //PentahoOut0002.xls, //PentahoOut0003.xls) to be created, and the way it is right now it's only creating a single file with data worth of all three inputs.
So far i have tried several ways with no result, even posts from here and outside containing several other aid Transformations and jobs to do it, and it simply doesn't.

Save the output filename in the row, and making sure the rows are sorted on the filename call Transformation Executor with a new transformation that should save the data. Make sure to enable Row grouping on the filename field, and also pass the filename as a Parameter to the new transformation.
In the child transformation start with Get rows from result and save the result to the file using the passed filename parameter.

Related

import multiple excel files to database in pentaho 6

I want to import multiple excel files to my db follow a loop. For example, I put all excel files in a for and each excel file import to my db.
Because when I try to import all files in forder which I has maximum of 2 files to import. Three files shows errors related to ram.
Thank you in advance.
You can use a Get file names step as an input to get all the excel files.
You feed the information of the Get file names to the Microsoft excel input step, this step has a check to accept filenames from previous step.
To make this work all excel files must have the same structure, if they have different structure, you'll have to inject metadata with the differences in each file, and you'll have to build a logic in previous transformations to determine the metadata to inject.

How to control execution order for unrelated Alteryx IO tasks?

I have 3 completely unrelated Excel files. Each needs to be uploaded to a separate database table. Unrelated files, unrelated tables. So I have 3 completely independent Input --> Output structures.
Once all these Input --> Output routines complete, then I have other code I need to execute.
The problem is I want to guarantee my "other" code doesn't start until ALL 3 Excel files get uploaded. How can I BlockUntilComplete for all these 3 Excel files?
Something like the below might work... input the file paths; use multiple block-until-done's... filter the filename to work with, dynamic input to grab it, then do your upload or whatever... then later on continue on to the rest of the wrokflow. (See picture)

azure data factory: iterate over millions of files

Previously I had a problem on how to merge several JSON files into one single file,
which I was able to resolve it with the answer of this question.
At first, I tried with just some files by using wild cards in the file name in the connection section of the input dataset. But when I remove the file name, theory tells me that all of the files in all folders would be loaded recursively as I checked the copy recursively option, in the source section of the copy activity.
The problem is that when I manually trigger the pipeline after removing the file name from the input of the data set, only some of the files get loaded and the task ends successfully but only loading around 400+ files, each folder has 1M+ files, I want to create BIG csv files by merging all the small JSON files of the source (I already was able to create csv file by mapping the schemas in the copy activity).
It is probably stopping due to a timeout or out of memory exception.
One solution is to loop over the contents of the directory using
Directory.EnumerateFiles(searchDir)
This way you can process all the files without having the list / contents of all files in memory at the same time.

Application.LoadfromText...load from string instead?

I was wondering if it is possible to use the code saved in the .txt file using the application.savetotext and save the code in a table, then use the application.loadfromtext to to build the object from a string rather that a .txt file
Does that make any sense? Basically I'm wanting to store all the object codes in a table on separate rows and allow users to select the relevant row and build the object without having to import the .txt file
Yes and no. You would have to write the field content to a (temp) file, then use LoadFromText to read in the object.
But it doesn't make much sense, and I think you are on a wrong track. You could just as well have the objects ready-made in application.

Input data recieved from an email into a CSV/Excel/LibreOffic Calc file

Having a bit of trouble with a script I am trying to create. Basically I would like it to send out a reminder email to send hours I worked that day, then I send a reply, the script will read the email for date start time and end time and then input this data into a CSV/Excel/LibreOffic calc file. A new line for each date. I have manage to sort out the email sending and reading part, then inputting the data into a variable for the next subroutine to read (the excel bit). But I am not sure on how to go about this part. I have seen many suggestions of using Text::CSV and other modules but i'm not certain on how to use them? also how would I go about making it append to the end of the document instead of just overwriting it?
Thanks in advance guys
CSV is very easy to read and parse, and Text::CSV is very easy to use too. What specific problems are you having with Text::CSV, and what have you tried?
If you want true Excel format, it looks like you'd need to read the existing contents with something like Spreadsheet::XLSX, write it back out using something like Excel::Writer::XLSX, and append your data to it as you're back out the original data.
CSV is simpler, though, if you can live with it. You could open the file in append mode, and just write to it. To do so, build up your data into columns, and "combine" them ($csv->combine (#columns)), then create a string out of that ($csv->string()) that you can write to your original file.

Resources