Excel output in pentaho showing last month - excel

I´m working with PDI 4.1. I´ve created transformations and jobs, and I have an excel file with data from database. The columns in my excel file are name, date and hour, and I need to bring the data from last month. Can I do something like this?
Name_july_hour.xls==name_june_hour.xls
Thanks in advance.

You've likely figured this out by now, but what you need to do is flow the data from last month's Excel file into a transform with a Microsoft Excel Input step. Then you can do what ever you want with it (aggregate it, join it with another file, join it with a database table, ... whatever) before writing it to the new month's file with the Excel Writer Step.

Related

How to import local excel file into snowflake database table? I will need to do this daily

All -
I've done some research, but I'm having trouble finding a clear answer.
Problem to solve for: I have a dependency where a co-worker updates a local excel file, and I need the information in that file to be imported into a snowflake data table for analysis.
The data structure of the excel file will always be consistent, but I will need to import the new file daily into Snowflake, and it can have as many as 200+ rows every day.
I've attached screenshots of what the excel file structure is. What is the most simple way to enable my co-worker or myself to update the snowflake database table with the new file every day?
The excel workbook will be 2 sheets. I've attached the sample data below. Please help :/
I would likely create a little Python application that loads the Excel file into a Panda dataframe and then loads that dataframe to Snowflake. Something like this might work: https://pandas.pydata.org/pandas-docs/stable/reference/api/…. Once that script is written, you could schedule it to run every day or just manually run it every day.

I need to pull specific cells from a single excel file to create a single row in SSIS

I'm provided with a folder of excel files. Each represent one form with data entered in specific cells. Each file is of the same format and each would for ONE row of information to be imported into my sql server database.
I believe I can loop through each excel file in the folder, however I am having issues finding the right tools to extract these specific cells and merge them into a single row to insert into the table.
Power Query to the rescue! :)
http://excelunplugged.com/2015/02/10/get-data-from-folder-in-power-query/
Ended up writing some VBA instead to move the data into a tabular / List form in one excel sheet then used that Document to feed SSIS. So far, does not seem like SSIS can do that initial part.

Change pivot table source data to newest data File

I would like to have my pivot table in excel automatically update upon opening or in the background to the newest edition of data stored as a csv in a folder.
The csv files have the same columns and follow the same naming convention csvFile_ddmmyy where the date is substituted. They're run everyday. I would like the excel to update the pivot tables source data to the newest dates data.
Preferably this will be done automatically, but i can also type in the date in a certain cell and have some macro to take this date and put it in the connection string.
If you may propose any solution to this problem, I'd greatly appreciate it.
Make a copy of the latest CSV and use the same file name every time. Point the data connection to that one file that never changes its file name.

How to concatenate query result executed in Excel

I have an Excel that collects product data from an Oracle database. This process is executed once a day and one of its columns is the execution date.
The problem is that every time I run the query data is updated. What I want is to append the result below the existing data, so I can generate graphs showing some product information through the time.
How could I do that?
Thanks is advance!
Can you not just change the date in your query to a specific date that you would like to set as the begin date for your data pool? You would probably have to batch it occasionally as the file would probably get rather large, but just a suggestion if I'm understanding the question correctly. Otherwise, you could take #Dank advice and just copy the data into another worksheet to create a "master file" to populate your graphs.

Generating summaries automatically

Part of my job is to pull a report weekly that lists patching information for around 75000 PCs. I have to filter some erroneous data, based on certain criteria, and then summarize this data myself and update it in a separate spreadsheet. I am comfortable with pivot tables / formulas, but it ends up taking a good couple of hours.
Is there a way to import data from a CSV file into a template that already has in place my formulas/settings, etc. if the data has the same columns, but a different amount of rows each time?
If you're confortable with programming, then, you can use macros, on this case, you will connect to your CSV file, then extract the information and put it in the corresponding places on your spreadsheet, on this question you can find most of what you need to start off: macro to Import csv file into an excel non active worksheet.

Resources