Hi need help with below requirement.
In Teradata, I have a set of SQL scripts which have to run one after the another. Then once running of them is done, the data has to be stored in a file based on the run date.I have to automate them so the same work triggers once everyday and data get stored in new folder/file with date included in the name.
Ex: after running of the scripts one after the another, finally one table is created that table has to be stored as tableYYYYMMDD. The scripts has to run everyday and a new table has to be created or else there should be only one table and data needs to appended to it everyday.
Related
All -
I've done some research, but I'm having trouble finding a clear answer.
Problem to solve for: I have a dependency where a co-worker updates a local excel file, and I need the information in that file to be imported into a snowflake data table for analysis.
The data structure of the excel file will always be consistent, but I will need to import the new file daily into Snowflake, and it can have as many as 200+ rows every day.
I've attached screenshots of what the excel file structure is. What is the most simple way to enable my co-worker or myself to update the snowflake database table with the new file every day?
The excel workbook will be 2 sheets. I've attached the sample data below. Please help :/
I would likely create a little Python application that loads the Excel file into a Panda dataframe and then loads that dataframe to Snowflake. Something like this might work: https://pandas.pydata.org/pandas-docs/stable/reference/api/…. Once that script is written, you could schedule it to run every day or just manually run it every day.
I recently modernized all my excel files, and started using the magic of PowerQueries and PowerPivot.
Background: I have 2 files:
- First one is a "master" with all sales and production logs, and everything works inside that excel file with Power queries to tables stored in that same file.
- Second one is mostly a different set of data about continuous improvement data, but i'd like to start linking them with the master file by having charts that compare efficiency to production, etc.
As it is now, I am using links by entering a direct reference to the cells/ranges in the master file (i.e: [Master.xlsm]!$A1:B2) However, every new version of the Master file, I have to update the links and it's not scalable if I have more documents in the future.
Options:
- Is it possible to store all the queries or data from the Master files in a separate file in the same folder and "call" for it when needed either in my Sales/Production master file or the Manufacturing file? That could be a database or connection file that has the queries to the data stored in the master file.
- If not, what is the best way to connect my Manufacturing file to my Master file without entering specifically the filename?
My fear is that as soon as the Master file name will change (date, version), I will have to navigate inside the queries and fix all the links again. Additionally, I wanna make this futureproof early one as I plan to gather large amounts of data and start more measurements.
Thanks for your help!
Once you have a data model built, you can create a connection to it from other Excel files. If you are looking for a visible way to control the source path of the connected file, you can add a named range to the Excel file that is connecting to the data model, and in the named range, enter the file path. In Power Query, add a new query that returns your named range (the file path), and swap out the static file path in your queries with the new named range query.
Here is a sample M code that gets the contents of a named range. This query is named "folderPath_filesToBeAudited".
let
Source = Excel.CurrentWorkbook(){[Name="folderPath_filesToBeAudited"]}[Content]{0}[Column1]
in
Source
Here is an example of M code showing how to use the new query to reference the file path.
Folder.Files(folderPath_filesToBeAudited)
Here is a step-by-step article.
https://accessanalytic.com.au/powerquery_namedcells_parameters/
I have 4 Sheets sheets with similar fields. I intend to merge these sheets together to create a master file that has all information in one sheet. However, i need Tableau to connect to the final merged file so i can create dashboards off. This works locally as i have an access program that appends the tables together and creates a new table which Tableau connects to.
The main issue is i am trying to take this process offline (to run online to locally), meaning i need a database that can;
1- Drop content of the tables, pick up the sheets from a specified folder, import them into specified tables.
2- Append new tables into master tables.
All of this should be done automatically at a scheduled time.
I tried using SQL server (SQL Agent for scheduling job import/append etc) for this requirement but i need to know if something else is out there that can serve this purpose efficiently.
Thank you
As long as the sheets have the exact fields, you should be able to use Tableau's Union feature. This feature will allow you to do a wildcard search for sheets within a folder structure. Anytime the data is refreshed in Tableau it will reach back out to the folder and update/union what is currently there.
I have a redshift table and it is storing a lot of data. Every weekend I go and manually using Workbench TRUNCATE last week of data that I no longer need.
I manually have to run
DELETE FROM tableName WHERE created_date BETWEEN timeStamp1 AND timeStamp2;
Is it possible to have some way to tell the table or have some expiration policy that removes the data every Sunday for me?
If not, Is there a way to automate the delete process every 7 days? Some sort of shell script or cron job in nodeJS that does this.
No, there is no in-built ability to run commands on a regular basis on Amazon Redshift. You could, however, run a script on another system that connects to Redshift and runs the command.
For example, a cron job that calls psql to connect to Redshift and execute the command. This could be done in a one-line script.
Alternatively, you could configure an AWS Lambda function to connect to Redshift and execute the command. (You would need to write the function yourself, but there are libraries that make this easier.) Then, you would configure Amazon CloudWatch Events to trigger the Lambda function on a desired schedule (eg once a week).
A common strategy is to actually store data in separate tables per time period (eg a month, but in your case it would be a week). Then, define a view that combines several tables. To delete a week of data, simply drop the table that contains that week of data, create a new table for this week's data, then update the view to point to the new table but not the old table.
By the way...
Your example uses the DELETE command, which is not the same as the TRUNCATE command.
TRUNCATE removes all data from a table. It is an efficient way to completely empty a table.
DELETE is good for removing part of a table but it simply marks rows as deleted. The data still occupies space on disk. Therefore, it is recommended that you VACUUM the table after deleting a significant quantity of data.
EDIT: I think this question belongs over at superuser not here at Stackexchange.
What I would like to do is have a single excel file that calls up data from every excel file in a given directory. Specifically if I have a time sheet excel file from multiple people working multiple different job numbers I would like to have that data populated in a single file for everyones times. The directory where the files are stored would be updated weekly so I would want the "master" excel file to reflect the weekly changes automatically...hopefully. Is there an easy way to do this that I would be able to teach someone else?
Import every file to a database table using stored procedure and export one excel file. You can schedule this as a job. Use OPENROWSET and xp_cmdshell. What technology are you using?