I have created a small data warehouse with the help of Tableau software. First I entered my information in Excel and created my fact table in Excel and then imported into Tableau where I created my queries.
I would like to know if the creation of a Fact table is the ETL process? (I know what ETL means,I just want to know where it happened in my project).
In principle you do Extract, Transform and Load - but mostly manually. Your Extract procedure is done manually, while gathering the information you need to create your excel sheet. The transformation is then again a manual step, you create the excel sheet based on the data you collected from wherever. And at last, you load the finished excel sheet into your BI system Tableau.
Tableau is a data analytics package that helps you look at already gathered data and query it for business intelligence. It is separate from an ETL tool.
The extract-transform-load process is where data from a system (database, customer relationship system, whatever) is extracted from the data source, it is then transformed/converted so it can be loaded into a data warehouse. For example, Excel spreadsheets converted to CSV, or changing how dates are formatted in Oracle DB data. Once the data is in a format that the warehouse can process, it is loaded into the data warehouse.
Tableau can be used to query and analyze the data in a data warehouse to help discover trends or problems in a business. In and of itself, it is not an ETL tool.
Fact table creation is not part of ETL concept. Its related to Data modulation.
There is no ETL happening in your process.
Related
I aplogize beforehand if this questions turns out to be not specific enough. The issue is as following:
I have an excel file in which there are several sheets with lots of calculations (mostly financials). I have access to the same database from where raw excel file was downloaded. Now I want to repreduce the calculations and executive summary using Power BI getting the data from database directly (most likely using Direct Query mode). But I am not sure how should go about it? Should/can I use the existing Excel file to somehow copy the work that has been already done and just change the source to database? Or will I have to do it all over again? One main point when considering the above questions is whether Power BI will be able to do all the complex calculations done in Excel previously?
Via search I came accross to a few videos where they say we can upload Excel file into Power BI and then apply the same tables from database and finally using Advanced Editor change Excel tables sources to database. But thing is that database doesn't have the kind of tables I have in Excel (lots of changes and calculations are applied to the raw data that was downloaded from database). So I am not sure how this method can work.
I've created a report that uses SSAS to create a Pivot table. I have to authenticate with username/password when I refresh it. Once it's refreshed I want to send it to someone else.
However when they open it they can't drill down in the Pivot table because it asks them to authenticate as well.
I can't remove the connection from the file because then you don't have the data for the Pivot so it doesn't let you drill down either.
Is there a way to work around that, to make the Pivot table available for use (to drill down, no need to change the fields) to the other person?
If you want to provide a self-contained Excel file with the detail data to support an interactive PivotTable then please look at Power Pivot. In newer versions of Excel it is called the Excel Data Model. You load the model with detail data, define your calculations and relationships between tables. The data is compressed and stored in the Excel file so except during refresh from your relational source (which you could do before sending the Excel file) the user doesn’t need any access to servers.
You will have to rebuild the data model in Power Pivot. If your SSAS model is a Tabular model then the concepts should be pretty similar.
Moderator please move to appropriate forum if required.
I use MS Excel 2016 for data visualization.
Can understand Extract means saving Excel data onto a spreadsheet and Transforming data means manipulating it in Power Query.
QUESTION:
But if I decide to load data to Power Pivot (Data Model) doesn't that fall back into Transform because you can
Create Calendar Table
Create Measures (or Calculate Columns if necessary)
Or does using Power Pivot (Data Model) fall under Data Modelling because you are no longer formatting, merging pre-existing data;
Rather you are creating new data (i.e. Calendar Table, Measures, etc) to merge with pre-existing data
Kindly clarify
Power Query (now standard excel 2016 in data tab): is an ETL (Extract - Transform - Load) tool. A standard example would be that you would connect it to your source ERP system, and make a product table. That wouldn't be an exact copy of the table, but could consist out of several tables, that are joined. You keep only the relevant columns.
Power Pivot: this is a data modelling tool, it allows you to create relations between data and attribute tables. It gives you the possibility to use time related measures (YTD, Previous Year, ...).
In general when you build your model in power pivot, you can choose to either load the data directly into power pivot (without power query). This is useful if you already have a datawarehouse in which the ETL process is done.
If you have an ETL process to execute, it's better to use power query, and load the data into power pivot. (option: load to data model).
I am developing SSAS Tabular project for Power BI, as part of requirement I need to automate the below process
1. Every week I have to delete last two weeks of data in SSAS Table
2. Update last two weeks of data.
Thanks in advance
Please advice
For this, you have to create an SSIS Package to delete the last two weeks of data and then process the cube.
Your SSIS Package to delete the last 2 weeks of data.
Schedule to process your SSAS cube.
SSAS Tabular, Power Pivot, Power BI don't provide facilities to allow a partial refresh, sliding window or any other type of refresh other than full data refresh (Power BI premium does but assuming you're not using that).
You need to control the data getting into the data model by controlling the data in the source tables underlying the model.
This is commonly done using SSIS, TSQL and/or stored procedures.
Background:
I currently run reporting on a monthly basis from a source csv file that's roughly 50 columns by 15k rows. I have an existing system where I am importing the data into sql, using multiple stored procedures to handle the data transformations, then using excel connections to view the reports in excel after the data transformations. These transformations are relatively complex and comprise of ~4 stored procedures at ~5 pages and around 200 lines of code each.
Problem:
The amount of code and tables in sql to handle the transformations is becoming overwhelming. QA is a pain in the ass to track trough all the tables and stored procedures to find out where the problem lies. This whole process including extensive QA is taking me 3 days to complete, where ideally I'd like to to take half a day total. I can run through all the stored procedures and excel connections/formatting in a few hours, but currently it's more efficient to run QA after every single step.
Potential Solutions:
Would integrating SSIS help the automation and QA process?
I am new to SSIS, so how does data transformations work with SSIS?
Do I just link a stored proc as a step in the SSIS flow?
Note: I should specify that the results need to be displayed in Excel on a heavily formatted worksheet. Currently, there is a feeder sheet in excel that fetches data from SQL views, and the report page has formula links to that feeder sheet.
I appreciate all the help in advance.
I've done something similar and partially converted a SQL stored procedure (SP) solution to SSIS.
You can call existing SPs using the SSIS Execute SQL Task, so I would start with an SSIS "Master" package that just executes all your SPs. Right out of the box that gives you control over dependancies, parallel execution, restartability & logging.
Then I would incrementally chip away at replacing the SPs with SSIS Data Flow Tasks - this opens up the full range of SSIS transformation capabilities and is almost always a lot faster to build and run than SPs.
I would replace the Excel layer with an Report Services Report, but this would probably be a lower priority.