There is a use case in my company to enable business users with no technical knowledge to use the data from Azure cloud. Back in the SQL server days this was easily solved through OLAP cubes. You could write a query for data that's backing up the cube, and then business people could just connect to the cube and data was downloaded as a pivot table, the only problem with large datasets there was compute (the larger the data, the slower the pivot table) but not really the row limit.
With the current Azure Synapse set up it seems that Excel is trying to download the entire data set and obviously always hits a 1M row limit. Is there anyway to directly use the data in the pivot table without bringing it in full to Excel? Because all my tables are >1M rows.
UPD: You can load data directly to Pivot, but it does load the data to RAM and the actual loading takes time. I am looking for a similar to cube solution, where the pivot table is available immediately and the querying happens once you're adding fields and calculations to the pivot table.
Related
I have 100-150 Azure databases with same table schema. There are 300-400 tables in each database. Separate reports are enabled on all these databases.
Now I want to merge these database into a centralized database and generate some different Power BI reports from this centralized database.
The approach I am thinking is -
There will be Master table on target database which will have
DatabaseID and Name.
All the tables on target database will have the composite primary key
created with the Source Primary key and Database ID.
There will be multiple (30-35) instances of Azure data factory
pipeline and each instance will be responsible to merge data from
10-15 databases.
These ADF pipelines will be scheduled to run weekly.
Can anyone please guide me that the above approach will be feasible in this scenario? Or there could any other option we can go for.
Thanks in Advance.
You trying to create a Data Warehouse.
I hope you will never archive to merge 150 Azure SQL Databases because is soon as you try to query that beefy archive what you will see is this:
This because Power BI, as any other tool, comes with limitations:
Limitation of Distinct values in a column: there is a 1,999,999,997 limit on the number of distinct values that can be stored in a column.
Row limitation: If the query sent to the data source returns more than one million rows, you see an error and the query fails.
Column limitation: The maximum number of columns allowed in a dataset, across all tables in the dataset, is 16,000 columns.
A data warehouse is not just the merge of ALL of your data. You need to clean them and import only the most useful ones.
So the approach you are proposing is overall OK, but just import what you need.
I have a pivot table that shows the sales data of the company since 2019, and I need to update it with June's information. However, the problem is that Excel cannot handle adding 200,000 rows to the 1,046,984 in my current database. How can I do this without using Access? Should I use a CSV file instead?
The only way I am aware of that this works within excel is by leveraging PowerPivot (see here).
I have a rather large datamodel in excel. it consists of an imported data mart featuring one fact table and around 20 dimension tables.
I also have 3 tables directly in the excel sheet, where users can enter data, that then gets merged into the existing datamodel using power query.
I would like to be able to update the datamodel thereby updating the content of my pivot tables and my calculations, without refreshing the actual data coming from my external server.
Is this possible without having to disable external data connections i the sheet (I'd like to periodically update the data)
For clarification, i am building a KPI that will be measured monthly on data present on the 1st of every month, but will have to be analyzed, commented, and have outliers handled throughout the month.
You've not mentioned VBA in your question, but going by the fact you've tagged your question as VBA, I'm guessing that's what you're using?
VBA code to refresh a single query is:
Sheets("sheetName").ListObjects("queryName").Refresh
If you're trying to do it manually, then it's just a question of selecting a cell within the table the query is pulling to, and then Query > Refresh.
Moderator please move to appropriate forum if required.
I use MS Excel 2016 for data visualization.
Can understand Extract means saving Excel data onto a spreadsheet and Transforming data means manipulating it in Power Query.
QUESTION:
But if I decide to load data to Power Pivot (Data Model) doesn't that fall back into Transform because you can
Create Calendar Table
Create Measures (or Calculate Columns if necessary)
Or does using Power Pivot (Data Model) fall under Data Modelling because you are no longer formatting, merging pre-existing data;
Rather you are creating new data (i.e. Calendar Table, Measures, etc) to merge with pre-existing data
Kindly clarify
Power Query (now standard excel 2016 in data tab): is an ETL (Extract - Transform - Load) tool. A standard example would be that you would connect it to your source ERP system, and make a product table. That wouldn't be an exact copy of the table, but could consist out of several tables, that are joined. You keep only the relevant columns.
Power Pivot: this is a data modelling tool, it allows you to create relations between data and attribute tables. It gives you the possibility to use time related measures (YTD, Previous Year, ...).
In general when you build your model in power pivot, you can choose to either load the data directly into power pivot (without power query). This is useful if you already have a datawarehouse in which the ETL process is done.
If you have an ETL process to execute, it's better to use power query, and load the data into power pivot. (option: load to data model).
I am developing SSAS Tabular project for Power BI, as part of requirement I need to automate the below process
1. Every week I have to delete last two weeks of data in SSAS Table
2. Update last two weeks of data.
Thanks in advance
Please advice
For this, you have to create an SSIS Package to delete the last two weeks of data and then process the cube.
Your SSIS Package to delete the last 2 weeks of data.
Schedule to process your SSAS cube.
SSAS Tabular, Power Pivot, Power BI don't provide facilities to allow a partial refresh, sliding window or any other type of refresh other than full data refresh (Power BI premium does but assuming you're not using that).
You need to control the data getting into the data model by controlling the data in the source tables underlying the model.
This is commonly done using SSIS, TSQL and/or stored procedures.