Power BI performance - excel

I need some help with improving power BI performance to read some data.
I currently import data from an excel sheet, a table with lots of different data types. And i was wondering if it is viable to change the data source, since it would have to be a one man job.
Does power BI has better performance importing from another data source? Im considering access because of the simplicity of the change. Using a proper database like SQL is on the table but it wouldn't be as easy to do in a short time change.

I would suggest you using SQL SSAS. You can create all the metrics outside of PBI to after import only the summarized statistics going to the drill-down level you need for.

Related

Precalculate OLAP cube inside Azure Synapse

We have dimensinal model with fact tables of 100-300 GBs in parquet each. We build PBI reports on top of Azure Synapse (DirectQuery) and experience performance issues on slicing/dicing and especially on calculating multiple KPIs. In the same time data volume is pretty expensive to be kept in Azure Analysis Services. Because of number of dimensions, the fact table can't be aggregated significantly, so PBI import mode or composite model isn't an option as well.
Azure Synapse Analytics faciliates OLAP operations, like GROUP BY ROLLUP/CUBE/GROUPING SETS.
How can I benefit from Synapse's OLAP operations support?
Is that possible to pre-calculate OLAP cubes inside Synapse in order to boost PBI reports performance? How?
If the answer is yes, is that recomended to pre-calculate KPIs? Means moving KPIs definition to DWH OLAP cube level - is it an anti-pattern?
P.S. using separate aggreagations for each PBI visualisation is not an option, it's more an exception from the rule. Synapse is clever enough to take the benefit from materialized view aggregation even on querying a base table, but this way you can't implement RLS and managing that number of materialized views also looks cumbersome.
Upd for #NickW
Could you please answer the following sub-questions:
Have I got it right - OLAP operations support is mainly for downstream cube providers, not for Warehouse performance?
Is spawning Warehouse with materialized views in order to boost performance is considered a common practice or an anti-pattern? I've found (see the link) Power BI can create materialized views automatically based on query patterns. Still I'm afraid it won't be able to provide a stable testable solution, and RLS support again.
Is KPIs pre-calculation at Warehouse side considered as a common way or an anti-pattern? As I understand this is usually done no cube provider side, but if I haven't got one?
Do you see any other options to boost the performance? I can think only about reducing query parallelism by using PBI composite model and importing all dimensions to PBI. Not sure if it'd help.
Synapse Result Set Caching and Materialized Views can both help.
In the future the creation and maintence of Materialized Views will be automated.
Azure Synapse will automatically create and manage materialized views
for larger Power BI Premium datasets in DirectQuery mode. The
materialized views will be based on usage and query patterns. They
will be automatically maintained as a self-learning, self-optimizing
system. Power BI queries to Azure Synapse in DirectQuery mode will
automatically use the materialized views. This feature will provide
enhanced performance and user concurrency.
https://learn.microsoft.com/en-us/power-platform-release-plan/2020wave2/power-bi/synapse-integration
Power BI Aggregations can also help. If there are a lot of dimensions, select the most commonly used to create aggregations.
to hopefully answer some of your questions...
You can't pre-calculate OLAP cubes in Synapse; the closest you could get is creating aggregate tables and you've stated that this is not a viable solution
OLAP operations can be used in queries but don't "pre-build" anything that can be used by other queries (ignoring CTEs, sub-queries, etc.). So if you have existing queries that don't use these functions then re-writing them to use these functions might improve performance - but only for each specific query
I realise that your question was about OLAP but the underlying issue is obviously performance. Given that OLAP is unlikely to be a solution to your performance issues, I'd be happy to talk about performance tuning if you want?
Update 1 - Answers to additional numbered questions
I'm not entirely sure I understand the question so this may not be an answer: the OLAP functions are there so that it is possible to write queries that use them. There can be an infinite number of reasons why people might need to to write queries that use these functions
Performance is the main (only?) reason for creating materialised views. They are very effective for creating datasets that will be used frequently i.e. when base data is at day level but lots of reports are aggregated at week/month level. As stated by another user in the comments, Synapse can manage this process automatically but whether it can actually create aggregates that are useful for a significant proportion of your queries is obviously entirely dependent on your particular circumstances.
KPI pre-calculation. In a DW any measures that can be calculated in advance should be (by your ETL/ELT process). For example, if you have reports that use Net Sales Amount (Gross Sales - Tax) and your source system is only providing Gross Sales and Tax amounts then your should be calculating Net Sales as a measure when loading your fact table. Obviously there are KPIs that can't be calculated in advance (i.e. probably anything involving averages) and these need to be defined in your BI tool
Boosting Performance: I'll cover this in the next section as it is a longer topic
Boosting Performance
Performance tuning is a massive subject - some areas are generic and some will be specific to your infrastructure; this is not going to be a comprehensive review but will highlight a few areas you might need to consider.
Bear in mind a couple of things:
There is always an absolute limit on performance - based on your infrastructure - so even in a perfectly tuned system there is always going to be a limit that may not be what you hoped to achieve. However, with modern cloud infrastructure the chances of you hitting this limit are very low
Performance costs money. If all you can afford is a Mini then regardless of how well you tune it, it is never going to be as fast as a Ferrari
Given these caveats, a few things you can look at:
Query plan. Have a look at how your queries are executing and whether there are any obvious bottlenecks you can then focus on. This link give some further information Monitor SQL Workloads
Scale up your Synapse SQL pool. If you throw more resources at your queries they will run quicker. Obviously this is a bit of a "blunt instrument" approach but worth trying once other tuning activities have been tried. If this does turn out to give you acceptable performance you'd need to decide if it is worth the additional cost. Scale Compute
Ensure your statistics are up to date
Check if the distribution mechanism (Round Robin, Hash) you've used for each table is still appropriate and, on a related topic, check the skew on each table
Indexing. Adding appropriate indexes will speed up your queries though they also have a storage implication and will slow down data loads. This article is a reasonable starting point when looking at your indexing: Synapse Table Indexing
Materialised Views. Covered previously but worth investigating. I think the automatic management of MVs may not be out yet (or is only in public preview) but may be something to consider down the line
Data Model. If you have some fairly generic facts and dimensions that support a lot of queries then you might need to look at creating additional facts/dimensions just to support specific reports. I would always (if possible) derive them from existing facts/dimensions but you can create new tables by dropping unused SKs from facts, reducing data volumes, sub-setting the columns in tables, combining tables, etc.
Hopefully this gives you at least a starting point for investigating your performance issues.

Issues while building Tabular Data Model from Vertica

I have been assigned a new project where I need to prepare a PowerBI report using Azure Analysis Services (Data mart). Here the flow is Data from Vertica DW -> Azure Analysis Services (via tabular Model)-> PowerBI. I am pretty much new to Tabular Model and Vertica
Scenario:
1) The DW is in Vertica Platform online.
2) I am trying to build a data model using Analysis Services Tabular Project in VS 2019
3) This model will be deployed on Azure which will act as data source to PowerBI
4) I cannot select individual tables directly (from Vertica) while performing "Import from Data Source". I have to use a view here.
5) I have been given a single big table with around 30 columns as a source from Vertica
Concerns:
1) While importing data from Vertica, there is no option to "Transform" it as we used to have it in PowerBI Query Editor while importing data. However, I tried to import a local file and at this time, I could find this option
2) with reference to Scenario #5, how can I split the big table in various Dimensions in Model.bim? Currently, I am adding them as calculated tables. Is this optimal way or you guys can suggest something better?
Also, any good online material where I can get my hands dirty on modeling in Analysis Services Tabular Project (I can do it very well in PowerBI)?
Thanks in advance
Regards
My personal suggestion is to avoid using Visual Studio as hell. Unfortunately, it is not only useless but also damages you.
Instead, use Tabular Editor. From there you can easily work with the Tabular Model.
My personal suggestion is to avoid using calculated table as dimensions, instead create several tables in Tabular Editor and simply modify the source query / fields.
In reference to the 1st question, I believe there is some bug while connecting Vertica with PowerBI it works perfectly elsewhere except for this combination.
For #2, I can use I can choose "Import new tables" from the connected data source. It can be found under Tabular Editor View.

Access Excel Data Model (Power Query) tables from ODBC

We can access Excel data using ODBC (Excel ODBC driver). Can we also access the data in the data model (i.e. Power Query tables)? Basically I am thinking about (mis)using Excel/Power Query as a database and let an external application retrieve data from it (using SQL).
To read from Sheet1 I can do:
SELECT ... FROM [Sheet1$]
but
SELECT ... FROM [table in data model]
does not seem to work for me. Is this supposed to work or is this not supported at all?
There is a ton of information about Power Query using ODBC to import data. Here I am looking at the other way around.
You should distinguish for yourself Power Query tables and Data Model (Power Pivot) tables. You can set up some PQ tables as tables, loadable to DM, so data will be "transferred" from PQ to DM only for that particular tables.
I'm pretty sure that it is impossible to get data from "PQ only" tables. You can just get m queries (not their results) via VBA or unpacking Excel.
Regarding PP (DM) tables. Actually, there is Analytical Services (VertiPac) engine inside Excel (just in case - as well inside PowerBI Desktop). So as soon as you start Excel or PBI, you actually start AS engine instance as well. The data in it are reachable via:
Excel VBA (Visual Basic for Applications). You have Thisworkbook.Model.DataModelConnection.* API, and can get to data itself and to model as well. This is the only "official" way to get the data programmatically.
Power Query - as Analytical Services data source. This is unofficial way, but I read, that Microsoft told that they are not going to close it in the future (but you never know :-)). E.g. Dax Studio can do that - https://www.sqlbi.com/tools/dax-studio/.
Unfortunatelly, while getting to PBI AS service is quite easy, I don't know how to get to Excel AS service without Dax Studio. As far as I understand, the main problem here is how to get an AS port number, launched by Excel. But I hope that this info will at least help you understand the way for further searching, if you want to go Power Query way. Or may be it is reasonable to use Power BI Desktop for the task.
Excel is just a zip file, so definitely AS files are inside of it. I never went this way, but you can observe what is inside exel zip - possibly the AS files may be in some useful form there.

Dynamic SQL versus using the model

We started using COGNOS about 3 years ago. We have used COGNOS 8 and are now on COGNOS 10. We are constantly being told that using dynamic SQL queries instead of using the COGNOS model is extremely bad in that it causes performance issues and that it is not recommended by IBM. We have never had a problem that was specific to dynamic SQL and they perform just as good as reports that use the model.
Are there any performance issues or drawbacks that are specific to dynamic SQL queries? Is it really recommended by IBM that they not be used?
I understand that the model is great for at-hoc reporting and for users who do not know SQL. But for developers, the dynamic SQL seems to be a better option especially if they do not have any control over the COGNOS model. (We have to request and document needed changes the model)
Appreciate your comments/feedback.
Manually building your queries with Dynamic SQL may we worse for many reasons (extensability, maintainability, reusability), but performance wise it is only limited by your own SQL query writing abilities. This means in some cases it will be faster than using the Cognos model. There are no speed disadvantages to using dynamic SQL.
That being said, you are missing alot of the benefits of Cognos if you are not leveraging the model. Your ability to maintain consistency, make broad changes without rewriting reports, and quickly produce new reports will be severely diminished with Dynamic SQL.
If your environment is small, dynamic sql may meet your needs. Especially for odd one-off reports that use tables and relationships that have little to do with your other reports. Or if there is a specific way you want to force indexes to be used, this may be achieved with dynamic sql.
Edit: It is important to note that criteria established in Report Studio Filters will not be passed into your Dynamic SQL queries until after the data has been retrieved. For large data sets this can be extremely inefficient. In order to pass criteria into your Dynamic SQL from your prompts, use #prompt('yourPromptVariableNamehere')# or #promptmany('yourMultiSelectPromptVariablehere')#. A rule of thumb is this, run your Dynamic SQL query outside of cognos and see how much data is being returned. If you have a giant sales query that at a minimum needs to be filtered on date or branch, put a Prompt in the prompt page to force the user to select a specific date/period/date range/branch/etc. into your prompts, and add the criteria into your Dynamic SQL Statement with the prompt/promptmany syntax. Prompts can still be used as regular filters inside your Report Studio queries, but all of that criteria is filtered AFTER the result set is returned from the database if you are using Dynamic Queries without prompt/promptmany.
When it comes to performance, when you introduce dynamic SQL, it wont be able to use the caching abilities that Cognos offers (system wise).
On the other hand, its obvious that you can tune the SQL better than the machine.
I wouldn't say dynamic SQL can cause performance issues in general.
IBM doesn't recommend dynamic SQL because only with a proper model, build with framework manager, you can use all the features of Cognos.

is Sharepoint and AnalysisServices required in BI Semantic Model?

Im new to ReportModels. I'm was planning to test it on our new SSRS2012 and I just found out microsoft already depreciated this feature.
Furthur reading, it was replaced by BI Semantic Model. Long story short, I can seem to confirm if we need to setup Sharepoint for this to work.
1.) Is ShareportServices required for the BISM to work?
2.) do we also need Analysis Services for BISM to work too?
thanks
Perhaps this white paper will help you.
BISM refers to a couple of related technologies/tools, so the answers to your questions are not a simple yes/no. As usual, it depends...
BI Semantic models can be in the form of individual Power Pivot models inside of Excel, shared in the same manner you would share any Excel file. They can also refer to Power Pivot models inside of SharePoint. Or they can refer to an SSAS Tabular model, which can be consumed with Power View inside of SharePoint or Power View in Excel (or just base Excel, or for that matter SSRS).
So if you are using Power Pivot in SharePoint or using Power View in SharePoint, then you will need SharePoint services. If you are going to use SSAS Tabular, you will need SSAS. If you are using Power Pivot in SharePoint, you need to install SSAS for Power Pivot and configure Power Pivot.

Resources