Dynamic SQL versus using the model - cognos

We started using COGNOS about 3 years ago. We have used COGNOS 8 and are now on COGNOS 10. We are constantly being told that using dynamic SQL queries instead of using the COGNOS model is extremely bad in that it causes performance issues and that it is not recommended by IBM. We have never had a problem that was specific to dynamic SQL and they perform just as good as reports that use the model.
Are there any performance issues or drawbacks that are specific to dynamic SQL queries? Is it really recommended by IBM that they not be used?
I understand that the model is great for at-hoc reporting and for users who do not know SQL. But for developers, the dynamic SQL seems to be a better option especially if they do not have any control over the COGNOS model. (We have to request and document needed changes the model)
Appreciate your comments/feedback.

Manually building your queries with Dynamic SQL may we worse for many reasons (extensability, maintainability, reusability), but performance wise it is only limited by your own SQL query writing abilities. This means in some cases it will be faster than using the Cognos model. There are no speed disadvantages to using dynamic SQL.
That being said, you are missing alot of the benefits of Cognos if you are not leveraging the model. Your ability to maintain consistency, make broad changes without rewriting reports, and quickly produce new reports will be severely diminished with Dynamic SQL.
If your environment is small, dynamic sql may meet your needs. Especially for odd one-off reports that use tables and relationships that have little to do with your other reports. Or if there is a specific way you want to force indexes to be used, this may be achieved with dynamic sql.
Edit: It is important to note that criteria established in Report Studio Filters will not be passed into your Dynamic SQL queries until after the data has been retrieved. For large data sets this can be extremely inefficient. In order to pass criteria into your Dynamic SQL from your prompts, use #prompt('yourPromptVariableNamehere')# or #promptmany('yourMultiSelectPromptVariablehere')#. A rule of thumb is this, run your Dynamic SQL query outside of cognos and see how much data is being returned. If you have a giant sales query that at a minimum needs to be filtered on date or branch, put a Prompt in the prompt page to force the user to select a specific date/period/date range/branch/etc. into your prompts, and add the criteria into your Dynamic SQL Statement with the prompt/promptmany syntax. Prompts can still be used as regular filters inside your Report Studio queries, but all of that criteria is filtered AFTER the result set is returned from the database if you are using Dynamic Queries without prompt/promptmany.

When it comes to performance, when you introduce dynamic SQL, it wont be able to use the caching abilities that Cognos offers (system wise).
On the other hand, its obvious that you can tune the SQL better than the machine.
I wouldn't say dynamic SQL can cause performance issues in general.
IBM doesn't recommend dynamic SQL because only with a proper model, build with framework manager, you can use all the features of Cognos.

Related

Precalculate OLAP cube inside Azure Synapse

We have dimensinal model with fact tables of 100-300 GBs in parquet each. We build PBI reports on top of Azure Synapse (DirectQuery) and experience performance issues on slicing/dicing and especially on calculating multiple KPIs. In the same time data volume is pretty expensive to be kept in Azure Analysis Services. Because of number of dimensions, the fact table can't be aggregated significantly, so PBI import mode or composite model isn't an option as well.
Azure Synapse Analytics faciliates OLAP operations, like GROUP BY ROLLUP/CUBE/GROUPING SETS.
How can I benefit from Synapse's OLAP operations support?
Is that possible to pre-calculate OLAP cubes inside Synapse in order to boost PBI reports performance? How?
If the answer is yes, is that recomended to pre-calculate KPIs? Means moving KPIs definition to DWH OLAP cube level - is it an anti-pattern?
P.S. using separate aggreagations for each PBI visualisation is not an option, it's more an exception from the rule. Synapse is clever enough to take the benefit from materialized view aggregation even on querying a base table, but this way you can't implement RLS and managing that number of materialized views also looks cumbersome.
Upd for #NickW
Could you please answer the following sub-questions:
Have I got it right - OLAP operations support is mainly for downstream cube providers, not for Warehouse performance?
Is spawning Warehouse with materialized views in order to boost performance is considered a common practice or an anti-pattern? I've found (see the link) Power BI can create materialized views automatically based on query patterns. Still I'm afraid it won't be able to provide a stable testable solution, and RLS support again.
Is KPIs pre-calculation at Warehouse side considered as a common way or an anti-pattern? As I understand this is usually done no cube provider side, but if I haven't got one?
Do you see any other options to boost the performance? I can think only about reducing query parallelism by using PBI composite model and importing all dimensions to PBI. Not sure if it'd help.
Synapse Result Set Caching and Materialized Views can both help.
In the future the creation and maintence of Materialized Views will be automated.
Azure Synapse will automatically create and manage materialized views
for larger Power BI Premium datasets in DirectQuery mode. The
materialized views will be based on usage and query patterns. They
will be automatically maintained as a self-learning, self-optimizing
system. Power BI queries to Azure Synapse in DirectQuery mode will
automatically use the materialized views. This feature will provide
enhanced performance and user concurrency.
https://learn.microsoft.com/en-us/power-platform-release-plan/2020wave2/power-bi/synapse-integration
Power BI Aggregations can also help. If there are a lot of dimensions, select the most commonly used to create aggregations.
to hopefully answer some of your questions...
You can't pre-calculate OLAP cubes in Synapse; the closest you could get is creating aggregate tables and you've stated that this is not a viable solution
OLAP operations can be used in queries but don't "pre-build" anything that can be used by other queries (ignoring CTEs, sub-queries, etc.). So if you have existing queries that don't use these functions then re-writing them to use these functions might improve performance - but only for each specific query
I realise that your question was about OLAP but the underlying issue is obviously performance. Given that OLAP is unlikely to be a solution to your performance issues, I'd be happy to talk about performance tuning if you want?
Update 1 - Answers to additional numbered questions
I'm not entirely sure I understand the question so this may not be an answer: the OLAP functions are there so that it is possible to write queries that use them. There can be an infinite number of reasons why people might need to to write queries that use these functions
Performance is the main (only?) reason for creating materialised views. They are very effective for creating datasets that will be used frequently i.e. when base data is at day level but lots of reports are aggregated at week/month level. As stated by another user in the comments, Synapse can manage this process automatically but whether it can actually create aggregates that are useful for a significant proportion of your queries is obviously entirely dependent on your particular circumstances.
KPI pre-calculation. In a DW any measures that can be calculated in advance should be (by your ETL/ELT process). For example, if you have reports that use Net Sales Amount (Gross Sales - Tax) and your source system is only providing Gross Sales and Tax amounts then your should be calculating Net Sales as a measure when loading your fact table. Obviously there are KPIs that can't be calculated in advance (i.e. probably anything involving averages) and these need to be defined in your BI tool
Boosting Performance: I'll cover this in the next section as it is a longer topic
Boosting Performance
Performance tuning is a massive subject - some areas are generic and some will be specific to your infrastructure; this is not going to be a comprehensive review but will highlight a few areas you might need to consider.
Bear in mind a couple of things:
There is always an absolute limit on performance - based on your infrastructure - so even in a perfectly tuned system there is always going to be a limit that may not be what you hoped to achieve. However, with modern cloud infrastructure the chances of you hitting this limit are very low
Performance costs money. If all you can afford is a Mini then regardless of how well you tune it, it is never going to be as fast as a Ferrari
Given these caveats, a few things you can look at:
Query plan. Have a look at how your queries are executing and whether there are any obvious bottlenecks you can then focus on. This link give some further information Monitor SQL Workloads
Scale up your Synapse SQL pool. If you throw more resources at your queries they will run quicker. Obviously this is a bit of a "blunt instrument" approach but worth trying once other tuning activities have been tried. If this does turn out to give you acceptable performance you'd need to decide if it is worth the additional cost. Scale Compute
Ensure your statistics are up to date
Check if the distribution mechanism (Round Robin, Hash) you've used for each table is still appropriate and, on a related topic, check the skew on each table
Indexing. Adding appropriate indexes will speed up your queries though they also have a storage implication and will slow down data loads. This article is a reasonable starting point when looking at your indexing: Synapse Table Indexing
Materialised Views. Covered previously but worth investigating. I think the automatic management of MVs may not be out yet (or is only in public preview) but may be something to consider down the line
Data Model. If you have some fairly generic facts and dimensions that support a lot of queries then you might need to look at creating additional facts/dimensions just to support specific reports. I would always (if possible) derive them from existing facts/dimensions but you can create new tables by dropping unused SKs from facts, reducing data volumes, sub-setting the columns in tables, combining tables, etc.
Hopefully this gives you at least a starting point for investigating your performance issues.

JOOQ vs SQL Queries

I am on jooq queries now...I feel the SQL queries looks more readable and maintainable and why we need to use JOOQ instead of using native SQL queries.
Can someone explains few reason for using the same?
Thanks.
Here are the top value propositions that you will never get with native (string based) SQL:
Dynamic SQL is what jOOQ is really really good at. You can compose the most complex queries dynamically based on user input, configuration, etc. and still be sure that the query will run correctly.
An often underestimated effect of dynamic SQL is the fact that you will be able to think of SQL as an algebra, because instead of writing difficult to compose native SQL syntax (with all the keywords, and weird parenthesis rules, etc.), you can think in terms of expression trees, because you're effectively building an expression tree for your queries. Not only will this allow you to implement more sophisticated features, such as SQL transformation for multi tenancy or row level security, but every day things like transforming a set of values into a SQL set operation
Vendor agnosticity. As soon as you have to support more than one SQL dialect, writing SQL manually is close to impossible because of the many subtle differences in dialects. The jOOQ documentation illustrates this e.g. with the LIMIT clause. Once this is a problem you have, you have to use either JPA (much restricted query language: JPQL) or jOOQ (almost no limitations with respect to SQL usage).
Type safety. Now, you will get type safety when you write views and stored procedures as well, but very often, you want to run ad-hoc queries from Java, and there is no guarantee about table names, column names, column data types, or syntax correctness when you do SQL in a string based fashion, e.g. using JDBC or JdbcTemplate, etc. By the way: jOOQ encourages you to use as many views and stored procedures as you want. They fit perfectly in the jOOQ paradigm.
Code generation. Which leads to more type safety. Your database schema becomes part of your client code. Your client code no longer compiles when your queries are incorrect. Imagine someone renaming a column and forgetting to refactor the 20 queries that use it. IDEs only provide some degree of safety when writing the query for the first time, they don't help you when you refactor your schema. With jOOQ, your build fails and you can fix the problem long before you go into production.
Documentation. The generated code also acts as documentation for your schema. Comments on your tables, columns turn into Javadoc, which you can introspect in your client language, without the need for looking them up in the server.
Data type bindings are very easy with jOOQ. Imagine using a library of 100s of stored procedures. Not only will you be able to access them type safely (through code generation), as if they were actual Java code, but you don't have to worry about the tedious and useless activity of binding each single in and out parameter to a type and value.
There are a ton of more advanced features derived from the above, such as:
The availability of a parser and by consequence the possibility of translating SQL.
Schema management tools, such as diffing two schema versions
Basic ActiveRecord support, including some nice things like optimistic locking.
Synthetic SQL features like type safe implicit JOIN
Query By Example.
A nice integration in Java streams or reactive streams.
Some more advanced SQL transformations (this is work in progress).
Export and import functionality
Simple JDBC mocking functionality, including a file based database mock.
Diagnostics
And, if you occasionally think something is much simpler to do with plain native SQL, then just:
Use plain native SQL, also in jOOQ
Disclaimer: As I work for the vendor, I'm obviously biased.

Is Cassandra just a storage engine?

I've been evaluating Cassandra to replace MySQL in our microservices environment, due to MySQL being the only portion of the infrastructure that is not distributed. Our needs are both write and read intensive as it's a platform for exchanging raw data. A type of "bus" for lack of better description. Our selects are fairly simple and should remain that way, but I'm already struggling to get past some basic filtering due to the extreme limitations of select queries.
For example, if I need to filter data it has to be in the key. At that point I can't change data in the fields because they're part of the key. I can use a SASI index but then I hit a wall if I need to filter by more than one field. The hope was that materialized views would help with this but in another post I was told to avoid them, due to some instability and problematic behavior.
It would seem that Cassandra is good at storage but realistically, not good as a standalone database platform for non-trivial applications beyond very basic filtering (i.e. a single field.) I'm guessing I'll have to accept the use of another front-end like Elastic, Solr, etc. The other option might be to accept the idea of filtering data within application logic, which is do-able, as long as the data sets coming back remain small enough.
Apache Cassandra is far more than just a storage engine. Its design is a distributed database oriented towards providing high availability and partition tolerance which can limit query capability if you want good and reliable performance.
It has a query engine, CQL, which is quite powerful, but it is limited in a way to guide user to make effective queries. In order to use it effectively you need to model your tables around your queries.
More often than not, you need to query your data in multiple ways, so users will often denormalize their data into multiple tables. Materialized views aim to make that user experience better, but it has had its share of bugs and limitations as you indicated. At this point if you consider using them you should be aware of their limitations, although that is generally good idea for evaluating anything.
If you need advanced querying capabilities or do not have an ahead of time knowledge of what the queries will be, Cassandra may not be a good fit. You can build these capabilities using products like Spark and Solr on top of Cassandra (such as what DataStax Enterprise does), but it may be difficult to achieve using Cassandra alone.
On the other hand there are many use cases where Cassandra is a great fit, such as messaging, personalization, sensor data, and so on.

Using Cognos 10.1 which is better an Inner Join or an "IN" Filter?

I'm using Cognos 10.1 and I have a report that uses two queries each with the same primary key.
Query 1: UniqueIds
Query 2: DetailedInfo
I'm not sure how to tell whether it's better build a report using the DetailedInfo query with a filter that says PrimaryKey in (UniqueIds.PrimaryKey) or should I create a third query that joins UniqueIds to DetailedInfo on PrimaryKey.
I'm new to Cognos and I'm learning to think differently. Using MicroSoft SQL Server I'd just use an inner join.
So my question is, in Cognos 10.1 which way is better and how can tell what the performance differences are?
You'd better start from the beginning.
You queries (I hope Query Subjects) should be joined in Framework Manager, in a model. Then you can easily filter second query by applying filters to first query.
Joins in Report Studio is the last solution.
The report writers ultimate weapon is a well indexed data warehouse, with a solid framework model built on top.
You want all of your filtering and joining to happen on the database side as much as possible. If not, then large data sets are brought over to the Cognos server before they are joined and filtered by Cognos.
The more work that happens on the database, the faster your reports will be. By building your reports in certain ways, you can mitigate Cognos side processing, and promote database side processing.
The first and best way to do this is with a good Framework Model, as Alexey pointed out. This will allow your reports to be simpler, and pushes most of the work to the database.
However a good model still exposes table keys to report authors so that they can have the flexibility to create unique data sets. Not every report warrants a new Star Schema, and sometimes you want to join the results of queries against two different Star Schema sources.
When using a join or a filter, Cognos attempts to push all of the work to the database as a default. It wants to have the final data set sent to it, and nothing else.
However when creating your filters, you have two ways of defining variables... with explicit names that refer to modeled data sources (ie. [Presentation View].[Sales].[Sales Detail].[Net Profit] ) or by referring to a column in the current data set (such as [Net Profit] ). Using explicit columns from the model will help ensure the filters are applied at the database.
Sometimes that is not possible, such as with a calculated column. For example, if you dont have Net Profit in your database or within your model, you may establish it with a Calculated column. If you filter on [Net Profit] > 1000, Cognos will pull the dataset into Cognos before applying your filter. Your final result will be the same, but depending on the size of data before and after the filter is applied, you could see a performance decrease.
It is possible to have nested queries within your report, and cognos will generate a single giant SQL statement for the highest level query, which includes sub queries for all the lower level data. You can generate SQL/MDX in order to see how Cognos is building the queries.
Also, try experimenting. Save your report with a new name, try it one way and time it. Run it a few times and take an average execution speed. Time it again with the alternate method and compare.
With smaller data sets, you are unlikely to see any difference. The larger your data set gets, the bigger a difference your method will affect the report speed.
Use joins to merge two queries together so that columns from both queries can be used in the report. Use IN() syntax if your only desire is to filter one query using the existence of corresponding rows in a second. That said, there are likely to be many cases that both methods will be equally performant, depending on the number of rows involved, indexes etc.
By the way, within a report Cognos only supports joins and unions between different queries. You can reference other queries directly in filters even without an established relationship but I've seen quirks with this, like it works when run interactively but not scheduled or exported. I would avoid doing this in reports.

Can I query SAP BO WEBI via Excel VBA? Can I do it fast enough?

Following up on my previous post, I need to be able to query a database of 6M+ rows in the fastest way possible, so that this DB can be effectively used as a "remote" data source for a dynamic Excel report.
Like I said, normally I would store the data I need on a separate (perhaps hidden) worksheet and I would manipulate it through a second "control" sheet. This time, the size (i.e. number of rows) of my database prevents me from doing so (as you all know, excel cannot handle more than 1,4M rows).
The solution my IT guy put in place consists of holding the data on a txt file inside of a network folder. This far, I managed to query this file through ADO (slow but no mantainance needed) or to use it as a source to populate an indexed Access table, which I can then query (faster but requires more mantainance & additional software).
I feel both solutions, although viable, are sub-optimal. Plus it seems to me as all of this is but an unnecessary overcomplication. The txt file is actually an export from SAP BO, which the IT guy has access to through WEBI. Now, can't I just query the BO database through WEBI myself in a "dynamic" kind of way?
What I'm trying to say is, why can't I extract only bits of information at a time, on a need-to-know basis and directly from the primary source, instead of having all of the data transfered in bulk on a secondary/duplicate database?
Is this sort of "dynamic" queries even possible? Or will the "processing" times hinder the success of my approach? I need this whole thing to really feel istantaneuos, as if the data was already there and I'm not actually retrieving it all the times.
And most of all, can I do this through VBA? Unfortunately that's the only thing I will be having access to, I can't do this BO-side.
I'd like to thank you guys in advance for whatever help you can grant me!
Webi (short for Web Intelligence) is a front-end analytical reporting application from Business Objects. Your IT contact apparently has created (or has access to) such a Webi document, which retrieves data through a universe (an abstraction layer) from a database.
One way that you could use the data retrieved by Web Intelligence as a source and dynamically request bits instead of retrieving all information in one go, it to use a feature called BI Web Service. This will make data from Webi available as a web service, which you could then retrieve from within Excel. You can even make this dynamic by adding prompts which would put restrictions on the data retrieved.
Have a look at this page for a quick overview (or Google Web Intelligence BI Web Service for other tutorials).
Another approach could be to use the SDK, though as you're trying to manipulate Web Intelligence, your only language options are .NET or Java, as the Rebean SDK (used to talk to Webi) is not available for COM (i.e. VBA/VBScript/…).
Note: if you're using BusinessObjects BI 4.x, remember that the Rebean SDK is actually deprecated and replaced by a REST SDK. This could make it possible to approach Webi using VBA after all.
That being said, I'm not quite sure if this is the best approach, as you're actually introducing several intermediate layers:
Database (holding the data you want to retrieve)
Universe (semantic abstraction layer)
Web Intelligence
A way to get data out of Webi (manual export, web service, SDK, …)
Excel
Depending on your license and what you're trying to achieve, Xcelsius or Design Studio (BusinessObjects BI 4.x) could also be a viable alternative to the Excel front-end, thereby eliminating layers 3 to 4 (and replacing layer 5). The former's back-end is actually heavily based on Excel (although there's no VBA support). Design Studio allows scripting in JavaScript.

Resources