Power Bi data source, optimal query option: copy, duplicate or reference? - reference

I have a PBI file, in which I made a power query with several steps (such as merging two files).
I would like to produce several output files originating from this query, but specific steps for these files (I made a series of specific change to the data in these queries).
If I refresh my PBI file, I would like that all the queries from the original one to the three ones originating from this one get impacted.
I would also like to have impacts on the three other queries if I add a new step in the original query.
So far I used copy:
I took my original query, I did a right click and simply used the "copy" option. However, this is duplicating the previously merged files used to create the query at the same time.
I see that there is also the option "Duplicate" and "reference" in Power BI.
I tried doing some research and I read the following on "Duplicate":
"Duplicate copies a query with all the applied steps of it as a new query; an exact copy."
For me this seemed exactly that same as a "copy", I thus expected that I would get a copy of the previously merged files when I duplicated the query. But no. I tested it and only the selected query got duplicated.
When I did the test for "reference" however, my query appeared, only the result this time, (not the data use to create it), but it had no steps. When I try to click on "source", I cannot "see" the source.
I am thus puzzled as to the best way forward, and more broadly the best cases and practices to adopt.
What option could I choose enabling PBI to operate the same steps each time I refresh my source, i.e merging the two files and then doing a series of specific steps on three copies of my source ?

I suspect you want to do the following:
Load Universe 1 and Universe 2
Merge them into a single table Merge1
Create three new queries that reference Merge1
Create specific steps for each of the new queries
This way, each of the new queries starts at exactly the same place without having to load Universe 1 and Universe 2 three separate times.
If you go to your query dependencies view, it should look like this:
Notice that I've disabled load (to model) for the top 3 tables since we probably only need them as staging tables.

Related

Join feature from two different index

We need data form two different azure search indexes,since we were not able to find any option to join indexes currently,we are replicating data from different indexes,to another new index,because of which we are facing issues to keep redundant data in sync in multiple indexes and also the cost aspect to maintain data in new index
Is there any other better option other than our current solution for our use case

Appened Query and Filter Variables to Output in Oracle OBIEE

I frequently use OBIEE to create custom analyses from predefined subject areas. I'm often making extensive use of filters as I'm typically pulling data on a very specific issue from a huge dataset. A problem I have repeatedly come across is trying to recreate a previous analyses complete with the filter variables.
Obviously if it's something I know I'll come back too, I'll save the query. But that's often not the case. Maybe a month or two will go by and I'll need to generate a new version of a previous report to compare with the original and I end up not being able to trust the output because I'm not sure that it's using the same variables.
Is there a way to append the query and filter variables to the report itself so it can be easily referenced and recreated?
Additional info.
* I almost exclusively output the data from OBIEE as a .csv and do most of the work from excel pivot tables I save in .xlsx.
* I'm not a DBA.
a) You can always save filters as presentation catalog objects instead of clicking them together from zero every single time. Think LEGO bricks for adults. OBIEE is based on the concept of stored and reusable objects.
b) You will never have your filters in your CSV output since CSV is a raw data output and not a formatted / graphical one. If you were using a graphical analysis export you could always add the "Filters View" to your results.
Long story short since you're using OBIEE as a data dump tool and are circumventing what the tool is designed to do and how it is supposed to function you're constraining yourself in terms of the benefits and the functionality you can get from it.

How can I implement an iterative optimization problem in Spark

Assume I have the following two sets of data. I'm attempting to associate products on hand with their rolled up tallies. For a roll up tally you may have products made of multiple categories with a primary and alternative category. In a relational database I would load the second set of data into a temporary table use a stored procedure to iterate through the rollup data and decrement the quantities until until they were zero or I had matched the tallies. I'm trying to implement a solution in Spark/PySpark and I'm not entirely sure where to start. I've attached a possible output solution that I'm trying to achieve though I recognize there are multiple outputs that would work.
#Rolled Up Quantities#
owner,category,alternate_category,quantity
ABC,1,4,50
ABC,2,3,25
ABC,3,2,15
ABC,4,1,10
#Actual Stock On Hand#
owner,category,product_id,quantity
ABC,1,123,30
ABC,2,456,20
ABC,3,789,20
ABC,4,012,30
#Possible Solution#
owner,category,product_id,quantity
ABC,1,123,30
ABC,1,012,20
ABC,2,456,20
ABC,2,789,5
ABC,3,789,15
ABC,4,012,10

Cassandra Schema Design - Handling Merging of Similar but Differing Source Data Sets

I'm working on a project to merge data from multiple database tables and files into cassandra. This will come from different sources such as flat files, sql db's, etc.
Problem Statement: Most of these source files are similar, however, there are some differences and I want to merge each of these into a single cassandra table. There are about 50 similar fields and an extra 20 fields that don't coexist. My thought is that I can merge them all and just add all of the fields and leave them as tombstones if not populated. The other option would be to merge the same fields into cassandra and then for the fields that are different to add a map column; however, I don't know if there is really any benefit in doing this other than looking nicer.
Any ideas/advice from people who have dealt with this?
What you need is an ETL tool (Extract/Transform/Load) to combine, clean and or standardize the data, and use Cassandra as your repository. There are multiple tools in the market that can provide you this functionality, (a google search for "ETL tools" can give you an overwhelming amount of resources to choose from).
As a personal preference check https://nifi.apache.org/ , you can define those transformations and filtering as workflows

Using Cognos 10.1 which is better an Inner Join or an "IN" Filter?

I'm using Cognos 10.1 and I have a report that uses two queries each with the same primary key.
Query 1: UniqueIds
Query 2: DetailedInfo
I'm not sure how to tell whether it's better build a report using the DetailedInfo query with a filter that says PrimaryKey in (UniqueIds.PrimaryKey) or should I create a third query that joins UniqueIds to DetailedInfo on PrimaryKey.
I'm new to Cognos and I'm learning to think differently. Using MicroSoft SQL Server I'd just use an inner join.
So my question is, in Cognos 10.1 which way is better and how can tell what the performance differences are?
You'd better start from the beginning.
You queries (I hope Query Subjects) should be joined in Framework Manager, in a model. Then you can easily filter second query by applying filters to first query.
Joins in Report Studio is the last solution.
The report writers ultimate weapon is a well indexed data warehouse, with a solid framework model built on top.
You want all of your filtering and joining to happen on the database side as much as possible. If not, then large data sets are brought over to the Cognos server before they are joined and filtered by Cognos.
The more work that happens on the database, the faster your reports will be. By building your reports in certain ways, you can mitigate Cognos side processing, and promote database side processing.
The first and best way to do this is with a good Framework Model, as Alexey pointed out. This will allow your reports to be simpler, and pushes most of the work to the database.
However a good model still exposes table keys to report authors so that they can have the flexibility to create unique data sets. Not every report warrants a new Star Schema, and sometimes you want to join the results of queries against two different Star Schema sources.
When using a join or a filter, Cognos attempts to push all of the work to the database as a default. It wants to have the final data set sent to it, and nothing else.
However when creating your filters, you have two ways of defining variables... with explicit names that refer to modeled data sources (ie. [Presentation View].[Sales].[Sales Detail].[Net Profit] ) or by referring to a column in the current data set (such as [Net Profit] ). Using explicit columns from the model will help ensure the filters are applied at the database.
Sometimes that is not possible, such as with a calculated column. For example, if you dont have Net Profit in your database or within your model, you may establish it with a Calculated column. If you filter on [Net Profit] > 1000, Cognos will pull the dataset into Cognos before applying your filter. Your final result will be the same, but depending on the size of data before and after the filter is applied, you could see a performance decrease.
It is possible to have nested queries within your report, and cognos will generate a single giant SQL statement for the highest level query, which includes sub queries for all the lower level data. You can generate SQL/MDX in order to see how Cognos is building the queries.
Also, try experimenting. Save your report with a new name, try it one way and time it. Run it a few times and take an average execution speed. Time it again with the alternate method and compare.
With smaller data sets, you are unlikely to see any difference. The larger your data set gets, the bigger a difference your method will affect the report speed.
Use joins to merge two queries together so that columns from both queries can be used in the report. Use IN() syntax if your only desire is to filter one query using the existence of corresponding rows in a second. That said, there are likely to be many cases that both methods will be equally performant, depending on the number of rows involved, indexes etc.
By the way, within a report Cognos only supports joins and unions between different queries. You can reference other queries directly in filters even without an established relationship but I've seen quirks with this, like it works when run interactively but not scheduled or exported. I would avoid doing this in reports.

Resources