List unsold items per customer - cognos

I am working on a Cognos Report to display a list with Customers and Items that were never purchased by those customers, but I can't reverse the association to find the "excluded" Items.
My relevant tables and relationships are:
Customers 1..1 <--> 0..1 Sales 1..1 <--> 1..1 Items
I have customers A, B and C and products X, Y and Z.
A bought X and Y.
B bought Z.
C never bought anything.
The desired output would be:
___________________
| Customer | Item |
|----------|------|
| A | Z |
| B | X |
| B | Y |
| C | X |
| C | Y |
| C | Z |
|__________|______|
Any out-of-the-box ideas on how to build a query for such report?
Thank you!

You current model won't fit your needs. Try to create a custom "data model" for this query in your report.
Go to Query explorer tab in Report Studio and add 3 queries: Customer, Item, Sales
Join Customer and Item by any field, press "Convert to expression" and set something like 1=1 - emulate cross join.
Left join Sales to result of (1) by item_id and customer_id (you have something like this, right?)
Filter by "Sales. is null" - no sales for pair Item and customer
Result is your dataset.

Related

Optimizing Theta Joins in Spark SQL

I have just 2 tables wherein I need to get the records from the first table (big table 10 M rows) whose transaction date is lesser than or equal to the effective date present in the second table (small table with 1 row), and this result-set will then be consumed by downstream queries.
Table Transact:
tran_id | cust_id | tran_amt | tran_dt
1234 | XYZ | 12.55 | 10/01/2020
5678 | MNP | 25.99 | 25/02/2020
5561 | XYZ | 32.45 | 30/04/2020
9812 | STR | 10.32 | 15/08/2020
Table REF:
eff_dt |
30/07/2020 |
Hence as per logic I should get back the first 3 rows and discard the last record since it is greater than the reference date (present in the REF table)
Hence, I have used a non-equi Cartesian Join between these tables as:
select
/*+ MAPJOIN(b) */
a.tran_id,
a.cust_id,
a.tran_amt,
a.tran_dt
from transact a
inner join ref b
on a.tran_dt <= b.eff_dt
However, this sql is taking forever to complete due to the cross Join with the transact table even using Broadcast hints.
So is there any smarter way to implement the same logic which will be more efficient than this ? In other words, is it possible to optimize the Theta join in this query ?
Thanks in advance.
So I wrote something like this:
Referring from https://databricks.com/session/optimizing-apache-spark-sql-joins
Can you try Bucketing on trans_dt (Bucketed on Year/Month only). And write 2 queries to do the same work
First query, trans_dt(Year/Month) < eff_dt(Year/Month). So this could help you actively picking up buckets(rather than checking each and every record trans_dt) which is less than 2020/07.
second query, trans_dt(Year/Month) = eff_dt(Year/Month) and trans_dt(Day) <= eff_dt(Day)

Making Switch function return a column in a table and not a measure (PowerBI DAX)

What I'm trying to do is to change measures using slicers in Power BI Desktop. I have found some examples of people who have done that (for instance this ) .
What they do is that they create a table where there are ID and Measure names. The 'Measure names' column of this table will be used as field value in the filter of the visualization. Then, they create a switch function that, given a certain value in the filter, switch the value in the filter to a measure.
You can see an example below:
Measure Value = SWITCH(
MIN('Dynamic'[Measure ID]) ,
1,[Max Temp],
2,[Min Temp],
3,[Air Pressure],
4,[Rainfall],
5,[Wind Speed],
6,[Humidity]
)
Where 'Dynamic' is a group containing a measure ID and a Measure name:
Dynamic:
Measure ID | Measure Name
1 | Max Temp
2 | Min Temp
3 | Air Pressure
4 | Rainfall
5 | Wind Speed
6 | Humidity
All of the 'Measure Name' are measures as well.
My problem is: I have too many columns (400!) and I cannot turn them into measures one by one. It will take days. I was thinking that maybe I could use the switch function so that it returns the column in the table and NOT the corresponding measure. However I cannot just insert
'Name of the table'['Name of the column'] in the switch function as result parameter.
Does anyone know how to make the function Switch return a column and not a measure? (Or any other suggestion)
DAX doesn't work well for lots of columns like this, so I'd suggest reshaping your data (in the query editor) by unpivoting all those columns you want to work with so that instead of a table that looks like this
ID | Max Temp | Min Temp | Air Pressure | Rainfall | Wind Speed | Humidity
---+----------+----------+--------------+----------+------------+----------
1 | | | | | |
...
you'd unpivot all those data columns so it looks more like this:
ID | ColumnName | Value
---+--------------+-------
1 | Max Temp |
1 | Min Temp |
1 | Air Pressure |
1 | Rainfall |
1 | Wind Speed |
1 | Humidity |
...
Then you can create a calculated table, Dynamic, to use as your slicer:
Dynamic = DISTINCT ( Unpivoted[ColumnName] )
Now you can write a switching measure like this:
SwitchingMeasure =
VAR ColName = SELECTEDVALUE ( Dynamic[ColumnName] )
RETURN
CALCULATE ( [BaseMeasure], Unpivoted[ColumnName] = ColName )
where [BaseMeasure] is whatever aggregation you're after, e.g., SUM ( TableName[Value] ).

Nested filtering in Excel 2013 pivots

I have following table configuration in my Excel sheet (let's say that it's some kind of shop inventory):
Product | Type | Producer | Cost per unit
Apple | fruit | fruitCo | 5,00
Apple | fruit | bananaCo | 6,00
Banana | fruit | bananaCo | 4,00
T-shirt | clothes | clothsCo | 60,00
Etc.
And I've created a PivotTable from following data, that groups it by:
Filters: Producer, Type
Columns: Product
Rows: <empty>
Values: Sum of Cost
I've got two filters, Producer and Type. When I select a Producer from list (f.e bananaCo), the second filter shows me every kind of Type, even those that are not present in the already selected Producer filtering. Is there any way to make this filtering nested, so when I choose a Producer, only the types of product distributed by the selected producer appear in the Type filter list?
Not sure if this is the problem of not but try clicking on the Product field in the pivot and clicking the field settings button from the ribbon (under Options, Active Field) then Layout & Print in the window that appears.
Make sure Show items with no data is deselected.

postgres join list with $ delimiter

From these tables:
select group, ids
from some.groups_and_ids;
Result:
group | group_ids
---+----
winners | 1$4
losers | 4
others | 2$3$4
and:
select id,name from some.ids_and_names;
id | name
---+----
1 | bob
2 | robert
3 | dingus
4 | norbert
How would you go about returning something like:
winners | bob, norbert
losers | norbert
others | robert, dingus, norbert
with normalized (group_name, id) as (
select group_name, unnest(string_to_array(group_ids,'$')::int[])
from groups_and_ids
)
select n.group_name, string_agg(p.name,',' order by p.name)
from normalized n
join ids_and_names p on p.id = n.id
group by n.group_name;
The first part (the common table expression) normalizes your broken table design by creating a proper view on the groups_and_ids table. The actual query then joins the ids_and_names table to the normalized version of your groups and the aggregates the names again.
Note I renamed group to group_name because group is a reserved keyword.
SQLFiddle: http://sqlfiddle.com/#!15/2205b/2
Is it possible to redesign your database? Putting all the group_ids into one column makes life hard. If your table was e.g.
group | group_id
winners | 1
winners | 4
losers | 4
etc. this would be trivially easy. As it is, the below query would do it, although I hesitated to post it, since it encourages bad database design (IMHO)!
p.s. I took the liberty of renaming some columns, because they are reserved words. You can escape them, but why make life difficult for yourself?
select group_name,array_to_string(array_agg(username),', ') -- array aggregation and make it into a string
from
(
select group_name,theids,username
from ids_and_names
inner join
(select group_name,unnest(string_to_array(group_ids,'$')) as theids -- unnest a string_to_array to get rows
from groups_and_ids) i
on i.theids = cast(id as text)) a
group by group_name

Excel: max() of count() with column grouping in a pivot table

I have a pivot table fed from a MySQL view. Each returned row is basically an instantiation of "a person, with a role, at a venue, on a date". The each cell then shows count of person (lets call it person_id).
When you pivot this in excel, you get a nice table of the form:
| Dates -->
--------------------------
Venue |
Role | -count of person-
This makes a lot of sense, and the end user likes this format BUT the requirement has changed to group the columns (date) into a week.
When you group them in the normal way, this count is then applied in columns as well. This is, of course, logical behaviour, but what I actually want is max() of the original count().
So the question: Does anyone know how to have cells count(), but the grouping perform a max()?
To illustrate this, imagine the columns for a week. Then imaging the max() grouped as a week, giving:
Old:
| M | T | W | T | F | S | S ||
--------------------------------------- .... for several weeks
Venue X |
Role Y| 1 | 1 | 2 | 1 | 2 | 3 | 1 ||
New (grouped by week)
| Week 1 | ...
---------------------------
Venue X |
Role Y| 3 | ...
I'm not on my pc, but the steps below should be broadly correct:
You should be able to right click on the date field on pivot table and select group.
Then highlight week, you may have to select year also.
Lastly right click on the count data you already have and expand the summarise by, and select max.

Resources