IBM Cognos: matching multiple columns (two foreign keys) - cognos

I am learning how to use IBM Cognos and my first task is to create relationships between the tables I have uploaded into Cognos.
Basically, I am trying to tell Cognos to link the id column in the Person Table with the person_id and related_person_id columns in the Relationship Table, as shown here:
However, this does not seem possible since the Match Selected Columns button becomes disabled when I try to also link the related_person_id column.
The reason I need to do this is because person_id and related_person_id are foreign keys - they point to people in the Person Table and explain how they are related.
How can this be accomplished in Cognos?
Thank you.

You can have any number of matches. You need to match a single query item from each side for each match. IIRC, a query item can be used in multiple matches, although that would only be really helpful once relational operators are implemented.
It isn't clear if in your case you want to use person_id and related_person_id as a composite key or if you want a 1.n relationship between ID and person_id and some other relationship (n.1?) between ID and related person ID or if a 1.n relationship between ID and person_id would be sufficient to whatever you are trying to accomplish.
Editorial comment:
It would be really really nice if Cognos introduced relational operators Real Soon Now.

Related

Excel 2016 Relationship

Goal
Create a working relationship between my Category Sales and Voids PivotTables so I can leverage one slicer for all data.
Background
Using two PowerQueries, I pull in data from SQL to Excel. Because Sales and Voids have DateStamp and StoreID columns in common, I essentially concatenate these in the SQL query to create an ID. For example:
select concat(StoreID,convert(int,DateStamp)) as ID, DateStamp, StoreID, Category, Sales from...
select concat(StoreID,convert(int,DateStamp)) as ID, DateStamp, StoreID, Voids from...
This is a one-to-many relationship between the two (Sales --> Voids)
Problem
Despite creating the relationship in Excel (through Manage Relationships, as PowerPivot is not available) I can't get it to apply and Excel tells me relationships between tables may be needed. I've no idea what I'm doing wrong.
Workaround
The only workaround I can think of is to take the void value for a given day and divide by the number of categories that have sales, then just do a join to create one table that I pull into Excel. It would technically work for my application, but I'd love to know why the relationship isn't working.
Thanks.
The answer is to export your data into the data model so that you can use power pivot, PLUS a export another power query (or several) into the data model that is a deduplicated table of keys.
Then, in the data model editor, set up the data relationships so that there is a one to many relationship between your deduplicated key table(s) and the "actual data".
Then, in a power pivot, use those "key" tables as much as possible, maybe even to the ruthless ideal(1) of using ONLY key tables in your primary row and column headers, and if you have a second level of categorization then a deduplicated table of primary to secondary, and so on, then using the real data tables only in the body of your power pivot.
(1) - Keep in mind that this is just an ideal I'm just explaining to help you understand and maybe start moving towards as much as actually makes sense. As with most things, in reality, the ideal is almost never worth reaching because there are other factors (like your own patience and time).

PowerBi desktop

I have 2 tables I am trying to create a relationship with in power bi. Both tables have the same values for example:
Table1 has location and Table2 has Location however the location is different. Every time I try to connect them say I need a unique value. Can someone please help me so I can connect them together?
Here is a passage from documentation regarding relationships in Power BI (Create and manage relationships in Power BI Desktop. In short, one of the tables you choose for the relationship should have unique values in the join column. So far in Power BI, you can define 1:*, 1:1 and *:1 relationship.
BlockquoteNote that you'll see an error that states One of the columns must have unique values if none of the tables selected for the relationship has unique values. At least one table in a relationship must have a distinct, unique list of key values, which is a common requirement for all relational database technologies.
If you encounter that error, there are a couple ways to fix the issue:
Use "Remove Duplicate Rows" to create a column with unique values. The drawback to this approach is that you will lose information when duplicate rows are removed, and often a key (row) is duplicated for good reason.
Add an intermediary table made of the list of distinct key values to the model, which will then be linked to both original columns in the relationship.
One of your table has to have unique values in Location (Primary Key) while the other can have duplicate values in Location (Foreign Key). Plus, the table with duplicates (the fact normally) must have values that are present in the other table (in the dimension).
In my opinion, to match your needs, you hshould add all the possible location in the table which would have unique values (the dimension).
I hope I made myself clear.

An outer join Excel Power Pivot Pivot table?

I have a PowerPivot with two tables one contains a list of facilities, their type (active/inactive) and whether they belong to org A or org B (FaciltyID|Active/Inactive|ORG)
Another table has a list of users and facitilites assigned to them + their org, so it looks like (userID|FacilityID|ORG) where each userID is repeated the number of times that=the number of facilties it has.
Initially I needed to report the number of facilities active and easily built a PivotTable for it.
Now I need to get a list of the facilities that each user is missing , so I need to basically do an outer join between the the tables for each user and I just can't figure out the way to do it! I joined both table on the FacilityID and am able to see whether they have inactive facilties, but can't figure out a way to show all the facilities they are missing!
Thanks
Nonexistence is hard. This is not the sort of thing that is best solved through measures, but through modeling. In your source, you should cross join Facility and User to get FacilityUser. In FacilityUser, every user has 1 row with every facility, and you add a flag to indicate whether the user is or isn't assigned to that facility. Then your problem becomes one of filtering on that flag value. This is solvable in DAX, but you don't want to do that.

Cassandra many-to-many relationship modeling options

In this article the author illustrates several options for modeling many-to-many relationship in Cassandra. I would like to get some more clarifications on two of them:
Why option 4 would take more space? It seems like you are just "appending" Item_by_user to User column space.
Also, in option 4, how can you define composite columns as the author suggests? It seems like you have two groups of columns: 1) Name, Email and 2) Likes whereas the latter is wide(?). What would be the CQL code that defines Name, Email and wide columns for Likes for the User table?
Thanks.
The following images are taken form the article mentioned above:
As far as first question goes it looks to me that it will take same amount of space only one row per user and per item less because you keep everything in single row.
As for second question you can take a look at static columns (here is cql documentation). Basically it is a way to define column which will be shared by all values in one row (user details in user table and item details in items table) and you can update value only by using partitioning key.
Second solution can be to model which items user liked as map type (here is map documentation) and same thing goes to items (create a map of users which liked that item).
I suggest you to get more information about Data modeling in Cassandra. I've read A Big Data Modeling Methodology for Apache Cassandra and Basic Rules of Cassandra Data Modeling as useful articles in this case. They will help you understanding about modelling the tables based on your queries (Query-Driven methodology) and data duplication and its advantages/disadvantages.

How to optimize Cassandra model while still supporting querying by contents of lists

I just switched from Oracle to using Cassandra 2.0 with Datastax driver and I'm having difficulty structuring my model for this big data approach. I have a Persons table with UUID and serialized Persons. These Persons have lists of addresses, names, identifications, and DOBs. For each of these lists I have an additional table with a compound key on each value in the respective list and the additional person_UUID column. This model feels too relational to me, but I don't know how else to structure it so that I can have index(am able to search by) on address, name, identification, and DOB. If Cassandra supported indexes on lists I would have just the one Persons table containing indexed lists for each of these.
In my application we receive transactions, which can contain within them 0 or more of each of those address, name, identification, and DOB. The persons are scored based on which person matched which criteria. A single person with the highest score is matched to a transaction. Any additional address, name, identification, and DOB data from the transaction that was matched is then added to that person.
The problem I'm having is that this matching is taking too long and the processing is falling far behind. This is caused by having to loop through result sets performing additional queries since I can't make complex queries in Cassandra, and I don't have sufficient memory to just do a huge select all and filter in java. For instance, I would like to select all Persons having at least two names in common with the transaction (names can have their order scrambled, so there is no first, middle, last; that would just be three names) but this would require a 'group by' which Cassandra does not support, and if I just selected all having any of the names in common in order to filter in java the result set is too large and i run out of memory.
I'm currently searching by only Identifications and Addresses, which yield a smaller result set (although it could still be hundreds) and for each one in this result set I query to see if it also matches on names and/or DOB. Besides still being slow this does not meet the project's requirements as a match on Name and DOB alone would be sufficient to link a transaction to a person if no higher score is found.
I know in Cassandra you should model your tables by the queries you do, not by the relationships of the entities, but I don't know how to apply this while maintaining the ability to query individually by address, name, identification, and DOB.
Any help or advice would be greatly appreciated. I'm very impressed by Cassandra but I haven't quite figured out how to make it work for me.
Tables:
Persons
[UUID | serialized_Person]
addresses
[address | person_UUID]
names
[name | person_UUID]
identifications
[identification | person_UUID]
DOBs
[DOB | person_UUID]
I did a lot more reading, and I'm now thinking I should change these tables around to the following:
Persons
[UUID | serialized_Person]
addresses
[address | Set of person_UUID]
names
[name | Set of person_UUID]
identifications
[identification | Set of person_UUID]
DOBs
[DOB | Set of person_UUID]
But I'm afraid of going beyond the max storage for a set(65,536 UUIDs) for some names and DOBs. Instead I think I'll have to do a dynamic column family with the column names as the Person_UUIDs, or is a row with over 65k columns very problematic as well? Thoughts?
It looks like you can't have these dynamic column families in the new version of Cassandra, you have to alter the table to insert the new column with a specific name. I don't know how to store more than 64k values for a row then. With a perfect distribution I will run out of space for DOBs with 23 million persons, I'm expecting to have over 200 million persons. Maybe I have to just have multiple set columns?
DOBs
[DOB | Set of person_UUID_A | Set of person_UUID_B | Set of person_UUID_C]
and I just check size and alter table if size = 64k? Anything better I can do?
I guess it's just CQL3 that enforces this and that if I really wanted I can still do dynamic columns with the Cassandra 2.0?
Ugh, this page from Datastax doc seems to say I had it right the first way...:
When to use a collection
This answer is not very specific, but I'll come back and add to it when I get a chance.
First thing - don't serialize your Persons into a single column. This complicates searching and updating any person info. OTOH, there are people that know what they're saying that disagree with this view. ;)
Next, don't normalize your data. Disk space is cheap. So, don't be afraid to write the same data to two places. You code will need to make sure that the right thing is done.
Those items feed into this: If you want queries to be fast, consider what you need to make that query fast. That is, create a table just for that query. That may mean writing data to multiple tables for multiple queries. Pick a query, and build a table that holds exactly what you need for that query, indexed on whatever you have available for the lookup, such as an id.
So, if you need to query by address, build a table (really, a column family) indexed on address. If you need to support another query based on identification, index on that. Each table may contain duplicate data. This means when you add a new user, you may be writing the same data to more than one table. While this seems unnatural if relational databases are the only kind you've ever used, but you get benefits in return - namely, horizontal scalability thanks to the CAP Theorem.
Edit:
The two column families in that last example could just hold identifiers into another table. So, voilĂ  you have made an index. OTOH, that means each query takes two reads. But, still will be a performance improvement in many cases.
Edit:
Attempting to explain the previous edit:
Say you have a users table/column family:
CREATE TABLE users (
id uuid PRIMARY KEY,
display_name text,
avatar text
);
And you want to find a user's avatar given a display name (a contrived example). Searching users will be slow. So, you could create a table/CF that serves as an index, let's call it users_by_name:
CREATE TABLE users_by_name (
display_name text PRIMARY KEY,
user_id uuid
}
The search on display_name is now done against users_by_name, and that gives you the user_id, which you use to issue a second query against users. In this case, user_id in users_by_name has the value of the primary key id in users. Both queries are fast.
Or, you could put avatar in users_by_name, and accomplish the same thing with one query by using more disk space.
CREATE TABLE users_by_name (
display_name text PRIMARY KEY,
avatar text
}

Resources