I have created a flat table from my DB and defined a solr core on it.
It works excellent so far.
My problem is that my table has two hierarchies. So when flatted it is too big.
Lets consider the following example scenario
My Tables are
School
Students (1:n with school)
Teachers(1:n with school)
EDIT: Consider that all tables in my example have two columns: Name & Description which I would like to index and search and the searches are user generated freetext search over those columns.
Now, each school has many students and teachers but each student/teacher has another multivalue field. i.e. the following table
studentHobbies - 1:N with students
teacherCourses - 1:N with teachers
My main Entity is School and that what I want to get in the result.
Flattening does not help me much and is very expensive.
EDIT Problems with query
When you query a flat table by school name, as I described, if the school has 300 student, 300 teachers, 300 with 300 teacherCourses, 300 studentHobbies,
you get 8.1 Billion rows (300*300*300*300). Searching for the school name will retrieve 8.1 B rows.
Can you direct me to how I define 1:n:n relationships In data-config.xml
Thanks.
Found it
a documentation of 1:n, 1:n:n and n:n relation
Related
I am learning how to use IBM Cognos and my first task is to create relationships between the tables I have uploaded into Cognos.
Basically, I am trying to tell Cognos to link the id column in the Person Table with the person_id and related_person_id columns in the Relationship Table, as shown here:
However, this does not seem possible since the Match Selected Columns button becomes disabled when I try to also link the related_person_id column.
The reason I need to do this is because person_id and related_person_id are foreign keys - they point to people in the Person Table and explain how they are related.
How can this be accomplished in Cognos?
Thank you.
You can have any number of matches. You need to match a single query item from each side for each match. IIRC, a query item can be used in multiple matches, although that would only be really helpful once relational operators are implemented.
It isn't clear if in your case you want to use person_id and related_person_id as a composite key or if you want a 1.n relationship between ID and person_id and some other relationship (n.1?) between ID and related person ID or if a 1.n relationship between ID and person_id would be sufficient to whatever you are trying to accomplish.
Editorial comment:
It would be really really nice if Cognos introduced relational operators Real Soon Now.
I'm going to try and make this as least confusing as possible, I apologize in advance if my attempt is a failure.
I work in education and am trying to create a predictive analysis document that will tell us how many class sections we will need to offer in a given semester.
I pulled data from the past five years and consolidated it into a pivot chart. I set the pivot chart to combine all courses with a common Subject, Course Title and Catalog Number (See image below for more detail), and output 3 different columns of values based on what we need.
The problem I am facing now is with curriculum changes throughout the years. There are some courses within the list that are no longer being offered and a new course with a new Course Title, Subject and Catalog Number that can now be substituted for the previously needed course. Since the data has been pulled into one pivot chart, both the old curriculum courses and the new curriculum courses are in one list.
I would like to somehow create a relationship between the old curriculum courses and the new curriculum courses. If possible I would like the names of the courses to remain separate, but the values of the old and new to be averaged out together in their respective rows.
In a new page, I plan on putting an easy to use form where the user can select a course subject and name, enter in some other necessary data and the document will output the amount of course sections needed.
Does anyone know of a way to make a relationship between two cells and have other cells effected by this relationship?
Thanks so much!
Mike
enter image description here
In this article the author illustrates several options for modeling many-to-many relationship in Cassandra. I would like to get some more clarifications on two of them:
Why option 4 would take more space? It seems like you are just "appending" Item_by_user to User column space.
Also, in option 4, how can you define composite columns as the author suggests? It seems like you have two groups of columns: 1) Name, Email and 2) Likes whereas the latter is wide(?). What would be the CQL code that defines Name, Email and wide columns for Likes for the User table?
Thanks.
The following images are taken form the article mentioned above:
As far as first question goes it looks to me that it will take same amount of space only one row per user and per item less because you keep everything in single row.
As for second question you can take a look at static columns (here is cql documentation). Basically it is a way to define column which will be shared by all values in one row (user details in user table and item details in items table) and you can update value only by using partitioning key.
Second solution can be to model which items user liked as map type (here is map documentation) and same thing goes to items (create a map of users which liked that item).
I suggest you to get more information about Data modeling in Cassandra. I've read A Big Data Modeling Methodology for Apache Cassandra and Basic Rules of Cassandra Data Modeling as useful articles in this case. They will help you understanding about modelling the tables based on your queries (Query-Driven methodology) and data duplication and its advantages/disadvantages.
I have 2 lookup tables; Members (83,000 rows) and Groups (2,500 rows).
There are 2 fact tables; GroupMembers (190,000 rows) and Metrics (650,000 rows).
I created two separate models:
1. Members, Groups and GroupMembers
2. Members, Groups and Metrics
Both models have only one calculated measure on the fact table (COUNTROWS). If I bring in the Members>MemberID on Rows and Filter on Groups>GroupID, using CountRows for values; model 2 performs super fast and model 1 is really slow.
Here are the DAX query results.
Model 1:
Model 2:
The only difference between the two fact tables is that GroupMembers only has unique combinations of Groups and Members, whereas the Metrics table has other columns so the Groups and Members combinations are not unique.
Both excel files can be found here: http://1drv.ms/1GdK1WK
Please help!
[EDIT]
I did some further testing and found that if I duplicate the data in GroupMembers (i.e. load the same 190,000 rows twice so GroupID/UserID combination is not unique), the performance is great. Go figure! :)
I'm opening a support case with MS and will update this thread.
From Microsoft support team; apparently the UserID int values are too big for the query. They gave me a workaround to create an identity key column for the user table, and use that as the relationship between Users and UserGroups. I posted the workaround file in a new folder in OneDrive http://1drv.ms/1BEEmDJ
The support team will talk to the product team regarding the root issue and they'll investigate why large int columns are causing a performance issue.
Suppose I have have a huge list of movies categorized by genres. Users can vote for movies, and each movie can be in multiple genres.
What is a good way to store this in Cassandra if I want to present the top X movies per category? Please ignore other use cases as I can have other column families as required (like presenting detailed movie information).
Action
Movie A
Movie B
Movie C
Comedy
Movie D
Movie E
Movie A
Based on the information you have presented -I say that you only gave requirements to create a single column family.
The columns would be - movie name, category, and any other attributes about the movie.
It is quite common to create several column families with different structure that are like 'materialized views' of original column family.
In Cassandra you design column families based on how the application is going to use it. So, you design your queries first then you design the column family to support it.