Cassandra Data Model Design

Cassandra Data Model Design - cassandra

Suppose I have have a huge list of movies categorized by genres. Users can vote for movies, and each movie can be in multiple genres.
What is a good way to store this in Cassandra if I want to present the top X movies per category? Please ignore other use cases as I can have other column families as required (like presenting detailed movie information).
Action
Movie A
Movie B
Movie C
Comedy
Movie D
Movie E
Movie A

Based on the information you have presented -I say that you only gave requirements to create a single column family.
The columns would be - movie name, category, and any other attributes about the movie.
It is quite common to create several column families with different structure that are like 'materialized views' of original column family.
In Cassandra you design column families based on how the application is going to use it. So, you design your queries first then you design the column family to support it.

Related

Best way to store data for multiple category?

I don’t know if it’s the best way but right now, I use sqflite, but I can store data for only one of the categories. How can do something like, when I add an animal, I will choose one of them from a DropdownMenuItem, then if I choose Dog it will store the data in the Dog menu only.
The only tutorials I find online is for notepad (one category) store only. If someone could explain to me or like a tutorial which can help, I would be so thankful.
Something like this

Use normalized schema with manyToMany joining table:
- you have animals(id,name)
- you have categories(id,name)
- create animal_categories(animal_id,category_id) -> each row joins 1 animal and 1 category -> add multiple rows to allow animal have more categories.
You have to store Animal first to have its id and then insert rows for the connection with multiple categories.

Creating relationship between two different columns- a relationship that effects other values

I'm going to try and make this as least confusing as possible, I apologize in advance if my attempt is a failure.
I work in education and am trying to create a predictive analysis document that will tell us how many class sections we will need to offer in a given semester.
I pulled data from the past five years and consolidated it into a pivot chart. I set the pivot chart to combine all courses with a common Subject, Course Title and Catalog Number (See image below for more detail), and output 3 different columns of values based on what we need.
The problem I am facing now is with curriculum changes throughout the years. There are some courses within the list that are no longer being offered and a new course with a new Course Title, Subject and Catalog Number that can now be substituted for the previously needed course. Since the data has been pulled into one pivot chart, both the old curriculum courses and the new curriculum courses are in one list.
I would like to somehow create a relationship between the old curriculum courses and the new curriculum courses. If possible I would like the names of the courses to remain separate, but the values of the old and new to be averaged out together in their respective rows.
In a new page, I plan on putting an easy to use form where the user can select a course subject and name, enter in some other necessary data and the document will output the amount of course sections needed.
Does anyone know of a way to make a relationship between two cells and have other cells effected by this relationship?
Thanks so much!
Mike
enter image description here

Cassandra many-to-many relationship modeling options

In this article the author illustrates several options for modeling many-to-many relationship in Cassandra. I would like to get some more clarifications on two of them:
Why option 4 would take more space? It seems like you are just "appending" Item_by_user to User column space.
Also, in option 4, how can you define composite columns as the author suggests? It seems like you have two groups of columns: 1) Name, Email and 2) Likes whereas the latter is wide(?). What would be the CQL code that defines Name, Email and wide columns for Likes for the User table?
Thanks.
The following images are taken form the article mentioned above:

As far as first question goes it looks to me that it will take same amount of space only one row per user and per item less because you keep everything in single row.
As for second question you can take a look at static columns (here is cql documentation). Basically it is a way to define column which will be shared by all values in one row (user details in user table and item details in items table) and you can update value only by using partitioning key.
Second solution can be to model which items user liked as map type (here is map documentation) and same thing goes to items (create a map of users which liked that item).

I suggest you to get more information about Data modeling in Cassandra. I've read A Big Data Modeling Methodology for Apache Cassandra and Basic Rules of Cassandra Data Modeling as useful articles in this case. They will help you understanding about modelling the tables based on your queries (Query-Driven methodology) and data duplication and its advantages/disadvantages.

Retrieving just the joined entities from Core Data

I have a movie showtimes entity which has a one-to-one relationship to a movie entity. The inverse (movie -> movie showtime) relationship is a one-to-many relationship. If a movie is deleted, the associated movie showtimes will also be deleted, but if a movie showtime is deleted the associated movie will stay. (Not sure how much of that is relevant but wanted to clarify the situation as much as I could)
Now, is there a way to query Core Data to get only the unique movies for which I have showtimes?
Is it possible to select from the movie showtimes and somehow restrict the results to just the associated unique movies? Or would selecting from the movie entity bring back only the movies with a matching row in the movie showtime entity?

Sure. Write a fetch request on Movie with no restrictions, and you'll get all Movie instances.

Taking your questions in turn:
Now, is there a way to query Core Data to get only the unique movies for which I have showtimes?
You could use a predicate to select movies where the count of showtimes is greater than zero:
"showtimes.#count > 0"
Is it possible to select from the movie showtimes and somehow restrict the results to just the associated unique movies?
If you have (courtesy of your first query) an array (say scheduledMovies) of movies which have showtimes, then you can fetch the associated showtimes using a predicate like this:
"movie IN %#", scheduledMovies
Or would selecting from the movie entity bring back only the movies with a matching row in the movie showtime entity?
If you fetch movies, then you will get ALL movies, unless you specify a predicate as per your first question. But if you fetch showtimes, you can get an array of the associated movies using key value coding with the key:
"#distinctUnionOfObjects.movie"
The resulting array will not have any movies with no showtimes. (The first part of this key removes any duplicates, since several showtimes might have the same movie).

solr - define 1:n:n relationship

I have created a flat table from my DB and defined a solr core on it.
It works excellent so far.
My problem is that my table has two hierarchies. So when flatted it is too big.
Lets consider the following example scenario
My Tables are
School
Students (1:n with school)
Teachers(1:n with school)
EDIT: Consider that all tables in my example have two columns: Name & Description which I would like to index and search and the searches are user generated freetext search over those columns.
Now, each school has many students and teachers but each student/teacher has another multivalue field. i.e. the following table
studentHobbies - 1:N with students
teacherCourses - 1:N with teachers
My main Entity is School and that what I want to get in the result.
Flattening does not help me much and is very expensive.
EDIT Problems with query
When you query a flat table by school name, as I described, if the school has 300 student, 300 teachers, 300 with 300 teacherCourses, 300 studentHobbies,
you get 8.1 Billion rows (300*300*300*300). Searching for the school name will retrieve 8.1 B rows.
Can you direct me to how I define 1:n:n relationships In data-config.xml
Thanks.

Found it
a documentation of 1:n, 1:n:n and n:n relation

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string