I am migrating RDBMS tables to cassandra. We track customers who are subscribing to different categories. There are some categories already but some categories maybe added new incourse of time. Right now we are joining the tables.
details table1 - columns
custid, name , address, phone
details table2 - custid, cat1, cat2, cat3, cat4
category details - catid, catname, catregion, catdescription, iscatmanadatory
In Cassandra I am trying to keep the customer id and name as primary keys. I am planning to keep categories subscribed by customer in a map. But if any new categories are added, will collection columns create any bottlenecks ?
Related
I am using Cassandra for storing contest data.
Currently I have a contest table like this (table contest_score):
And I created a materialized views for ranking users in a contest (table contest_ranking):
For get top 10 users of a contest I can simple query select top 10 from contest_ranking;
But how can I get ranking of specific user. For example: user_id = 4 will have rank 2.
The principal philosophy of data modelling in Cassandra is that you need to design a CQL table for each application query. It is a one-to-one mapping between app queries and CQL tables.
Since you have a completely different application query, you need to create a separate table for it. Here's an example schema:
CREATE TABLE rank_by_userid (
user_id int,
rank int,
PRIMARY KEY(user_id)
)
You can then get the rank of a user with this query:
SELECT rank FROM rank_by_userid WHERE user_id = ?
You have to manually create and maintain this new table because you won't be able to populate it with materialized views. Cheers!
I've been using NODE.JS - SEQUELIZE to deal with POSTGRES database. But, it's been a while that I am facing an issue.
I have two TABLES:
FIRST TABLE: Purchases. Inside of this table, there is a column which keeps the foreign key of the Products table, because they are associated. But, as long as I'veen been coding, I realized that I needed to "insert" more than one products at once, like an array, for those people who will buy more than one product at once.
SECOND TABLE: Products.
I want something like this => Allow to a purchase inside of Purchases to have more than one products associated with. But all that I can do is make the product foreign key column in purchases table accepts only intenger (ID) of only one product.
For exemple:
The user X buyed multiple products, so then in product in Purchases will have the products [1,3,5] and these numbers are the product's ID that I would like to associate with the Products table.
print of the PURCHASES MODEL: purchases MODEL(not the migration) on sequelize
print of the PURCHASES TABLE: purchases table structure
print of the PRODUCTS TABLE: products table structure
The conclusion I've have reached was using "Belongs to MANY" or "Has many", but I don't how.
Thanks.
I propose you to add another table to achieve multiple products in one order:
ORDER table - stores one record per a customer order (all columns that related with an order as a whole)
ORDER_ITEMS - stores items inside each order (columns: a link to ORDER, a link to PRODUCT, a quantity, a price and other related columns (a discount and so on)
PRODUCT - stores a catalog of products to buy
I'm confused as to how primary keys in Cassandra allow for quick data access. Say for example I create a table of Students with the following schema columns:
I choose the primary key to be Student Id. My understanding is that all the students will be placed around the cluster based on some hash of this value. Say I also choose the Country as a Clustering Column. So Within each partition of the students (who have been split based on their Id) they will be ordered by Country (presumably alphabetically).
So if I then want to retrieve all students for a specific country will I have to visit multiple nodes in the cluster? While the students have been ordered by Country within each node there is nothing to say that all the students for a specific country have been stored on the same node? Is this type of query even supported?
If I had only added 5 students to a 5 nodes cluster would it be possible that all the students would be stored on separate nodes if the Student Id was a UUID?
So if I then want to retrieve all students for a specific country will I have to visit multiple nodes in the cluster?
Yes.
While the students have been ordered by Country within each node there is nothing to say that all the students for a specific country have been stored on the same node?
Correct.
Is this type of query even supported?
It is but that's considered an anti-pattern in Cassandra. What happens is that the coordinator (the node that receives the request from the client) will have to query ALL other nodes since it will have to scan all rows for that column family.
If I had only added 5 students to a 5 nodes cluster would it be possible that all the students would be stored on separate nodes if the Student Id was a UUID?
Yes.
The way your problem can be solved is by having a column family for each query (one for selecting by Student ID and the other for selecting by Country, each one having a different primary query) while duplicating the rows (when you create a student you have to insert it in both column families).
Folks,
How would you model the following data in Apache Cassandra?
Customers (customerID), can purchase many items (itemID) and hold them in their shopping cart. Also a timestamp should be kept to keep track of when things were placed in the shopping cart.
Requirements:
Fetch a specific customerID's shopping cart.
Fetch customers that currently have itemID in their cart.
Knee jerk thought would be having 2 tables. One would map customerId to an array of itemIDs. Not sure how to fill the second requirement.
Question is specific to Cassandra, and maybe Dynamo. Please no relations db suggestions.
For Cassandra it sounds like two tables to me. One partitioned by customerID with itemID as a clustering column and timestamp as a non-key field. The second table would be partitioned by itemID with customerID as a clustering column.
If you can wait for Cassandra 3.0, then the second table could be defined as a materialized view of the first table. Then Cassandra would take care of updating the second table automatically. Otherwise you'll have to keep both tables consistent in your application.
I have a column family like
object
(
obect_id,
company-id,
group_id,
family_id,
description,
..
);
I want to query that based on object id, company id ,group id and any combination of these.
My question is
should i make composite primary key
(object id, company id ,group id)
or create seperate column familis ?
only object id is unique in CF, company id can repeat in multiple rows, but group iddoes not repeat in many rows
You may well want to duplicate your data in multiple CFs depending on your query patterns. This is quite common practice.
If a common query is "Get all objects by company_id" then you might want to store all objects with in a CF with partitioned just by company_id as a row key. If you need to do individual object lookups as well, then you store that data duplicated in another CF - each object partitioned by object_id. If groups are always a subset of a specific company, perhaps you want to row key by company, but then cluster by group.
You should be designing your Cassandra schema based on the queries you need to run, rather than the data that needs to go in it.