Many-To-Many Cassandra Database - cassandra

Let say i have users. Those users can have access to multiple projects. So a project can also allow multiple users.
So I model four tables. users (by_id), projects (by id), projects_by_user_id and users_by_project_id.
----------- ------------ -------------------- --------------------
| users | | projects | | projects_by_user | | users_by_project |
|---------| |--------- | |------------------| |------------------|
| id K | | id K | | user_id K | | project_id K |
| name | | name | | project_id C | | user_id C |
----------- ------------ | project_name S | | user_name S |
-------------------- --------------------
So storing the user_name in the users_by_project and the projet_name in the projects_by_user table for querying.
The problem I have is when an user updates the project_name, this will of course update the projects table. But for data consistency I also need to update each partition in the projects_by_user table.
As far as I can see, this is only possible by querying all the users from the users_by_project table and doing an update for each user.
Is there any better way without first reading lots of data?

I don't see why you need four tables. Your users and projects tables could contain all of the data.
If you define the tables like this:
CREATE TABLE users (
user_id int PRIMARY KEY,
name text,
project_ids list<int> );
CREATE TABLE projects (
project_id int PRIMARY KEY,
name text,
user_ids list<int> );
Then each user would have a list of project ids they have access to, and each project would have a list of users that have access to it.
To add access to project 123 to user 1 you would run:
BEGIN BATCH
UPDATE users SET project_ids = project_ids + [123] WHERE user_id=1;
UPDATE projects SET user_ids = user_ids + [1] WHERE project_id=123;
APPLY BATCH;
To change a project name, you would just do:
UPDATE projects SET name = 'New project name' WHERE project_id=123;
For simplicity I showed the id fields as int's, but normally you would use uuid's for that.

I don't think there is better way. Cassandra has a lot of limitation on the queries you can make. In your case, you have to create a compound key (user_id, project_id), and in order to update it you have to provide both parts in where clause, which means you have to read all users for specific project and update each of these. If you have a large database and this scenario will happen often, this would be significant overhead, so I guess it would be better to remove projectname field from the table and perform join of the projects and projects_by_users at the application level.
BTW: Scenario you described here is more convenient for relational database model, so if the rest of your database model is similar to this, I would think of using some relational database instead.

Related

How to join two tables, apply where on both tables , apply pagination with bookshelf node js

I have two tables mentioned below:
Reports
Id |status
1 |Active
Reports_details
Id| Cntry| State | City
1 | IN | UP | Delhi
1 | US | Texas | Salt lake
Now my requirement is
Select distinct r.Id from Reports r left join Reports_details rd on r.Id=rd.Id where r.status=‘Active’ and contains(city,’”Del*”’)
Note: using contains for full text search
Problem: How to add where clause on both tables Bookshelf Model simultaneously
and how to fetch above query data with pagination
Tried created 2 respective Models with belongs on and hasMany but issue comes when applying where on either Model, it’s not accepting where clause from both table-error:Invalid column name
Appreciate your suggestion on the work around. Thank You

Extract unique Tags and Count

Good morning,
I am trying to run a kusto Query to display unique owner tags to show in a chart the amount of times an owner shows up in azure. I want to have all the distinct owners and cost centers show up but am having trouble figuring out the best way to do that.Below is an example but I need to specify multiple owners to produce a chart that shows all the unique owners in Azure
resources
| where tags['owner'] =~ "Billy"
| summarize count()
This should give you a direction. You'll need to do something similar for the "cost center" property (or update your question to elaborate on which part of your data set it comes from)
resources
| extend owner = tostring(tags.owner)
| summarize count() by owner
| render barchart // you can choose a different type of chart
given your modified description: If I have two sets of tags one that is (Owner) and the other is (owner) how would I combine them together in the query? -> you could do this (assuming you can't fixed the source data, which would be better):
union (
resources
| extend owner = tostring(tags.owner) // lowercase 'o' in 'tags.owner'
| summarize count() by owner
), (
resources
| extend owner = tostring(tags.Owner) // uppercase 'O' in 'tags.Owner'
| summarize count() by owner
)
| summarize sum(count_) by owner

How do I implement a node js query that is supposed to display data for specific model from two different tables? I am using the loopback framework

I have a table that holds Assessment Data with the following headings.
Stud_id | SessionId | CourseId | ProgId | .....
And a student Data that holds Student data with the following headings.
id | IndexNo | S_name | f_name | ProgramId.
The relationship between these two fields is that Student has many assessment and assessment belongs to student.
What I want to do is to list all students with these fields(Index No, S_name, f_name, Course) who registered for a course and pass it as a json endpoint to a client who makes a request with courseId, SessionId. deptId.

Cassandra CQL searching for element in list

I have a table that has a column of list type (tags):
CREATE TABLE "Videos" (
video_id UUID,
title VARCHAR,
tags LIST<VARCHAR>,
PRIMARY KEY (video_id, upload_timestamp)
) WITH CLUSTERING ORDER BY (upload_timestamp DESC);
I have plenty of rows containing various values in the tags column, ie. ["outdoor","funny cats","funny mice"].
I want to perform a SELECT query that will return all rows that contain "funny cats" in the tags column. How can I do that?
To directly answer your question, yes there is a way to accomplish this. As of Cassandra 2.1 you can create a secondary index on a collection. First, I'll re-create your column family definition (while adding a definition for upload_timestamp timeuuid) and put some values in it.
aploetz#cqlsh:stackoverflow> SELECT * FROM videos ;
video_id | upload_timestamp | tags | title
--------------------------------------+--------------------------------------+-----------------------------------------------+---------------------------
2977b806-df76-4dd7-a57e-11d361e72ce1 | fc011080-64f9-11e4-a819-21b264d4c94d | ['sci-fi', 'action', 'adventure'] | Star Wars
ab696e1f-78c0-45e6-893f-430e88db7f46 | 8db7c4b0-64fa-11e4-a819-21b264d4c94d | ['documentary'] | The Witches of Whitewater
15e6bc0d-6195-4d8b-ad25-771966c780c8 | 1680d120-64fa-11e4-a819-21b264d4c94d | ['dark comedy', 'action', 'language warning'] | Pulp Fiction
(3 rows)
Next, I'll create a secondary index on the tags column:
aploetz#cqlsh:stackoverflow> CREATE INDEX ON videos (tags);
Now, if I want to query the videos that contain the tag "action," I can accomplish this with the CONTAINS keyword:
aploetz#cqlsh:stackoverflow> SELECT * FROM videos WHERE tags CONTAINS 'action';
video_id | upload_timestamp | tags | title
--------------------------------------+--------------------------------------+-----------------------------------------------+--------------
2977b806-df76-4dd7-a57e-11d361e72ce1 | fc011080-64f9-11e4-a819-21b264d4c94d | ['sci-fi', 'action', 'adventure'] | Star Wars
15e6bc0d-6195-4d8b-ad25-771966c780c8 | 1680d120-64fa-11e4-a819-21b264d4c94d | ['dark comedy', 'action', 'language warning'] | Pulp Fiction
(2 rows)
With this all being said, I should pass along a couple of warnings:
Secondary indexes do not perform well at scale. They exist to provide convenience, not performance. If you are expecting to have to query by tag often, then the right way to solve this would be to create a videosbytag query table, with the same data but keyed like this: PRIMARY KEY (tag,video_id)
You don't need the double-quotes in your table name. In fact, having it in quotes may cause you problems (ok, maybe minor irritations) down the road.

Relational Stores & Many-to-one joins

David, could I ask for some clarification on what you say about joins in this answer
When you say "You cannot, using the join of the relational stores, join one entry to multiple ones", does that mean in any direction?
E.g. Store 1:
| Key1 | Measure1 |
Store 2:
| Key 1 | SomeId1 | Measure2 | Measure3 |
| Key 1 | SomeId2 | Measure4 | Measure4 |
So is it not possible to join these two stores by putting the join from Store 2 to Store 1?
And if not, are you saying then that the only way to manage this is to duplicate the entries in Store 1? E.g.:
Store 1
| Key 1 | SomeId1 | Measure1 | Measure2 | Measure3 |
| Key 1 | SomeId2 | Measure1 | Measure4 | Measure4 |
The direction matters for the one-to-many : it depends which store is the "parent" one.
The relational stores includes the concept of an "ActivePivot Store" which is your main store (on which your schema is based). This store can then be joined to one or more stores, given a set of key fields, that we'll call "child" stores for simplicity. Each of these child stores can eventually be joined with other stores, and so on (you can represent it with a directed graph).
The main rule to respect is that you should never have a "parent" store entry resolving to multiple "child" store entries (neither should you have any cyclic relationship I believe).
The simplified idea behind the relational stores (as of RS 1.5.x / AP 4.4.x) is that when one entry is submitted into the "ActivePivot Store" then, starting from the ActivePivot Store, it'll recursively resolve the joins in order to retrieve maximum one entry in each of the joined stores. Depending of your schema definition, these entries will then be used to populate the fact before inserting it in the cube.
If resolving a join result in more than one entry then AP will not be able to choose which one to use in order to populate the fact and will throw an exception.
Coming back to your example you can do the join between Store 1 and Store 2 only in the case where Store 2 is your ActivePivot Store or a "parent" of Store 1 (APStore->...->Store2->Store1), which seems to be your case.
If not (Store1->Store2) you will then have to duplicate the entries of Store 1 in order to ensure that it will always find only one entry at maximum when resolving the join. Store 1 will then looks like:
| Key 1 | SomeId1 | Measure1
| Key 1 | SomeId2 | Measure1
Your join with Store 2 will then be done on the fields "Key, SomeId" instead of just "Key" and that will ensure you to find only one entry when resolving Store1->Store2

Resources