CQL pagination through User Defined Type - cassandra

Let say i have this table
CREATE TABLE mykeyspace.post (
id uuid PRIMARY KEY,
title text,
photos set<frozen <photoIds>>
);
and UDT :
CREATE TYPE mykeyspace.photoIds (
photoId uuid,
details text
);
How can I paginated through photos, means 10 photos at a time for given post id ?

Paging through collections is not supported.
See reference manual:
Keep collections small to prevent delays during querying because Cassandra reads a collection in its entirety. The collection is not paged internally.
As discussed earlier, collections are designed to store only a small amount of data.

May I propose another schema for your table post :
CREATE TABLE mykeyspace.post (
id uuid,
title text static,
photo photo,
PRIMARY KEY (id, photo)
);
CREATE TYPE mykeyspace.photo (
id uuid,
details text
);
This schema means :
There is one partition by id => a partition is equivalent to a post
There is one title by partition/post
There is multiple photo ids by partition/post
This schema should serve your goal very well until you reach about 100.000 photos by partition/post.
If you have never used static columns before, you can refer to Datastax documentation
The driver can do the paging for you. See Paging Feature in Datastax Java driver documentation.
The Cql query looks like this :
select photo.id, photo.details from post where id=*your_post_id*;
PS : I think you should not use uppercase in your schema.

Related

Hard time understanding Cassandra query

In Cassandra, I understand that tables are supposed to be created according to what needs to be queried. For example, I currently have a Users and Users_By_Status table.
##Users##
CREATE TABLE Users (
user_id uuid,
name text,
password text,
status int,
username text,
PRIMARY KEY (user_id)
);
CREATE INDEX user_username_idx ON Users (username);
##Users_By_Status##
CREATE TABLE Users_By_Status (
username text,
status int,
user_id uuid,
PRIMARY KEY (username, status, user_id)
);
In this case, if a user leaves, their record won't be deleted. Instead, status will be changed from 1 to 0.
If I insert data into the Users table, do I need to manually insert the data into Users_By_Status table too? What happens if I update the status in Users? Do I need to manually update the record in Users_By_Status table too?
I have a feeling I'm understanding Cassandra wrongly. Appreciate all the help I can get.
Shortly answer: yes, in your case you need to delete manually.
In cassandra db you need to write more code in your app layer to handle cenarios like that.
But we have other options like materialized view or BATCH Statements.
For your solution, i think that materialized view is the best option. You can create a Materialized view from your table Users. Like this:
CREATE MATERIALIZED VIEW Users_By_Status
AS SELECT username, status, userid
FROM Users
PRIMARY KEY(username, status, userid);
And yes, when you update table users, the update will happen in the Materialized View Users_By_Status too.
Reference: https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useCreateMV.html
Do I need to manually update the record in Users_By_Status table too?
So CoutinhoBR alluded to it, but I'll come right out and say it. You cannot update primary key values in Cassandra. So that's where a DELETE is required to get the old status value out of there, and then a write for the new one.

MongoDb integrating with external db

I have a database which contains data from two separate systems/servers. The first is generated locally [I develop and create this data] (users, activity logs, orders, ...). The second comes from a "product provider" [I only have READ access from API] These objects were created by MySQL and sent in JSON. They already have an "id" property.
With NodeJS, I use request to get a product by "id", and then store it with newProduct.save() appends an _id.
In products, "id" is necessary form relationships with the other collections in my database (such as products_price), and access dynamic endpoints, such as "products/:id/promos".
Note that products are constantly being updated externally and I need to be able to update my documents by "id" not by "_id" as the external server has no knowledge about "_id." [id is unique on a collection level, as each collection is a fresh iteration]
For my first question: should I treat "product.id" as a "regular" MongoDB field and use aggregate/lookup to merge documents from my collections? Or should I overwrite ObjectID() with id? (before saving rename "id" to "_id")
At some point, Orders (local) and Products (external) need to form a relationship where Order _id and Product id (or _id) are stored together for easy retrieval.
Which id do I use in this case?
if you are pretty sure that 'id' coming from your product provider API is unique you better use that as _id (overwrite _id), it will save you:
an unneeded index ('_id' is indexed any way)
some CPU cycles that mongoDB would take to produce the ObjectID
some disk and memory space
(*) even if you find yourself dealing with many different product providers, assuming its one is using his own unique product id you could use a combined _id to make it unique as:
_id = {provider: 'foo', id: xxx}
or _id = [provider_name, product_id]
or _id = provider_name + product_id
etc. etc.
in this use case of multiple providers format depends on how you plan to fetch those products later.

Cassandra Nested Data Model/Structure

I have problem with nested data model in my project.
Below are scripts for create table and user defined data type in my project.
// main table for keeping journey information
CREATE TABLE journey (
journeyid uuid,
journeyname text,
createdate timestamp,
journeyassetdetail LIST<FROZEN<assettype>>, // this is materials for journey
journeylist LIST<FROZEN<subjourneylist>>, // any journey can be a sub journey in other journey (a journey can have one or more sub-journey)
PRIMARY KEY (journeyid)
);
CREATE TYPE subjourneylist (
action FROZEN<actions>,
product FROZEN<products>,
suborderjourney int,
subjourneyid uuid,
createdate timestamp
);
CREATE TYPE assettype (
type text,
file LIST<FROZEN<file>>
);
CREATE TYPE file (
assetfileid uuid,
filename text,
url text
);
As you can see, there are 2 UDT on my journey table (assettype and subjourneylist) which mean it can be one or many sub-journey and assetdetail in a journey row. I design data model like this because I concern about READ performance, my developer need to get everything in one time connected to the database.
But look back into UPDATE, the problem is when I need to update somethings in any Asset or Sub-Journey, it means I have to apply the updated data to Journey (Main Journey table) which we don't know how to do that in easy way.
Right now, I have to use others tool or self-developed program to prepare a script to re-create whole journey again.
Do your guys have any suggestion with my data model or Do i have to reconsider another data model to support my read-write data.
Please feel free to give me an example or any suggestions.
Thank you very much.

Can we add primary key to collection datatypes?

When I tried to retrieve table using contains keyword it prompts "Cannot use CONTAINS relation on non collection column col1" but when I tried to create table using
CREATE TABLE test (id int,address map<text, int>,mail list<text>,phone set<int>,primary key (id,address,mail,phone));
it prompts "Invalid collection type for PRIMARY KEY component phone"
One of the basics in Cassandra is that you can't modify primary keys. Always keep that in mind.
You can't use a collection as primary key unless it is frozen, meaning you can't modify it.
This will work
CREATE TABLE test (id int,address frozen<map<text, int>>,mail frozen<list<text>>,phone frozen<set<int>>,primary key (id,address,mail,phone));;
However, I think you should take a look at this document: http://www.datastax.com/dev/blog/cql-in-2-1
You can put secondary indexes on collections after cql 2.1. You may want to use that functionality.

Cassandra secondary index using collection type

Here is a cassandra table:
CREATE TABLE Account(
id uuid,
userRef uuid,
name map<text, text>,
dataStatus text,
dataVisibility text,
...
PRIMARY KEY( id, dataStatus, dataVisibility, userRef)
)
CREATE INDEX idx_xxx_account_name ON Account (name);
'name' is a cql3 column of (collection) type 'map'. My question is: is it possible to create secondary index on a map type, i.e., name?
Thanks.
As of Cassandra 1.2.6, custom indexes on collections are supported.
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-1.2.6
Your question is quite old. In cassandra 2.1 valid syntax is
CREATE INDEX on Account(keys(name));
No response? I have decided to rewrite the table as follows:
CREATE TABLE Account(
id uuid,
userRef uuid,
**main_name text,**
**other_name map<text, text>,**
dataStatus text,
dataVisibility text,
...
PRIMARY KEY( id, dataStatus, dataVisibility, userRef)
)
CREATE INDEX idx_xxx_account_name ON Account (main_name);
*_name could be anything e.g., email, phone etc. For example, a main_name could be the mandatory, whereas other_name could be optional.
Anyway now I can index main_name as a 'text' type instead of the map of text values.
To answer your initial question:
There is no support for secondary indexes on collections yet. Concretely, you could associate a set of tags to a user, but you cannot automatically index users by their tags yet. Adding that support is definitively on the roadmap but remains to be implemented.
Coming in 1.2: Collections support in CQL3
Also, I don't quite see why you use a map? Why not a simple set or list? Have a look at the reference provided below.
create index idx_name on Account(ENTRIES(name))
this is for access the rows with particular entry in map.

Resources