Primary key in cassandra is unique?

Primary key in cassandra is unique? - cassandra

It could be kind of lame but in cassandra has the primary key to be unique?
For example in the following table:
CREATE TABLE users (
name text,
surname text,
age int,
adress text,
PRIMARY KEY(name, surname)
);
So if is it possible in my database to have 2 persons in my database with the same name and surname but different ages? Which means same primary key..

Yes the primary key has to be unique. Otherwise there would be no way to know which row to return when you query with a duplicate key.
In your case you can have 2 rows with the same name or with the same surname but not both.

By definition, the primary key has to be unique. But that doesn't mean you can't accomplish your goals. You just need to change your approach/terminology.
First of all, if you relax your goal of having the name+surname be a primary key, you can do the following:
CREATE TABLE users ( name text, surname text, age int, address text, PRIMARY KEY((name, surname),age) );
insert into users (name,surname,age,address) values ('name1','surname1',10,'address1');
insert into users (name,surname,age,address) values ('name1','surname1',30,'address2');
select * from users where name='name1' and surname='surname1';
name | surname | age | address
-------+----------+-----+----------
name1 | surname1 | 10 | address1
name1 | surname1 | 30 | address2
If, on the other hand, you wanted to ensure that the address is shared as well, then you probably just want to store a collection of ages in the user record. That could be achieved by:
CREATE TABLE users2 ( name text, surname text, age set<int>, address text, PRIMARY KEY(name, surname) );
insert into users2 (name,surname,age,address) values ('name1','surname1',{10,30},'address2');
select * from users2 where name='name1' and surname='surname1';
name | surname | address | age
-------+----------+----------+----------
name1 | surname1 | address2 | {10, 30}
So it comes back to what you actually need to accomplish. Hopefully the above examples give you some ideas.

The primary key is unique. With your data model, you can only have one age per (name, surname) combination.

Yes as mentioned in above comments you can have a composite key with name, surname, and age to achieve your goal but still, that won't solve the problem. Rather you can consider adding a new column userID and make that as the primary key. So even in case of name, surname and age duplicate, you don't have to revisit your data model.
CREATE TABLE users (
userId int,
name text,
surname text,
age int,
adress text,
PRIMARY KEY(userid)
);

I would state specifically that partition key should be unique.I could not get it in one place but from the following statements.
Cassandra needs all the partition key columns to be able to compute
the hash that will allow it to locate the nodes containing the
partition.
The partition key has a special use in Apache Cassandra beyond
showing the uniqueness of the record in the database..
Please note that there will not be any error if you insert same
partition key again and again as there is no constraint check.
Queries that you'll run equality searches on should be in a partition
key.
References
https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause
how Cassandra chooses the coordinator node and the replication nodes?
Insert query replaces rows having same data field in Cassandra clustering column

Related

cassandra data consistency issue

Hi I'm new in Apache Cassandra and I found article about Basic Rules of Cassandra Data Modeling. In example 1 are created 2 tables
CREATE TABLE users_by_username (
username text PRIMARY KEY,
email text,
age int
)
CREATE TABLE users_by_email (
email text PRIMARY KEY,
username text,
age int
)
This tables contains same data (username, email and age). Here I don't understand how to insert data into two tables. I think, that I have to execute two separate inserts. One for table users_by_username and one for table users_by_email. But how to maintain data consistency between tables. For example when I insert data into first table and I forgot to insert data to second table ... or the other way

It's your job as developer to make sure that data is in sync. Although, you can use things like materialized views to generate another "table" with slightly different primary key (there are some rules on what could be changed). For your case, for example, you can have following:
CREATE TABLE users_by_username (username text PRIMARY KEY,
email text, age int);
create MATERIALIZED VIEW users_by_email as SELECT * from
users_by_username where email is not null and
username is not null primary key (email, username);
and if you insert data as
insert into users_by_username (username, email, age)
values ('test', 'test#domain.com', 30);
you can query the materialized view for data in addition to query by username
SELECT * from users_by_username where username = 'test' ;
username | age | email
----------+-----+-----------------
test | 30 | test#domain.com
SELECT * from users_by_email where email = 'test#domain.com';
email | username | age
-----------------+----------+-----
test#domain.com | test | 30

Ordering by username in Cassandra

Let's say I have this table:
CREATE TABLE "users" (
username text,
created_at timeuuid,
email text,
firstname text,
groups list<text>,
is_active boolean,
lastname text,
"password" text,
roles list<text>,
PRIMARY KEY (username, created_at)
)
I want to order users by username, which is not possible as ordering is only possible via the clustering column. How can I order my users by username?
I need to query users by username, so that is the reason, why username is the indexing column.
What is the right approach here?

If you absolutely must have the username sorted, and return all usernames in one query then you will need to create another table for this effect:
CREATE TABLE "users" (
field text,
value text,
PRIMARY KEY (field, value)
)
Unfortunately, this will put all the usernames in just one partition, but it's the only way of keeping them sorted. On the other hand, you could expand the table to store different values that you need to retrieve in the same way. So for instance, the partition field="username" would have all the usernames, but you could create another partition field="Surname" to store all the usernames sorted.
Cassandra is NoSQL, so duplication of data can be expected.

Cassandra stores the partition key data by hashing the value.
So when the data is returned, the order is done by the hash values and not order of the data itself. Thus, you can't order on the partition key.
Coming back to your question, I'm not sure about what kind of data it is and what kind of query you would want to run. Assuming multiple users per email I'd create the following table:
CREATE TABLE "users" (
username text,
created_at timeuuid,
email text,
firstname text,
groups list<text>,
is_active boolean,
lastname text,
"password" text,
roles list<text>,
PRIMARY KEY (email, username)
)

Search in user defined type with Apache Cassandra

In this example:
CREATE TYPE address (
street text,
city text,
zip_code int,
phones set<text>
)
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
addresses map<string, address>
)
How can I query users with city = newyork or find a user with a specific phone number.

This is not really a problem of querying a user-defined type: imagine that address would be a single text column and that addresses would contain a single address (ie. addresses TEXT); the problem would be the same.
Your user table is not meant to be query-able by anything else than the primary key, which in this case is the partition key, which is a UUID which makes it quasi useless.
If you want to query the users by name I would denormalize (that implies some duplication) and make a users_by_name table:
CREATE TABLE users_by_name(
name TEXT,
id UUID,
addresses whatever,
PRIMARY KEY((name), id)
)
where the users are stored by name (they should be unique) and the results will be retrieved sorted by id (id is the clustering key part of the primary key).
Same goes for query by addresses:
CREATE TABLE users_by_name(
city TEXT,
street TEXT,
name TEXT,
id UUID,
PRIMARY KEY((city), street)
)
You might think that it does not really solve your problem, but it looks like you designed your data model from a relational DB (SQL) point of view, this is not the goal with Cassandra.

Cassandra: Is there a limit to amount of data that a collection column can hold?

In the below table, what is the maximum size phone_numbers column can accommodate ?
Like normal columns, is it 2GB ?
Is it 64K*64K as mentioned here
CREATE TABLE d2.employee (
id int PRIMARY KEY,
doj timestamp,
name text,
phone_numbers map<text, text>
)

Collection types in Cassandra are represented as a set of distinct cells in the internal data model: you will have a cell for each key of your phone_numbers column. Therefore they are not normal columns, but a set of columns. You can verify this by executing the following command in cassandra-cli (1001 stands for a valid employee id):
use d2;
get employee[1001];
The good answer is your point 2.

Set/Query columns form secondary compound key with CQL3

Say I have the following CF with compound Primary Key
CREATE TABLE dpt (
empID int,
deptID int,
PRIMARY KEY (deptID, empID));
Because of the compound PK, cassandra will create one row for each dept, and the employee IDs that are members of the department will be stored as columns on that row with the :empID as the column name.
Quesiton #1: can I set a value to that column (e.g the employ name) with CQL3? if so, how?
Question #2: can I see the value of <individual_employ_ID>:empID column - if exists - with CQL3?
thanks

Question #1:
CREATE TABLE dpt (
empID int,
deptID int,
empName text,
PRIMARY KEY (deptID, empID));
Question #2:
Please take a look at the examples:
http://www.datastax.com/docs/1.1/references/cql/INSERT
http://www.datastax.com/docs/1.1/references/cql/UPDATE

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Primary key in cassandra is unique? - cassandra

Yes the primary key has to be unique. Otherwise there would be no way to know which row to return when you query with a duplicate key. In your case you can have 2 rows with the same name or with the same surname but not both.

The primary key is unique. With your data model, you can only have one age per (name, surname) combination.

Related

cassandra data consistency issue

Ordering by username in Cassandra

Search in user defined type with Apache Cassandra

Cassandra: Is there a limit to amount of data that a collection column can hold?

Set/Query columns form secondary compound key with CQL3

Categories

Resources