How do I add multiple unique keys in a table using memsql? - singlestore

Using MemSQL I want to create a table which has multiple unique keys and 1 primary key in it. But I don't know how do I add multiple unique keys in table. This is my table:
CREATE TABLE IF NOT EXISTS `user_auth` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) NOT NULL,
`code` char(36) NOT NULL,
`rest_code` char(36) NOT NULL,
`password` varchar(100) NOT NULL,
`pswd_updated_on` datetime NOT NULL,
PRIMARY KEY (`id`)
);
And I wanted add unique key on user_id,code,rest_code columns.

You need to add a shard key on enough columns that every unique/primary key contains the shard key. If you dont specify a shard key, the primary key is chosen, and since (user_id, code, rest_code) isn't part of the shard key (implicitly on id), it won't work. This is because MemSQL needs to be able to resolve duplicates locally, that is, within a single shard.
I would try something like shard(user_id), primary key(user_id, id), unqiue key(user_id, code, rest_code). Yes, adding user_id to the primary key is redundant, and I'm making some assumptions about your app, but my guess is the database doesn't actually have to worry about multiple users "claiming the same id", so this will work.
Good luck :)

Related

Order of column in composite partitioning key

I am using Scylla database and I have created a partitioning key composite of two columns.
Does the order of keys matter in this case?
Table definition
create table X(
user_id text,
city text,
name text,
PRIMARY KEY ((user_id, city))
);
will anything change if I write
PRIMARY KEY ((city, primary_key))?
In a composite partition key the order does not matter.
Switching the order of the keys may result in different hash values. But it shouldn't reduce the efficiency of data distribution.

Understanding the relationship between primary key and partitioning in Cassandra

I am new to Cassandra and have a few novice level questions in the primary key.
Is the Primary key supposed to be unique per record? (My guess would be not.)
To elaborate. Suppose my table looks like this
CREATE TABLE user_action (
user_id int,
action text,
date_of_action date,
PRIMARY KEY (user_id)
)
I am guessing I can have multiple rows with the same user_id
If primary key is not one per record, can a primary key be split across many partitions?
Can a partition have multiple primary keys?
Is the primary key itself decided to pick the partition or is the hashCode of the primary key used to pick a partition?
Is it fair to think of a partition as a file?
Primary key and Partition key in some case would be the same but not always, it depends upon the number of primary keys. Data is distributing based on partition key which is unique across the Cassandra cluster. I am not explaining all the scenario and concept here but yes, you should go through the documentation and I am sure you can understand the things very quick after reading the below link.
https://www.datastax.com/blog/2016/02/most-important-thing-know-cassandra-data-modeling-primary-key
https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useCompoundPrimaryKeyConcept.html
1>Is the Primary key supposed to be unique per record? (My guess would
be not.) To elaborate. Suppose my table looks like this
CREATE TABLE user_action ( user_id int, action text, date_of_action
date, PRIMARY KEY (user_id) )
Primary key is supposed to be unique per record /row. In the example you mentioned, you can have only one record for user_id. For allowing multiple rows with same user_id, you have to introduce a differentiating key. This key is called clustering key in Cassandra and it forms a part of primary key.
Primary key is a combination of (partition key and clustering key(s)). Partition key is used by Cassandra to find a partition/record. If clustering key is defined in data model then it will be used to differentiate different rows. If no clustering key is defined as in your case then only one record will be kept in database.
In example below you can have same user_id records who live different states. Here Primary key is combination of (user_id, state). user_id is the partition key and state is clustering key.
CREATE TABLE user_action (
user_id int,
state text,
action text,
date_of_action date,
PRIMARY KEY (user_id,state)
)
I am guessing I can have multiple rows with the same user_id
As explained above you can have multiple rows with the same user_id if you define a clustering key otherwise with the example you quoted, it is not possible.
2>If primary key is not one per record, can a primary key be split
across many partitions?
Primary key cannot be split across many partitions. As explained above partition key part of primary key will always point to unique partition.
3>Can a partition have multiple primary keys?
In the example I have quoted, (1,RJ), (1,GJ) can be possible primary keys pointing towards single partition pointed by parition key value 1. So you can have multiple primary keys for a partitions in that sense.
4>Is the primary key itself decided to pick the partition or is the
hashCode of the primary key used to pick a partition?
Hashcode of partition key (part of primary key) is used to get the partition
5>Is it fair to think of a partition as a file?
It will depend on your data model.

Is it necessary to use all the columns defined as the primary key to query a Cassandra database?

I am using Cassandra database and need to define the Primary Key which is a combination of partition key and clustering keys. The cassandra database needs to be queried based on the combination of two fields i.e. a customer number and createdAt (Unix timestamp value), as per the business requirement. These columns cannot be used as Primary key because they cannot uniquely identify a row in the database. So, is it correct to add the uuid column from database as a clustering key to make the primary key unique, so that the Primary key will become a combination of - customerNumber(Partition key), createdAt (ClusteringKey), uuid( clustering key). However the database will never be queried based on the whole primary key. It will always be queried based on the part of the Primary key i.e. Customer Number and createdAt. uuid will never be used to query the database.
So if I understand correctly, your PRIMARY KEY definition looks like this:
PRIMARY KEY (customerNumber,createdAt,uuid)
It will always be queried based on the part of the Primary key
Yes, querying by part of the PRIMARY KEY definition is fine, in your case. Cassandra tries to restrict queries to a single node, and it achieves this by ensuring that an entire partition is written to a single node (and then replicated). Because of this, you really only need to supply the partition key on your queries (customerNumber), and they should work.
Supplying an additional PRIMARY KEY component however, is helpful. In a high-throughput scenario, the smaller you can keep your result set payloads, the better.
tl;dr;
Querying by customerNumber and createdAt will be just fine.

Retrieve rows based on column of type "time" in cassandra db

How to retrieve rows based on column of type "time" in cassandra db.
We tried with query
select *
from payment_transactions_by_transactiondate
where transaction_time>='00:00:00'
and transaction_time<='23:59:59'
and transaction_date='2018-03-21'
allow filtering;
,
but its not fetching the rows (where transaction_time is a primary key).
You can not do a range query on the primary key. It's because Cassandra distributes data on different node based on a primary key. Instead What you can do, is to make the transaction_time clustering key. See the difference between primary key and clustering key. From the above query, it seems you need transactions in a particular date(transaction_date). So to do this query make transaction_date primary key and transaction_time clustering key.
For example:
create table payment_transactions_by_transactiondate(
....
....
....
primary key (transaction_date, transaction_time)
);

how to handle search by unique id in Cassandra

I have a table with a composite primary key. name,description, ID
PRIMARY KEY (id, name, description)
whenever searching Cassandra I need to provide the three keys, but now I have a use case where I want to delete, update, and get just based on ID.
So I created a materialized view against this table, and reordered the keys to have ID first so I can search just based on ID.
But how do I delete or update record with just an ID ?
It's not clear if you are using a partition key with 3 columns, or if you are using a composite primary key.
If you are using a partition key with 3 columns:
CREATE TABLE tbl (
id uuid,
name text,
description text,
...
PRIMARY KEY ((id, name, description))
);
notice the double parenthesis you need all 3 components to identify your data. So when you query your data by ID from the materialized view you need to retrieve also both name and description fields, and then issue one delete per tuple <id, name, description>.
Instead, if you use a composite primary key with ID being the only PARTITION KEY:
CREATE TABLE tbl (
id uuid,
name text,
description text,
...
PRIMARY KEY (id, name, description)
);
notice the single parenthesis, then you can simply issue one delete because you already know the partition and don't need anything else.
Check this SO post for a clear explanation on primary key types.
Another thing you should be aware of is that the materialized view will populate a table under the hood for you, and the same rules/ideas about data modeling should also apply for materialized views.

Resources