Is there a way to add foreign constraints in a QLDB table?
Suppose I have two open tables, one with and another without, a primary key. Can I add a foreign key constraint referencing those with a primary key?
Does the same work with indexes? If I don't have any primary keys, but only indexes on the tables, can I add foreign key referencing these indexes?
There is no Primary Key or Foreign Key concept in QLDB at this time.
Document IDs are a closely related concept to Primary Keys in that each Document ID is a universally unique identifier.
To enforce Foreign Constraints on the application-level, you can SELECT before UPDATE or DELETE or DROP TABLE in a transaction to ensure the referential integrity of the Foreign Key.
Related
[Question posted by a user on YugabyteDB Community Slack]
In YSQL, If my table is using the primary key defined on multiple columns then what will be the sharding key in this case? Will both columns be used to compute the hash? Also, can we specify columns to be used as partition/sharding keys without mentioning them in primary keys? What if one of the tables does not have primary keys but needs to be sharded on one of the columns?
By default when multiple columns are configured in a primary key, the first column is hashed, and additional columns are ordered by ascending range.
But you have full control over the primary key definition, which means that you can specify it in any way you like: https://docs.yugabyte.com/latest/api/ysql/the-sql-language/statements/ddl_create_table/#primary-key
Currently, you can only shard with columns that are in the PRIMARY KEY. If a table doesn’t have a primary key, an implicit one gets created internally and the table is sharded based on that. Thus, it is a best practice to always create a primary key.
I am new to Cassandra and have a few novice level questions in the primary key.
Is the Primary key supposed to be unique per record? (My guess would be not.)
To elaborate. Suppose my table looks like this
CREATE TABLE user_action (
user_id int,
action text,
date_of_action date,
PRIMARY KEY (user_id)
)
I am guessing I can have multiple rows with the same user_id
If primary key is not one per record, can a primary key be split across many partitions?
Can a partition have multiple primary keys?
Is the primary key itself decided to pick the partition or is the hashCode of the primary key used to pick a partition?
Is it fair to think of a partition as a file?
Primary key and Partition key in some case would be the same but not always, it depends upon the number of primary keys. Data is distributing based on partition key which is unique across the Cassandra cluster. I am not explaining all the scenario and concept here but yes, you should go through the documentation and I am sure you can understand the things very quick after reading the below link.
https://www.datastax.com/blog/2016/02/most-important-thing-know-cassandra-data-modeling-primary-key
https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useCompoundPrimaryKeyConcept.html
1>Is the Primary key supposed to be unique per record? (My guess would
be not.) To elaborate. Suppose my table looks like this
CREATE TABLE user_action ( user_id int, action text, date_of_action
date, PRIMARY KEY (user_id) )
Primary key is supposed to be unique per record /row. In the example you mentioned, you can have only one record for user_id. For allowing multiple rows with same user_id, you have to introduce a differentiating key. This key is called clustering key in Cassandra and it forms a part of primary key.
Primary key is a combination of (partition key and clustering key(s)). Partition key is used by Cassandra to find a partition/record. If clustering key is defined in data model then it will be used to differentiate different rows. If no clustering key is defined as in your case then only one record will be kept in database.
In example below you can have same user_id records who live different states. Here Primary key is combination of (user_id, state). user_id is the partition key and state is clustering key.
CREATE TABLE user_action (
user_id int,
state text,
action text,
date_of_action date,
PRIMARY KEY (user_id,state)
)
I am guessing I can have multiple rows with the same user_id
As explained above you can have multiple rows with the same user_id if you define a clustering key otherwise with the example you quoted, it is not possible.
2>If primary key is not one per record, can a primary key be split
across many partitions?
Primary key cannot be split across many partitions. As explained above partition key part of primary key will always point to unique partition.
3>Can a partition have multiple primary keys?
In the example I have quoted, (1,RJ), (1,GJ) can be possible primary keys pointing towards single partition pointed by parition key value 1. So you can have multiple primary keys for a partitions in that sense.
4>Is the primary key itself decided to pick the partition or is the
hashCode of the primary key used to pick a partition?
Hashcode of partition key (part of primary key) is used to get the partition
5>Is it fair to think of a partition as a file?
It will depend on your data model.
I am using Cassandra database and need to define the Primary Key which is a combination of partition key and clustering keys. The cassandra database needs to be queried based on the combination of two fields i.e. a customer number and createdAt (Unix timestamp value), as per the business requirement. These columns cannot be used as Primary key because they cannot uniquely identify a row in the database. So, is it correct to add the uuid column from database as a clustering key to make the primary key unique, so that the Primary key will become a combination of - customerNumber(Partition key), createdAt (ClusteringKey), uuid( clustering key). However the database will never be queried based on the whole primary key. It will always be queried based on the part of the Primary key i.e. Customer Number and createdAt. uuid will never be used to query the database.
So if I understand correctly, your PRIMARY KEY definition looks like this:
PRIMARY KEY (customerNumber,createdAt,uuid)
It will always be queried based on the part of the Primary key
Yes, querying by part of the PRIMARY KEY definition is fine, in your case. Cassandra tries to restrict queries to a single node, and it achieves this by ensuring that an entire partition is written to a single node (and then replicated). Because of this, you really only need to supply the partition key on your queries (customerNumber), and they should work.
Supplying an additional PRIMARY KEY component however, is helpful. In a high-throughput scenario, the smaller you can keep your result set payloads, the better.
tl;dr;
Querying by customerNumber and createdAt will be just fine.
I find this abit confusing. Iam using memsql column store. I try to understand if there is a way to enforce duplications on specific key (e.g eventId). I found some doc regarding Unenforced Unique but I didnt really understand its intention.
The point of unenforced unique keys is as a hint:
An unenforced unique constraint is informational: the query planner may use the unenforced unique constraint as a hint to choose better query plans.
from https://docs.memsql.com/v6.8/concepts/unenforced-unique-constraints/.
Unfortunately MemSQL does not support (enforced) unique constraints on columnstore tables.
MemSQL now supports unique constraint with version 7+ but can be applied to only single column:
https://docs.memsql.com/v7.1/guides/use-memsql/physical-schema-design/creating-a-columnstore-table/creating-a-columnstore-table/
Your columnstore table definition can contain metadata-only unenforced unique keys, single-column hash keys (which may be UNIQUE), and a FULLTEXT key. You cannot define more than one unique key.
one hack to enable UNIQUE constraint on multi columns is to use a computed column consisting of multiple columns appended and then apply UNIQUE on it which will indirectly enforce uniqueness on multiple columns.
example:
CREATE TABLE articles (
id INT UNSIGNED,
year int UNSIGNED,
title VARCHAR(200),
body TEXT,
SHARD KEY(title),
KEY (id) USING CLUSTERED COLUMNSTORE,
KEY (id) USING HASH,
UNIQUE KEY (title) USING HASH,
KEY (year) USING HASH);
I am not quite sure how to word this so I will give an example. I have a program that reads in it's database tables from user-defined csv files. (I am using SQLite with Python.)
Say we have tables:
Profile (
profile_name TEXT,
zoning TEXT,
share REAL
PRIMARY KEY (profile_name, zoning),
)
ProfileAssign (
geography TEXT,
year INTEGER,
profile_name TEXT,
PRIMARY KEY (geography, year),
FOREIGN KEY geography REFERENCES geography
)
Where we want each geography to have an associated zoning profile. Now say we also wanted to ensure that the user doesn't assign a profile that doesn't exist to a geography, i.e. we want to have a foreign key constraint in the ProfileAssign table:
FOREIGN KEY profile_name REFERENCES profile
Now obviously this cannot happen since profile_name is not the primary key of the Profile table. My solution has been to create a separate table on which we can create a foreign key reference:
ProfileListing (
profile_name TEXT,
PRIMARY KEY (profile_name)
)
Which is quite annoying for the user because not only do they have to define these profiles, they now have to list all of their names in a separate file. Any ideas how to circumvent this?
Foreign keys need to identify a row in the parent table.
This is usually done with the primary key, but any other unique key works as well; the only requirement is that the uniqueness is enforced (with a UNIQUE constraint):
CREATE TABLE Profile (
profile_name TEXT UNIQUE,
zoning TEXT,
...
PRIMARY KEY (profile_name, zoning)
);
CREATE TABLE ProfileAssign (
...
FOREIGN KEY profile_name REFERENCES profile(profile_name)
);