Unique names as primary keys in JHipster tables? - jhipster

I am evaluating JHipster and I am wondering why unique fields
(login in T_USER and name in T_AUTHORITY) are primary keys instead of auto-generated numbers (IDs) as it is the case in T_PERSISTENT_AUDIT_EVENT?
Thanks,
Milan

I think it's using the old fashioned kind of primary key, so we must choose the primary key value first rather than sequence number as id, there is pros of using the value such as ROLE_ADMIN, or ROLE_USER rather than integer 1 or 2 as primary key

Related

Is it necessary to use all the columns defined as the primary key to query a Cassandra database?

I am using Cassandra database and need to define the Primary Key which is a combination of partition key and clustering keys. The cassandra database needs to be queried based on the combination of two fields i.e. a customer number and createdAt (Unix timestamp value), as per the business requirement. These columns cannot be used as Primary key because they cannot uniquely identify a row in the database. So, is it correct to add the uuid column from database as a clustering key to make the primary key unique, so that the Primary key will become a combination of - customerNumber(Partition key), createdAt (ClusteringKey), uuid( clustering key). However the database will never be queried based on the whole primary key. It will always be queried based on the part of the Primary key i.e. Customer Number and createdAt. uuid will never be used to query the database.
So if I understand correctly, your PRIMARY KEY definition looks like this:
PRIMARY KEY (customerNumber,createdAt,uuid)
It will always be queried based on the part of the Primary key
Yes, querying by part of the PRIMARY KEY definition is fine, in your case. Cassandra tries to restrict queries to a single node, and it achieves this by ensuring that an entire partition is written to a single node (and then replicated). Because of this, you really only need to supply the partition key on your queries (customerNumber), and they should work.
Supplying an additional PRIMARY KEY component however, is helpful. In a high-throughput scenario, the smaller you can keep your result set payloads, the better.
tl;dr;
Querying by customerNumber and createdAt will be just fine.

Unenforced Unique vs enforced Unique in memsql

I find this abit confusing. Iam using memsql column store. I try to understand if there is a way to enforce duplications on specific key (e.g eventId). I found some doc regarding Unenforced Unique but I didnt really understand its intention.
The point of unenforced unique keys is as a hint:
An unenforced unique constraint is informational: the query planner may use the unenforced unique constraint as a hint to choose better query plans.
from https://docs.memsql.com/v6.8/concepts/unenforced-unique-constraints/.
Unfortunately MemSQL does not support (enforced) unique constraints on columnstore tables.
MemSQL now supports unique constraint with version 7+ but can be applied to only single column:
https://docs.memsql.com/v7.1/guides/use-memsql/physical-schema-design/creating-a-columnstore-table/creating-a-columnstore-table/
Your columnstore table definition can contain metadata-only unenforced unique keys, single-column hash keys (which may be UNIQUE), and a FULLTEXT key. You cannot define more than one unique key.
one hack to enable UNIQUE constraint on multi columns is to use a computed column consisting of multiple columns appended and then apply UNIQUE on it which will indirectly enforce uniqueness on multiple columns.
example:
CREATE TABLE articles (
id INT UNSIGNED,
year int UNSIGNED,
title VARCHAR(200),
body TEXT,
SHARD KEY(title),
KEY (id) USING CLUSTERED COLUMNSTORE,
KEY (id) USING HASH,
UNIQUE KEY (title) USING HASH,
KEY (year) USING HASH);

Will cassandra do multi partition search for compounded primary key

I have a scenario, let's say below is my cassandra table
CREATE TABLE USER (
id TEXT,
name TEXT,
age int,
role TEXT,
PRIMARY KEY ((id, role), age));
Now I should be able to query user table using either id or role or both id and role. My question is when I use only id or role in the WHERE clause to find user, in this case will cassandra search for user record in different partition(and nodes)? As I am not searching user using both id and role which make the PK of my table.
When you use a compound partition key like in your example PRIMARY KEY ((id, role), age)
Cassandra will concatenate the two values together. It's a technique used to create a more unique or sometimes granular partition key to better control how evenly data gets distributed around the respective datacenter.
Because id and role are concatenated then hashed, you must always provide both the id AND role. Cassandra will not know what to do if you give only part of the compound partition key.

Cassandra table based query and primary key uniqueness

I have read here that for a table like:
CREATE TABLE user (
username text,
password text,
email text,
company text,
PRIMARY KEY (username)
);
We can create a table like:
CREATE TABLE user_by_company (
company text,
username text,
email text,
PRIMARY KEY (company)
);
In order to support query by the company. But what about primary key uniqueness for the second table?
Modify your table's PRIMARY KEY definition and add username as a clustering key:
CREATE TABLE user_by_company (
company text,
username text,
email text,
PRIMARY KEY (company,username)
);
That will enforce uniqueness, as well as return all usernames for a particular company. Additionally, your result set will be sorted in ascending order by username.
data will be partitioned by the company name over nodes. What if there is a lot of users from one company and less from other one. Data will be partition'ed in a non balanced way
That's the balance that you have to figure out on your own. PRIMARY KEY definition in Cassandra is a give-and-take between data distribution and query flexibility. And unless the cardinality of company is very low (like single digits), you shouldn't have to worry about creating hot spots in your cluster.
Also, if one particular company gets too big, you can use a modeling technique known as "bucketing." If I was going to "bucket" your user_by_company table, I would first add a company_bucket column, and it as an additional (composite) partitioning key:
CREATE TABLE user_by_company (
company text,
company_bucket text,
username text,
email text,
PRIMARY KEY ((company,company_bucket),username)
);
As for what to put into that bucket, it's up to you. Maybe that particular company has East and West locations, so something like this might work:
INSERT INTO user_by_company (company,company_bucket,username,email)
VALUES ('Acme','West','Jayne','jcobb#serenity.com');
The drawback here, is that you would then have to provide company_bucket whenever querying that table. But it is a solution that could help you if a company should get too big.
I think there is typo in the blog (the link you mentioned). You are right with the table structure as user_by_company there will be issue with uniqueness.
To support the typo theory:
In this case, creating a secondary index in the company field in the
user table could be a solution because it has much lower cardinality
than the user's email but let’s solve it with performance in mind.
Secondary indexes are always slower than dedicated table approach.
This are the lines mentioned in the blog for querying user by company.
If you were to define company as primary key OR part of primary key there should be no need to create secondary index.

Cassandra - difference in efficiency between simple and compound key

I have a problem with understanding a one thing from this article - http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling
Exercise - We want get all users by groupname.
Solution:
CREATE TABLE groups (
groupname text,
username text,
email text,
age int,
PRIMARY KEY (groupname, username)
);
SELECT * FROM groups WHERE groupname = 'footballers';
But to find all users in group we can set: PRIMARY KEY (groupname) and it work's also.
Why is needed in this case a clustering key (username)? I know that when we set username as the clustering key we can use it in a WHERE clause. But to find users only by groupname is any difference between PRIMARY KEY (groupname) and PRIMARY KEY (groupname, username) in terms of query efficiency?
Clustering keys provide multiple benefits: Query flexibility, result set ordering (within a partition key) and extended uniqueness.
But to find all users in group we can set: PRIMARY KEY (groupname)
Try that once. Create a new table using only groupname as your PRIMARY KEY, and then try to insert multiple usernames for each group. You will find that there will only ever be one group, and that the username column will be overwritten for each new user within that group.
But to find users only by groupname is any difference between PRIMARY KEY (groupname) and PRIMARY KEY (groupname, username) in terms of query efficiency?
If PRIMARY KEY (groupname) performs faster, the most-likely reason is because there can be only a single row returned.
In this case, defining username as a clustering key provides:
The ability to sort by username within a group.
The ability to query for a specific username within a group.
The ability to add multiple usernames within a group.
You don't need the clustering key if you want to query by groupname.
If you add a clustering key (username in this exemple) rows will be ordered by username for a groupname.

Resources