I'm trying to model a column family in Cassandra 1.1 which logically looks like this:
Entp: //CF
//rowkey-> entp_name_xyz:
{entp_full_name: "full_name_xyz",
some_value: 1,
policy: {policy_name: "default policy",
type: "type 1",
prop_1: "prop 1",
...
},
rules: {rule_1:, rule_2:,rule_3:}
}
The queries I'm trying to model are:
Get all policies given an entp name, Get all rules given an entp, Get all columns given an entp_name
I'm planning to model this column family as having "wide rows" where one row would look like this:
RowKey:- entp_name_xyz,
column_name:- policy:p1
Value:-{JSON object - {policy_name: "default policy", type: "type 1", prop_1: "prop 1", ...}}
column_name:- policy:p2
Value:-{JSON object - {policy_name: "default policy2", type: "type 1", prop_1: "prop 1", ...}}
column_name: rule:r1 where r1 is a rowkey of a Rules column family
Value: Null
Now my question is in cqlsh or cassandra-cli,
how do I insert a composite column name such as policy:p1?
With this scheme, is it possible to have a query like:
select * from entp where column_name like "policy:*" and entp_name= xyz in order to just read all the policy columns ?
How do I set a null value for a column. I read in some forums that you shouldn't need to set a null because its equivalent to not having a value. But consider the case where
you have a static schema with col1, col2 and col3 and I want to insert a row with col3 =null, but with col1 and col2 having some values. What is the cqlsh syntax to insert such data (I could not find it in the documentation) because the following gives an error:
insert into entp (col1,col2,col3) values ("abc","xyz", null)
Thanks!
Composites are far, far easier to work with in CQL3, which is available to you in cassandra 1.1, so I'll use that in my answer. Tables with multiple-component primary keys in CQL3 are equivalent to wide rows in the storage engine (Cassandra) layer.
If I've interpreted what your policy and rules data looks like, then this is a possible answer:
CREATE TABLE entp_policies (
entp_name text,
policy_name text,
policy_value text,
PRIMARY KEY (entp_name, policy_name)
);
CREATE TABLE entp_rules (
entp_name text,
rule_name text,
rule_value text,
PRIMARY KEY (entp_name, rule_name)
);
You'd use it like this:
INSERT INTO entp_policies (entp_name, policy_name, policy_value)
VALUES ('entp_name_xyz', 'p1',
'{policy_name: "default policy", type: "type 1", ...}');
INSERT INTO entp_policies (entp_name, policy_name, policy_value)
VALUES ('entp_name_xyz', 'p2',
'{policy_name: "default policy2", type: "type 1", ...}');
INSERT INTO entp_rules (entp_name, rule_name) VALUES ('entp_name_xyz', 'r1');
-- Get all policies given an entp name
SELECT * FROM entp_policies WHERE entp_name = 'entp_name_xyz';
-- Get all rules given an entp
SELECT * FROM entp_rules WHERE entp_name = 'entp_name_xyz';
-- Get all columns given an entp_name (both of the above)
With your scheme, yes, it would be possible to have a query like that, but it would be a bit more finicky than with my version, plus CQL2 is deprecated.
That's right, you just avoid inserting the value. There isn't any explicit NULL in cql (yet), but you could just do:
insert into entp (col1,col2) values ('abc','xyz');
Hope that helps!
You can use both rules and policies in one table if you define the another column in the composite
create table entp_details(
entp_name text,
type text,
name text,
value text,
primary key (entp_name, type, name));
In here type is either (Policy or Rule).
INSERT INTO entp_details (entp_name, type, name, value)
VALUES ('entp_name_xyz', 'Policy', 'p1',
'{policy_name: "default policy", type: "type 1", ...}');
INSERT INTO entp_details (entp_name, type, name, value)
VALUES ('entp_name_xyz', 'Policy', 'p2',
'{policy_name: "default policy2", type: "type 1", ...}');
INSERT INTO entp_details (entp_name, type, name, value) VALUES ('entp_name_xyz', 'Rule', 'r1', null);
And the queries are like
select * from entp_details WHERE entp_name = 'entp_name_xyz' and type = 'Policy';
select * from entp_details WHERE entp_name = 'entp_name_xyz' and type = 'Rule';
Related
Curious to find out what the best way is to generate relationship identities through ADF.
Right now, I'm consuming JSON data that does not have any identity information. This data is then transformed into multiple database sink tables with relationships (1..n, etc.). Due to FK constraints on some of the destination sink tables, these relationships need to be "built up" one at a time.
This approach seems a bit kludgy, so I'm looking to see if there are other options that I'm not aware of.
Note that I need to include the Surrogate key generation for each insert. If I do not do this, based on output database schema, I'll get a 'cannot insert PK null' error.
Also note that I turn IDENTITY_INSERT ON/OFF for each sink.
I would tend to take more of an ELT approach and use the native JSON abilites in Azure SQL DB, ie OPENJSON. You could land the JSON in a table in Azure SQL DB using ADF (eg a Stored Proc activity) and then call another stored proc to process the JSON, something like this:
-- Setup
DROP TABLE IF EXISTS #tmp
DROP TABLE IF EXISTS import.City;
DROP TABLE IF EXISTS import.Region;
DROP TABLE IF EXISTS import.Country;
GO
DROP SCHEMA IF EXISTS import
GO
CREATE SCHEMA import
CREATE TABLE Country ( CountryKey INT IDENTITY PRIMARY KEY, CountryName VARCHAR(50) NOT NULL UNIQUE )
CREATE TABLE Region ( RegionKey INT IDENTITY PRIMARY KEY, CountryKey INT NOT NULL FOREIGN KEY REFERENCES import.Country, RegionName VARCHAR(50) NOT NULL UNIQUE )
CREATE TABLE City ( CityKey INT IDENTITY(100,1) PRIMARY KEY, RegionKey INT NOT NULL FOREIGN KEY REFERENCES import.Region, CityName VARCHAR(50) NOT NULL UNIQUE )
GO
DECLARE #json NVARCHAR(MAX) = '{
"Cities": [
{
"Country": "England",
"Region": "Greater London",
"City": "London"
},
{
"Country": "England",
"Region": "West Midlands",
"City": "Birmingham"
},
{
"Country": "England",
"Region": "Greater Manchester",
"City": "Manchester"
},
{
"Country": "Scotland",
"Region": "Lothian",
"City": "Edinburgh"
}
]
}'
SELECT *
INTO #tmp
FROM OPENJSON( #json, '$.Cities' )
WITH
(
Country VARCHAR(50),
Region VARCHAR(50),
City VARCHAR(50)
)
GO
-- Add the Country first (has no foreign keys)
INSERT INTO import.Country ( CountryName )
SELECT DISTINCT Country
FROM #tmp s
WHERE NOT EXISTS ( SELECT * FROM import.Country t WHERE s.Country = t.CountryName )
-- Add the Region next including Country FK
INSERT INTO import.Region ( CountryKey, RegionName )
SELECT t.CountryKey, s.Region
FROM #tmp s
INNER JOIN import.Country t ON s.Country = t.CountryName
-- Now add the City with FKs
INSERT INTO import.City ( RegionKey, CityName )
SELECT r.RegionKey, s.City
FROM #tmp s
INNER JOIN import.Country c ON s.Country = c.CountryName
INNER JOIN import.Region r ON s.Region = r.RegionName
AND c.CountryKey = r.CountryKey
SELECT * FROM import.City;
SELECT * FROM import.Region;
SELECT * FROM import.Country;
This is a simple test script designed to show the idea and should run end-to-end but it is not production code.
In Cassandra,Suppose we require to access key level against map type column. how to do it?
Create statement:
create table collection_tab2(
empid int,
emploc map<text,text>,
primary key(empid));
Insert statement:
insert into collection_tab2 (empid, emploc ) VALUES ( 100,{'CHE':'Tata Consultancy Services','CBE':'CTS','USA':'Genpact LLC'} );
select:
select emploc from collection_tab2;
empid | emploc
------+--------------------------------------------------------------------------
100 | {'CBE': 'CTS', 'CHE': 'Tata Consultancy Services', 'USA': 'Genpact LLC'}
In that case, if want to access 'USA' key alone . What I should do?
I tried based on the Index. But all values are coming.
CREATE INDEX fetch_index ON killrvideo.collection_tab2 (keys(emploc));
select * from collection_tab2 where emploc CONTAINS KEY 'CBE';
empid | emploc
------+--------------------------------------------------------------------------
100 | {'CBE': 'CTS', 'CHE': 'Tata Consultancy Services', 'USA': 'Genpact LLC'}
But expected:
'CHE': 'Tata Consultancy Services'
Just as a data model change I would strongly recommend:
create table collection_tab2(
empid int,
emploc_key text,
emploc_value text,
primary key(empid, emploc_key));
Then you can query and page through simply as the emploc_key is clustering key instead of part of the cql collection that has multiple limits and negative performance impacts.
Then:
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'CHE', 'Tata Consultancy Services');
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'CBE, 'CTS');
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'USA', 'Genpact LLC');
Can also put it in a unlogged batch and it will still be applied efficiently and atomically because all in the same partition.
To do it as you have you can after 4.0 with CASSANDRA-7396 with [] selectors like:
SELECT emploc['USA'] FROM collection_tab2 WHERE empid = 100;
But I would still strongly recommend data model changes as its significantly more efficient, and can work in existing versions with:
SELECT * FROM collection_tab2 WHERE empid = 100 AND emploc_key = 'USA';
I have following table.
CREATE TABLE test_x (id text PRIMARY KEY, type frozen<mycustomtype>);
mycustomtype is defined as follows,
CREATE TABLE mycustomtype (
id uuid PRIMARY KEY,
name text
)
And i have created following materialized view for queries based on mycustometype filed.
CREATE MATERIALIZED VIEW test_x_by_mycustomtype_name AS
SELECT id, type
FROM test_x
WHERE type IS NOT NULL
PRIMARY KEY (id, type)
WITH CLUSTERING ORDER BY (type ASC)
With above view i hope to execute following query.
select id from test_x_by_mycustomtype_name where type =
{id: a3e64f8f-bd44-4f28-b8d9-6938726e34d4, name: 'Sample'};
But the query fails saying i need to use 'ALLOW FILTERING'. I created the view not to use ALLOW FILTERING. Why this error is happening here since i have used the part of primary key of the view ?
In you view, the type column is still clustering key. Hence, ALLOW FILTER should be used. You can change the view as per below and retry
CREATE MATERIALIZED VIEW test_x_by_mycustomtype_name_2 AS
SELECT id, type
FROM test_x
WHERE type IS NOT NULL
PRIMARY KEY (type, id)
WITH CLUSTERING ORDER BY (id ASC);
cqlsh:test> select id from test_x_by_mycustomtype_name_2 where type = {id: a3e64f8f-bd44-4f28-b8d9-6938726e34d4, name: 'Sample'};
id
----
Change the order of the primary key of materialized view
CREATE MATERIALIZED VIEW test_x_by_mycustomtype_name AS
SELECT id, type
FROM test_x
WHERE type IS NOT NULL
PRIMARY KEY (type, id)
WITH CLUSTERING ORDER BY (type ASC);
I have the following format in json to store lawyers, I have doubts how to model in postgres the field "specialties" which has array of object each one with a title and a subarrays of subspecialties:
{
"id": 1,
"name": "John Johnson Johannes",
"gender": "f",
"specialties": [
{
"specialty": "Business law",
"sub-specialties": [
"Incorporation",
"Taxes",
"Fusions"
]
},
{
"specialty": "Criminal law",
"sub-specialties": [
"Property offenses",
"Personal offenses",
"Strict liability"
]
}
]
}
And I have made this lawyers table in Postgres:
DROP DATABASE IF EXISTS lawyers_db;
CREATE DATABASE lawyers_db;
\c lawyers_db;
CREATE TYPE gen AS ENUM ('f', 'm');
CREATE TABLE lawyers_tb (
ID SERIAL PRIMARY KEY,
name VARCHAR,
gender gen
);
INSERT INTO lawyers_tb (name, gender)
VALUES ('John Doe', 'm');
I'm using some node.js libraries that when I read data from Postgres table it returns the data as a JSON, so I would like to keep the relational model without using JSONb to store as a document my lawyers.
Is it possible to achieve what I want without using JSONb type?
Forget about objects for a minute and really think through what your data are and how they relate to each other (we are after all using a relational database).
What you have here is simply a relationship.
You have lawyers and you have specialties. The relationship is that lawyers have specialties and specialties belong to lawyers (an n-to-n relationship) and the same goes for the relationship between specialties and subspecialties (n-to-n).
First, lets do the simpler structure of a 1-to-n relationship:
CREATE TABLE lawyers_tb (
ID SERIAL PRIMARY KEY,
name VARCHAR,
gender gen
);
CREATE TABLE specialties_tb (
ID SERIAL PRIMARY KEY,
name VARCHAR,
lawyer_ID INTEGER
);
CREATE TABLE subspecialties_tb (
ID SERIAL PRIMARY KEY,
name VARCHAR,
specialty_ID INTEGER
);
This works but results in duplicates because each specialty can only belong to one lawyer thus if two lawyers specialise in "Business law" you'd have to define "Business law" twice. Worse, for each specialty you also have to duplicate subspecialties.
The solution is a join table (also called a map/mapping table):
CREATE TABLE lawyers_tb (
ID SERIAL PRIMARY KEY,
name VARCHAR,
gender gen
);
CREATE TABLE lawyer_specialties_tb (
name VARCHAR,
lawyer_ID INTEGER,
specialty_ID INTEGER
);
CREATE TABLE specialties_tb (
ID SERIAL PRIMARY KEY,
name VARCHAR
);
CREATE TABLE specialty_subspecialties_tb (
name VARCHAR,
specialty_ID INTEGER,
subspecialty_ID INTEGER
);
CREATE TABLE subspecialties_tb (
ID SERIAL PRIMARY KEY,
name VARCHAR
);
This way each specialty can belong to more than one lawyer (true n-to-n relationship) and each subspecialty can belong to more than one specialty.
You can use joins to fetch the whole dataset:
SELECT lawyers_tb.name as name,
lawyers_tb.gender as gender,
specialties_tb.name as specialty,
subspecialties_tb.name as subspecialty
FROM lawyers_tb LEFT JOIN lawyer_specialties_tb
ON lawyers_tb.ID=lawyer_specialties_tb.lawyer_ID
LEFT JOIN specialties_tb
ON specialties_tb.ID=lawyer_specialties_tb.specialty_ID
LEFT JOIN specialty_subspecialties_tb
ON specialties_tb.ID=specialty_subspecialties_tb.specialty_ID
Yes, it's a bit more complicated to query but the structure allows you to maintain each dataset individually and defines the proper relationships between them.
You may also want to define the keys in the join tables as foreign keys to enforce correctness of the dataset.
I am trying to insert values in UDT but getting error message -
message="unconfigured columnfamily my_object"
below my statement-
INSERT INTO home.my_object (id,type,quantity ,critical,page_count,stock,outer_envelope ) VALUES ('3.MYF','COM','D','A','VV','','');
What am i doing wrong?
That error means that the keyspace "home" exists, but does not contain a table (column family) called "my_object". I also noticed that your insert statement does not contain a UDT literal.
UDTs define a type, but you must also define a table with a column of that type before inserting any data. I assume your UDT is called "my_object". Try this:
create table home.test (key int primary key, object frozen<my_object>);
insert into home.test (key, object) values (0, {id: 'value', type: 'othervalue'});