Data modelling in Cassandra

Data modelling in Cassandra - cassandra

I am trying out Cassandra and looking at ways to model our data in it. I have described our data store requirements along with my thoughts on how to model in Cassandra. Please let me know whether this makes sense and suggest changes.
Did quite some search on the web, but didn't get clear idea regarding how to model multi-valued column requirements and index it, which is quite a common requirement.
Any help would be greatly appreciated.
Our current data for each record:
{
‘id’ : <some uuid>,
‘title’ : text,
‘description’ text,
‘images’ : [{id : id1, ‘caption’: cap1}, {id : id2, ‘caption’: cap2}, ... ],
‘videos’ : [‘video id1’, video id2’, …],
‘keywords’ [‘keyword1’, ‘keyword2’,...]
updated_at: <timestamp>
}
Queries we would need
Lookup by id
Lookup by images.id
Lookup by keyword
All records where updated_at >
Our current model
Column Family: Article
id: uuid
title: varchar
description: varchar
images:
videos:
keywords:
updated_at:
updated_date: [eg: ‘2013-05-06:02’]
Column Family: Image-Article Index
{
‘id’ : <image id>,
‘article1 uuid’ : null,
‘article2 uuid’ : null,
...
}
Column Family: Keyword-Article Index
{
‘id’ : <keyword>,
‘article1 uuid’ : null,
‘article2 uuid’ : null,
...
}
Sample queries:
Lookup by id => straight forward
Lookup by images.id =>
ids = select * from ‘Image-Article Index’ where id=<image id>
select * from Article where id in (ids)
Lookup by keyword =>
ids = select * from ‘Keyword-Article Index’ where id=<image id>
select * from Article where id in (ids)
All records where updated_at > <some timestamp>
Cassandra doesn’t support range queries unless there is one equality condition on one of the indexed columns.
extract date and hour from given timestamp;
for each date:hour in start to current time
ids = select * from Article where update_date=date:hour and timestamp > <some timestamp>
select * from Article where id in (ids)

Related

Need a sequelize alternative to a find raw query

SELECT * FROM recommended_plan WHERE user_id = ? AND created_at = (SELECT MAX(created_at) FROM recommended_plan WHERE user_id = ?)
I am stuck at converting this raw query into a sequelize one.

CASE 1 - you have only one record with max created_at value
First you can convert it to this equivalent one:
SELECT * FROM recommended_plan WHERE user_id = ?
order by created_at desc
limit 1
(Please pay attention that the limit option depends on a certain DBMS and could have a different name and/or syntax).
Now you can easily construct the corresponding Sequelize query:
const plan = await RecommendedPlan.findAll({
where: {
user_id: userId
},
limit: 1,
order: [['created_at', 'desc']]
})
CASE 2 - you have several records with max created_at value:
You can use Sequelize.literal to use a condition with a subquery:
const plan = await RecommendedPlan.findAll({
where: {
user_id: userId,
created_at: Sequelize.literal('(SELECT MAX(created_at) FROM recommended_plan rp WHERE rp.user_id = $userId)')
},
bind: {
userId: userId
}
})

NodeJs : bulk insert into SQL Server one-to-many

I want to using nodejs mssql package to bulk insert data with below json:
[
{
"name": "Tom",
"registerDate": "2021-10-10 00:00:00",
"gender": 0,
"consumeRecord":[
{
"date": "2021-10-11 00:00:00",
"price": 102.5
},
{
"date": "2021-10-12 00:00:00",
"price": 200
}
]
},
{
"name": "Mary",
"registerDate": "2021-06-10 00:00:00",
"gender": 1,
"consumeRecord":[
{
"date": "2021-07-11 00:00:00",
"price": 702.5
},
{
"date": "2021-12-12 00:00:00",
"price": 98.2
}
]
}
]
I am try to mssql bulk insert for the member record with multiple consume data?
Is there anything can insert one to many with bulk insert like below.
because it seems need to insert the member table and get the id (primary key) first. Then using the id (primary key) for the consume table relation data
const sql = require('mssql')
// member table
const membertable = new sql.Table('Member')
table.columns.add('name', sql.Int, {nullable: false})
table.columns.add('registerDate', sql.VarChar(50), {nullable: false})
table.columns.add('gender', sql.VarChar(50), {nullable: false})
// consume record table
const consumeTable = new sql.Table('ConsumeRecord')
table.columns.add('MemberId', sql.Int, {nullable: false})
table.columns.add('Date', sql.VarChar(50), {nullable: false})
table.columns.add('price', sql.Money, {nullable: false})
// insert into member table
jsonList.forEach(data => {
table.rows.add(data.name)
table.rows.add(data.registerDate)
table.rows.add(data.gender)
consumeTable.rows.add(data.memberId) // <---- should insert member table id
consumeTable.rows.add(data.consumeRecord.data)
consumeTable.rows.add(data.consumeRecord.price)
const request = new sql.Request()
request.bulk(consumeTable , (err, result) => {
})
})
const request = new sql.Request()
request.bulk(membertable , (err, result) => {
})
Expected Record:
Member Table
id (auto increment)
name
registerDate
gender
1
Tom
2021-10-10 00:00:00
0
2
Mary
2021-06-10 00:00:00
1
Consume Record Table
id
MemberId
Date
price
1
1
2021-10-10 00:00:00
102.5
2
1
2021-10-12 00:00:00
200
3
2
2021-07-11 00:00:00
702.5
4
2
2021-12-12 00:00:00
98.2

The best way to do this is to upload the whole thing in batch to SQL Server, and ensure that it inserts the correct foreign key.
You have two options
Option 1
Upload the main table as a Table Valued Parameter or JSON blob
Insert with OUTPUT clause to select the inserted IDs back to the client
Correlate those IDs back to the child table data
Bulk Insert that as well
Option 2 is a bit easier: do the whole thing in SQL
Upload everything as one big JSON blob
Insert main table with OUTPUT clause into table variable
Insert child table, joining the IDs from the table variable
CREATE TABLE Member(
Id int IDENTITY PRIMARY KEY,
name varchar(50),
registerDate datetime NOT NULL,
gender tinyint NOT NULL
);
CREATE TABLE ConsumeRecord(
MemberId Int NOT NULL REFERENCES Member (Id),
Date datetime not null,
price decimal(9,2)
);
Note the more sensible datatypes of the columns
DECLARE #ids TABLE (jsonIndex nvarchar(5) COLLATE Latin1_General_BIN2 not null, memberId int not null);
WITH Source AS (
SELECT
j1.[key],
j2.*
FROM OPENJSON(#json) j1
CROSS APPLY OPENJSON(j1.value)
WITH (
name varchar(50),
registerDate datetime,
gender tinyint
) j2
)
MERGE Member m
USING Source s
ON 1=0 -- never match
WHEN NOT MATCHED THEN
INSERT (name, registerDate, gender)
VALUES (s.name, s.registerDate, s.gender)
OUTPUT s.[key], inserted.ID
INTO #ids(jsonIndex, memberId);
INSERT ConsumeRecord (MemberId, Date, price)
SELECT
i.memberId,
j2.date,
j2.price
FROM OPENJSON(#json) j1
CROSS APPLY OPENJSON(j1.value, '$.consumeRecord')
WITH (
date datetime,
price decimal(9,2)
) j2
JOIN #ids i ON i.jsonIndex = j1.[key];
db<>fiddle
Unfortunately, INSERT only allows you to OUTPUT from the inserted table, not from any non-inserted columns. So we need to hack it with a weird MERGE

JSONB Query Sequelize

I have one table in postgres table and table structure is
ID
Name
Details
Context
CreatedDate
Where as Context is JSONB field and CreatedDate is a timestamp
I am saving data in Context this way {"trade": {"id": 102}, "trader": {"id": 100}}
I am trying to select record from Context based on trader id and this is my query
this.findAll({
where: {
context: {
$contains: {
trader: [{id: '100'}]
}
}
}
})
I tried nested keys as well but no result yeild.
this.findAll({
where: {
'context.trader.id': {
$eq: '100'
}
}
})
Can you please suggest how I can select the records based on my structure.
In continuity to that how I can get records based on two statements like adding createdtime in this where clause

Sequelize - Join with multiple column

I like to convert the following query into sequelize code
select * from table_a
inner join table_b
on table_a.column_1 = table_b.column_1
and table_a.column_2 = table_b.column_2
I have tried many approaches and followed many provided solution but I am unable to achieve the desired query from sequelize code.
The max I achieve is following :
select * from table_a
inner join table_b
on table_a.column_1 = table_b.column_1
I want the second condition also.
and table_a.column_2 = table_b.column_2
any proper way to achieve it?

You need to define your own on clause of the JOIN statement
ModelA.findAll({
include: [
{
model: ModelB,
on: {
col1: sequelize.where(sequelize.col("ModelA.col1"), "=", sequelize.col("ModelB.col1")),
col2: sequelize.where(sequelize.col("ModelA.col2"), "=", sequelize.col("ModelB.col2"))
},
attributes: [] // empty array means that no column from ModelB will be returned
}
]
}).then((modelAInstances) => {
// result...
});

Regarding #TophatGordon 's doubt in accepted answer's comment: that if we need to have any associations set up in model or not.
Also went through the github issue raised back in 2012 that is still in open state.
So I was also in the same situation and trying to setup my own ON condition for left outer join.
When I directly tried to use the on: {...} inside the Table1.findAll(...include Table2 with ON condition...), it didn't work.
It threw an error:
EagerLoadingError [SequelizeEagerLoadingError]: Table2 is not associated to Table1!
My use case was to match two non-primary-key columns from Table1 to two columns in Table2 in left outer join. I will show how and what I acheived:
Don't get confused by table names and column names, as I had to change them from the original ones that I used.
SO I had to create an association in Table1(Task) like:
Task.associate = (models) => {
Task.hasOne(models.SubTask, {
foreignKey: 'someId', // <--- one of the column of table2 - SubTask: not a primary key here in my case; can be primary key also
sourceKey: 'someId', // <--- one of the column of table1 - Task: not a primary key here in my case; can be a primary key also
scope: {
[Op.and]: sequelize.where(sequelize.col("Task.some_id_2"),
// '=',
Op.eq, // or you can use '=',
sequelize.col("subTask.some_id_2")),
},
as: 'subTask',
// no constraints should be applied if sequelize will be creating tables and unique keys are not defined,
//as it throws error of unique constraint
constraints: false,
});
};
So the find query looks like this :
Task.findAll({
where: whereCondition,
// attributes: ['id','name','someId','someId2'],
include: [{
model: SubTask, as: 'subTask', // <-- model name and alias name as defined in association
attributes: [], // if no attributes needed from SubTask - empty array
},
],
});
Resultant query:
One matching condition is taken from [foreignKey] = [sourceKey]
Second matching condition is obtained by sequelize.where(...) used in scope:{...}
select
"Task"."id",
"Task"."name",
"Task"."some_id" as "someId",
"Task"."some_id_2" as "someId2"
from
"task" as "Task"
left outer join "sub_task" as "subTask" on
"Task"."some_id" = "subTask"."some_id"
and "Task"."some_id_2" = "subTask"."some_id_2";
Another approach to achieve same as above to solve issues when using Table1 in include i.e. when Table1 appears as 2nd level table or is included from other table - say Table0
Task.associate = (models) => {
Task.hasOne(models.SubTask, {
foreignKey: 'someId', // <--- one of the column of table2 - SubTask: not a primary key here in my case; can be primary key also
sourceKey: 'someId', // <--- one of the column of table1 - Task: not a primary key here in my case; can be a primary key also
as: 'subTask',
// <-- removed scope -->
// no constraints should be applied if sequelize will be creating tables and unique keys are not defined,
//as it throws error of unique constraint
constraints: false,
});
};
So the find query from Table0 looks like this : Also the foreignKey and sourceKey will not be considered as we will now use custom on: {...}
Table0.findAll({
where: whereCondition,
// attributes: ['id','name','someId','someId2'],
include: {
model: Task, as: 'Table1AliasName', // if association has been defined as alias name
include: [{
model: SubTask, as: 'subTask', // <-- model name and alias name as defined in association
attributes: [], // if no attributes needed from SubTask - empty array
on: {
[Op.and]: [
sequelize.where(
sequelize.col('Table1AliasName_OR_ModelName.some_id'),
Op.eq, // '=',
sequelize.col('Table1AliasName_OR_ModelName->subTask.some_id')
),
sequelize.where(
sequelize.col('Table1AliasName_OR_ModelName.some_id_2'),
Op.eq, // '=',
sequelize.col('Table1AliasName_OR_ModelName->subTask.some_id_2')
),
],
},
}],
}
});
Skip below part if your tables are already created...
Set constraints to false, as if sequelize tries to create the 2nd table(SubTask) it might throw error (DatabaseError [SequelizeDatabaseError]: there is no unique constraint matching given keys for referenced table "task") due to following query:
create table if not exists "sub_task" ("some_id" INTEGER, "some_id_2"
INTEGER references "task" ("some_id") on delete cascade on update
cascade, "data" INTEGER);
If we set constraint: false, it creates this below query instead which will not throw unique constraint error as we are referencing non-primary column:
create table if not exists "sub_task" ("some_id" INTEGER, "some_id_2" INTEGER, "data" INTEGER);

Distinct count with sequelize

I'm trying to get a distinct count with sequelize such as
'SELECT COUNT(DISTINCT(age)) AS `count` FROM `Persons` AS `Person`'
As long as I use a raw query, I get the desired result. However, as soon as I change to the sequelize count function, the query is broke in Postgres:
Person.count({distinct:'age'}).then(...);
results to
'SELECT COUNT(DISTINCT(*)) AS `count` FROM `Persons` AS `Person`'
which leads to a syntax error. Solutions described in different posts such as How to get a distinct count with sequelize? do not work, unless you add an include statement or a where clause which I do not have in this special case.
Does anybody know a proper solution for this?

You have to use Sequelize aggregation to make it worked correctly.
Model.aggregate(field, aggregateFunction, [options])
Returns: Returns the aggregate result cast to options.dataType, unless
options.plain is false, in which case the complete data result is
returned.
Example:
Person.aggregate('age', 'count', { distinct: true })
.then(function(count) {
//.. distinct count is here
});
Executing (default):
SELECT count(DISTINCT("age")) AS "count" FROM "persons" AS "person";

You can do it something like this :
models.User.findAll({
attributes: ['user_status',[sequelize.fn('COUNT', sequelize.col('user_status')), 'total']] ,
group : ['user_status']
});
That will return something like :
[
{ user_status : 1 , total : 2 },
{ user_status : 2 , total : 6 },
{ user_status : 3 , total : 9 },
...
]
Then you can loop through returned data and check of status

In latest version, you should be doing
Person.count({distinct:true, col: 'age'}).then(...);
See: http://docs.sequelizejs.com/class/lib/model.js~Model.html#static-method-count

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Data modelling in Cassandra - cassandra

Related

Need a sequelize alternative to a find raw query

NodeJs : bulk insert into SQL Server one-to-many

JSONB Query Sequelize

Sequelize - Join with multiple column

Distinct count with sequelize

Categories

Resources