instagram timeline data model in cassandra

instagram timeline data model in cassandra - cassandra

i want design timeline (Home) like instagram, but most sampled like "twissandra-j" used bellow schema:
-- Users user is following
CREATE TABLE following (
username text,
followed text,
PRIMARY KEY(username, followed)
);
-- Users who follow user
CREATE TABLE followers (
username text,
following text,
PRIMARY KEY(username, following)
);
-- Materialized view of tweets created by user
CREATE TABLE userline (
tweetid timeuuid,
username text,
body text,
PRIMARY KEY(username, tweetid)
);
-- Materialized view of tweets created by user, and users she follows
CREATE TABLE timeline (
username text,
tweetid timeuuid,
posted_by text,
body text,
PRIMARY KEY(username, tweetid)
);
in this design, every new post inserted, for each follower inserted a new record to timeline. if a user has 10k follower and 1000 users worked with application, program fails, Is there a better way?
// Insert the tweet into follower timelines
for (String follower : getFollowers(username)) {
execute("INSERT INTO timeline (username, tweetid, posted_by, body) VALUES ('%s', %s, '%s', '%s')",
follower,
id.toString(),
username,
body);

I guess, one of those 2 solutions/suggestions could help :
1)- 1st suggestion, insert into TIMELINE in a Batch mode of 1000 inserts statements for example.
execute("
BEGIN BATCH
INSERT INTO timeline (username, tweetid, posted_by, body) VALUES ('%s', %s, '%s', '%s')", follower, id.toString(), username, body);
INSERT INTO timeline (username, tweetid, posted_by, body) VALUES ('%s', %s, '%s', '%s')", follower, id.toString(), username, body);
INSERT INTO timeline (username, tweetid, posted_by, body) VALUES ('%s', %s, '%s', '%s')", follower, id.toString(), username, body);
...
// n statements
APPLY BATCH");
Batching multiple statements saves network exchanges between the client/server and server coordinator/replicas.
One more thing, batches are atomic by default (in Cassandra 1.2 and later). In the context of a Cassandra batch operation, atomic means that if any of the batch succeeds, all of it will, otherwise none.
2)- 2nd suggestion, achieve insert into TIMELINE in an asynchronous mode (with success callback function in the front-end):
And of course, maybe you can combine both of them.

Related

How do I post data from req.body into a CQL UDT column using the Node.js driver?

I am new to cassandra I need your help.
After creating a collection table using cql console, I am able to create new records and read them, but Post operation using cassandra-driver in nodejs is not working, it only works when I use cql console.
I created table:
CREATE TYPE event_info (
type text,
pagePath text,
ts text,
actionName text
);
CREATE TABLE journey_info_5 (
id uuid PRIMARY KEY,
user_id text,
session_start_ts timestamp,
event FROZEN<event_info>
);
codes for post operation:
export const pushEvent = async(req,res)=>{
const pushEventQuery = 'INSERT INTO user_journey.userjourney (id, user_id, session_start_ts,events)
VALUES ( ${types.TimeUuid.now()}, ${req.body.user_id},${types.TimeUuid.now()},
{ ${req.body.type},${req.body.pagePath},${req.body.ts},${req.body.actionName}} } );'
try {
await client.execute(pushEventQuery)
res.status(201).json("new record added successfully");
} catch (error) {
res.status(404).send({ message: error });
console.log(error);
}
}
it is giving errors, How can I get data from user and post in this collection?
please help me, if any idea

The issue is that your CQL statement is invalid. The format for inserting values in a user-defined type (UDT) column is:
{ fieldname1: 'value1', fieldname2: 'value2', ... }
Note that the column names in your schema don't match up with the CQL statement in your code so I'm reposting the schema here for clarity:
CREATE TYPE community.event_info (
type text,
pagepath text,
ts text,
actionname text
)
CREATE TABLE community.journey_info_5 (
id uuid PRIMARY KEY,
event frozen<event_info>,
session_start_ts timestamp,
user_id text
)
Here's the CQL statement I used to insert a UDT into the table (formatted for readability):
INSERT INTO journey_info_5 (id, user_id, session_start_ts, event)
VALUES (
now(),
'thierry',
totimestamp(now()),
{
type: 'type1',
pagePath: 'pagePath1',
ts: 'ts1',
actionName: 'actionName1'
}
);
For reference, see Inserting or updating data into a UDT column. Cheers!

I can't get sqlite3 insert to work on nodejs

I can't get the following code to work. The SQL statement works when I test it with the sqlite binaries but trying to run it via the nodejs sqlite3 library always result in the following error. Can someone who have used the library before please help me?
[Error: SQLITE_RANGE: column index out of range
Emitted 'error' event on Statement instance at:
] {
errno: 25,
code: 'SQLITE_RANGE'
}
db.serialize(() => {
db.run("CREATE TABLE IF NOT EXISTS account(id INTEGER PRIMARY KEY, firstname TEXT, lastname TEXT, password TEXT, email TEXT UNIQUE)");
db.run("INSERT INTO account(firstname, lastname, password, email) VALUES(#firstname, #lastname, #password, #email)", {firstname, lastname, password, email});
response.send('Successfully registered account');
response.end();
});

Since you are not passing the primary key in the INSERT clause, you either need to update the primary key to auto-increment, or pass it into the INSERT clause.
db.run("CREATE TABLE IF NOT EXISTS account(id INTEGER PRIMARY KEY AUTO_INCREMENT, firstname TEXT, lastname TEXT, password TEXT, email TEXT UNIQUE)");

How to insert new rows to a junction table Postgres

I have a many to many relationship set up with with services and service_categories. Each has a table, and there is a third table to handle to relationship (junction table) called service_service_categories. I have created them like this:
CREATE TABLE services(
service_id SERIAL,
name VARCHAR(255),
summary VARCHAR(255),
profileImage VARCHAR(255),
userAgeGroup VARCHAR(255),
userType TEXT,
additionalNeeds TEXT[],
experience TEXT,
location POINT,
price NUMERIC,
PRIMARY KEY (id),
UNIQUE (name)
);
CREATE TABLE service_categories(
service_category_id SERIAL,
name TEXT,
description VARCHAR(255),
PRIMARY KEY (id),
UNIQUE (name)
);
CREATE TABLE service_service_categories(
service_id INT NOT NULL,
service_category_id INT NOT NULL,
PRIMARY KEY (service_id, service_category_id),
FOREIGN KEY (service_id) REFERENCES services(service_id) ON UPDATE CASCADE,
FOREIGN KEY (service_category_id) REFERENCES service_categories(service_category_id) ON UPDATE CASCADE
);
Now, in my application I would like to add a service_category to a service from a select list for example, at the same time as I create or update a service. In my node js I have this post route set up:
// Create a service
router.post('/', async( req, res) => {
try {
console.log(req.body);
const { name, summary } = req.body;
const newService = await pool.query(
'INSERT INTO services(name,summary) VALUES($1,$2) RETURNING *',
[name, summary]
);
res.json(newService);
} catch (err) {
console.log(err.message);
}
})
How should I change this code to also add a row to the service_service_categories table, when the new service ahas not been created yet, so has no serial number created?
If any one could talk me through the approach for this I would be grateful.
Thanks.

You can do this in the database by adding a trigger to the services table to insert a row into the service_service_categories that fires on row insert. The "NEW" keyword in the trigger function represents the row that was just inserted, so you can access the serial ID value.
https://www.postgresqltutorial.com/postgresql-triggers/
Something like this:
CREATE TRIGGER insert_new_service_trigger
AFTER INSERT
ON services
FOR EACH ROW
EXECUTE PROCEDURE insert_new_service();
Then your trigger function looks something like this (noting that the trigger function needs to be created before the trigger itself):
CREATE OR REPLACE FUNCTION insert_new_service()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS
$$
BEGIN
-- check to see if service_id has been created
IF NEW.service_id NOT IN (SELECT service_id FROM service_service_categories) THEN
INSERT INTO service_service_categories(service_id)
VALUES(NEW.service_id);
END IF;
RETURN NEW;
END;
$$;
However in your example data structure, it doesn't seem like there's a good way to link the service_categories.service_category_id serial value to this new row - you may need to change it a bit to accommodate

I managed to get it working to a point with multiple inserts and changing the schema a bit on services table. In the service table I added a column: category_id INT:
ALTER TABLE services
ADD COLUMN category_id INT;
Then in my node query I did this and it worked:
const newService = await pool.query(
`
with ins1 AS
(
INSERT INTO services (name,summary,category_id)
VALUES ($1,$2,$3) RETURNING service_id, category_id
),
ins2 AS
(
INSERT INTO service_service_categories (service_id,service_category_id) SELECT service_id, category_id FROM ins1
)
select * from ins1
`,
[name, summary, category_id]
);
Ideally I want to have multiple categories so the category_id column on service table, would become category_ids INT[]. and it would be an array of ids.
How would I put the second insert into a foreach (interger in the array), so it creates a new service_service_categories row for each id in the array?

Proper Sequelize flow to avoid duplicate rows?

I am using Sequelize in my node js server. I am ending up with validation errors because my code tries to write the record twice instead of creating it once and then updating it since it's already in DB (Postgresql).
This is the flow I use when the request runs:
const latitude = req.body.latitude;
var metrics = await models.user_car_metrics.findOne({ where: { user_id: userId, car_id: carId } })
if (metrics) {
metrics.latitude = latitude;
.....
} else {
metrics = models.user_car_metrics.build({
user_id: userId,
car_id: carId,
latitude: latitude
....
});
}
var savedMetrics = await metrics();
return res.status(201).json(savedMetrics);
At times, if the client calls the endpoint very fast twice or more the endpoint above tries to save two new rows in user_car_metrics, with the same user_id and car_id, both FK on tables user and car.
I have a constraint:
ALTER TABLE user_car_metrics DROP CONSTRAINT IF EXISTS user_id_car_id_unique, ADD CONSTRAINT user_id_car_id_unique UNIQUE (car_id, user_id);
Point is, there can only be one entry for a given user_id and car_id pair.
Because of that, I started seeing validation issues and after looking into it and adding logs I realize the code above adds duplicates in the table (without the constraint). If the constraint is there, I get validation errors when the code above tries to insert the duplicate record.
Question is, how do I avoid this problem? How do I structure the code so that it won't try to create duplicate records. Is there a way to serialize this?

If you have a unique constraint then you can use upsert to either insert or update the record depending on whether you have a record with the same primary key value or column values that are in the unique constraint.
await models.user_car_metrics.upsert({
user_id: userId,
car_id: carId,
latitude: latitude
....
})
See upsert
PostgreSQL - Implemented with ON CONFLICT DO UPDATE. If update data contains PK field, then PK is selected as the default conflict key. Otherwise, first unique constraint/index will be selected, which can satisfy conflict key requirements.

How will i know that record was duplicate or it was inserted successfully?

Here is my CQL table:
CREATE TABLE user_login (
userName varchar PRIMARY KEY,
userId uuid,
fullName varchar,
password text,
blocked boolean
);
I have this datastax java driver code
PreparedStatement prepareStmt= instances.getCqlSession().prepare("INSERT INTO "+ AppConstants.KEYSPACE+".user_info(userId, userName, fullName, bizzCateg, userType, blocked) VALUES(?, ?, ?, ?, ?, ?);");
batch.add(prepareStmt.bind(userId, userData.getEmail(), userData.getName(), userData.getBizzCategory(), userData.getUserType(), false));
PreparedStatement pstmtUserLogin = instances.getCqlSession().prepare("INSERT INTO "+ AppConstants.KEYSPACE+".user_login(userName, userId, fullName, password, blocked) VALUES(?, ?, ?, ?, ?) IF NOT EXIST");
batch.add(pstmtUserLogin.bind(userData.getEmail(), userId, userData.getName(), passwordEncoder.encode(userData.getPwd()), false));
instances.getCqlSession().executeAsync(batch);
Here the problem is that if I remove IF NOT EXIST all work fine but if put it back it simply do not insert records in table nor throw any error.
So how will i know that i am inserting duplicate userName ?
I am using cassandra 2.0.1

Use INSERT... IF NOT EXISTS, then you can use ResultSet#wasApplied() to check the outcome:
ResultSet rs = session.execute("insert into user (name) values ('foo') if not exists");
System.out.println(rs.wasApplied());
Notes:
this CQL query is a lightweight transaction, that carries performance implications. See this article for more information.
your example only has one statement, you don't need a batch

Looks like you need an ACID transaction and Cassandra, simply put, is not ACID. You have absolutely no guarantee that in the interval you check if username exists it will not be created from someone else.
Besides that, in CQL standard INSERT and UPDATE do the same thing. They both write a "new" record marking the old ones deleted. If there are old records or not is not important.
IF you want to authenticate or create a new user on the fly, I suppose you can work on a composite key username + password, and to your query as update where username =datum AND password = datum.
IN this way, if the user gives you a wrong password your query fails.
If user is new, he cant give a "wrong" password, and so his account is created.
You can now test for a field like "alreadysubscribed" which you only set after the first login, so in case of a "just created" user will be missing

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

instagram timeline data model in cassandra - cassandra

Related

How do I post data from req.body into a CQL UDT column using the Node.js driver?

I can't get sqlite3 insert to work on nodejs

How to insert new rows to a junction table Postgres

Proper Sequelize flow to avoid duplicate rows?

How will i know that record was duplicate or it was inserted successfully?

Categories

Resources