How to handle many to many relationship DynamoDB - node.js

Im new to dynamoDB and im trying to build an ecommerce store. I have a table with a user, product and order.
My access patterns are:
get all products in a users order
I can then use this for a similar issue with the users cart. But im not sure how. My user to order relationship is one to many and my product to order relationship is many to many.
My data looks like this:
type Variant = {
size: Sizes;
quantity: number;
price: number;
}
type OrderProduct = {
id: string;
orderId: string;
product: Product;
status: string;
trackingId: string;
}
export type Product = {
id: string;
name: string;
description: string;
category: string;
createdAt: string;
variants: Variant[];
}
export type Order = {
id: string;
userId: string;
products: OrderProduct[];
createdAt: string;
}
export type User = {
id: string;
name: string;
address: string;
}
Ive seen this on aws for many to many relationships: aws many to many relationships
But this doesnt really explain how to do a one to many and then many to many query. Any advice and help with the query would be great!

DynamoDB only allows you to query by partition key (and ranged key), or to query by indexes.
If you have different tables, you cannot do a join query. You might need to create a global secondary index and then do a query on that.
So, for instance, if your Product had a secondary index over a field called "order_id", you coud do:
const documentClient = new AWS.DynamoDB.DocumentClient();
const orderId = 1234; // the real order id
const options = {
TableName: 'Product',
IndexName: 'OrderIdIndex',
KeyConditionExpression: 'order_id = :order_id',
ExpressionAttributeValues: {
':order_id': orderId
}
};
const response = await documentClient.query(options)
Keep in mind that this example is modifying your original structure.
You might need to add that new index and attribute
Edit
Keep in mind that there might be some delay for the index propagation. For example, if you insert a new Product, and you immediately want to search using the Index by order_id, DynamoDB might tell you that there is no product (because its propagating the data). If that small delay is not acceptable, you might prefer to first query the Order, and then query each product by Id (you could use batchGet if needed)

You do not do relationship queries in Dynamo. It is not a Relational Database, it is a document database.
This means most importantly, your normal way of storing data in multiple tables and usually by some whatever unique auto incrimented identifier in an SQL is a terrible way to do it in a dynamo
Instead, you need to store your data based on your access patterns - and this may feel very weird coming from SQL! You may even feel like you are duplicating data at times.
Since a Dynamo query requires you to know what the Partition Key is in order to query (you cannot do a search or a conditional on the PK) then the PK needs to be what you have to start your query.
so with your access pattern described, your PK must be the user. Then, a separate entry for each item in their cart would be the way to proceed - basically something like:
(EDIT: you can switch User for OrderID very easily too of course)
PK: User
SK: ITEM#123456123
PK: User
SK: ITEM#123491239
PK: User
SK: Item#113322
and maybe even a
PK: User
SK: META
with attribiutes like "total items" or "login time" or "sales offered" or whatever else needs to be tracked.
then if you query against the PK of USER, you get back a list of all their items. They remove an item, you remove the SK document associated with that item. They increase the amount, then you increase that items quantity attribute. ect.
This is in effect a One to Many relationship: One (the PK of User) and Many (SK's prefixed with ITEM#) - you can then do a query of PK=User, SK (beginsWith) ITEM# to retrieve all the items of a user.
But as you may be able to see, this can get very complex very fast if you are trying to do many different relationships - dynamo is not built for that. If you need to do anything deeper than a single relationship like this or need to be able to dynamically decide the relationships/queries at run time, then Dyanmo is not the solution, SQL is.

Related

Proper Sequelize flow to avoid duplicate rows?

I am using Sequelize in my node js server. I am ending up with validation errors because my code tries to write the record twice instead of creating it once and then updating it since it's already in DB (Postgresql).
This is the flow I use when the request runs:
const latitude = req.body.latitude;
var metrics = await models.user_car_metrics.findOne({ where: { user_id: userId, car_id: carId } })
if (metrics) {
metrics.latitude = latitude;
.....
} else {
metrics = models.user_car_metrics.build({
user_id: userId,
car_id: carId,
latitude: latitude
....
});
}
var savedMetrics = await metrics();
return res.status(201).json(savedMetrics);
At times, if the client calls the endpoint very fast twice or more the endpoint above tries to save two new rows in user_car_metrics, with the same user_id and car_id, both FK on tables user and car.
I have a constraint:
ALTER TABLE user_car_metrics DROP CONSTRAINT IF EXISTS user_id_car_id_unique, ADD CONSTRAINT user_id_car_id_unique UNIQUE (car_id, user_id);
Point is, there can only be one entry for a given user_id and car_id pair.
Because of that, I started seeing validation issues and after looking into it and adding logs I realize the code above adds duplicates in the table (without the constraint). If the constraint is there, I get validation errors when the code above tries to insert the duplicate record.
Question is, how do I avoid this problem? How do I structure the code so that it won't try to create duplicate records. Is there a way to serialize this?
If you have a unique constraint then you can use upsert to either insert or update the record depending on whether you have a record with the same primary key value or column values that are in the unique constraint.
await models.user_car_metrics.upsert({
user_id: userId,
car_id: carId,
latitude: latitude
....
})
See upsert
PostgreSQL - Implemented with ON CONFLICT DO UPDATE. If update data contains PK field, then PK is selected as the default conflict key. Otherwise, first unique constraint/index will be selected, which can satisfy conflict key requirements.

One to many relation in Dynamodb Node js (Dynamoose)

I am using Dynamodb with nodejs for my reservation system. And Dynamoose as ORM. I have two tables i.e Table and Reservation. To create relation between them, I have added tableId attribute in Reservation which is of type Model type (of type Table type), as mentioned in the dyanmoose docs. Using the document.populate I am able to get the Table data through the tableId attribute from Reservation table. But how can I retrieve all Reservation for a Table? (Reservation and Table has one to many relation)?
These are my Model:
Table Model:
const tableSchema = new Schema ({
tableId: {
type: String,
required: true,
unique: true,
hashKey: true
},
name: {
type: String,
default: null
},
});
*Reservation Model:*
const reservationSchema = new Schema ({
id: {
type: Number,
required: true,
unique: true,
hashKey: true
},
tableId: table, \\as per doc attribute of Table (Model) type
date: {
type: String
}
});
This is how I retrieve table data from reservation model
reservationModel.scan().exec()
.then(posts => {
return posts.populate({
path: 'tableId',
model: 'Space'
});
})
.then(populatedPosts => {
console.log('pp',populatedPosts);
return {
allData: {
message: "Executedddd succesfully",
data: populatedPosts
}
}
})
Anyone please help to retrieve all Reservation data from Table??
As of v2.8.2, Dynamoose does not support this. Dynamoose is focused on one directional simple relationships. This is partly due to the fact that we discourage use of model.populate. It is important to note that model.populate does another completely separate request to DynamoDB. This increases the latency and decreases the performance of your application.
DynamoDB truly requires a shift in how you think about modeling your data compared to SQL. I recommend watching AWS re:Invent 2019: Data modeling with Amazon DynamoDB (CMY304) for a great explanation of how you can model your data in DynamoDB in a highly efficient manner.
At some point Dynamoose might add support for this, but it's really hard to say if we will.
If you truly want to do this, I'd recommend adding a global index to your tableId property in your reservation schema. Then you can run something like the following:
async function code(id) {
const reservation = await reservationModel.get(id);
const tables = await tableModel.query("tableId").eq(id).exec(); // This will be an array of `table` entries where `"tableId"=id`. Remember, it is required you add an index for this to work.
}
Remember, this will cause multiple calls to DynamoDB and isn't as efficient. I'd highly recommend watching the video I linked above to get more information about how to model your data in an more efficient manner.
Finally, I'd like to point out that your unique: true code does nothing. As seen in the Dynamoose Attribute Settings Documentation, unique is not a valid setting. In your case since you don't have a rangeKey, it's not possible for two items to have the same hashKey, so technically it's already a unique property based on that. However it is important to note that you can overwrite existing items when creating an item. You can set overwrite to false for document.save or Model.create to prevent that behavior and throw an error instead of overwriting your document.

DynamoDB: Query to find an item in an array of strings

It's possible that I'm not quite understanding how hash/primary keys work in DynamoDB, but I'm trying to create a model (using Serverless + Dynogels/NodeJS) for a messaging service.
The model looks like this:
const ConversationORM = dynogels.define('Conversation', {
hashKey: 'id',
timestamps: true,
tableName: config.CONVERSATION_TABLE,
schema: {
id: Joi.string(),
users: Joi.array(), // e.g. ['foo', 'bar', 'moo']
messages: Joi.array()
}
})
As you can see, users is an array, which lists the userIds of the conversation's participants.
I need to create a service which finds all conversations that a user is participating in. In MongoDB (which I'm far more familiar with), I'd do something like:
Conversation.find({users: {"$in": ['foo']} }).then(....
Is there something equivalent I can do in DynamoDB? This is an API call that will happen quite often so I'm hoping to make it as efficient as possible.
This answer takes into account a comment on Hunter Frazier's answer saying you don't want to use a Scan.
When using a Query you need specify a single partition key in the operation. In your schema this would mean partitioning on the userid attribute, which is a set. Partition keys in DynamoDB must be a top-level scalar attribute. As userid is not scalar (its a set), you cannot use this attribute as an index, and therefore you cannot do a Query for the conversations a user is part of.
If you need to do this Query, I would suggest revisiting your schema. Specifically I would suggest implementing the Adjacency list pattern which works well in databases containing many-to-many relationships.
You can see some additional notes on the article above I have written on this answer DynamoDB M-M Adjacency List Design Pattern
In your case you would have:
Primary Key: ConversationID
Sort Key: UserID
GSI Primary Key: UserID
You can then use the GSI Primary key in a Query to return all conversations the user is part of.
I'm not familiar with Dynogels or Serverless but if it uses the regular API this might work:
var params = {
ExpressionAttributeNames: {
"U": "users"
},
ExpressionAttributeValues: {
":a": {
S: "John Doe"
}
},
FilterExpression: "Author = :a",
ProjectionExpression: "#U",
TableName: "Conversations"
};
dynamodb.scan(params, function (err, data) {
if (err) console.log(err, err.stack);
else console.log(data);
});

Mongoose: Bulk upsert but only update records if they meet certain criteria

I am designing an item inventory system for a website that I am building.
The user's inventory is loaded from a Web API. This information is then processed so that it is more suited to my web app. I am trying to combine all the item records into one MongoDB collection - so other user inventories will be cached in the same place. What I have to deal with is deleting old item records if they are missing from the user's inventory (i.e. they sold it to someone) and also upserting the new items. Please note I have looked through several Stack Overflow questions about bulk upserts but I was unable to find anything about conditional updates.
Each item has two unique identifiers (classId and instanceId) that allow me to look them up (I have to use both IDs to match it) which remain constant. Some information about the item, such as its name, can change and therefore I want to be able to update those records when I fetch new inventory information. I also want new items that my site hasn't seen before to be added to my database.
Once the data returned from the Web API has been processed, it is left in a large array of objects. This means I am able to use bulk writing, however, I am unaware of how to upsert with conditions with multiple records.
Here is part of my item schema:
const ItemSchema = new mongoose.Schema({
ownerId: {
type: String,
required: true
},
classId: {
type: String,
required: true
},
instanceId: {
type: String,
required: true
},
name: {
type: String,
required: true
}
// rest of item attributes...
});
User inventories typically contain 600 or more items, with a max count of 2500.
What is the most efficient way of upserting this much data? Thank you
Update:
I have had trouble implementing the solution to the bulk insert problem. I made a few assumptions and I don't know if they were right. I interpreted _ as lodash, response.body as the JSON returned by the API and myListOfItems also as that same array of items.
import Item from "../models/item.model";
import _ from 'lodash';
async function storeInventory(items) {
let bulkUpdate = Item.collection.initializeUnorderedBulkOp();
_.forEach(items, (data) => {
if (data !== null) {
let newItem = new Item(data);
bulkUpdate.find({
classId: newItem.classId,
instanceId: newItem.instanceId
}).upsert().updateOne(newItem);
items.push(newItem);
}
});
await bulkUpdate.execute();
}
Whenever I run this code, it throws an error that complains about an _id field being changed, when the schema objects I created don't specify anything to do with schemas, and the few nested schema objects don't make a difference to the outcome when I change them to just objects.
I understand that if no _id is sent to MongoDB it auto generates one, but if it is updating a record it wouldn't do that anyway. I also tried setting _id to null on each item but to no avail.
Have I misunderstood anything about the accepted answer? Or is my problem elsewhere in my code?
This is how I do it :
let bulkUpdate = MyModel.collection.initializeUnorderedBulkOp();
//myItems is your array of items
_.forEach(myItems, (item) => {
if (item !== null) {
let newItem = new MyModel(item);
bulkUpdate.find({ yyy: newItem.yyy }).upsert().updateOne(newItem);
}
});
await bulkUpdate.execute();
I think the code is pretty readable and understandable. You can adjust it to make it work with your case :)

Mongoose: How to populate 2 level deep population without populating fields of first level? in mongodb

Here is my Mongoose Schema:
var SchemaA = new Schema({
field1: String,
.......
fieldB : { type: Schema.Types.ObjectId, ref: 'SchemaB' }
});
var SchemaB = new Schema({
field1: String,
.......
fieldC : { type: Schema.Types.ObjectId, ref: 'SchemaC' }
});
var SchemaC = new Schema({
field1: String,
.......
.......
.......
});
While i access schemaA using find query, i want to have fields/property
of SchemaA along with SchemaB and SchemaC in the same way as we apply join operation in SQL database.
This is my approach:
SchemaA.find({})
.populate('fieldB')
.exec(function (err, result){
SchemaB.populate(result.fieldC,{path:'fieldB'},function(err, result){
.............................
});
});
The above code is working perfectly, but the problem is:
I want to have information/properties/fields of SchemaC through SchemaA, and i don't want to populate fields/properties of SchemaB.
The reason for not wanting to get the properties of SchemaB is, extra population will slows the query unnecessary.
Long story short:
I want to populate SchemaC through SchemaA without populating SchemaB.
Can you please suggest any way/approach?
As an avid mongodb fan, I suggest you use a relational database for highly relational data - that's what it's built for. You are losing all the benefits of mongodb when you have to perform 3+ queries to get a single object.
Buuuuuut, I know that comment will fall on deaf ears. Your best bet is to be as conscious as you can about performance. Your first step is to limit the fields to the minimum required. This is just good practice even with basic queries and any database engine - only get the fields you need (eg. SELECT * FROM === bad... just stop doing it!). You can also try doing lean queries to help save a lot of post-processing work mongoose does with the data. I didn't test this, but it should work...
SchemaA.find({}, 'field1 fieldB', { lean: true })
.populate({
name: 'fieldB',
select: 'fieldC',
options: { lean: true }
}).exec(function (err, result) {
// not sure how you are populating "result" in your example, as it should be an array,
// but you said your code works... so I'll let you figure out what goes here.
});
Also, a very "mongo" way of doing what you want is to save a reference in SchemaC back to SchemaA. When I say "mongo" way of doing it, you have to break away from your years of thinking about relational data queries. Do whatever it takes to perform fewer queries on the database, even if it requires two-way references and/or data duplication.
For example, if I had a Book schema and Author schema, I would likely save the authors first and last name in the Books collection, along with an _id reference to the full profile in the Authors collection. That way I can load my Books in a single query, still display the author's name, and then generate a hyperlink to the author's profile: /author/{_id}. This is known as "data denormalization", and it has been known to give people heartburn. I try and use it on data that doesn't change very often - like people's names. In the occasion that a name does change, it's trivial to write a function to update all the names in multiple places.
SchemaA.find({})
.populate({
path: "fieldB",
populate:{path:"fieldC"}
}).exec(function (err, result) {
//this is how you can get all key value pair of SchemaA, SchemaB and SchemaC
//example: result.fieldB.fieldC._id(key of SchemaC)
});
why not add a ref to SchemaC on SchemaA? there will be no way to bridge to SchemaC from SchemaA if there is no SchemaB the way you currently have it unless you populate SchemaB with no other data than a ref to SchemaC
As explained in the docs under Field Selection, you can restrict what fields are returned.
.populate('fieldB') becomes populate('fieldB', 'fieldC -_id'). The -_id is required to omit the _id field just like when using select().
I think this is not possible.Because,when a document in A referring a document in B and that document is referring another document in C, how can document in A know which document to refer from C without any help from B.

Resources