One to many relation in Dynamodb Node js (Dynamoose)

One to many relation in Dynamodb Node js (Dynamoose) - node.js

I am using Dynamodb with nodejs for my reservation system. And Dynamoose as ORM. I have two tables i.e Table and Reservation. To create relation between them, I have added tableId attribute in Reservation which is of type Model type (of type Table type), as mentioned in the dyanmoose docs. Using the document.populate I am able to get the Table data through the tableId attribute from Reservation table. But how can I retrieve all Reservation for a Table? (Reservation and Table has one to many relation)?
These are my Model:
Table Model:
const tableSchema = new Schema ({
tableId: {
type: String,
required: true,
unique: true,
hashKey: true
},
name: {
type: String,
default: null
},
});
*Reservation Model:*
const reservationSchema = new Schema ({
id: {
type: Number,
required: true,
unique: true,
hashKey: true
},
tableId: table, \\as per doc attribute of Table (Model) type
date: {
type: String
}
});
This is how I retrieve table data from reservation model
reservationModel.scan().exec()
.then(posts => {
return posts.populate({
path: 'tableId',
model: 'Space'
});
})
.then(populatedPosts => {
console.log('pp',populatedPosts);
return {
allData: {
message: "Executedddd succesfully",
data: populatedPosts
}
}
})
Anyone please help to retrieve all Reservation data from Table??

As of v2.8.2, Dynamoose does not support this. Dynamoose is focused on one directional simple relationships. This is partly due to the fact that we discourage use of model.populate. It is important to note that model.populate does another completely separate request to DynamoDB. This increases the latency and decreases the performance of your application.
DynamoDB truly requires a shift in how you think about modeling your data compared to SQL. I recommend watching AWS re:Invent 2019: Data modeling with Amazon DynamoDB (CMY304) for a great explanation of how you can model your data in DynamoDB in a highly efficient manner.
At some point Dynamoose might add support for this, but it's really hard to say if we will.
If you truly want to do this, I'd recommend adding a global index to your tableId property in your reservation schema. Then you can run something like the following:
async function code(id) {
const reservation = await reservationModel.get(id);
const tables = await tableModel.query("tableId").eq(id).exec(); // This will be an array of `table` entries where `"tableId"=id`. Remember, it is required you add an index for this to work.
}
Remember, this will cause multiple calls to DynamoDB and isn't as efficient. I'd highly recommend watching the video I linked above to get more information about how to model your data in an more efficient manner.
Finally, I'd like to point out that your unique: true code does nothing. As seen in the Dynamoose Attribute Settings Documentation, unique is not a valid setting. In your case since you don't have a rangeKey, it's not possible for two items to have the same hashKey, so technically it's already a unique property based on that. However it is important to note that you can overwrite existing items when creating an item. You can set overwrite to false for document.save or Model.create to prevent that behavior and throw an error instead of overwriting your document.

Related

What is the most efficient way to perform CRUD operations to millions of documents in MongoDB

I am new to MongoDB and currently doing a project where MongoDB is my primary Database Management System. I am using Mongoose as the Object Data Modeling. Suppose, I have two collections called products and features. And each product may have multiple features that is a one-to-many relationship.
// products schema
const products = mongoose.Schema({
id: Number,
name: String,
description: String,
category: String,
price: Number,
}, {
strict: false,
});
// features schema
const features = mongoose.Schema({
id: Number,
product_id: Number,
feature: String,
value: String,
}, {
strict: false,
});
I have imported documents/records for both of the collections from external .csv files and the number of records for both collections are more than 3 million. My client-side application requires data about a particular product with all the features in it like below:
{
productId: 3,
name: 'Denim Jeans',
description: '',
category: 'Cloth',
price: 40.00,
features: [
{
feature: 'Material',
value: 'Cotton',
},
{
feature: 'color',
value: 'blue',
},
....
]
}
Each product will not have more than 5-6 features. So, what I wanted to do is to embed the features document as a subdocument in the products document like the response above. So, I wrote a piece of code like this. It's not the exact same code as I deleted it from my code when it was not working but the logic is the same.
db.products.find({}, (err, product) => {
// product -> array of all documents from products collection
// for each product, I am trying to find the corresponding feature from
// the features collections and embed it to each product document
product.forEach(item => {
db.features.find({product_id: item.id}, (err, feature) => {
// feature -> array of all the features of a product
// embed the feature array to each individual product item
item.features = feature;
})
})
})
Now, the issue is when I run the above piece of code, I got errors like OutOfMemory as it is trying to read from millions of records and my memory is not capable of holding all of this. My question is what is the best way to retrieve all the products and for each individual product write a query to get its corresponding features and embed it inside each product document.
I have a couple of ideas. Kindly correct me if I am wrong. Instead of storing all the products and their features in memory, I want to store them on the disk and update the individual product using the Bulk API of MongoDB. But in that case, how to achieve this and I am concerned about the performance. What is the best practice to follow in this case? Or, should I keep them in a separate collection and from the application server make two queries and package the response there? Or, should I use any kind of aggregation pipeline on the database level? Thanks in advance.

MongoDB best practice for sorting by non-indexed fields

I have an app that allows users to use their own custom data, so I can't know what the data is. However, I do want to allow them to sort the data.
This can be a significant amount of data, and mongodb ends up giving me memory errors (32MB limit)
What would be the best way to approach this? How can I allow the user to sort a large amount of data by an unknown field?

MongoDB allows you to design the schema in such a way that it can store Objects and Object relation in a schema, So you can allow the user to store any kind of information. As #kevinadi said, there is a limit of 32MB. As of sorting is concerned it can be done on your serverside.
This is an example I tried when storing objects in MongoDB and Mongoose ORM
var mongoose = require("mongoose");
var userSchema = new mongoose.Schema({
email: {
type: String,
unique: true,
required: true,
lowercase: true,
trim: true,
match: [/^\w+([\.-]?\w+)*#\w+([\.-]?\w+)*(\.\w{2,3})+$/, "Please fill a valid email address"]
},
custInfo:{
type:Object,
required: true
}
isConfirmed: {
type: Boolean,
required: true,
default: false
},
confirmedOn:{
type: Date,
required: true,
default: Date.now()
}
});
module.exports = mongoose.model("user",userSchema);

Since you have tagged this question Meteor I assume, you have the default Meteor environment, where you can use the client-side lightweight Mongo collections.
This gives you the opportunity to publish (Publication) / return (Method) your data mostly unsorted and let the client's handle this task.
Think this: just 100 clients asking for a publication that updates on every sort action (because the subscription parameters change, so the publication changes, too).
This causes already your server to consume a high amount of RAM to keep the observers (OPLOG etc.) running for 100 publications, each querying huge amounts of documents.
Possible performant solutions are described below. Please keep in mind, that they are not bound to any front-end and rather a conceptual description. You will have to include reactivity etc., based on your frontend environment.
Option A - Publish unsorted, let clients sort
server
Meteor.publish('hugeData', function () {
return MyCollection.find({ ...})
})
client
const handle = Meteor.subscribe('hugeData')
if (handle.ready()) {
const sortedData = MyCollection.find({ ... }, {sort: { someField: -1 } })
}
A big plus is here, that you can inform the clients about the completeness status, if using cursor.observeChanges.
Note, that if you want to scan backwards (return docs, with the newest) you can use the hint option on find:
Meteor.publish('hugeData', function () {
return MyCollection.find({ ...}, { hint: { $natural : -1 })
})
This is way more performant than { sort: { fieldName: -1} }.
Option B - return unsorted from Method, let clients sort
Now there may still be a problem with solution A, since it still has a lot of RAM to consume if there are lots of subscribers. An alternative (especially if live-data changes are not so relevant) is to use the Meteor Methods:
server
Meteor.method('hugeData', function () {
return MyCollection.find({ ...}).fetch()
})
Note that this requires to fetch the docs, otherwise and unhandledPromiseRejection is thrown.
client
This requires a LocalCollection on the client, that is not in sync with your server side collection, or you will get problems with document syncing:
const HugeData = new LocalCollection(null) // note the null as collection name!
const insertUpdate = document => {
if (LocalCollection.findOne(document._id)) {
delete document._id
return LocalCollection.update(document._id, document)
} else {
return LocalCollection.insert(document)
}
}
Meteor.call('hudeData', (err, data) => {
data.forEach(insertUpdate)
})
Then you can use the LocalCollection on the client for any projection of the received data.
All in all it is a good tradeoff to move the load to the clients. As long as you keep them informed when projections take a while it should be okay.

My current thought is an additional indexed collection holding 1.entity id, 2 fields name 3.field value.
Have that collection indexed, and then pull ordered entity ids from there, later on loading the full relevant documets by ID.

Mongoose with CosmosDB: Getting error `Shared throughput collection should have a partition key`

I have a node-express application that currently uses Mongoose to connect to MongoDB, and am attempting to migrate it to Azure Cosmos DB.
When I simply allow Mongoose to create the database, the application works fine, however the database is created with individual collection RU pricing.
If I create a new database with Shared throughput enabled and attempt to use that, I get the error Shared throughput collection should have a partition key
I have tried updating the collection schema to include a shard key like this:
const mongoose = require('mongoose');
module.exports = function() {
const types = mongoose.Schema.Types;
const messages = new mongoose.Schema({
order: { type: types.ObjectId, required: true, ref: 'orders' },
createdAt: { type: Date, default: Date.now },
sender: { type: types.ObjectId, required: true, ref: 'users' },
recipient: { type: types.ObjectId, ref: 'users' },
text: { type: String, required: true },
seen: { type: Boolean, default: false },
}, { shardKey: { order: 1 } });
return mongoose.model('messages', messages);
};
However this does not work.
Any ideas on how to create/use a partition key? Alternatively the database is small, so if its possible to remove the requirement for the partition key that would also be fine.

Now I don't have an exact answer for this question so no need to accept this unless you feel it's correct.
The best solution I've found so far is that this is due to "Provision Throughput" being checked when the database is created in the Azure console. If you delete and recreate the database with this box not checked (it's right below the input for the database name) then you should no longer encounter this error.

You specify it when you're creating a collection in the DB that you've opted in for Shared Throughput.
Collection vs Database
If you're using individual collection pricing, you can set the throughput on the individual collections. If you're using the lesser pricing option, you'd get the shared throughput (at the database level) which is less granular but less expensive.
Details here: https://azure.microsoft.com/en-us/blog/sharing-provisioned-throughput-across-multiple-containers-in-azure-cosmosdb/
Partition keys
If you're using the Shared throughput, you'll need a partition Id for the collection that you're adding.
So - create a DB with Shared throughput (check the checkbox below)
After that when you're attempting to add a new document you should be able to create a partition key.

I have yet another not-quite-complete answer for you. It seems like, yes, it is required to use partitioned collections if you are using the shared/db-level throughput model in Cosmos. But it turns out it is possible to create a CosmosDb collection with a partition key using only the MongoDb wire protocol (meaning no dependency on an Azure SDK, and no need to pre-create every collection via the Azure Portal).
The only remaining catch is, I don't think it's possible to run this command via Mongoose, it will probably have to be run directly via the MongoDb Node.js Driver, but at least it can still be run from code.
From a MongoDb Shell:
db.runCommand({shardCollection: "myDbName.nameOfCollectionToCreate",
key: {nameOfDesiredPartitionKey: "hashed"}})
This command is meant to set the sharding key for a collection and start sharding the collection, but in CosmosDb it works to create the collection with the desired partitionKey already set.

I have an even little more complete answer. You actually can do it with mongoose. I usually do it like this in an Azure Function:
mongoose.connect(process.env.COSMOSDB_CONNSTR, {
useUnifiedTopology: true,
useNewUrlParser: true,
auth: {
user: process.env.COSMODDB_USER,
password: process.env.COSMOSDB_PASSWORD,
},
})
.then(() => {
mongoose.connection.db.admin().command({
shardCollection: "mydb.mycollection",
key: { _id: "hashed" }
})
console.log('Connection to CosmosDB successful 🚀')
})
.catch((err) => console.error(err))

Mongoose: Bulk upsert but only update records if they meet certain criteria

I am designing an item inventory system for a website that I am building.
The user's inventory is loaded from a Web API. This information is then processed so that it is more suited to my web app. I am trying to combine all the item records into one MongoDB collection - so other user inventories will be cached in the same place. What I have to deal with is deleting old item records if they are missing from the user's inventory (i.e. they sold it to someone) and also upserting the new items. Please note I have looked through several Stack Overflow questions about bulk upserts but I was unable to find anything about conditional updates.
Each item has two unique identifiers (classId and instanceId) that allow me to look them up (I have to use both IDs to match it) which remain constant. Some information about the item, such as its name, can change and therefore I want to be able to update those records when I fetch new inventory information. I also want new items that my site hasn't seen before to be added to my database.
Once the data returned from the Web API has been processed, it is left in a large array of objects. This means I am able to use bulk writing, however, I am unaware of how to upsert with conditions with multiple records.
Here is part of my item schema:
const ItemSchema = new mongoose.Schema({
ownerId: {
type: String,
required: true
},
classId: {
type: String,
required: true
},
instanceId: {
type: String,
required: true
},
name: {
type: String,
required: true
}
// rest of item attributes...
});
User inventories typically contain 600 or more items, with a max count of 2500.
What is the most efficient way of upserting this much data? Thank you
Update:
I have had trouble implementing the solution to the bulk insert problem. I made a few assumptions and I don't know if they were right. I interpreted _ as lodash, response.body as the JSON returned by the API and myListOfItems also as that same array of items.
import Item from "../models/item.model";
import _ from 'lodash';
async function storeInventory(items) {
let bulkUpdate = Item.collection.initializeUnorderedBulkOp();
_.forEach(items, (data) => {
if (data !== null) {
let newItem = new Item(data);
bulkUpdate.find({
classId: newItem.classId,
instanceId: newItem.instanceId
}).upsert().updateOne(newItem);
items.push(newItem);
}
});
await bulkUpdate.execute();
}
Whenever I run this code, it throws an error that complains about an _id field being changed, when the schema objects I created don't specify anything to do with schemas, and the few nested schema objects don't make a difference to the outcome when I change them to just objects.
I understand that if no _id is sent to MongoDB it auto generates one, but if it is updating a record it wouldn't do that anyway. I also tried setting _id to null on each item but to no avail.
Have I misunderstood anything about the accepted answer? Or is my problem elsewhere in my code?

This is how I do it :
let bulkUpdate = MyModel.collection.initializeUnorderedBulkOp();
//myItems is your array of items
_.forEach(myItems, (item) => {
if (item !== null) {
let newItem = new MyModel(item);
bulkUpdate.find({ yyy: newItem.yyy }).upsert().updateOne(newItem);
}
});
await bulkUpdate.execute();
I think the code is pretty readable and understandable. You can adjust it to make it work with your case :)

Mongoose: How to populate 2 level deep population without populating fields of first level? in mongodb

Here is my Mongoose Schema:
var SchemaA = new Schema({
field1: String,
.......
fieldB : { type: Schema.Types.ObjectId, ref: 'SchemaB' }
});
var SchemaB = new Schema({
field1: String,
.......
fieldC : { type: Schema.Types.ObjectId, ref: 'SchemaC' }
});
var SchemaC = new Schema({
field1: String,
.......
.......
.......
});
While i access schemaA using find query, i want to have fields/property
of SchemaA along with SchemaB and SchemaC in the same way as we apply join operation in SQL database.
This is my approach:
SchemaA.find({})
.populate('fieldB')
.exec(function (err, result){
SchemaB.populate(result.fieldC,{path:'fieldB'},function(err, result){
.............................
});
});
The above code is working perfectly, but the problem is:
I want to have information/properties/fields of SchemaC through SchemaA, and i don't want to populate fields/properties of SchemaB.
The reason for not wanting to get the properties of SchemaB is, extra population will slows the query unnecessary.
Long story short:
I want to populate SchemaC through SchemaA without populating SchemaB.
Can you please suggest any way/approach?

As an avid mongodb fan, I suggest you use a relational database for highly relational data - that's what it's built for. You are losing all the benefits of mongodb when you have to perform 3+ queries to get a single object.
Buuuuuut, I know that comment will fall on deaf ears. Your best bet is to be as conscious as you can about performance. Your first step is to limit the fields to the minimum required. This is just good practice even with basic queries and any database engine - only get the fields you need (eg. SELECT * FROM === bad... just stop doing it!). You can also try doing lean queries to help save a lot of post-processing work mongoose does with the data. I didn't test this, but it should work...
SchemaA.find({}, 'field1 fieldB', { lean: true })
.populate({
name: 'fieldB',
select: 'fieldC',
options: { lean: true }
}).exec(function (err, result) {
// not sure how you are populating "result" in your example, as it should be an array,
// but you said your code works... so I'll let you figure out what goes here.
});
Also, a very "mongo" way of doing what you want is to save a reference in SchemaC back to SchemaA. When I say "mongo" way of doing it, you have to break away from your years of thinking about relational data queries. Do whatever it takes to perform fewer queries on the database, even if it requires two-way references and/or data duplication.
For example, if I had a Book schema and Author schema, I would likely save the authors first and last name in the Books collection, along with an _id reference to the full profile in the Authors collection. That way I can load my Books in a single query, still display the author's name, and then generate a hyperlink to the author's profile: /author/{_id}. This is known as "data denormalization", and it has been known to give people heartburn. I try and use it on data that doesn't change very often - like people's names. In the occasion that a name does change, it's trivial to write a function to update all the names in multiple places.

SchemaA.find({})
.populate({
path: "fieldB",
populate:{path:"fieldC"}
}).exec(function (err, result) {
//this is how you can get all key value pair of SchemaA, SchemaB and SchemaC
//example: result.fieldB.fieldC._id(key of SchemaC)
});

why not add a ref to SchemaC on SchemaA? there will be no way to bridge to SchemaC from SchemaA if there is no SchemaB the way you currently have it unless you populate SchemaB with no other data than a ref to SchemaC

As explained in the docs under Field Selection, you can restrict what fields are returned.
.populate('fieldB') becomes populate('fieldB', 'fieldC -_id'). The -_id is required to omit the _id field just like when using select().

I think this is not possible.Because,when a document in A referring a document in B and that document is referring another document in C, how can document in A know which document to refer from C without any help from B.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

One to many relation in Dynamodb Node js (Dynamoose) - node.js

Related

What is the most efficient way to perform CRUD operations to millions of documents in MongoDB

MongoDB best practice for sorting by non-indexed fields

Mongoose with CosmosDB: Getting error `Shared throughput collection should have a partition key`

Mongoose: Bulk upsert but only update records if they meet certain criteria

Mongoose: How to populate 2 level deep population without populating fields of first level? in mongodb

Categories

Resources