I'm implementing query filters in my nodejs application.
In modeling, I have this schema:
"clause": [
{
"description": "test1",
"number": 200
},
{
"description": "test2",
"number": 201
},
{
"description": "test3",
"number": 202
},
],
Basically I need to inform an array of objects to the dynamo and I need to know which record contains this information in which I searched
I've had success filtering just one object within the array, like this:
const params: QueryCommandInput = {
TableName: config.CONTRACT_DB,
KeyConditionExpression: 'pk = :i',
FilterExpression: 'contains(#clause, :clause)',
ExpressionAttributeNames: {
'#clause': 'clause',
},
ExpressionAttributeValues: {
':i': `user#${user.id}`,
':clause': {
number: 200,
description: 'test1',
},
},
};
But it is necessary for me to know the values of number and description, I failed to get the result by informing only one of the properties.
And I have no idea how I would implement a solution where the user enters multiple clauses
Has anyone had success in querying objects inside arrays in dynamodb?, I didn't find anything relevant here.
You cannot search by description with the schema you designed. To search for a value, that value must be in the partition key (pk) or sort key (sk). You need to change the schema. I suggest splitting your array so that each item in the array is an item in the database. Then you can set the pk to a pointer to the parent object, and set sk to a number if you want to preserve the order of items in the array.
Next, create a GSI with the description as the pk, and the sk is your choice depending on your needs. Then you can search for an exact match with description by querying the GSI. If you want a partial search (begins_with), you can set the pk to a fixed value such as "clauseDescription", and the sk to description.
Summary:
I am building my first large scale full stack application(MERN stack) that is trying to mimic a large scale clothing store. Each article of clothing has many 'tags' that represent its features, top/bottom/accessory/shoes/ect, and subcategories, for example on top there is shirt/outerwear/sweatshirt/etc, and sub-sub-categories within it, for example on shirt there is blouse/t-shirt/etc. Each article has tags for primary colors, hemline, pockets, technical features, the list goes on.
Main question:
how should I best organize the data in mongodb with mongoose schemas in order for it to be quickly searchable when I plan on having 50,000 or more articles? And genuinely curious, how do large clothing retailers typically design databases to be easily searchable by customers when items have so many identifying features?
Things I have tried or thought of:
On the mongoDB website there is a recommendation to use a tree structure with child references. here is the link: https://docs.mongodb.com/manual/tutorial/model-tree-structures-with-child-references/ I like this idea but I read here: https://developer.mongodb.com/article/mongodb-schema-design-best-practices/ that when storing over a few thousand pieces of data, using object ID references is no longer sufficient, and could create issues because of datalimits.
Further, each clothing item would fall into many different parts of the tree. For example it could be a blouse so it would be in the blouse "leaf" of the tree, and then if its blue, it would be in the blue "leaf" of the tree, and if it is sustainably sourced, it would fall into that "leaf" of the tree as well. Considering this, a tree like data structure seems not the right way to go. It would be storing the same ObjectID in many different leaves.
My other idea was to store the article information (description, price, and picture) seperate from the tagging/hierarchical information. Then each tagging object would have a ObjectID reference to the item. This way I could take advantage of the propogate method of mongoose if I wanted to collect that information.
I also created part of the large tree structure as a proof of concept for a design idea I had, and this is only for the front end right now, but this also creates bad searches cause they would look like taxonomy[0].options[0].options[0].options[0].title to get to 'blouse'. Which from my classes doesnt seem like a good way to make the code readable. This is only a snippet of a long long branching object. I was going to try to make this a mongoose schema. But its a lot of work and I wanna make sure that I do it well.
const taxonomy = [
{
title: 'Category',
selected: false,
options: [
{
title: 'top',
selected: false,
options: [
{
title: 'Shirt',
selected: false,
options: [
{
title: 'Blouse',
selected: false,
},
{
title: 'polo',
selected: false,
},
{
title: 'button down',
selected: false,
},
],
},
{
title: 'T-Shirt',
selected: false,
},
{
title: 'Sweater',
selected: false,
},
{
title: 'Sweatshirt and hoodie',
selected: false,
},
],
},
Moving forward:
I am not looking for a perfect answer, but I am sure that someone has tackled this issue before (all big businesses that sell lots of categorized products have) If someone could just point me in the right direction, for example, give me some terms to google, some articles to read, or some videos to watch, that would be great.
thank you for any direction you can provide.
MongoDB is a document based database. Each record in a collection is a document, and every document should be self-contained (it should contain all information that you need inside it).
The best practice would be to create one collection for each logical whole that you can think of. This is the best practice when you have documents with a lot of data, because it is scalable.
For example, you should create Collections for: Products, Subproducts, Categories, Items, Providers, Discounts...
Now, when you creating Schemas, instead of creating nested structure, you can just store a reference of one collection document as a property of another collection document.
NOTE: The maximum document size is 16 megabytes.
BAD PRACTICE
Let us first see what would be the bad practice. Consider this structure:
Product = {
"name": "Product_name",
"sub_products": [{
"sub_product_name": "Subpoduct_name_1",
"sub_product_description": "Description",
"items": [{
"item_name": "item_name_1",
"item_desciption": "Description",
"discounts": [{
"discount_name": "Discount_1",
"percentage": 25
}]
},
{
"item_name": "item_name_2",
"item_desciption": "Description",
"discounts": [{
"discount_name": "Discount_1",
"percentage": 25
},
{
"discount_name": "Discount_2",
"percentage": 50
}]
},
]
},
...
]
}
Here product document has sub_products property which is an array of sub_products. Each sub_product has items, and each item has discounts. As you can see, because of this nested structure, the maximum document size would be quickly exceeded.
GOOD PRACTICE
Consider this structure:
Product = {
"name": "Product_name",
"sub_products": [
'sub_product_1_id',
'sub_product_2_id',
'sub_product_3_id',
'sub_product_4_id',
'sub_product_5_id',
...
]
}
Subproduct = {
"id": "sub_product_1_id",
"sub_product_name": "Subroduct_name",
"sub_product_description": "Description",
"items": [
'item_1_id',
'item_2_id',
'item_3_id',
'item_4_id',
'item_5_id',
...
]
}
Item = {
"id": "item_1_id",
"item_name": "item_name_1",
"item_desciption": "Description",
"items": [
'discount_1_id',
'discount_2_id',
'discount_3_id',
'discount_4_id',
'discount_5_id',
...
]
}
Discount = {
"id": "discount_1_id",
"discount_name": "Discount_1",
"percentage": 25
}
Now, you have collection for each logical whole and you are just storing a reference of one collection document as a property of another collection document.
Now you can use one of the best features of the Mongoose that is called population. If you store a reference of one collection document as a property of another collection document, when performing querying of the database, Mongoose will replace references with the actual documents.
I am sending JSON from a NodeJS API to an Angular app to be displayed as part of an Angular component. The returned data is simple Mongo id and name string. The data is present in debugger and Postman like so:
{ "httpAllNames": [
{
"_id": "5d5c54315be61d26c0b2afb8",
"campaignTitle": "Make America Zombie Free Again"
},
{
"_id": "5d5c54735be61d26c0b2afba",
"campaignTitle": "Zmobie Free 2"
},
{
"_id": "5d5d3fb280dead0604fe6f8c",
"campaignTitle": "Universal Basic Income For All"
},
{
"_id": "5d5eeaee3278d24b10093988",
"campaignTitle": "Remove All Zombies from the US"
} ]}
I pass the data from my Campaign Service to the component without any trouble. The code in question for accessing it within the browseCampaigns.component is as follows:
browsingCampaignNames: httpAllNames[];
campaignCountDisplayed: number = 0;
onLoadCampaigns() {
this.browsCampServ.fetchCampaignsForBrowsing().subscribe(camps => {
this.browsingCampaignNames = camps;
});
this.campaignCountDisplayed = this.browsingCampaignNames.length;
}
What I'm expecting is to have an array of as many items as are within the httpAllNames object, however, Angular is treating that as a single array, with the desired array nested within.
I guess what I'm trying to do is 'unwrap' the outer layer so that my browsingCampaignNames property is able to access it.
I've tried adjusting the output of the API by removing the status code and unwrapping it from a generic httpResponse object. I've also tried this.browsingCampaignNames = camps[0]; and this.browsingCampaignNames = camps['httpAllNames']; as though to try to access the data by index, even though those methods are 'hacky.'
Thank you in advance.
I'm building a rest api that allows users to submit and retrieve data produced by surveys: questions are not mandatory and each "submit" could be different from each other. Each submit is a json with data and a "survey id":
{
id: abc123,
surveyid: 123,
array: [],
object: {}
...
}
I have to store this data and allow retrieving and querying.
First approach: going without schema and putting everything in a single collection: it works, but each json field is treated as a "String" and making queries on numeric values is problematic.
Second approach: get questions datatypes for each survey, make/save a mongoose schema on a json file and then keep updated this file.
Something like this, where entry "schema : {}" represent a mongoose schema used for inserting and querying/retrieving data.
[
{
"surveyid" : "123",
"schema" : {
"name": "string",
"username" : "string",
"value" : "number",
"start": "date",
"bool" : "boolean",
...
}
},
{ ... }
]
Hoping this is clear, I've some questions:
Right now I've a single collection for all "submits" and everything is treated as a string. Can I use a mongoose schema, without other modifications, in order to specify that some fields are numeric (or date or whatever)? Is it allowed or is it even a good idea?
Are there any disadvantage using an external json file? Mongoose schemas are loaded at run time when requested or does the service need to be restart when this file is updated?
How to store data with a "schema" that could change often ?
I hope it's clear!
Thank you!
Document Role =
{ "_id" = "12345",
Name = "Developer"
},
{ "_id" = "67890",
Name = "Manager"
}
Document Employee =
{ "_id" = "00000",
"Name"= "Jack",
"Roles"= [{_id:"12345"},{_id:"67890"}]
}
I want to select one Role and list all the users having the same role
How to do that?
I want to get some thing like.
{ "_id" = "12345",
Name = "Developer"
Employees = [{"_id":"00000"}]
}
Is it possible to use populate to achieve this?
Mongoose .populate() and other methods you might find are not "join magic" for MongoDB. What they in fact all do is execute "additional" query(ies) operations on the database and "merge" the results "under the hood" for your as opposed to you doing the work yourself.
So your best option as long as you can deal with it is to use "embedding" which keeps the "related" information in the document for which you are "pairing" it to, such as for "Roles":
{
"_id": "0000",
"name": "Developer",
"employees": [{ "_id": "12345", "name": "Jack" }]
}
Which is simple, but of course comes at it's own cost and dealing with the "embedded" entries and how you use it according to "updating" or "reading" as is appropriate. It's a single "read" operation, but "updates" may be more costly due to the need to update the embedded information in multiple places, and multiple documents.
If you can "live" with "referencing" and the cost it incurs then you can always do this:
var rolesSchema = Schema({
"name": String,
"emloyees": [{ "type": Schema.Types.ObjectId, "ref": "Employee" }]
});
var employeesSchema = Schema({
"name": String,
"roles": [{ "type": Schema.Types.ObjectId, "ref": "Role" }]
});
var Role = mongoose.model('Role',rolesSchema);
var Employee = mongoose.model('Employee',employeeSchema);
Role.find({ "_id": "12345"}).populate("employees").exec(function(err,docs) {
// populated "joined" results in here
})
What this does behind the scenes is effectively (basic JavaScript representation and "at best") :
var roles = db.role.find({ "_id": "12345" }).map(function(doc) {
doc.employees = doc.employees.map(function(employee) {
return db.employees.find({ "_id": { "$in": doc.employees } }).toArray();
})
})
Mongoose works on the concept of using the "schema" definition to "know" which collection to execute the "other query" on and then return the "joined" results to you. But it is not a single query but multiple hits to the database.
Other schemes might "keep" the referenced collection information in the document itself, as opposed to relying on the "model code" to get that information. But the same principle applies where you need to make another call to the database and perform some type of "merge" in the API provided.
So it all falls down to your choice. Either you "embed" the data and live with that cost, or you "reference" the data and live with the network "cost" that is associated with multiple database hits.
The key point here is "nothing is free", and not even the way that SQL RDBMS perform "joins" which also has a "cost" of it's own and is a lot of the reasoning why NoSQL solutions like MongoDB do it this way and "do not support joins" in a native fashion for the "cost" involved in distributed data systems.
The main lesson here is to "do what suits you and your application", and not just choose the "coolest thing right now", but basically expect what you get from choosing different storage solutions. They all have their own purposes. Horses for Courses as the saying goes.