Elasticsearch js with Node.js: How to return aggregated results from multiple indexes? - node.js

We have two indexes: posts and users. We'd like to make queries on these two indexes, search for a post in the index "posts" and then go to the index "users" to get the user info, to eventually return an aggregated result of both the user info and the post we found.
Let me clarify it a bit with an example:
posts:
[
{
post: "this is a post about stack overflow",
username: "james_bond",
user_id: "007"
},
{...}
]
users:
[
{
username: "james_bond",
user_id: "007",
bio: "My name's James. James Bond."
nb_posts: "7"
},
{...}
]
I want to search for all the posts which contain "stack overflow", and then display all the users who are talking about it and their info (from the "users" index), it could look something like this:
result: {
username: "james_bond",
user_id: "007",
post: "this is a post about stack overflow",
bio: "My name's James. James Bond"
}
I hope this is clear enough, I'm sorry if this question has already been answered but I honestly didn't find any answer anywhere.
So is it possible to do so with only ES js?

I dont beleive it is possible to do exactly what you are asking as it would be very costly to join across two indexes which are potentially sharded across different nodes (this is not a main use case for elasticsearch). But if you have control of the data within elastic search you could structure the data so that you can acheive a different type of joining.
You can either use:
nested query
Where documents may contain fields of type nested. These fields are used to index arrays of objects, where each object can be queried (with the nested query) as an independent document.
has_child and has_parent queries
A join field relationship can exist between documents within a single index. The has_child query returns parent documents whose child documents match the specified query, while the has_parent query returns child documents whose parent document matches the specified query.
Denormalisation
Alternativly you could store the user denormalised within the post document when you insert the document into the index. This becomes a balancing act between saving time from doing multiple reads every time a post is viwed (fully normalised) and the cost of updating all posts from user 007 everytime his detials change (denormalised). There is a tradeoff here, you dont need to denormalise everything and as you have it you have already denormalised the username from users to posts.
Here is a Question/Answer that gives more detials on the options.

Related

Can mongoose batch update based on an array of objects that matches the collection?

I am working on a project in Express/Node, and I am utilizing a MongoDB database that has a collection of Course documents that represent a course in my school system that changes in real-time. The Course documents in my database each look like this:
Course Document
{
courseID: Number,
restrictions: String,
status: String,
}
My program has to check for changes in the school's course system, and update any changes that it sees and updates my private MongoDB database with the changes. To accomplish this, I currently have a script that looks at all the courses in the school system, and records them in an array of objects, with each object corresponding to a course.
var allCourses =
[
{
courseID: 123456,
restrictions: "A and B",
status: "OPEN"
},
{
courseID: 678990,
restrictions: "A",
status: "FULL",
}
]
The goal now is to be able to go through my database, and skip the documents that are the same as the corresponding javascript object in the array, and update those that are not.
Obviously, I could just iterate through my array with forEach, and update every single course by filtering by 'courseID' and updating both fields one document at a time, but I can foresee that this would take a large amount of time.
I was wondering if there was a batch update function, similar to the insertMany operation, that can take my array of objects and update my database documents that correspond to an object within the array?
These are helpful links
Trying to do a bulk upsert with Mongoose. What's the cleanest way to do this?
https://docs.mongodb.com/manual/reference/method/db.collection.insertMany/

How to Create Relationships Using MongoDB?

I am working on a web application that uses a mongoDB database and express/nodeJS. I want to create a project in which I have users, and users can have posts, which can have many attributes, such as title, creator, and date. I am confused how to do this so that I avoid replication in my database. I tried references by using ids in a list of all the users posts like this idea: [postID1, postID2, postID3, etc...]. The problem is that I want to be able to use query back to all the users posts and display them in an ejs template, but I don't know how to do that. How would I use references? What should I do to make a this modeling system optimal for relationships?
Any help would be greatly appreciated!
Thank you!
This is a classic parent-child relationship, and your problem is that you're storing the relationship in the wrong record :-). The parent should never contain the reference to the children. Instead, each child should have a reference to the parent. Why not the other way around? It's a bit of a historical quirk: it's done that way because a classic relational table can't have multiple values for a single field, which means you can't store multiple child IDs easily in a relational table, whereas since each child will only ever have one parent, it's easy to setup a single field in the child. A Mongo document can have multiple values within a single field by using arrays, but unless you really have a good reason to do so, it's just better to follow the historical paradigm.
How does this apply in your situation? What you're trying to do is to store references to all the children (i.e. the post IDs) as a list in the parent (i.e. an array in the user document). This is not the usual way to do this. Instead, in each child (i.e. in each post), have a field called user_id, and store the userID there.
Next, make sure you create an index on the user_id field.
With that setup, it's easy to take a post and figure out who the user was (just look at the user_id field). And if you want to find all of a user's posts, just do posts.find({user_id: 'XXXX'}). If you have an index on that field, the find will execute quickly.
Storing parent references in the child is almost always better than storing child references in the parent. Even though Mongo is flexible enough to allow you to structure it either way, it's not preferred unless you have a real reason for it.
EDIT
If you do have a valid reason for storing the child references in the parent, then assuming a structure like this:
user = {posts: [postID1, postID2, postID3, ...]}
You can find the user for a specific post by user.find({posts: "XXXX"}). MongoDB is smart enough to know that you're searching for a user in which the post array contains element "XXX". And if you create an index on the posts field, then the query should be pretty quick.
I would like to mention that, there is nothing wrong in Parent containing Child references in NoSQL databases at least. It all depends on what suits your needs.
You have One-to-many relationship between users and post, and you can model your data in following 3 ways
Embedded Data Model
{
user: "username",
post: [
{
title: "Title-1",
creator: "creator",
published_date: ISODate("2010-09-24")
},
{
title: "Title-2",
creator: "creator",
published_date: ISODate("2010-09-24")
}
]
}
Parent containing child references
{
user: "username",
posts: [123456789, 234567890, ...]
}
{
_id: 123456789,
title: "Title-1",
creator: "creator",
published_date: ISODate("2010-09-24")
}
{
_id: 234567890,
title: "Title-2",
creator: "creator",
published_date: ISODate("2010-09-24")
}
Child containing parent reference
{
_id: "U123",
name: "username"
}
{
_id: 123456789,
title: "Title-1",
creator: "creator",
published_date: ISODate("2010-09-24"),
user: "U123"
}
{
_id: 23456789,
title: "Title-2",
creator: "creator",
published_date: ISODate("2010-09-24"),
user: "U123"
}
According to the MongoDB docs (I have edited the below paragraph according to your case)
When using references, the growth of the relationships determine where
to store the reference. If the number of posts per user is small
with limited growth, storing the post reference inside the user
document may sometimes be useful. Otherwise, if the number of posts
per user is unbounded, this data model would lead to mutable,
growing arrays.
Reference: https://docs.mongodb.com/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/
Now you have to decide what is best for your project keeping in mind that your model should satisfy all the test cases
Peace,

query filter for mongodb using node js

I have two collection one is questions which stores _id, title, options, result, feedback and second is a child in the child I have store question_id, score. And I have filter the _id from questions collection. I don't know how I do this, Is it possible can we set the query for this. so that next time when I find the question from questions collection it sends filtered question. Means Return only that question from questions collection which id not same as the second collection child qustion_id.
This is my first collection where I have store questions, _id title option result feedback
_id:{type:String},
title:{type:String, required:true},
options:{type:Array, required:true},
result:{type:Array, required:true},
feedback:{type:String}
This is my Second collection where I have store attempted question_id and score
quiz:[
{
questionId:{
type:mongoose.Schema.Types.ObjectId,
ref: 'Question',
index: true
},
score:{type:Number},
time:{type:String}
}
]
This is not exactly I just create an example
var query = {}
firstcollection.find($and[{_id:},{secondcollection question_id:}]},function(err, data){
so that filter data means filter _id will store in data.
and I send this data to the frontend
res.send(data);
});
The main problem is conceptual, you are trying to work with mongodb, which is document store in RDBMS style. Under the community pressure Mondo added some minimal join functionality in latest version, but it doesn't make it relational DB.
There is no good way to perform such query. The idea behind document store is simple - you do have collection of documents and you query this collection, and only this collection. All link between collections are "virtual" and only provided by code logic, with no support from DB engine.
So all you can do with mongo is: query first collection for ids (with appropriate projection, to fetch ids only), store answer to some array and then perform second query to other collection using this array.

Is a type property the correct way to store different data entities in CouchDB?

I'm trying to wrap my head around CouchDB. I'm trying to switch off of MongoDB to CouchDB because I think the concept of views are more appealing to me. In CouchDB it looks like all records are stored in a single database. There is no concept of collections or anything, like in MongoDB. So, when storing different data entities such as users, blog posts, comments, etc, how do you differentiate between them from within your map reduce functions? I was thinking about just using some sort of type property and for each item I'd just have to make sure to specify the type, always. This line of thought was sort of reinforced when I read over the CouchDB cookbook website, in which an example does the same thing.
Is this the most reliable way of doing this, or is there a better method? I was thinking of alternatives, and I think the only other alternative way is to basically embed as much as I can into logical documents. Like, the immediate records inside of the database would all be User records, and each User would have an array of Posts, in which you just add all of the Posts to. The downside here would be that embedded documents wouldn't get their own id properties, correct?
Using type is convenient and fast when creating views. Alternatively you can consider using a part of the JSON document. I.e., instead of defining:
{
type: "user",
firstname: "John",
lastname: "Smith"
}
You would have:
{
user: {
firstname: "John",
lastname: "Smith"
}
}
And then in the view for emitting documents containing user information, instead of using:
function (doc) {
if (doc.type === "user") emit(null, doc);
}
You would write:
function (doc) {
if (doc.user) emit(null, doc);
}
As you can see there is not much difference. As you have already realized 1st approach is the most widely used but second (afaik) is well accepted.
Regarding the question of storing all Posts of one User in one single document. Depends on how you plan to update your document. Remember that you need to write the whole document each time that you update (unless you use attachments). That means that each time a user writes a new Post you need to retrieve the document containing the array of Posts, add/modify one element and update the document. Probably too much (heavy).

Mongoose: Only return one embedded document from array of embedded documents

I've got a model which contains an array of embedded documents. This embedded documents keeps track of points the user has earned in a given activity. Since a user can be a part of several activities or just one, it makes sense to keep these activities in an array. Now, i want to extract the hall of fame, the top ten users for a given activity. Currently i'm doing it like this:
userModel.find({ "stats.activity": "soccer" }, ["stats", "email"])
.desc("stats.points")
.limit(10)
.run (err, users) ->
(if you are wondering about the syntax, it's coffeescript)
where "stats" is the array of embedded documents/activeties.
Now this actually works, but currently I'm only testing with accounts who only has one activity. I assume that something will go wrong (sorting-wise) once a user has more activities. Is there anyway i can tell mongoose to only return the embedded document where "activity" == "soccer" alongside the top-level document?
Btw, i realize i can do this another way, by having stats in it's own collection and having a db-ref to the relevant user, but i'm wondering if it's possible to do it like this before i consider any rewrites.
Thanks!
You are correct that this won't work once you have multiple activities in your array.
Specifically, since you can't return just an arbitrary subset of an array with the element, you'll get back all of it and the sort will apply across all points, not just the ones "paired" with "activity":"soccer".
There is a pretty simple tweak that you could make to your schema to get around this though. Don't store the activity name as a value, use it as the key.
{ _id: userId,
email: email,
stats: [
{soccer : points},
{rugby: points},
{dance: points}
]
}
Now you will be able to query and sort like so:
users.find({"stats.soccer":{$gt:0}}).sort({"stats.soccer":-1})
Note that when you move to version 2.2 (currently only available as unstable development version 2.1) you would be able to use aggregation framework to get the exact results you want (only a particular subset of an array or subdocument that matches your query) without changing your schema.

Resources