I am working on a mobile site that lets you search for tags of a MongoDB collection of articles.
Basically, each article object has a tags property, which stores an array of tag strings.
The search works fine, but I also want to add logging to the searches.
The reason is that I want to see what visitors are searching for and what results they are getting in order to optimize the tags.
For example, if the user enters the tag grocery, then I want to save the query results.
Hope my question is clear. thank you!
You can't optimize something without measuring. You'll need to be able to compare new results with old results. So you'll have to save a snapshot of all the information crucial to a search query. This obviously includes the search terms itself, but also an accurate snapshot of the result.
You could create snapshots of entire products, but it's probably more efficient to save only the information involved in determining the search results. In your case these are the article tags, but perhaps also the article description if this is used by your search engine.
After each search query you'll have to build a document similar to the following, and save this in a searchLog collection in MongoDB.
{
query: "search terms",
timestamp: new Date(), // time of the search
results: [ // array of articles in the search result
{
articleId: 123, // _id of the original article
name: "Lettuce", // name of the article, for easier analysis
tags: [ "grocery", "lettuce" ] // snapshot of the article tags
// snapshots of other article properties, if relevant
},
{
articleId: 456,
name: "Bananas",
tags: [ "fruit", "banana", "yellow" ]
}
]
}
Related
We have two indexes: posts and users. We'd like to make queries on these two indexes, search for a post in the index "posts" and then go to the index "users" to get the user info, to eventually return an aggregated result of both the user info and the post we found.
Let me clarify it a bit with an example:
posts:
[
{
post: "this is a post about stack overflow",
username: "james_bond",
user_id: "007"
},
{...}
]
users:
[
{
username: "james_bond",
user_id: "007",
bio: "My name's James. James Bond."
nb_posts: "7"
},
{...}
]
I want to search for all the posts which contain "stack overflow", and then display all the users who are talking about it and their info (from the "users" index), it could look something like this:
result: {
username: "james_bond",
user_id: "007",
post: "this is a post about stack overflow",
bio: "My name's James. James Bond"
}
I hope this is clear enough, I'm sorry if this question has already been answered but I honestly didn't find any answer anywhere.
So is it possible to do so with only ES js?
I dont beleive it is possible to do exactly what you are asking as it would be very costly to join across two indexes which are potentially sharded across different nodes (this is not a main use case for elasticsearch). But if you have control of the data within elastic search you could structure the data so that you can acheive a different type of joining.
You can either use:
nested query
Where documents may contain fields of type nested. These fields are used to index arrays of objects, where each object can be queried (with the nested query) as an independent document.
has_child and has_parent queries
A join field relationship can exist between documents within a single index. The has_child query returns parent documents whose child documents match the specified query, while the has_parent query returns child documents whose parent document matches the specified query.
Denormalisation
Alternativly you could store the user denormalised within the post document when you insert the document into the index. This becomes a balancing act between saving time from doing multiple reads every time a post is viwed (fully normalised) and the cost of updating all posts from user 007 everytime his detials change (denormalised). There is a tradeoff here, you dont need to denormalise everything and as you have it you have already denormalised the username from users to posts.
Here is a Question/Answer that gives more detials on the options.
I am new to arangodb, i have the following document under collection named posts
{_id:56687,
userid: "usr32",
postcontent: "text here",
comment: [{usrid:"usr32",msg:"good post",date:"date"},{usrid:"usr32",msg:"good post",date:"date"}]
}
Basically this document corresponds to a facebook wall like post and will take multiple comments from various users. Can someone please help with aql query to push data into this array?
AQL should be able to identify the document based on _id field, which i was able to do with ease, the problem is pushing new comment into the array
I use nodejs
found a solution, donno if there is a better one
for p in posts filter p._key=='56687' let arr = push(p.comment,'nice video') update p with {comment: arr} in posts
I am working on a search feature over mongoose documents where I have to search over 250,000 documents.
In this feature I have to add search indexes over multiple fields.
In documents some of the fields are string type,
some are multi level objects.
I have indexed all the possible fields.
At local I am having 100,000 documents and when I search over them it took around 300-400ms.
But when I search over them on server It took around 10-15 sec to respond.
The search query is conditional based but I am sharing a small code snippet.
$and(
{
$or:[
{'field1': {$regex: re }},
{'field2': {$regex: re }},
{'level1.level2.value': {$regex: re }}
]
},
{
$and:[
{
lowAge: {$lte: parseInt(age)}
},
{
highAge: {$gte: parseInt(age)}
},
{
$or:[
{
gender:gender
},
{
gender:"N/A"
}
]
}
]
}
)
Can someone advice me that how can I speed up the process on server.
To speed further more, you can use the text index.
But text index comes with the following Storage Requirements and Performance Costs
Text indexes can be large. They contain one index entry for each unique post-stemmed word in each indexed field for each document inserted.
Building a text index is very similar to building a large multi-key index and will take longer than building a simple ordered (scalar) index on the same data.
When building a large text index on an existing collection, ensure that you have a sufficiently high limit on open file descriptors. See the recommended settings.
Text indexes will impact insertion throughput because MongoDB must add an index entry for each unique post-stemmed word in each indexed field of each new source document.
Additionally, text indexes do not store phrases or information about the proximity of words in the documents. As a result, phrase queries will run much more effectively when the entire collection fits in RAM.
Please see the below references
https://docs.mongodb.com/manual/core/index-text/
https://www.tutorialspoint.com/mongodb/mongodb_text_search.htm
Hope it helps!
I am trying to write a blog engine for myself with node.js/express/mongodb (also a start to learn node.js). To go a little further than the tutorials on the Internet, I want to add tags support to the blog engine.
I want to do the following things with tags:
Viewers could see all the tags as a tag cloud on a "tag cloud page"
Viewers could see the tags that an article has on article list page and single article page
Viewers are able to click on a single tag to show the article list
What's more, viewers are able to search articles with particluar tags in the SO way: [tag1][tag2] --> /tags/tag1+tag2 --> list of articles that has both tag1 and tag2
In relational database, a post_tag table will be used for this. But how to desgin this in MongoDB?
I have checked MongoDB design - tags
But as efdee comments, the design
db.movies.insert({
name: "The Godfather",
director: "Francis Ford Coppola",
tags: [ "mafia", "wedding", "violence" ]
})
has a problem:
This doesn't seem to actually answer his question. How would you go about getting a distinct list of tags used in the entire movie collection?
That's also my concern: in my design, I need to show a list of all the tags; I also need to know how many articles each tag has. So is there a better way than the design shown above?
My concern with the design above is: if I want to show a list of the tags, the query will go over all the article items in the database. Is there a more efficient way?
You'd need to create a multi key index on tags to start with.
Then you will be able to find document matching tag using this syntax
db.movies.find({ "tags": { $all : [ /^this/, /^that/ ] }})
Because you're using the ^ (start of string) of the reg ex mongo will still use the index.
To get keyword densities, using the aggregation framework, you could simple get a count.
db.movies.aggregate({ $project: { _id:0, tags: 1}},
{ $unwind: "$tags" },
{ $group : { _id : "$tags", occur : { $sum : 1 }}})
Sorry formatting difficult from iPad.
You would end up with collection of docs looking like:
{
_id: "mytag",
occur: 383
},
{
_id: "anothertag",
occur: 23
},
Using the aggregate command you get an inline result back, so would be down to the client app (or server) to serialise or cache the result if it's frequently used.
Let me know how you get on with that.
Hth
Sam
How would you go about getting a distinct list of tags used in the entire movie collection?
db.movies.distinct("tags")
For efficient queries, I'd probably duplicate data. tags are very unlikely to ever be edited, so I'd put the tags array in the article object, and then also put the tags in a tags collection, and tags has either a count of articles containing that tag, or an array of article ids.
db.movies.insert({
id: 1,
name: "The Godfather",
director: "Francis Ford Coppola",
tags: [ "mafia", "wedding", "violence" ]
});
db.tags.insert([
{name: "mafia", movie_count: 1},
{name: "wedding", movie_count: 1},
{name: "violence", movie_count: 1}
});
You could perform your 4 tasks using MapReduce functions. For example, for the list of all tags you'd emit the tag as the key and then in the reduce function you'd count them all up and return the count. That would be the route I'd go down. It may require a little more thought, but it's definitely powerful.
http://cookbook.mongodb.org/patterns/count_tags/
I've got a model which contains an array of embedded documents. This embedded documents keeps track of points the user has earned in a given activity. Since a user can be a part of several activities or just one, it makes sense to keep these activities in an array. Now, i want to extract the hall of fame, the top ten users for a given activity. Currently i'm doing it like this:
userModel.find({ "stats.activity": "soccer" }, ["stats", "email"])
.desc("stats.points")
.limit(10)
.run (err, users) ->
(if you are wondering about the syntax, it's coffeescript)
where "stats" is the array of embedded documents/activeties.
Now this actually works, but currently I'm only testing with accounts who only has one activity. I assume that something will go wrong (sorting-wise) once a user has more activities. Is there anyway i can tell mongoose to only return the embedded document where "activity" == "soccer" alongside the top-level document?
Btw, i realize i can do this another way, by having stats in it's own collection and having a db-ref to the relevant user, but i'm wondering if it's possible to do it like this before i consider any rewrites.
Thanks!
You are correct that this won't work once you have multiple activities in your array.
Specifically, since you can't return just an arbitrary subset of an array with the element, you'll get back all of it and the sort will apply across all points, not just the ones "paired" with "activity":"soccer".
There is a pretty simple tweak that you could make to your schema to get around this though. Don't store the activity name as a value, use it as the key.
{ _id: userId,
email: email,
stats: [
{soccer : points},
{rugby: points},
{dance: points}
]
}
Now you will be able to query and sort like so:
users.find({"stats.soccer":{$gt:0}}).sort({"stats.soccer":-1})
Note that when you move to version 2.2 (currently only available as unstable development version 2.1) you would be able to use aggregation framework to get the exact results you want (only a particular subset of an array or subdocument that matches your query) without changing your schema.