How can I find similar documents in MongoDB? - node.js

I have food db listing similar to:
{
Name: "burger",
ingredients: [
{Item:"bread"},
{Item:"cheese"},
{Item:"tomato"}
]
}
How can I find documents that have the most similar items in ingredients?

First of all, your data should be remodelled as below:
{
name: "Burger",
ingredients: [
"bread",
"cheese",
"tomato",
"beef"
]
}
The extra "Item" does not add any additional information nor does it help accessing the data in any way.
Next, you need to create a text index. The docs state that
text indexes can include any field whose value is a string or an array of string elements.
So we simply do a
db.collection.ensureIndex({"ingredients":"text"})
Now we can do a $text search:
db.collection.find(
{ $text: { $search: "bread beef" } },
{ score: { $meta: "textScore" } }
).sort( { score: { $meta: "textScore" } } )
which should give you the most relevant documents.
However, what you could also do is a non-text search for direct matches:
db.collection.find({ingredients:"beef"})
or for multiple ingredients
db.collections.find({ ingredients: { $all: ["beef","bread"] } })
So for searching by user input, you can use the text search and for search by selected ingredients, you can use the non-text search.

your best chance is if you store ingredients in a text field i.e:
{ingredients : "bread cheese tomato"} then you have to use a text index and query for similarity db.your_collection.find({$text: {$search: {"tomato" }}, {score: { $meta: "textScore" }}).sort({score: {$meta: "textScore" } } ) and get most relevant documents

Related

How to create an index for partial text search on MongoDB?

I'm following the tutorial instruction: https://docs.mongodb.com/manual/core/index-text/
This is the sample data:
db.stores.insert(
[
{ _id: 1, name: "Java Hut", description: "Coffee and cakes" },
{ _id: 2, name: "Burger Buns", description: "Gourmet hamburgers" },
{ _id: 3, name: "Coffee Shop", description: "Just coffee" },
{ _id: 4, name: "Clothes Clothes Clothes", description: "Discount clothing" },
{ _id: 5, name: "Java Shopping", description: "Indonesian goods" }
]
)
Case 1: db.stores.find( { $text: { $search: "java coffee shop" } } ) => FOUND
Case 2: db.stores.find( { $text: { $search: "java" } } ) => FOUND
Case 3: db.stores.find( { $text: { $search: "coff" } } ) => NOT FOUND
I'm expecting case 3 is FOUND because the query is matches a part of java coffee shop
Case 3 will not work with $text operator and reason is how Mongo Creates Text Indexes.
Mongo takes text indexed fields values and creates separate indexes for each unique word in string and not character(!).
so this means, that in your case for 1 object:
field name will have 2 indexes:
java
hut
field description will have 3 indexes:
coffee
and
cakes
$text operator compare $search values with this indexes and that's why "coff" will not work.
If you strongly want to take advantages of indexes you have to use $text operator, but it does not give you all flexibility, just like you want.
solution:
You Can simply use $regex with case sensitiveness option (i) and optimize your query with skip and limit.
If you want to return all documents and collection is large, $regex will cause performance issue
you can also check this article https://medium.com/coding-in-depth/full-text-search-part-1-how-to-create-mongodb-full-and-partial-text-search-c09c0bae17a3 and maybe use wildcard indexes for that, but i do not know is it a good practice or not.

How to define an index to use in a Mango Query

I am trying to create a CouchDB Mango Query with an index with the hope that the query runs faster. At the moment I have the following Mango Query which returns what I am looking for but it's slow. Therefore, I assume, I need to create an index to make it faster. I need help figuring out how to create that index.
selector: {
categoryIds: {
$in: categoryIds,
},
},
sort: [{ publicationDate: 'desc' }],
You can assume that my documents are let say news articles from different categories. Therefore in each document I have a field that contains one or more categories that the news article belongs to. For that I have an array of categoryIds for each document. My query needs to be optimized for queries like "Give me all news that have categoryId1 in their array of categoryIds sorted by publicationDate". What I don't know how to do is 1. How to define an index 2. What that index should be 3. How to use that index in "use_index" field of the Mango Query. Any help is appreciated.
Update after "Alexis Côté" answer:
If I define the index like this:
{
"_id": "_design/0f11ca4ef1ea06de05b31e6bd8265916c1bbe821",
"_rev": "6-adce50034e870aa02dc7e1e075c78361",
"language": "query",
"views": {
"categoryIds-json-index": {
"map": {
"fields": {
"categoryIds": "asc"
},
"partial_filter_selector": {}
},
"reduce": "_count",
"options": {
"def": {
"fields": [
"categoryIds"
]
}
}
}
}
}
And run the Mango Query like this:
{
"selector": {
"categoryIds": {
"$in": [
"e0bd5f97ac35bdf6893351337d269230"
]
}
},
"use_index": "categoryIds-json-index"
}
It still does return the results but they are not sorted in the order I want by publicationDate. So I am not clear what you are suggesting the solution is.
You can create an index as documented here
In your case, you will need an index on the "categoryIds" field.
You can specify the index using "use_index": "_design/<name>"
Note:The query planner should automatically pick this index if it's compatible.

mongodb find with calculated field

I'm trying to create a mongodb query using the filtered value in the filter. For example:
var myIdVariable = '1jig23h34r34r30h';
var myVisibleVariable = false;
var myDistanceVariable = 100;
db.getCollection.find({
'_id': myIdVariable,
'isVisible': myVisibleVariable,
'distanceRange': {$lte: {myDistanceVariable - distanceRange}}
})
So, I want filter the distanceRange from database based on the calculation of (myDistanceVariable - distanceRange), with the distanceRange given in the same query.
I don't know if I give you a clear explanation of my problem. It's possible?
Thanks you.
Use the $expr operator to build a query expression that allows you to compare fields from the same document as well as compare the distanceRange field with the calculation of the field itself and your variables.
You would need to use the logical $and query operator to include the other query expressions thus your final query would look like the following:
db.getCollection('collectionName').find({
'$expr': {
'$and': [
{ 'isVisible': myVisibleVariable },
{ '$lte': [
'$distanceRange', {
'$subtract': [
myDistanceVariable, '$distanceRange'
]
}
] }
]
}
})
If your MongoDB server doesn't support the $expr operator then go for the aggregation framework route with $redact
db.getCollection('collectionName').aggregate([
{ "$redact": {
"$cond": [
{
'$and': [
{ 'isVisible': myVisibleVariable },
{ '$lte': [
'$distanceRange', {
'$subtract': [
x, '$distanceRange'
]
}
] }
]
},
"$$KEEP",
"$$PRUNE"
]
} }
])
Note
Including the _id in the query expressions means you are narrowing down your selection to just a single document and the query may not return any results since it's looking for a specific document with that _id AND the same document should satisfy the other query expressions.

Using $concat with $project is giving error : 'MongoError: $concat only supports strings, not double'?

I have a mongoose model in which some fields are like :
var AssociateSchema = new Schema({
personalInformation: {
familyName: { type: String },
givenName: { type: String }
}
})
I want to perform a '$regex' on the concatenation of familyName and givenName (something like 'familyName + " " + 'givenName'), for this purpose I'm using aggregate framework with $concat inside $project to produce a 'fullName' field and then '$regex' inside $match to search on that field. The code in mongoose for my query is:
Associate.aggregate([
{ $project: {fullName: { $concat: [
'personalInformation.givenName','personalInformation.familyName']}}},
$match: { fullName: { 'active': true, $regex: param, $options: 'i' } }}
])
But it's giving me error:
MongoError: $concat only supports strings, not double on the first
stage of my aggregate pipeline i.e $project stage.
Can anyone point out what I'm doing wrong ?
I also got this error and then discovered that indeed one of the documents in the collection was to blame. They way I fished it out was by filtering by field type as explained in the docs:
db.addressBook.find( { "zipCode" : { $type : "double" } } )
I found the field had the value NaN, which to my eyes wouldn't be a number, but mongodb interprets it as such.
Looking at your code, I'm not sure why $concat isn't working for you unless you've had some integers sneak into some of your document fields. Have you tried having a $-sign in front of your concatenated values? as in, '$personalInformation.givenName'? Are you sure every single familyName and givenName is a string, not a double, in your collection? All it takes is one double for your $concat to fold.
In any case, I had a similar type mismatch problem with actual doubles. $concat indeed supports only strings, and usually, all you'd do is cast any non-strings to strings.. but alas, at the time of this writing MongoDB 3.6.2 does not yet support integer/double => string casting, only date => string casting. Sad face.
That said, try adding this projection hack at the top of your query. This worked for me as a typecast. Just make sure you provide a long enough byte length (128-byte name is pretty long so you should be okay).
{
$project: {
castedGivenName: {
$substrBytes: [ 'personalInformation.givenName', 0, 128 ]
},
castedFamilyName: {
$substrBytes: [ 'personalInformation.familyName', 0, 128 ]
}
},
{
$project: {
fullName: {
$concat: [
'$castedGivenName',
'$castedFamilyName'
]
}
}
},
{
$match: { fullName: { 'active': true, $regex: param, $options: 'i' } }
}
I managed to make it work by using $substr method, so the $project part of my aggregate pipeline is now:
`$project: {
fullName: {
$concat: [
{ $substr: ['$personalInformation.givenName', 0, -1] }, ' ', { $substr: ['$personalInformation.familyName', 0, -1] }
]
}
}
}`

Most efficient way to check if element exists in a set

so in my MongoDB database I have a collection holding user posts.
Within that collection I have a set called "likes", which holds an array of the ids of the users that have liked that post. When querying I would like to pass a user id to my query and have a boolean in the result telling me whether the id exists in the array to see whether the user has already liked the post. I understand this would be easy to do with two queries, one to get the post and one to check if the user has liked it, but I would like to find the most efficient way to do this.
For example, one of my documents looks like this
{
_id: 24jef247jos991,
post: "Test Post",
likes: ["userid1", "userid2"]
}
When I query from "userid1" I would like the return
{
_id: 24jef247jos991,
post: "Test Post",
likes: ["userid1", "userid2"],
userLiked: true
}
But when I query from let's say "userid3" I would like
{
_id: 24jef247jos991,
post: "Test Post",
likes: ["userid1", "userid2"],
userLiked: false
}
You can add the $addFields stage checking each of the document likes arrays against the input user.
db.collection.aggregate( [
{
$addFields: {
"userLiked":{ $in: [ "userid1", "$likes" ] }
}
}
] )
Starting from MongoDB 3.4 you can use the $in aggregation operator to check if an array contains a given element. You can use the $addFields operator aggregation operator to add the newly computed value to your document without explicitly including other fields.
db.collection.aggregate( [
{ "$addFields": { "userLiked": { "$in": [ "userid1", "$likes" ] } } }
])
In MongoDB 3.2, you can use the $setIsSubset operator and the square bracket [] operator to do this. The downside of this approach is that you need to manually $project all the field in your document. Also the $setIsSubset operator with de-duplicate your array which may not be what you want.
db.collection.aggregate([
{ "$project": {
"post": 1, "likes": 1,
"userLiked": { "$setIsSubset": [ [ "userid3" ], "$likes" ] }
}}
])
Finally if your mongod version is 3.0 or older you need to use the $literal operator instead of the [] operator.

Resources