MongoDB full text search on string array - node.js

So I'm using Node.js with MongoDB for my web application. I'm having some trouble creating a text index for my schema and searching for text within an array. I've looked at the mongo docs but haven't found anything related to this specifically.
My current implementation searches successfully on regular String values, but querying for text matching in [String]'s don't return anything.
Here's my REST call:
...console.log("Query string: " + str);
var qry = {
"$text": {
"$search": str
}
};
model.find(qry, function (err, results) {...
And when I create my schema:
var blah = new Schema({
foo : String,
bar : [String],
...
blah.index({
foo: 'text',
bar: 'text'
});
Any query won't return the results that match in bar. A query string for something within foo works fine.

Double check that you've created the correct indexes on the correct collections and the queries are being issued to the correct collections. Indexing an array works for me:
> db.test.drop()
> db.test.insert({ "_id" : 0, "a" : "dogs are good" })
> db.test.insert({ "_id" : 1, "a" : "I like dogs", "b" : ["where's my dog?", "here, have a cat"] })
> db.test.insert({ "_id" : 2, "b" : ["she borrowed my dog", "my frogs are croaking"] })
> db.test.ensureIndex({ "a" : "text", "b" : "text" })
> db.test.find({ "$text" : { "$search" : "dogs" } }, { "_id" : 1 })
{ "_id" : 0 }
{ "_id" : 2 }
{ "_id" : 1 }

Okay, I finally figured it out! Turns out, grunt serve doesn't update indexes in the database. I had created a text index for "foo" only and that didn't update when I added "bar" to the index. I had to run - in mongo shell:
db.dropDatabase()
The next time I ran it, the database was recreated and the proper indexes were set. If anyone else runs across this issue, try running db.getIndexes().

Related

mongoose return null in the query

my mongodb has a collection with data like this
{
"_id" : ObjectId("62ead2a8dd6922cfd6f466e4"),
"t" : "d",
"u" : {
"_id" : ObjectId("621d3469dd01e282b9a62321"),
"username" : "helxsz"
},
"users" : [
ObjectId("621d3469dd01e282b9a62321"),
ObjectId("628ee99ed0a58e00496a0730")
],
"createdAt" : ISODate("2022-08-03T19:55:20.965Z"),
"updatedAt" : ISODate("2022-08-03T19:55:20.965Z")
}
I am using node.js and mongoose to query the document.
let query = {
u:{
_id: "621d3469dd01e282b9a62321",
username: "helxsz"
},
t:'d',
};
collection
.findOne(query, 'u t ')
.exec(getResult);
why the executed query returns null to me
Maybe is because in your DB the u._id is ObjectId and in your query is a string. Mongoose should (?) parse it but I've faced not-parsed error like this many times.
So try parsing to ObjectId, in this example works.
Also an other problem is trying to search an object like this:
{
u:{
_id: "621d3469dd01e282b9a62321",
username: "helxsz"
}
}
Because in this way mongo looks for by the objects with the same order. You have to use dot notation
As an example, check how this query not work all times. To ensure the result you have to use dot notation:
let query = {
"u._id": "621d3469dd01e282b9a62321", // maybe casting to ObjectId is necessary
"u.username": "helxsz",
t: "d"
}
Example here
More info about Match an Embedded/Nested Document

complicated mongoose pull list of data from api and insert into mongodb if it doesn't already exist

I am connecting to the Yelp API using the RapidAPI module in Nodejs. I am able to request a token, connect, and request data, retrieve that data, and insert the relevant information for each result it into mongodb. Here's where it gets complicated...
Let's say I make a Yelp API request and search for bars. I get a list of bars and insert them into the database. Let's say one of these in the list is "Joe's Bar & Grill". One of the fields in my mongodb is "type" and it's an array. So now, this particular document will look something like this:
{
id: 'joes-bar-and-grill',
name: 'Joe\'s Bar & Grill',
type: ['bar']
}
But then I run another request on the Yelp API on "restaurants", and in this list "Joe's Bar & Grill" shows up again. Instead of inserting a new duplicate document into mongodb, I'd like the existing document to end up looking like this:
{
id: 'joes-bar-and-grill',
name: 'Joe\'s Bar & Grill',
type: ['bar', 'restaurant']
}
In addition to this, let's say I run another request again for "bars", and "Joe's Bar & Grill" comes up again. I don't want it to automatically insert "bar" into the type array again, if "bar" already exists in its array.
I've tried findOneAndUpdate with upsert: true and a $push of new data into the array, but I cannot get it to work at all. Does anyone have any ideas?
You can use findOneAndUpdate, combined with $addToSet (to make sure that an entry in the array only exists once) and $each (to allow passing arrays to $addToSet):
Bar.findOneAndUpdate({ id : 'joes-bar-and-grill' }, {
id : 'joes-bar-and-grill',
name : 'Joe\'s Bar & Grill',
$addToSet : { type : { $each : [ 'restaurant' ] } }
}, { upsert : true })
EDIT: now that you posted your entire code, the problem becomes more obvious.
For one, I'm not sure if the third and fourth arguments that you're passing to Location.update() make sense. As far as I know, the third should be an option object, and the fourth an async function.
Secondly, it looks like you're just ignoring any update errors.
And lastly, this isn't going to work:
for (var i = 0; i < payload.businesses.length; i++) { Location.update(...) }
Because Location.update() is asynchronous, the i variable will get clobbered (you should browse around on SO to find the explanation for that; for example, see this question).
You're going to need a library that will provide you with better async support, and preferably one that will also help limiting the number of update queries.
Once such library is async, and using it, your code would become something like this:
const async = require('async');
...
async.eachLimit(payload.businesses, 5, function(business, callback) {
Location.update({ yelpID : business.id }, {
name : business.name,
latitude : business.location.latitude,
longitude : business.location.longitude,
address1 : business.location.address1,
address2 : business.location.address2,
address3 : business.location.address3,
city : business.location.city,
state : business.location.state,
zip_code : business.location.zip_code,
country : business.location.country,
timezone : 'CST'
$addToSet : { type : 'bar' }
}, { upsert : true }, callback);
}, function(err) {
if (err) {
console.error(err);
} else {
console.log('All documents inserted');
}
});
You may use $addToSet operator
The $addToSet operator adds a value to an array unless the value is
already present, in which case $addToSet does nothing to that array.
$addToSet only ensures that there are no duplicate items added to the
set and does not affect existing duplicate elements. $addToSet does
not guarantee a particular ordering of elements in the modified set.
If the field is absent in the document to update, $addToSet creates
the array field with the specified value as its element.
If the field is not an array, the operation will fail.
The below solution assumes that on each update, you receive a single type and not an array. If the input document is an array itself, you may use robertklep's solution with $each operator
db.mycoll.update(
{ "id" : "joes-bar-and-grill" },
{
$set:{
name : 'Joe\'s Bar & Grill',
},
$addToSet : { type : 'restaurant' }
},
true, false);
I have also used $set operator.
The $set operator replaces the value of a field with the specified
value.
The $set operator expression has the following form:
{ $set: { field1: value1, ... } }
Here is the mongo shell output to explain it further :
> db.mycoll.find({ "id" : "joes-bar-and-grill" });
// NO RESULT
> db.mycoll.update(
... { "id" : "joes-bar-and-grill" },
... {
... $set:{
... name : 'Joe\'s Bar & Grill',
... },
... $addToSet : { type : 'restaurant' }
... },
... true, false);
WriteResult({
"nMatched" : 0,
"nUpserted" : 1,
"nModified" : 0,
"_id" : ObjectId("58e719b4d543c5e30d615d59")
})
// INSERTED A NEW DOCUMENT AS IT DOES NOT EXIST
> db.mycoll.find({ "id" : "joes-bar-and-grill" }); // FINDING THE OBJECT
{ "_id" : ObjectId("58e719b4d543c5e30d615d59"), "id" : "joes-bar-and-grill", "name" : "Joe's Bar & Grill", "type" : [ "restaurant" ] }
> db.mycoll.update(
... { "id" : "joes-bar-and-grill" },
... {
... $set:{
... name : 'Joe\'s Bar & Grill',
... },
... $addToSet : { type : 'bar' }
... },
... true, false);
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
// UPDATING THE DOCUMENT WITH NEW TYPE : "bar"
> db.mycoll.findOne({ "id" : "joes-bar-and-grill" });
{
"_id" : ObjectId("58e719b4d543c5e30d615d59"),
"id" : "joes-bar-and-grill",
"name" : "Joe's Bar & Grill",
"type" : [
"restaurant",
"bar"
]
}

mongoose $match wont return document

I use two ways to retrieve documents from my collection, the first one:
db.comments.find({"nid" : "req.body.data"});
returns many doc like:
{
"nid" : 20404,
"_id" : ObjectId("5638ba331294943d3d0a092b"),
"uid" : 1937,
"posted" : ISODate("2015-11-03T13:44:19.811Z"),
"text" : "txt",
"title" : "Test nid 2",
"stars" : 3,
"__v" : 0
}
,
And for another query I need to use aggregate and the query:
var pipleline = [
{$match: {nid:req.body.data}}
];
Comments.aggregate(pipleline, function(err, rank){
if(err) {
res.send("Error", String(err));
}
res.send(rank);
});
Returns [] - empty array.
Any ideas?
You can use the built in function chaining mongoose provides. Aside from match, it also has sort, project, group, and few others I don't know off the top of my head. More info here
Comments.aggregate().match({nid:req.body.data})
.exec(function(err,rank){
if(err) {
res.send("Error", String(err));
}
res.send(rank);
});

Sort descending String with Number in Mongo DB

I have currently a DB with two fields. Only one of them matter for the purpose of this question.
Imagine a DB with a single String field (let's call it "Tags"), and the following pattern: [a-z]*[0-9]*, like:
test129
test130
some43
some44
some45
...
My application needs to generate new "Tags", given the prepend "identifier" (like test or some).
So let's say I input test as the prepend name, and 100 as the number of "Tags" to generate.
He finds the LAST tag with the prepended name test on it.
Parses the number after the prepended name.
Sum +1 on that number, and generate 100 tags with the sequence.
Output in this specific case, would be: test131, test132, ..., test230.
I implemented this, and it was working just great with Mongoose. However, when I tried to generate a "Tag" from a already existent Tag with more than 1000, I found that the first step was a flaw. It was returning, let's say test999 instead of test1200, and causing the iteration to start from 999, and getting errors since it needs to be unique.
This is because, sorting a String differs from sorting a Number. I know the problem, but how can I solve this in a simple way, without having to create extra fields?
UPDATE: Part of the code where I find the tag:
lastAliasNumber: function (next){
console.log('process.lastAliasNumber');
// Skip if prefix is not set (tags already have name)
if(!prefix) return next();
// Build RegExp to find tags with the prefix given
var regexp = new RegExp('^'+prefix+'[0-9]+$', 'i');
Models.Tag
.findOne()
.where({
alias: regexp
})
.sort('-alias')
.exec(function (err, tag){
if(err) return next(err);
// Remove prefix and try parsing number
var lastId = 100;
if(tag){
// Remove prefix
var number = tag.alias.toLowerCase().replace(prefix, '');
// Get number from it
number = parseInt(number);
if(number) lastId = number;
}
console.log('lastAliasNumber', lastId);
next(null, lastId);
});
},
There is no ready way to do this kind of sorting within MongoDB. As your field is a string field, it will be sorted by the rules of string sorting and there is no way to do variable type sorting on one field.
Your best bet (assuming you cannot simply use an integer type and wish to keep only one field) would be to work out the theoretical maximum number of entries and pad your strings with the relevant leading number of 0's accordingly.
EG. assuming a maximum of 1,000,000 entries your strings would be:
test000999
test001200
test000131
Another option would be to have these entries become whole subdocuments with two distinct datatypes.
Consider my quick example documents below
> db.bar.insert({x:{text:"test",num:1}})
WriteResult({ "nInserted" : 1 })
> db.bar.insert({x:{text:"test",num:100}})
WriteResult({ "nInserted" : 1 })
> db.bar.insert({x:{text:"test",num:2}})
WriteResult({ "nInserted" : 1 })
> db.bar.insert({x:{text:"sweet",num:2}})
WriteResult({ "nInserted" : 1 })
> db.bar.insert({x:{text:"sweet",num:1}})
WriteResult({ "nInserted" : 1 })
> db.bar.find().sort({x:1})
{ "_id" : ObjectId("55fa469d695632545d3aff1f"), "x" : { "text" : "sweet", "num" : 1 } }
{ "_id" : ObjectId("55fa469b695632545d3aff1e"), "x" : { "text" : "sweet", "num" : 2 } }
{ "_id" : ObjectId("55fa468a695632545d3aff1b"), "x" : { "text" : "test", "num" : 1 } }
{ "_id" : ObjectId("55fa4695695632545d3aff1d"), "x" : { "text" : "test", "num" : 2 } }
{ "_id" : ObjectId("55fa468f695632545d3aff1c"), "x" : { "text" : "test", "num" : 100 } }
> db.bar.find().sort({x:-1})
{ "_id" : ObjectId("55fa468f695632545d3aff1c"), "x" : { "text" : "test", "num" : 100 } }
{ "_id" : ObjectId("55fa4695695632545d3aff1d"), "x" : { "text" : "test", "num" : 2 } }
{ "_id" : ObjectId("55fa468a695632545d3aff1b"), "x" : { "text" : "test", "num" : 1 } }
{ "_id" : ObjectId("55fa469b695632545d3aff1e"), "x" : { "text" : "sweet", "num" : 2 } }
{ "_id" : ObjectId("55fa469d695632545d3aff1f"), "x" : { "text" : "sweet", "num" : 1 } }

Remove duplicate array objects mongodb

I have an array and it contains duplicate values in BOTH the ID's, is there a way to remove one of the duplicate array item?
userName: "abc",
_id: 10239201141,
rounds:
[{
"roundId": "foo",
"money": "123
},// Keep one of these
{// Keep one of these
"roundId": "foo",
"money": "123
},
{
"roundId": "foo",
"money": "321 // Not a duplicate.
}]
I'd like to remove one of the first two, and keep the third because the id and money are not duplicated in the array.
Thank you in advance!
Edit I found:
db.users.ensureIndex({'rounds.roundId':1, 'rounds.money':1}, {unique:true, dropDups:true})
This doesn't help me. Can someone help me? I spent hours trying to figure this out.
The thing is, I ran my node.js website on two machines so it was pushing the same data twice. Knowing this, the duplicate data should be 1 index away. I made a simple for loop that can detect if there is duplicate data in my situation, how could I implement this with mongodb so it removes an array object AT that array index?
for (var i in data){
var tempRounds = data[i]['rounds'];
for (var ii in data[i]['rounds']){
var currentArrayItem = data[i]['rounds'][ii - 1];
if (tempRounds[ii - 1]) {
if (currentArrayItem.roundId == tempRounds[ii - 1].roundId && currentArrayItem.money == tempRounds[ii - 1].money) {
console.log("Found a match");
}
}
}
}
Use an aggregation framework to compute a deduplicated version of each document:
db.test.aggregate([
{ "$unwind" : "$stats" },
{ "$group" : { "_id" : "$_id", "stats" : { "$addToSet" : "$stats" } } }, // use $first to add in other document fields here
{ "$out" : "some_other_collection_name" }
])
Use $out to put the results in another collection, since aggregation cannot update documents. You can use db.collection.renameCollection with dropTarget to replace the old collection with the new deduplicated one. Be sure you're doing the right thing before you scrap the old data, though.
Warnings:
1: This does not preserve the order of elements in the stats array. If you need to preserve order, you will have retrieve each document from the database, manually deduplicate the array client-side, then update the document in the database.
2: The following two objects won't be considered duplicates of each other:
{ "id" : "foo", "price" : 123 }
{ "price" : 123, "id" : foo" }
If you think you have mixed key orders, use a $project to enforce a key order between the $unwind stage and the $group stage:
{ "$project" : { "stats" : { "id_" : "$stats.id", "price_" : "$stats.price" } } }
Make sure to change id -> id_ and price -> price_ in the rest of the pipeline and rename them back to id and price at the end, or rename them in another $project after the swap. I discovered that, if you do not give different names to the fields in the project, it doesn't reorder them, even though key order is meaningful in an object in MongoDB:
> db.test.drop()
> db.test.insert({ "a" : { "x" : 1, "y" : 2 } })
> db.test.aggregate([
{ "$project" : { "_id" : 0, "a" : { "y" : "$a.y", "x" : "$a.x" } } }
])
{ "a" : { "x" : 1, "y" : 2 } }
> db.test.aggregate([
{ "$project" : { "_id" : 0, "a" : { "y_" : "$a.y", "x_" : "$a.x" } } }
])
{ "a" : { "y_" : 2, "x_" : 1 } }
Since the key order is meaningful, I'd consider this a bug, but it's easy to work around.

Resources