MongoDB find field that is part of a longer string

MongoDB find field that is part of a longer string - node.js

I know how to search for a field that contains a part of my search in MongoDB & Node, or even if it is possible ie.
Record:
{
name: "Hello my name is robinson"
}
Query:
{
name: /robinson/i
}
However I don't know how to do the reverse
ie:
Query:
{
name: "Hello my name is robinson"
}
Record:
{
name: "robinson"
}
I am trying to make rules to categorise strings based on their content. Any help is much appreciated. Content may not always be broken down into words, otherwise I could have just done a split by space and searched for each one.

With a Text index you should be able to find documents from a phrase text search.
http://docs.mongodb.org/manual/reference/operator/query/text/#match-any-of-the-search-terms
If the search string is a space-delimited string, $text operator performs a logical OR search on each term and returns documents that contains any of the terms.
In your example, you create an index in the "name" field of your collection:
db.collection.createIndex( { name: "text" } )
Then you can query with the $text operator:
db.collection.find({$text: { $search: "Hello my name is robinson"}})
As stated in the docs, the query returns documents that contains "Hello or my or name or is or robinson".

Related

Text Searching and Text Indexing for nested fields in MongoDB

I have a document structure that looks something like this
{
name:
hobbies:[
{ tag: "food", description: "eating"},
{ tag: "soccer", description: "PL"}
]
}
Is it possible to achieve Text Indexing only on the tag subfield, so that I can attempt text searches with only the tag subfield being checked?
Currently I'm trying but it definitely ends up checking the description tag.
db.users.createIndex({"hobbies" : "text"})
Thanks for your time.

So I was able to get past this using multikey indexes which basically allows us to create an index for every element of the array. And in my case, I used multikey indexes on array fields that contain nested objects which worked liked this
db.inventory.createIndex( { "hobbies.tag": "text" } )
You can read more about this from the docs here MongoDB Multikey Index Docs

Find and update case insensitive data in MongoDB [duplicate]

Example:
> db.stuff.save({"foo":"bar"});
> db.stuff.find({"foo":"bar"}).count();
1
> db.stuff.find({"foo":"BAR"}).count();
0

You could use a regex.
In your example that would be:
db.stuff.find( { foo: /^bar$/i } );
I must say, though, maybe you could just downcase (or upcase) the value on the way in rather than incurring the extra cost every time you find it. Obviously this wont work for people's names and such, but maybe use-cases like tags.

UPDATE:
The original answer is now obsolete. Mongodb now supports advanced full text searching, with many features.
ORIGINAL ANSWER:
It should be noted that searching with regex's case insensitive /i means that mongodb cannot search by index, so queries against large datasets can take a long time.
Even with small datasets, it's not very efficient. You take a far bigger cpu hit than your query warrants, which could become an issue if you are trying to achieve scale.
As an alternative, you can store an uppercase copy and search against that. For instance, I have a User table that has a username which is mixed case, but the id is an uppercase copy of the username. This ensures case-sensitive duplication is impossible (having both "Foo" and "foo" will not be allowed), and I can search by id = username.toUpperCase() to get a case-insensitive search for username.
If your field is large, such as a message body, duplicating data is probably not a good option. I believe using an extraneous indexer like Apache Lucene is the best option in that case.

Starting with MongoDB 3.4, the recommended way to perform fast case-insensitive searches is to use a Case Insensitive Index.
I personally emailed one of the founders to please get this working, and he made it happen! It was an issue on JIRA since 2009, and many have requested the feature. Here's how it works:
A case-insensitive index is made by specifying a collation with a strength of either 1 or 2. You can create a case-insensitive index like this:
db.cities.createIndex(
{ city: 1 },
{
collation: {
locale: 'en',
strength: 2
}
}
);
You can also specify a default collation per collection when you create them:
db.createCollection('cities', { collation: { locale: 'en', strength: 2 } } );
In either case, in order to use the case-insensitive index, you need to specify the same collation in the find operation that was used when creating the index or the collection:
db.cities.find(
{ city: 'new york' }
).collation(
{ locale: 'en', strength: 2 }
);
This will return "New York", "new york", "New york" etc.
Other notes
The answers suggesting to use full-text search are wrong in this case (and potentially dangerous). The question was about making a case-insensitive query, e.g. username: 'bill' matching BILL or Bill, not a full-text search query, which would also match stemmed words of bill, such as Bills, billed etc.
The answers suggesting to use regular expressions are slow, because even with indexes, the documentation states:
"Case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilize case-insensitive indexes."
$regex answers also run the risk of user input injection.

If you need to create the regexp from a variable, this is a much better way to do it: https://stackoverflow.com/a/10728069/309514
You can then do something like:
var string = "SomeStringToFind";
var regex = new RegExp(["^", string, "$"].join(""), "i");
// Creates a regex of: /^SomeStringToFind$/i
db.stuff.find( { foo: regex } );
This has the benefit be being more programmatic or you can get a performance boost by compiling it ahead of time if you're reusing it a lot.

Keep in mind that the previous example:
db.stuff.find( { foo: /bar/i } );
will cause every entries containing bar to match the query ( bar1, barxyz, openbar ), it could be very dangerous for a username search on a auth function ...
You may need to make it match only the search term by using the appropriate regexp syntax as:
db.stuff.find( { foo: /^bar$/i } );
See http://www.regular-expressions.info/ for syntax help on regular expressions

db.company_profile.find({ "companyName" : { "$regex" : "Nilesh" , "$options" : "i"}});

db.zipcodes.find({city : "NEW YORK"}); // Case-sensitive
db.zipcodes.find({city : /NEW york/i}); // Note the 'i' flag for case-insensitivity

TL;DR
Correct way to do this in mongo
Do not Use RegExp
Go natural And use mongodb's inbuilt indexing , search
Step 1 :
db.articles.insert(
[
{ _id: 1, subject: "coffee", author: "xyz", views: 50 },
{ _id: 2, subject: "Coffee Shopping", author: "efg", views: 5 },
{ _id: 3, subject: "Baking a cake", author: "abc", views: 90 },
{ _id: 4, subject: "baking", author: "xyz", views: 100 },
{ _id: 5, subject: "Café Con Leche", author: "abc", views: 200 },
{ _id: 6, subject: "Сырники", author: "jkl", views: 80 },
{ _id: 7, subject: "coffee and cream", author: "efg", views: 10 },
{ _id: 8, subject: "Cafe con Leche", author: "xyz", views: 10 }
]
)
Step 2 :
Need to create index on whichever TEXT field you want to search , without indexing query will be extremely slow
db.articles.createIndex( { subject: "text" } )
step 3 :
db.articles.find( { $text: { $search: "coffee",$caseSensitive :true } } ) //FOR SENSITIVITY
db.articles.find( { $text: { $search: "coffee",$caseSensitive :false } } ) //FOR INSENSITIVITY

One very important thing to keep in mind when using a Regex based query - When you are doing this for a login system, escape every single character you are searching for, and don't forget the ^ and $ operators. Lodash has a nice function for this, should you be using it already:
db.stuff.find({$regex: new RegExp(_.escapeRegExp(bar), $options: 'i'})
Why? Imagine a user entering .* as his username. That would match all usernames, enabling a login by just guessing any user's password.

Suppose you want to search "column" in "Table" and you want case insensitive search. The best and efficient way is:
//create empty JSON Object
mycolumn = {};
//check if column has valid value
if(column) {
mycolumn.column = {$regex: new RegExp(column), $options: "i"};
}
Table.find(mycolumn);
It just adds your search value as RegEx and searches in with insensitive criteria set with "i" as option.

Mongo (current version 2.0.0) doesn't allow case-insensitive searches against indexed fields - see their documentation. For non-indexed fields, the regexes listed in the other answers should be fine.

For searching a variable and escaping it:
const escapeStringRegexp = require('escape-string-regexp')
const name = 'foo'
db.stuff.find({name: new RegExp('^' + escapeStringRegexp(name) + '$', 'i')})
Escaping the variable protects the query against attacks with '.*' or other regex.
escape-string-regexp

The best method is in your language of choice, when creating a model wrapper for your objects, have your save() method iterate through a set of fields that you will be searching on that are also indexed; those set of fields should have lowercase counterparts that are then used for searching.
Every time the object is saved again, the lowercase properties are then checked and updated with any changes to the main properties. This will make it so you can search efficiently, but hide the extra work needed to update the lc fields each time.
The lower case fields could be a key:value object store or just the field name with a prefixed lc_. I use the second one to simplify querying (deep object querying can be confusing at times).
Note: you want to index the lc_ fields, not the main fields they are based off of.

Using Mongoose this worked for me:
var find = function(username, next){
User.find({'username': {$regex: new RegExp('^' + username, 'i')}}, function(err, res){
if(err) throw err;
next(null, res);
});
}

If you're using MongoDB Compass:
Go to the collection, in the filter type -> {Fieldname: /string/i}
For Node.js using Mongoose:
Model.find({FieldName: {$regex: "stringToSearch", $options: "i"}})

The aggregation framework was introduced in mongodb 2.2 . You can use the string operator "$strcasecmp" to make a case-insensitive comparison between strings. It's more recommended and easier than using regex.
Here's the official document on the aggregation command operator: https://docs.mongodb.com/manual/reference/operator/aggregation/strcasecmp/#exp._S_strcasecmp .

You can use Case Insensitive Indexes:
The following example creates a collection with no default collation, then adds an index on the name field with a case insensitive collation. International Components for Unicode
/* strength: CollationStrength.Secondary
* Secondary level of comparison. Collation performs comparisons up to secondary * differences, such as diacritics. That is, collation performs comparisons of
* base characters (primary differences) and diacritics (secondary differences). * Differences between base characters takes precedence over secondary
* differences.
*/
db.users.createIndex( { name: 1 }, collation: { locale: 'tr', strength: 2 } } )
To use the index, queries must specify the same collation.
db.users.insert( [ { name: "Oğuz" },
{ name: "oğuz" },
{ name: "OĞUZ" } ] )
// does not use index, finds one result
db.users.find( { name: "oğuz" } )
// uses the index, finds three results
db.users.find( { name: "oğuz" } ).collation( { locale: 'tr', strength: 2 } )
// does not use the index, finds three results (different strength)
db.users.find( { name: "oğuz" } ).collation( { locale: 'tr', strength: 1 } )
or you can create a collection with default collation:
db.createCollection("users", { collation: { locale: 'tr', strength: 2 } } )
db.users.createIndex( { name : 1 } ) // inherits the default collation

I'm surprised nobody has warned about the risk of regex injection by using /^bar$/i if bar is a password or an account id search. (I.e. bar => .*#myhackeddomain.com e.g., so here comes my bet: use \Q \E regex special chars! provided in PERL
db.stuff.find( { foo: /^\Qbar\E$/i } );
You should escape bar variable \ chars with \\ to avoid \E exploit again when e.g. bar = '\E.*#myhackeddomain.com\Q'
Another option is to use a regex escape char strategy like the one described here Javascript equivalent of Perl's \Q ... \E or quotemeta()

Use RegExp,
In case if any other options do not work for you, RegExp is a good option. It makes the string case insensitive.
var username = new RegExp("^" + "John" + "$", "i");;
use username in queries, and then its done.
I hope it will work for you too. All the Best.

If there are some special characters in the query, regex simple will not work. You will need to escape those special characters.
The following helper function can help without installing any third-party library:
const escapeSpecialChars = (str) => {
return str.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");
}
And your query will be like this:
db.collection.find({ field: { $regex: escapeSpecialChars(query), $options: "i" }})
Hope it will help!

Using a filter works for me in C#.
string s = "searchTerm";
var filter = Builders<Model>.Filter.Where(p => p.Title.ToLower().Contains(s.ToLower()));
var listSorted = collection.Find(filter).ToList();
var list = collection.Find(filter).ToList();
It may even use the index because I believe the methods are called after the return happens but I haven't tested this out yet.
This also avoids a problem of
var filter = Builders<Model>.Filter.Eq(p => p.Title.ToLower(), s.ToLower());
that mongodb will think p.Title.ToLower() is a property and won't map properly.

I had faced a similar issue and this is what worked for me:
const flavorExists = await Flavors.findOne({
'flavor.name': { $regex: flavorName, $options: 'i' },
});

Yes it is possible
You can use the $expr like that:
$expr: {
$eq: [
{ $toLower: '$STRUNG_KEY' },
{ $toLower: 'VALUE' }
]
}
Please do not use the regex because it may make a lot of problems especially if you use a string coming from the end user.

I've created a simple Func for the case insensitive regex, which I use in my filter.
private Func<string, BsonRegularExpression> CaseInsensitiveCompare = (field) =>
BsonRegularExpression.Create(new Regex(field, RegexOptions.IgnoreCase));
Then you simply filter on a field as follows.
db.stuff.find({"foo": CaseInsensitiveCompare("bar")}).count();

These have been tested for string searches
{'_id': /.*CM.*/} ||find _id where _id contains ->CM
{'_id': /^CM/} ||find _id where _id starts ->CM
{'_id': /CM$/} ||find _id where _id ends ->CM
{'_id': /.*UcM075237.*/i} ||find _id where _id contains ->UcM075237, ignore upper/lower case
{'_id': /^UcM075237/i} ||find _id where _id starts ->UcM075237, ignore upper/lower case
{'_id': /UcM075237$/i} ||find _id where _id ends ->UcM075237, ignore upper/lower case

For any one using Golang and wishes to have case sensitive full text search with mongodb and the mgo godoc globalsign library.
collation := &mgo.Collation{
Locale: "en",
Strength: 2,
}
err := collection.Find(query).Collation(collation)

As you can see in mongo docs - since version 3.2 $text index is case-insensitive by default: https://docs.mongodb.com/manual/core/index-text/#text-index-case-insensitivity
Create a text index and use $text operator in your query.

Check if field is a substring of a longer string with MongoDB

I'm trying to find a way to return a document based on wether or not a field is a substring of a given string.
I got a prototype working that basically fetches everything from the collection and then does the needed logic in code. In code I can find what I want by iterating over every document and then returning a document based on search.includes(field). This is obviously not an ideal solution as fetching every document in a collection is an expensive operation that won't scale well.
Next thing I did was looking at text search using MongoDB indexes. This kind of works but it returns documents even if the field isn't a complete substring of the search.
Is there any way I can construct a query that checks if a field on a document is an exact substring of a given string?
As an example, here's three documents similar to those in my collection:
{
"_id": ObjectId("5b893f36e7e6ab1a88f87b39"),
"trigger": "hello",
"response": "World"
}
{
"_id": ObjectId("5b6ca6169cc009573bbc3571"),
"trigger": "stackoverflow",
"response": "Is awesome!"
}
{
"_id": ObjectId("5b6ca6169cc009573bbc3571"),
"trigger": "foo bar",
"response": "barfoo"
}
These are some cases with the output I expect:
The search strings stack or stackexchange should not return any documents as there is no trigger field which is a perfect substring of those.
The string hello stackexchange should get you only the first document as the trigger field is a substring of the search string.
The string hello stackoverflow would get you both documents as they both have a trigger field which is a substring of the search string.
EDIT: The query also has to deal with the fact that the trigger field may contain spaces. So the string foo bar foobar should match the last document but the string foo should not.
Any help is much appreciated!

After a quite a bit of trial and error, I've found a way to achieve what I wanted. By using $indexOfBytes in a $gt, I was able to check if trigger existed as a substring in the search string by seeing if the result of $indexOfBytes was greater than -1. Here is my final Mongoose query:
Collection.find({
$expr: {
$gt: [
{
$indexOfBytes: [
search,
"$trigger"
]
},
-1
]
}
});

Text search whitespace escape

I'm using nodeJs Mongoose to perform text search;
var mongoose = require('mongoose');
var config = require('../config');
var mongoosePaginate = require('mongoose-paginate');
var poiSchema = mongoose.Schema({
city:String,
cap:String,
country:String,
address: String,
description: String,
latitude: Number,
longitude: Number,
title: String,
url: String,
images:Array,
freeText:String,
owner:String,
});
poiSchema.index({'$**': 'text'});
poiSchema.plugin(mongoosePaginate);
mongoose.Promise = global.Promise;
mongoose.connect(config.database);
module.exports = mongoose.model('Poi', poiSchema);
As you can see here
poiSchema.index({'$**': 'text'});
I create a text index on every field inside my schema.
When I try to perform a text search, I develop this code:
var term = "a search term";
var query = {'$text':{'$search': term}};
Poi.paginate(query, {}, function(err, pois) {
if(!pois){
pois = {
docs:[],
total:0
};
}
res.json({search:pois.docs,total:pois.total});
});
Unfortunately, when I use whitespace inside term search, it will fetch all documents inside the collection that matches every single field inside term search split by whitespace.
I imagine that text index has as tokenizer whitespace;
I need to know how to escape whitespace in order to search every field that has the entire term search without splitting it.
I tried replacing whitespace with \\ but nothing changes.
Could please someone help me?

MongoDB allows text search queries on string content with support for case insensitivity, delimiters, stop words and stemming. The terms in your search string are, by default, OR'ed. From the docs, the $search string is ...
A string of terms that MongoDB parses and uses to query the text index. MongoDB performs a logical OR search of the terms unless specified as a phrase.
So, if at least one term in your $search string matches then MongoDB returns that document and MongoDB searches using all terms (where a term is a string separated by whitespace).
You can change this behaviour by specifying a phrase, you do this by enclosing multiple terms in quotes. In your question, I think you want to search for the exact phrase: a search term so just enclose that phrase in escaped string quotes.
Here are some examples:
Given these documents:
{ "_id" : ..., "name" : "search" }
{ "_id" : ..., "name" : "term" }
{ "_id" : ..., "name" : "a search term" }
The following queries will return ...
// returns the third document because that is the only
// document which contains the phrase: 'a search term'
db.collection.find({ $text: { $search: "\"a search term\"" } })
// returns all three documents because each document contains
// at least one of the 3 terms in this search string
db.collection.find({ $text: { $search: "a search term" } })
So, in summary you "escape whitespace" by enclosing your set of search terms in escaped string quotes ... instead of "a search term" use "\"a search term\"".

How to use same field multiple times in MongoDB find query in NodeJS

I have a MongoDB collection with records having a "name" field, I am trying to perform a find query where the name field appears twice in the query. I want to exclude certain names, via $nin, and perform regex search for other names. It doesn't seem to be working, as it returns all records. If I just have the regex search or the $nin search, it works as expected.
db.users.find({name:{$nin:[current_user]}).cb(array) - works
db.users.find({name:new RegExp(/query/)}).cb(array) - works
db.users.find({name:{$nin:[current_user]}, name:new RegExp(/query/)}).cb(array) - does NOT work, the current_user is NOT excluded from the find result.
I have a feeling, the find command takes the last query for multiple occurrences of the same field, is that so? And how do I get around it?
Thanks for help,
Gary

Your query JSON object contains name field two times, and it breaks the query. Pay attention to the $and mongo query operator. There are two ways to construct correct query:
1) db.users.find({ $and: [{ name: { $nin: [current_user] } }, { name: { $regex: new RegExp(/query/) } }] })
2) db.users.find({ name: { $nin: [current_user], $regex: new RegExp(/query/) } })
Also, if you exclude only one user, you can use $ne operator instead of $nin.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

MongoDB find field that is part of a longer string - node.js

Related

Text Searching and Text Indexing for nested fields in MongoDB

Find and update case insensitive data in MongoDB [duplicate]

Check if field is a substring of a longer string with MongoDB

Text search whitespace escape

How to use same field multiple times in MongoDB find query in NodeJS

Categories

Resources