Check if field is a substring of a longer string with MongoDB - node.js

I'm trying to find a way to return a document based on wether or not a field is a substring of a given string.
I got a prototype working that basically fetches everything from the collection and then does the needed logic in code. In code I can find what I want by iterating over every document and then returning a document based on search.includes(field). This is obviously not an ideal solution as fetching every document in a collection is an expensive operation that won't scale well.
Next thing I did was looking at text search using MongoDB indexes. This kind of works but it returns documents even if the field isn't a complete substring of the search.
Is there any way I can construct a query that checks if a field on a document is an exact substring of a given string?
As an example, here's three documents similar to those in my collection:
{
"_id": ObjectId("5b893f36e7e6ab1a88f87b39"),
"trigger": "hello",
"response": "World"
}
{
"_id": ObjectId("5b6ca6169cc009573bbc3571"),
"trigger": "stackoverflow",
"response": "Is awesome!"
}
{
"_id": ObjectId("5b6ca6169cc009573bbc3571"),
"trigger": "foo bar",
"response": "barfoo"
}
These are some cases with the output I expect:
The search strings stack or stackexchange should not return any documents as there is no trigger field which is a perfect substring of those.
The string hello stackexchange should get you only the first document as the trigger field is a substring of the search string.
The string hello stackoverflow would get you both documents as they both have a trigger field which is a substring of the search string.
EDIT: The query also has to deal with the fact that the trigger field may contain spaces. So the string foo bar foobar should match the last document but the string foo should not.
Any help is much appreciated!

After a quite a bit of trial and error, I've found a way to achieve what I wanted. By using $indexOfBytes in a $gt, I was able to check if trigger existed as a substring in the search string by seeing if the result of $indexOfBytes was greater than -1. Here is my final Mongoose query:
Collection.find({
$expr: {
$gt: [
{
$indexOfBytes: [
search,
"$trigger"
]
},
-1
]
}
});

Related

MongoDB nested array update multiple documents [duplicate]

I am trying to update a value in the nested array but can't get it to work.
My object is like this
{
"_id": {
"$oid": "1"
},
"array1": [
{
"_id": "12",
"array2": [
{
"_id": "123",
"answeredBy": [], // need to push "success"
},
{
"_id": "124",
"answeredBy": [],
}
],
}
]
}
I need to push a value to "answeredBy" array.
In the below example, I tried pushing "success" string to the "answeredBy" array of the "123 _id" object but it does not work.
callback = function(err,value){
if(err){
res.send(err);
}else{
res.send(value);
}
};
conditions = {
"_id": 1,
"array1._id": 12,
"array2._id": 123
};
updates = {
$push: {
"array2.$.answeredBy": "success"
}
};
options = {
upsert: true
};
Model.update(conditions, updates, options, callback);
I found this link, but its answer only says I should use object like structure instead of array's. This cannot be applied in my situation. I really need my object to be nested in arrays
It would be great if you can help me out here. I've been spending hours to figure this out.
Thank you in advance!
General Scope and Explanation
There are a few things wrong with what you are doing here. Firstly your query conditions. You are referring to several _id values where you should not need to, and at least one of which is not on the top level.
In order to get into a "nested" value and also presuming that _id value is unique and would not appear in any other document, you query form should be like this:
Model.update(
{ "array1.array2._id": "123" },
{ "$push": { "array1.0.array2.$.answeredBy": "success" } },
function(err,numAffected) {
// something with the result in here
}
);
Now that would actually work, but really it is only a fluke that it does as there are very good reasons why it should not work for you.
The important reading is in the official documentation for the positional $ operator under the subject of "Nested Arrays". What this says is:
The positional $ operator cannot be used for queries which traverse more than one array, such as queries that traverse arrays nested within other arrays, because the replacement for the $ placeholder is a single value
Specifically what that means is the element that will be matched and returned in the positional placeholder is the value of the index from the first matching array. This means in your case the matching index on the "top" level array.
So if you look at the query notation as shown, we have "hardcoded" the first ( or 0 index ) position in the top level array, and it just so happens that the matching element within "array2" is also the zero index entry.
To demonstrate this you can change the matching _id value to "124" and the result will $push an new entry onto the element with _id "123" as they are both in the zero index entry of "array1" and that is the value returned to the placeholder.
So that is the general problem with nesting arrays. You could remove one of the levels and you would still be able to $push to the correct element in your "top" array, but there would still be multiple levels.
Try to avoid nesting arrays as you will run into update problems as is shown.
The general case is to "flatten" the things you "think" are "levels" and actually make theses "attributes" on the final detail items. For example, the "flattened" form of the structure in the question should be something like:
{
"answers": [
{ "by": "success", "type2": "123", "type1": "12" }
]
}
Or even when accepting the inner array is $push only, and never updated:
{
"array": [
{ "type1": "12", "type2": "123", "answeredBy": ["success"] },
{ "type1": "12", "type2": "124", "answeredBy": [] }
]
}
Which both lend themselves to atomic updates within the scope of the positional $ operator
MongoDB 3.6 and Above
From MongoDB 3.6 there are new features available to work with nested arrays. This uses the positional filtered $[<identifier>] syntax in order to match the specific elements and apply different conditions through arrayFilters in the update statement:
Model.update(
{
"_id": 1,
"array1": {
"$elemMatch": {
"_id": "12","array2._id": "123"
}
}
},
{
"$push": { "array1.$[outer].array2.$[inner].answeredBy": "success" }
},
{
"arrayFilters": [{ "outer._id": "12" },{ "inner._id": "123" }]
}
)
The "arrayFilters" as passed to the options for .update() or even
.updateOne(), .updateMany(), .findOneAndUpdate() or .bulkWrite() method specifies the conditions to match on the identifier given in the update statement. Any elements that match the condition given will be updated.
Because the structure is "nested", we actually use "multiple filters" as is specified with an "array" of filter definitions as shown. The marked "identifier" is used in matching against the positional filtered $[<identifier>] syntax actually used in the update block of the statement. In this case inner and outer are the identifiers used for each condition as specified with the nested chain.
This new expansion makes the update of nested array content possible, but it does not really help with the practicality of "querying" such data, so the same caveats apply as explained earlier.
You typically really "mean" to express as "attributes", even if your brain initially thinks "nesting", it's just usually a reaction to how you believe the "previous relational parts" come together. In reality you really need more denormalization.
Also see How to Update Multiple Array Elements in mongodb, since these new update operators actually match and update "multiple array elements" rather than just the first, which has been the previous action of positional updates.
NOTE Somewhat ironically, since this is specified in the "options" argument for .update() and like methods, the syntax is generally compatible with all recent release driver versions.
However this is not true of the mongo shell, since the way the method is implemented there ( "ironically for backward compatibility" ) the arrayFilters argument is not recognized and removed by an internal method that parses the options in order to deliver "backward compatibility" with prior MongoDB server versions and a "legacy" .update() API call syntax.
So if you want to use the command in the mongo shell or other "shell based" products ( notably Robo 3T ) you need a latest version from either the development branch or production release as of 3.6 or greater.
See also positional all $[] which also updates "multiple array elements" but without applying to specified conditions and applies to all elements in the array where that is the desired action.
I know this is a very old question, but I just struggled with this problem myself, and found, what I believe to be, a better answer.
A way to solve this problem is to use Sub-Documents. This is done by nesting schemas within your schemas
MainSchema = new mongoose.Schema({
array1: [Array1Schema]
})
Array1Schema = new mongoose.Schema({
array2: [Array2Schema]
})
Array2Schema = new mongoose.Schema({
answeredBy": [...]
})
This way the object will look like the one you show, but now each array are filled with sub-documents. This makes it possible to dot your way into the sub-document you want. Instead of using a .update you then use a .find or .findOne to get the document you want to update.
Main.findOne((
{
_id: 1
}
)
.exec(
function(err, result){
result.array1.id(12).array2.id(123).answeredBy.push('success')
result.save(function(err){
console.log(result)
});
}
)
Haven't used the .push() function this way myself, so the syntax might not be right, but I have used both .set() and .remove(), and both works perfectly fine.

How to fuzzy query against multiple fields in elasticsearch?

Here's my query as it stands:
"query":{
"fuzzy":{
"author":{
"value":query,
"fuzziness":2
},
"career_title":{
"value":query,
"fuzziness":2
}
}
}
This is part of a callback in Node.js. Query (which is being plugged in as a value to compare against) is set earlier in the function.
What I need it to be able to do is to check both the author and the career_title of a document, fuzzily, and return any documents that match in either field. The above statement never returns anything, and whenever I try to access the object it should create, it says it's undefined. I understand that I could write two queries, one to check each field, then sort the results by score, but I feel like searching every object for one field twice will be slower than searching every object for two fields once.
https://www.elastic.co/guide/en/elasticsearch/guide/current/fuzzy-match-query.html
If you see here, in a multi match query you can specify the fuzziness...
{
"query": {
"multi_match": {
"fields": [ "text", "title" ],
"query": "SURPRIZE ME!",
"fuzziness": "AUTO"
}
}
}
Somewhat like this.. Hope this helps.

How can I create an autocomplete with MongoDB full text search

I want to create an autocomplete input box that shows word suggestions as users type.
Basically, my problem is that when I use the $text operator for searching strings in a document, the queries will only match on complete stemmed words. This is for the same reason that if a document field contains the word blueberry, a search on the term blue will not match the document. However, a search on either blueberry or blueberries would match.
find = {$text: { $search: 'blue' } };
^ (doesn't match blueberry or bluebird on a document.)
I want to be able to do this. I want to match 'blueberry' or 'bluebird' with 'blue', and initially I thought this was possible by using a 'starts with' (^) regular expression, but it seems like $text and $search only accepts a string; not a regexp.
I would like to know if there is a way to do this that is not excessively complex to implement/maintain. So far, I've only seen people trying to accomplish this by creating a new collection with the results of running a map/reduce across the collection with the text index.
I do not want to use ElasticSearch or Solr because I think it is overkill for what I am trying to do, and although I sometimes think that eventually I will have no other choice, I still cannot believe that there is not a simpler way to accomplish this.
MongoDB full text search matches whole words only, so it is inherently not suitable for auto complete.
The $text operator can search for words and phrases. The query matches on the complete stemmed words. For example, if a document field contains the word blueberry, a search on the term blue will not match the document. However, a search on either blueberry or blueberries will match.
(Source: http://docs.mongodb.org/manual/core/index-text/)
You can now use Atlas Search natively in MongoDB Atlas to achieve this. You will have to first add the autocomplete field mapping in your index definition before you can use the autocomplete operator to your query. This can be accomplished through the Visual Editor or the JSON editor - there's a tutorial which walks you through how to implement it.
Here's the index definition template from the docs:
{
"mappings": {
"dynamic": true|false,
"fields": {
"<field-name>": [
{
"type": "autocomplete",
"analyzer": "lucene.standard",
"tokenization": "edgeGram|rightEdgeGram|nGram",
"minGrams": <2>,
"maxGrams": <15>,
"foldDiacritics": true|false
}
]
}
}
}
And the query, where you can also specify support for typo-tolerance via the fuzzy parameter:
{
$search: {
"index": "<index name>", // optional, defaults to "default"
"autocomplete": {
"query": "<search-string>",
"path": "<field-to-search>",
"tokenOrder": "any|sequential",
"fuzzy": <options>,
"score": <options>
}
}
}

Elastic search synonym match involving numeric characters

I have documents indexed in elastic cluster with the below mapping. basically i have a field named model which holds car model names like "Silverado 2500HD", "Silverado 1500HD" "LX 350" etc etc.
POST /location-test-no-boost {
"settings":{
"analysis":{
"analyzer":{
"mysynonym":{
"tokenizer":"standard",
"filter":[
"standard","lowercase","stop","mysynonym"
],
"ignore_case":true
}
},
"filter":{
"mysynonym":{
"type":"synonym",
"synonyms": [
"2500 HD=>2500HD",
"chevy silverado=>Silverado"
]
}
}
}
},
"mappings":{
"vehicles":{
"properties":{
"id":{
"type":"long",
"ignore_malformed":true
},
"model":{
"type":"String",
"index_analyzer": "standard",
"search_analyzer":"mysynonym"
}
}
}
}
}
The sample document content is
POST /location-test-no-boost/vehicles/10
{
"model" : "Silverado 2500HD"
}
When i tried to search with Query string "Chevy sivlerado", the synonym matches perfectly to Silverado and gives back the result, on the contrary when i tried to search via query string "2500 HD" it is returning 0 results. I tried different combination on the Synonym involving number and found that elastic search synonym mapper does not support numbers is this correct?
is there any way i can make some mapping when user searches for "2500 HD", i can map the query to "2500HD"
Ok here's your problem:
You try to define a filter that try to merge "2500 HD" into "2500HD" for searching
But the analyzer will work like this:
Perform char_filter first (if any)
Perform tokenizer first, which is standard in your definition, hence "2500 HD" will be split into two terms 2500, HD
Perform filters after that, which will transform terms into 2500, hd. Your filter synonyms will be ignored because none of them matched the passed filter.
So when you query for "2500 HD", you actually search for 2500 and hd. And none of documents matched since the indexed terms is 2500hd.
I prefer you to replace your synonyms with word_delimiter filter, something like this:
"filter":{
"my_delimiter":{
"type":"word_delimiter",
"preserve_original": true
}
}
It will transform your document 2500HD into 2500hd, 2500, hd. And hence it will match the query "2500 HD", which will be transformed into 2500, hd. Please refer the document link to find out more options.
You dont need to define a synonym filter like that. If you actually want to transform like your current definitions, let define another tokenizer instead of using standard tokenizer.
P/S: You can install inquisitor plugin to see how terms will be analyzed: https://github.com/polyfractal/elasticsearch-inquisitor

MongoDB: Issue querying for only a specific subdocument

I'm new to MongoDB (and stackoverflow) - I've been trying to build real-time analytics with Mongo + Node.js.
I've created a document structure following the example at http://docs.mongodb.org/ecosystem/use-cases/pre-aggregated-reports/, but now I'm unable to query for just the "second" value with a given DateTime - I think the entire document is being returned because there is only one parent object. My structure looks like this, with only one document in the collection:
"hour" -> "minute" -> "second": value
doc:{
"0": {
"0": {
"0": 0,
"1": 0,
"2": 0,
"3": 0...
}
}
}
I've been looking into aggregate $unwind and $(projection), and I've created a string like "12.22.59" ("hh.mm.ss"), but I have no idea where to start.
I'd appreciate any help!
Thanks,
Kevin
Queries in MongoDB always match the whole document that has the matching fields, whether those fields are arrays, embedded in subdocuments, etc. You always query for the whole document, but you can use project to return just a part of the matching document. To return just the second value from your nested structure (as I interpret what that means), you would do
db.collection.find({ // your query }, { "_id" : 0, "0.0.1" : 1 })
I'm not 100% sure that this is what you're looking for. If not, could you edit the question to identify exactly what you want to be returned and let me know this isn't right with a comment?

Resources