Efficient text searching across multiple string fields in mongo using nodejs - node.js

There are like 10000 records like this that are stored in mongo. How to search the following efficiently in node.js. The search criteria can be based on any of the following 3 fields in mongo. The search criteria will come from front end and they can search userID or employeeName or department. For search only one input is going to be taken i.e only one field which can match any of these fields something like google search.
{
"userID": "01000900",
"employeeName": "A Abc",
"department": "Replenishment Boston"
},
{
"userID": "01001024",
"employeeName": "K Gbc",
"department": "Sales S-II - MA Core Urb"
},
{
"userID": "01001023",
"employeeName": "Ga Va",
"department": "Sales Phoenix"
},
{
"userID": "01000282",
"employeeName": "D Din",
"department": "Sales S-II - California - Me"
}

Assume searchText is the value which you are getting from client. Use regular expression.
{
$or:[
{userId:{$regex: searchText, $options: 'i'}},
{employeeName:{$regex: searchText, $options: 'i'}},
{department:{$regex: searchText, $options: 'i'}}
]
}

You can use Text index on all 3 fields like following:
db.collection.createIndex(
{
userID: "text",
employeeName: "text",
department: "text"
}
)
The query db.collection.find( { $text: { $search: "string from input" } } )
should do something like google search. Docs for text query syntax.

You can create index for each field. In this way the when you fins some data it will not scan the whole document. The search will be based on the index. So you will get the result instantaneously. Now for creating index. Suppose you want to create an index on employeeName then do the following:
db.collectionName.createIndex({employeeName:1});

Related

Cloudant Sorting on a nullable field

I want to sort on a field lets say name which is indexed in Cloudant DB. I am getting all the documents both which has this name field and which doesn't by using the index without sort . But when i try to sort with the name field I am not getting the documents which doesn't have this name field in the doc.
Is there any way to do this by using the query indexes. I want all the documents in sorted order which doesn't have the name field too.
For Example :
Below are some documents:
{
"_id": 1234,
"classId": "abc",
"name": "Happa"
}
{
"_id": 12345,
"classId": "abc",
"name": "Prasanth"
}
{
"_id": 123456,
"classId": "abc",
}
Below is the Query what i am trying to execute:
{
"selector": {
"classId": "abc",
"name" :{
"or" : [
{"$exists": true},{"$exists": false}
]
}
},
"sort": [{ "classId": "asc" }, { "name": "asc" }],
"use_index": "idx-classId_name"
},
I am expecting all the documents to be returned in a sorted order including the document which doesn't have that name field.
Your query makes no sense to me as it stands. You're requesting a listing of documents which either have, or don't have a specific field (meaning every document), and expecting to sort those on this field that may or may not exist. Such an order isn't defined out of the box.
I'd remove the name clause from the selector, sorting only on the classId field which appear in every document, and then do the secondary partial ordering on the client side, so you can decide how you intend to mix in the documents without the name field with those that have it.
Another solution is to use a view instead of a Cloudant Query index. I've not tested this, but hopefully the intent is clear:
function(doc) {
if (doc && doc.classId) {
var name = doc.name || "[notfound]";
emit(doc.classId+"-"+name, 1);
}
}
which will key the docs on "classId-name" and for docs with no name, a specified sentinel value.
Querying the view should return the documents lexicographically ordered on this compound key (which you can reverse with a query parameter if you wish).

How to define an index to use in a Mango Query

I am trying to create a CouchDB Mango Query with an index with the hope that the query runs faster. At the moment I have the following Mango Query which returns what I am looking for but it's slow. Therefore, I assume, I need to create an index to make it faster. I need help figuring out how to create that index.
selector: {
categoryIds: {
$in: categoryIds,
},
},
sort: [{ publicationDate: 'desc' }],
You can assume that my documents are let say news articles from different categories. Therefore in each document I have a field that contains one or more categories that the news article belongs to. For that I have an array of categoryIds for each document. My query needs to be optimized for queries like "Give me all news that have categoryId1 in their array of categoryIds sorted by publicationDate". What I don't know how to do is 1. How to define an index 2. What that index should be 3. How to use that index in "use_index" field of the Mango Query. Any help is appreciated.
Update after "Alexis Côté" answer:
If I define the index like this:
{
"_id": "_design/0f11ca4ef1ea06de05b31e6bd8265916c1bbe821",
"_rev": "6-adce50034e870aa02dc7e1e075c78361",
"language": "query",
"views": {
"categoryIds-json-index": {
"map": {
"fields": {
"categoryIds": "asc"
},
"partial_filter_selector": {}
},
"reduce": "_count",
"options": {
"def": {
"fields": [
"categoryIds"
]
}
}
}
}
}
And run the Mango Query like this:
{
"selector": {
"categoryIds": {
"$in": [
"e0bd5f97ac35bdf6893351337d269230"
]
}
},
"use_index": "categoryIds-json-index"
}
It still does return the results but they are not sorted in the order I want by publicationDate. So I am not clear what you are suggesting the solution is.
You can create an index as documented here
In your case, you will need an index on the "categoryIds" field.
You can specify the index using "use_index": "_design/<name>"
Note:The query planner should automatically pick this index if it's compatible.

Couchdb mango query speed

I have following type of documents:
{
"_id": "0710b1dd6cc2cdc9c2ffa099c8000f7b",
"_rev": "1-93687d40f54ff6ca72e66ca7fc99caff",
"date": "2018-06-04T07:46:08.848Z",
"topic": "some topic",
}
The collection is not very large. Only 20k documents.
However, the following query is very slow. Takes ca 5 secs!
{
selector: {
topic: 'some topic'
},
sort: ['date'],
}
I tried various indexes, e.g.
index: {
fields: ['topic', 'date']
}
but nothing really worked well.
What I am missing here?
When sorting in a Mango query, you need to ensure that the sort order you are asking for matches the index that you are using.
If you are indexing the data set in topic,date order then you can use the following query on "topic" to get the data out in data order using the index:
{
"selector": {
"topic": "some topic"
},
"sort": [
"topic",
"date"
]
}
Because the sort matches the form of the data in the index, the index is used to answer the query which should speed up your query time considerably.

Case insensitive search in mongodb and nodejs inside an array

I want to perform a tag search which has to be case insensitive against tag keywords. I need this for a single keyword search and how to do that for multiple keywords too. But the problem is when I search with following queries I am getting nothing. I am new to NodeJs and MongoDb so if there is any mistake in the queries please do rectify me.
The tags can be 'tag1' or 'TAG1' or 'taG1'.
for single tag keyword search I have used (I'm not getting any result):
db.somecollection.find({'Tags':{'TagText': new RegExp('Tag5',"i")}, 'Status':'active'})
for multiple tag keyword search (need to make this case insensitive too :( )
db.somecollection.find({'Tags':{'TagText': {"$in": ['Tag3','Tag5', 'Tag16']}}, 'Status':'active'})
the record-set in the db:
{
"results": {
"products": [
{
"_id": "5858cc242dadb72409000029",
"Permalink": "some-permalink-1",
"Tags": [
{"TagText":"Tag1"},
{"TagText":"Tag2"},
{"TagText":"Tag3"},
{"TagText":"Tag4"},
{"TagText":"Tag5"}
],
"Viewcount": 3791
},
{
"_id": "58523cc212dadb72409000029",
"Permalink": "some-permalink-2",
"Tags": [
{"TagText":"Tag8"},
{"TagText":"Tag2"},
{"TagText":"Tag1"},
{"TagText":"Tag7"},
{"TagText":"Tag2"}
],
"Viewcount": 1003
},
{
"_id": "5858cc242dadb11839084523",
"Permalink": "some-permalink-3",
"Tags": [
{"TagText":"Tag11"},
{"TagText":"Tag3"},
{"TagText":"Tag1"},
{"TagText":"Tag6"},
{"TagText":"Tag18"}
],
"Viewcount": 2608
},
{
"_id": "5850cc242dadb11009000029",
"Permalink": "some-permalink-4",
"Tags": [
{"TagText":"Tag14"},
{"TagText":"Tag12"},
{"TagText":"Tag4"},
{"TagText":"Tag5"},
{"TagText":"Tag7"}
],
"Viewcount": 6202
},
],
"count": 4
}
}
Create a text index for the field that you want search on. (Default is case insensitive)
db.somecollection.createIndex( { "Tags.TagText": "text" } )
For more options, https://docs.mongodb.com/v3.2/core/index-text/#index-feature-text
Make use $text operator in combination with $search for searching the content.
For more options, https://docs.mongodb.com/v3.2/reference/operator/query/text/#op._S_text
Search with single term
db.somecollection.find({$text: { $search: "Tag3"}});
Search with multiple search terms
db.somecollection.find({$text: { $search: "Tag3 Tag5 Tag16"}});
Update:
Looks like you are looking for case insensitive equality which can be easily achieved by regex. You'll not need text search. Drop the text search index.
Search with single term
db.somecollection.find({'Tags.TagText': {$regex: /^Tag3$/i}}).pretty();
Search with multiple search terms
db.somecollection.find({'Tags.TagText': {$in: [/^Tag11$/i, /^Tag6$/i]}}).pretty();

Query data where userID in multiples ID

I try to make a query and i don't know the right way to do this.
The mongo collection structure contains multiples user ID (uid) and i want to make a query that get all datas ("Albums") where the User ID match one of the uid.
I do not know if the structure of the collection is good for that and I would like to know if I should do otherwise.
{
"_id": ObjectId("55814a9799677ba44e7826d1"),
"album": "album1",
"pictures": [
"1434536659272.jpg",
"1434552570177.jpg",
"1434552756857.jpg",
"1434552795100.jpg"
],
"uid": [
"12814a8546677ba44e745d85",
"e745d677ba4412814e745d7b",
"28114a85466e745d677d85qs"
],
"__v": 0
}
I just searched on internet and found this documentation http://docs.mongodb.org/manual/reference/operator/query/in/ but I'm not certain that this is the right way.
In short, I need to know: if I use the right method for the stucture of the collection and the operator "$in" is the right solution (knowing that it may have a lot of "User ID": between 2 and 2000 maximum).
You don't need $in unless you are matching for more than one possible value in a field, and that field does not have to be an array. $in is in fact shorthand for $or.
You just need a simple query here:
Model.find({ "uid": "12814a8546677ba44e745d85" },function(err,results) {
})
If you want "multiple" user id's then you can use $in:
Model.find(
{ "uid": { "$in": [
"12814a8546677ba44e745d85",
"e745d677ba4412814e745d7b",
] } },
function(err,results) {
}
)
Which is short for $or in this way:
Model.find(
{
"$or": [
{ "uid": "12814a8546677ba44e745d85" },
{ "uid": "e745d677ba4412814e745d7b" }
]
},
function(err,results) {
}
)
Just to answer your question, you can use the below query to get the desired result.
db.mycollection.find( {uid : {$in : ["28114a85466e745d677d85qs"] } } )
However, you need to revisit your data structure, looks like its a Many-to-Many problem and you might need to think about introducing a mid collection for that.

Resources