mongo: "2d" index and normal index - node.js

location: {lat: Number,
lng: Number}
location is a 2d index in my mongodb and I have been using this for geospatial search, which is working fine.
Now if I need to search as db.find({lat:12.121212, lng:70.707070}), will it use the same index ? or, do I need to define a new index ? If so, how ?
I am using mongoose driver in node.js

The 2d index used for doing the geospatial commands is not going to help for an equivalency match on the two fields. For that you will need to define a compound index on the two sub-documents, something like this:
db.collection.ensureIndex({"location.lat" : 1, "location.lng" : 1})
This worked best for me with a test set of data - you can also define a normal index on the location field itself but that will be less efficient. You can test out the relative performance using hint and explain for any index combination. For example:
db.collection.find({"location.lat" : 179.45, "location.lng" : 90.23}).hint("location.lat_1_location.lng_1").explain()
You can do this for any index you wish in fact, though to check the results returned you will need to drop the .explain()
Please also bear in mind that a query can only use one index at a time, so if you are looking to combine the two (equivalency and a geospatial search) then the 2d index will be the only one used.
Note: all of the above examples are from the MongoDB JS shell, not node.js

Related

Optimise Pymongo query time

I am running a query on mongo db and looking for a solution(s) to optimise the time taken.
my query is like collection.find({'nameId':989080880,'Date':{'$gte':startDate}})
what I did is as below
pd.DataFrame(collection.find({'nameId':989080880,'Date':{'$gte':startDate}}))
this query took: x ms
then i tried
document=[]
for doc in collection.find({'nameId':989080880,'Date':{'$gte':startDate}}):
document.append(doc)
but it gave only a 15% improvement over x ms
Can not index as 'nameId' is a longinteger and indexing will require much more RAM etc.
looking forward to some suggestions
First is to optimize list using list comprehension, sth like:
document=[doc for doc in collection.find({'nameId':989080880,'Date':{'$gte':startDate}})]
Second, If you don't need all the field in your doc, just use projection, it an object containing all the fields u need to return.
with projection = { name: 1, address: 1} in your find query, all your documents only return name, address field, and also _id.
For more information, you can read here

Sum or Difference operation of two keys in document using Mongoengine

I have defined a model like
Class Orders(Document):
orderAmount = fields.FloatField()
cashbackAmount = fields.FloatField()
meta = {'strict': False}
I want to get all orders where (orderAmount - cashbackAmount value > 500). I am using Mongoengine and using that I want to perform this operation. I am not using Django Framework so I cannot use solutions of that.
Let's approach this if you had to do this without Mongoengine. You would start by dividing this problem into two steps
1) How to get the difference between two fields and output it as the new field?
2) How to filter all the documents based on that field's value?
You can see that it consists of several steps, so it looks like a great use case for the aggregation framework.
The first problem can be solved using addFields and subtract operators.
{$addFields: {difference: {$subtract: ["$a", "$b"]}}}
what can be translated into "for every document add a new field called difference where difference=a-b".
The second problem is a simple filtering:
{$match: {difference:{$gt: 500}}}
"give me all documents where difference field is greater than 500"
So the whole query in MongoDB would look like this
db.collectionName.aggregate([{$addFields: {difference: {$subtract: ["$a", "$b"]}}}, {$match: {difference:{$gt: 500}}}])
Now we have to translate it into Mongoengine. It turns out that there is aggregate method defined, so we can easily make small adjustments to make this query work.
Diff.objects.aggregate({"$addFields": {"difference": {"$subtract": ["$a", "$b"]}}}, {"$match": {"difference":{"$gt": 500}}})
As a result, you get CommandCursor. You can interact with that object or just convert it to the list, to get a list of dictionaries.

How to make a Mongo Find Query?

I come from MySQL world so mongo queries are a bit difficult to make considering I can't really make sense of mongo style queries. I am trying to make a query for finding a string. The problem is from my very primitive knowledge about mongodb queries, the query I made isn't working. I tried it in mongoose as well in mongo shell.
Schema:
mongoose.Schema({
doctorID : String,
patientIDList : Array // array of strings
});
Query Objective:
I want to find a doctor with doctorID and then look inside the patientIDList for an ID xxx. If the patientIDList doesn't contains xxx then add xxx in the list otherwise just add nothing.
Query:
The 2 queries I tried
MyModel.findOne({'doctorID':newAppointment.doctorID}, {'patientIDList' : newAppointment.patientID}, function(err){...});
MyModel.findOne({'doctorID': newAppointment.doctorID, 'patientIDList': newAppointment.patientID}, function(err){...});
What am I doing wrong? How can I make a query?
It's always a bit of challenge to switch from a SQL to NoSQL DB and other way around. What you are trying to do is check if a value exists in an array. If the array is a string array you can simply query for the value in array.
MyModel.findOne({doctorID : newAppointment.doctorID}, {patientIDList :newAppointment.doctorID}, function(err, res){
console.log(err, res);
})
Further read: https://docs.mongodb.com/manual/tutorial/query-documents/#match-an-array-element
Relevant Question: Find document with array that contains a specific value

Using more than one geospatial index per collection in MongoDB

Current MongoDB documentation states the following:
You may only have 1 geospatial index per collection, for now. While
MongoDB may allow to create multiple indexes, this behavior is
unsupported. Because MongoDB can only use one index to support a
single query, in most cases, having multiple geo indexes will produce
undesirable behavior.
However, when I create two geospatial indices in a collection (using Mongoose), they work just fine:
MySchema.index({
'loc1': '2d',
extraField1: 1,
extraField2: 1
});
MySchema.index({
'loc2': '2d',
extraField1: 1,
extraField2: 1
});
My question is this: while it seems to work, the MongoDB documentation says this could "produce undesirable behavior". So far, nothing undesirable has not yet been discovered neither in testing or use.
Should I be concerned about this? If the answer is yes then what would you recommend as a workaround?
It is still not supported, so even although you can create two of them, it doesn't mean they are actually used properly. I would investigate explain output, on the mongo shell and issue a few queries that make use of the loc and loc2 fields in a geospatial way. For example with:
use yourDbName
db.yourCollection.find( { loc: { $nearSphere: [ 0, 0 ] } } ).explain();
and:
db.yourCollection.find( { loc2: { $nearSphere: [ 0, 0 ] } } ).explain();
And then compare what the explain information gives you. You will likely see that only the first created geo index is used for both searches. There are a few tickets in JIRA for this that you might want to vote on:
https://jira.mongodb.org/browse/SERVER-2331
https://jira.mongodb.org/browse/SERVER-3653

Group and term query combination in Lucene

I am new to Lucene and I wanted I wanted to filter my search result based on 3 criterion:
value of field document_type should be Product
value of field brand_id should be 4
value of field family_id should be all of the values from (121, 232, 343)
So what I basically want is to have combinations like following in the search result:
document_type:Product AND brand_id:4 AND family_id:121
document_type:Product AND brand_id:4 AND family_id:232
document_type:Product AND brand_id:4 AND family_id:343
I thought document_type:Product AND brand_id:4 AND family_id:(121 232 343) should do the trick but while parsing this query standard analyzer makes Product to even when while indexing field document_type for value Product was set to Field.Index.NOT_ANALYZED and Field.Store.YES.
I was wondering if it is possible to create a boolean query by combining 3 possible queries for the given 3 cases.
I am quite new with Lucene, could someone help me with it?
Thanks.
Query.combine(Query[]) worked like a charm for the given situation.
The documentation for the given method is available here.
The query turned out to be like following once the combine was applied:
(+document_type:Product +brand_id:4 +family_id:121) (+document_type:Product +brand_id:4 +family_id:232) (+document_type:Product +brand_id:4 +family_id:343)
Thanks.

Resources