spring-data pagination real query - pagination

I have a repository with a simple query with pagination :
Page<MyBean> findMyBeans(String name, Pageable pageable);
My question is :
The pagination will limit to 20 (by default) the query to the mongo (like a limit in mysql) or it will retrieve all data from mongo and retun only 20 results to the caller ?
Thanks

It will limit a cursor to 20. Specifically it would create a DBCursor with the value limit to 20 as next:
Cursor id=0, ns=test.myCollection, query={ }, numIterated=0, limit=100, readPreference=primary
This is the same if you use the Java Mongo DB Driver and declares a DBCursor as:
DBCursor myCursor=myCollection.find().limit(20);
So using MongoDB directly native cursor would be (it iterates in 20 to 20)(First call first 20, then 21-40, and so on):
Cursor cursor = myCollection.find();
cursor.limit(20);

Related

How to perform multi-dimensional queries in CouchDB/Cloudant

I'm a beginner with CouchDB/Cloudant and I would like some expert advice on the most appropriate method of performing multidimensional queries.
Example...
My documents are like this
{
_id: 79f14b64c57461584b152123e3924516,
lat: -71.05204477,
lng: 42.36674199,
time: 1531500769,
tileX: 5,
tileY: 10,
lod: 7,
val1: 200.1,
val2: 101.5,
val3: 50
}
lat, lng, and time are the query parameters and they will be queried as ranges.
For example fetch all the documents that have
lat_startkey = -70 & lat_endkey = -72 AND
lng_startkey = 50 & lng_endkey = 40 AND
time_startkey = 1531500769 & time_endkey = 1530500000
I will also query using time as a range, and tileX, tileY, lod as exact values
For example
tileX = 5 AND
tileY = 10 AND
lod = 7 AND
time_startkey = 1531500769 & time_endkey = 1530500000
I've been reading about Views (map reduce), and I guess for the first type of query I could create a View each for time, lat, lng. My client could then perform 3 separate range queries, one against each View, and then in the client perform an intersection (inner join) of the resulting document id's. However this is obviously moving some of the processing outside of CouchDB, and I was hoping I could do this all within CouchDB itself.
I have also just found that CouchSearch (json/lucene), and n1ql exist... would these be of any help?
You should be able to use the N1QL query language for queries like this with no problems. N1QL is only available for Couchbase, not the CouchDB project that Couchbase grew out of.
For example, if I understand your first query there, you could write it like this in N1QL:
SELECT *
FROM datapoints
WHERE lat BETWEEN -72 AND -70 AND
lng BETWEEN 40 AND 50 AND
time BETWEEN 1531500769 AND 1530500000
To run such a query efficiently, you'll need an index, like this:
CREATE INDEX lat_long_time_idx ON datapoints(lat, lng, time)
You can find out more about N1QL here:
https://query-tutorial.couchbase.com/tutorial/#1
Sadly CouchDB is extremely poor at handling these sorts of multi-dimensional queries. You can have views on any of the axes but there is no easy way to retrieve the intersection, as you describe.
However an extension was written in the early days of that project to handle GeoSpatial queries (lat, long) called GeoCouch and that extension has been included in the Cloudant platform that you seem to be using. That means that you can do direct queries on the lat/long combination, just not the time axis using the GeoJSON format: https://console.bluemix.net/docs/services/Cloudant/api/cloudant-geo.html#cloudant-nosql-db-geospatial
However Cloudant also has another query system - Query: https://console.bluemix.net/docs/services/Cloudant/api/cloudant_query.html#query
Under this system you can build an arbitary index over your documents and then query for documents having certain criteria. For example this query selector will find documents with years in the range 1900-1903:
{
"selector": {
"year": {
"$gte": 1900
},
"year": {
"$lte": 1903
}
},
So it looks to me as if you could index the three values you care about (Lat, Long and Time) and build a 3 axis query in Cloudant. I have not tried that myself however.

cosmos db document query takes long time

i am new to cosmos-db and facing issues in querying the collection, i have a partitioned collection with 100000 RU/s(unlimited storage capacity). the partition is based on '/Bid' which a GUID. i am querying the collection based on the partition key value which has 10,000 records (the collection has more than 28,942,445 documents for different partitions). i am using the following query to get the documents but it takes around 50 seconds to execute the query which is not feasible.
object partitionkey = new object();
partitionkey = "2359c59a-f730-40df-865c-d4e161189f5b";
// Now execute the same query via direct SQL
var DistinctBColumn = this.client.CreateDocumentQuery<BColumn>(BordereauxColumnCollection.SelfLink, "SELECT * FROM BColumn_UL c WHERE c.BId = '2359c59a-f730-40df-865c-d4e161189f5b'",new FeedOptions { EnableCrossPartitionQuery=true, PartitionKey= new PartitionKey("2359c59a-f730-40df-865c-d4e161189f5b") }, partitionkey).ToList();
also tried with other querying options which too resulted in talking along 50 seconds.
But it takes less than a second for the same query on azure portal.
kindly help to optimize the query and correct me if i am wrong. Many Thanks.

not in query and select one field from second collection

My requirement is to count all the data whose particular id is not in reference collection. The equivalent SQL query would go as below:
select count(*) from tbl1 where tbl.arr.id not in (select id from tbl2)
I've tried as below, but got stuck up on fetching single field i.e. id from 2nd query.
db.coll1.find(
{$not:
{"arr.id":
{$in:
{db.coll2.find()}//how would I fetch a single column from
//2nd coll2
}
}
}
).count()
Also, Please note that arr.id is an ObjectId stored in collection coll1 and same will go with collection coll2. Should special care be taken while fetching the id like say ObjectId(id)?
Update - I am using mongo db version 3.0.9
I had to use $nin to check for not in condition and get the array in a different format as the version of mongodb was 3.0.9. Below is how I did it.
db.coll1.find({"arr.id":{$nin:[db.coll2.find({},["id"])]}}).count()
For mongodb v>=3.2 it would be as below
db.coll1.find({"arr.id":{$nin:[db.coll2.find({},"id")]}}).count()

MongoDB - too much data for sort() with no index. Full collection

I'm using Mongoose for Node.js to interface with the mongo driver, so my query looks like:
db.Deal
.find({})
.select({
_id: 1,
name: 1,
opp: 1,
dateUploaded: 1,
status: 1
})
.sort({ dateUploaded: -1 })
And get: too much data for sort() with no index. add an index or specify a smaller limit
The number of documents in the Deal collection is quite small, maybe ~500 - but each one contains many embedded documents. The fields returned in the query above are all primitive, i.e. aren't documents.
I currently don't have any indexes setup other than the default ones - I've never had any issue until now. Should I try adding a compound key on:
{ _id: 1, name: 1, opp: 1, status: 1, dateUploaded: -1 }
Or is there a smarter way to perform the query? First time using mongodb.
From the MongoDB documentation on limits and thresholds:
MongoDB will only return sorted results on fields without an index if the combined size of all documents in the sort operation, plus a small overhead, is less than 32 megabytes.
Probably all the embedded documents are too much, you should add an index on the sorted field dateUploaded if you want to run the same query.
Otherwise you can limit you query and start paginating the results.

Index multiple MongoDB fields, make only one unique

I've got a MongoDB database of metadata for about 300,000 photos. Each has a native unique ID that needs to be unique to protect against duplication insertions. It also has a time stamp.
I frequently need to run aggregate queries to see how many photos I have for each day, so I also have a date field in the format YYYY-MM-DD. This is obviously not unique.
Right now I only have an index on the id property, like so (using the Node driver):
collection.ensureIndex(
{ id:1 },
{ unique:true, dropDups: true },
function(err, indexName) { /* etc etc */ }
);
The group query for getting the photos by date takes quite a long time, as one can imagine:
collection.group(
{ date: 1 },
{},
{ count: 0 },
function ( curr, result ) {
result.count++;
},
function(err, grouped) { /* etc etc */ }
);
I've read through the indexing strategy, and I think I need to also index the date property. But I don't want to make it unique, of course (though I suppose it's fine to make it unique in combine with the unique id). Should I do a regular compound index, or can I chain the .ensureIndex() function and only specify uniqueness for the id field?
MongoDB does not have "mixed" type indexes which can be partially unique. On the other hand why don't you use _id instead of your id field if possible. It's already indexed and unique by definition so it will prevent you from inserting duplicates.
Mongo can only use a single index in a query clause - important to consider when creating indexes. For this particular query and requirements I would suggest to have a separate unique index on id field which you would get if you use _id. Additionally, you can create a non-unique index on date field only. If you run query like this:
db.collection.find({"date": "01/02/2013"}).count();
Mongo will be able to use index only to answer the query (covered index query) which is the best performance you can get.
Note that Mongo won't be able to use compound index on (id, date) if you are searching by date only. You query has to match index prefix first, i.e. if you search by id then (id, date) index can be used.
Another option is to pre aggregate in the schema itself. Whenever you insert a photo you can increment this counter. This way you don't need to run any aggregation jobs. You can also run some tests to determine if this approach is more performant than aggregation.

Resources