Is there a way to add an incrementing id in one statement in MongoDB? - python-3.x

So I got a small database, It's not going to grow much more and I'm trying to get one document from the db in an API that I implemented in python so that with a given document Id I retrieve the document in the db. However, I find it a little hard to put the user to write a random number from the db. All I require is a function that modifies each document by setting an id field and to Auto-Increment. As I said, it's not going to grow that much and the performance isn't really an issue here.
So far what I've been able to do is this:
var i = 0
db.MyCollection.update({},
{$set : {"new_field":1}},
{upsert:false,
multi:true}
i ++;),
I achieved to set an id field but it sets the same number to each document (the count of every document) So let's say that if the db has 10 docs, it'll set the Id to 10.

Find-and-modify operation returns the document updated (before or after the update depending on returnDocument setting). You can use this with $inc to implement a counter. Ruby example where c is a collection:
irb(main):005:0> c['foo'].insert_one(counter:true,count:1)
=> #<Mongo::Operation::Insert::Result:0x8040 documents=[{"n"=>1, "opTime"=>{"ts"=>#<BSON::Timestamp:0x00005609f260b7e0 #seconds=1594961771, #increment=2>, "t"=>1}, "electionId"=>BSON::ObjectId('7fffffff0000000000000001'), "ok"=>1.0, "$clusterTime"=>{"clusterTime"=>#<BSON::Timestamp:0x00005609f260b538 #seconds=1594961771, #increment=2>, "signature"=>{"hash"=><BSON::Binary:0x8060 type=generic data=0x0000000000000000...>, "keyId"=>0}}, "operationTime"=>#<BSON::Timestamp:0x00005609f260b290 #seconds=1594961771, #increment=2>}]>
irb(main):011:0> c['foo'].find_one_and_update({counter:true},{'$inc':{count:1}})
=> {"_id"=>BSON::ObjectId('5f112f6b2c97a6281f63f575'), "counter"=>true, "count"=>1}
irb(main):012:0> c['foo'].find_one_and_update({counter:true},{'$inc':{count:1}})
=> {"_id"=>BSON::ObjectId('5f112f6b2c97a6281f63f575'), "counter"=>true, "count"=>2}
irb(main):013:0> c['foo'].find_one_and_update({counter:true},{'$inc':{count:1}})
=> {"_id"=>BSON::ObjectId('5f112f6b2c97a6281f63f575'), "counter"=>true, "count"=>3}
irb(main):014:0> c['foo'].find_one_and_update({counter:true},{'$inc':{count:1}})
=> {"_id"=>BSON::ObjectId('5f112f6b2c97a6281f63f575'), "counter"=>true, "count"=>4}

Why not just use this logic? Instead of updating all via one query, just launch multiple queries one by one? Mongo will do it pretty fast, even if you have >1M docs in database (according to your phrase: I got a small database) because pre-builded index on _id field.
this is a javasript code, but I guess, you'll understand the logic of it
let all_documents = db.MyCollection.find({});
for (let i = 0; i < all_documents.length; i++) {
db.MyCollection.update({_id: all_documents[i]._id }, {$set : {"new_field": i}}, {upsert:false})
}

Related

To store Array of data to Azure Redis Cache [duplicate]

I have an array of Objects that I want to store in Redis. I can break up the array part and store them as objects but I am not getting how I can get somethings like
{0} : {"foo" :"bar", "qux" : "doe"}, {1} : {"name" "Saras", "age" : 23}
and then search the db based on name and get the requested key back. I need something like this. but can't come close to getting it right.
incr id //correct
(integer) 3
get id //correct
"3"
SADD id {"name" : "Saras"} //wrong
SADD myset {"name" : "Saras"} //correct
(integer) 1
First is getting this part right.
Second is somehow getting the key from the value i.e.
if name==="Saras"
then key=1
Which I find tough. Or I can store it directly as array of objects and use a simple for loop.
for (var i = 0; i < userCache.users.length; i++) {
if (userCache.users[i].userId == userId && userCache.users[i].deviceId == deviceId) {
return i;
}
}
Kindly suggest which route is best with some implementation?
The thing I found working was storing the key as a unique identifier and stringifying the whole object while storing the data and applying JSON.parse while extracting it.
Example code:
client
.setAsync(obj.deviceId.toString(), JSON.stringify(obj))
.then((doc) => {
return client.getAsync(obj.deviceId.toString());
})
.then((doc) => {
return JSON.parse(doc);
}).catch((err) => {
return err;
});
Though stringifying and then parsing it back is a computationally heavy operation and will block the Node.js server if the size of JSON becomes large. I am probably ready to take a hit for lesser complexity because I know my JSON wouldn't be huge, but that needs to be kept in mind while going for this approach.
Redis is pretty simple key-value storage. Yes, there are other data structures like sets, but it has VERY limited query capabilities. For example, if you want to get find data by name, then you would have to to something like that:
SET Name "serialized data of object"
SET Name2 "serialized data of object2"
SET Name3 "serialized data of object3"
then:
GET Name
would return data.
Of course this means that you can't store two entries with the same names.
You can do limited text matching on keys using: http://redis.io/commands/scan
To summarize: I think you should use other tool for complex queries.
The first issue you have, SADD id {"name" : "Saras"} //wrong, is obvious since the "id" key is not of type set, it is a string type.
In redis the only access point to data is through its key.
As kiss said, perhaps you should be looking for other tools.

Paginating a mongoose mapReduce, for a ranking algorithm

I'm using a MongoDB mapReduce to code a ranking feed algorithm, it almost works but the latest thing to implement is the pagination. The map reduce supports the results limitation but how could I implement the offset (skipping) based e.g. on the latest viewed _id of the results, knowing that I'm using mongoose?
This is the procedure I wrote:
o = {};
o.map = function() {
//log10(likes+comments) / elapsed hours from the post creation
emit(Math.log(this.likes + this.comments + 1) / Math.LN10 / Math.abs((now - this.createdAt) / 6e7 + 1), this);
};
o.reduce = function(key, values) {
//sort the values, when they have the same score
values.sort(function(a, b) {
a.createdAt - b.createdAt;
});
//serialize the values, because mongoose does not support multiple returned values
return JSON.stringify(values);
};
o.scope = {now: new Date()};
o.limit = 15;
Posts.mapReduce(o, function(err, results) {
if (err) return console.log(err);
console.log(results);
});
Also, if the mapReduce it's not the way to go, do you suggest other on how to implement something like this?
What you need is a page delimiter which is not the id of the latest viewed as you say, but your sorting property. In this case, it seems to be the formula Math.log(this.likes + this.comments + 1) / Math.LN10 / Math.abs((now - this.createdAt) / 6e7 + 1).
So, in your mapReduce query needs to hold a where value of that formula above. Or specifically, 'formula >= . And also it needs to hold the value of createdAt at the last page, since you don't sort by that. (Assuming createdAt is unique). So yourqueryof mapReduce would saywhere: theFormulaExpression, createdAt: { $lt: lastCreatedAt }`
If you do allow multiple identical createdAt values, you have to play a little outside of the database itself.
So you just search by formula.
Ideally, that gives you one element with exactly that value, and the next ones sorted after that. So in reply to the module caller, remove this first element off the array (and make sure you actually ask for more results then you need because of this).
Now, since you allow for multiple similar values, you need another identifying prop, say, object id or created_at. Your consumer (caller of this module) will have to provide both (last value of the score, createdAt of the last object). Say you have a page split exactly in the middle - one or more objects is on the previous page, another set on the next
. You'd have to not simply remove the top value (because that same score is already served on the previous page), but possibly several of them from the top.
Then it goes really crazy, because potentially your whole page was already served - compare the _ids, look for the first one after the one your module caller has provided you with. Or look into the data and determine how many matching values like that are there, try to get at least as many more values from mapReduce then you have on your actual page size.
Aside from that, I would do this with aggregation instead, it should be much more preformant.

mongodb: another "how to add a random record" thread

I've come across many of this same question here on StackOverflow. None providing a valid solid solution, so here we go:
I need to pick a random document from around 5 million documents in my MongoDB database in an efficient way.
I've tried getting the .count and using the .skip to get the random document, but it takes almost three seconds and very, very inefficient.
I can't make changes to the documents (like adding a "random") entry to each document or changing their _id's.
I've tried the solution of adding documents with an incremental _id (to pick a random _id to bypass using .skip) but this brought more headache than what it did when I try to add many documents in a short amount of time.
Adding data in an incremental way, or picking a random document, should not be this hard. I'm either missing some common knowledge, or doing something wrong, or this is what it really is..
Wanted to bring up the topic and get your responses.
Here is a way using the default ObjectId values for _id and a little math and logic.
// Get the "min" and "max" timestamp values from the _id in the collection and the
// diff between.
// 4-bytes from a hex string is 8 characters
var min = parseInt(db.collection.find()
.sort({ "_id": 1 }).limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
max = parseInt(db.collection.find()
.sort({ "_id": -1 })limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
diff = max - min;
// Get a random value from diff and divide/multiply be 1000 for The "_id" precision:
var random = Math.floor(Math.floor(Math.random(diff)*diff)/1000)*1000;
// work out a "random" _id value in the range:
var _id = new ObjectId(((min + random)/1000).toString(16) + "0000000000000000")
// Then query for the single document:
var randomDoc = db.collection.find({ "_id": { "$gte": _id } })
.sort({ "_id": 1 }).limit(1).toArray()[0];
That's the general logic in shell representation and easily adaptable.
So in points:
Find the min and max primary key values in the collection
Generate a random number that falls between the timestamps of those documents.
Add the random number to the minimum value and find the first document that is greater than or equal to that value.
This uses "padding" from the timestamp value in "hex" to form a valid ObjectId value since that is what we are looking for. Using integers as the _id value is essentially simplier but the same basic idea in the points.

How to retrieve all documents in couchdb database without causing out of memory

I have a coucdb database which contains about 200000 tweets, keys are tweet ID. I have a query which needs to retrieve all documents to look for some information. I'm using lightcouch to work with couchdb in a java web app. If I create a dbClient like this:
List<JsonObject>tweets = dbClient.view("_all_docs").query(JsonObject.class);
and then loop through tweets, for each JsonObject in tweets, use
JsonObject tweetJson = dbClient.find(JsonObject.class, tweet.get("id").toString().replaceAll("\"", ""));
to retrieve each tweet one by one it took extremely long time for 200000 documents. If I load all documents in one single query using includeDocs(true)
List<JsonObject>allTweets = dbClient.view("_all_docs").includeDocs(true).query(JsonObject.class);
it caused outofmemory exception since the number of documents are too large. So how can i deal with this problem? I'm thinking about using limit(5000) to retrieve 5000 documents for each time and loop through whole database, but I don't know how to write the loop to continue to retrieve the next 5000 after the first 5000 docs. One possible solution is using startKey and endKey but I'm confused how to use them when the key is tweet ID.
Use queryPage but make sure to use a String as the Key
See: https://github.com/lightcouch/LightCouch/issues/26#event-122327174
0.1.6 still seems to show this behaviour.
A workaround that I found for this goes something like this:
changes = DbClient.changes()
.since(null) // or... since(since) if you want an offset
.includeDocs(true);
int size = 1;
getCursor("0");
while (size > 0 ) {
ChangesResult resultSet = changes.limit(40000).getChanges();
List<ChangesResult.Row> rowList = resultSet.getResults();
for (ChangesResult.Row feed: rowList) {
<instantiate your object via gson>
.
.
.
}
getCursor(resultSet.getLastSeq());
size = rowList.size();
}

How to search and sort with CouchDB in one map function

I'm stumbling a bit with my CouchDB knowledge.
I have a database of content that is tagged with an array of tags and has a created date.
I want to create a view that pulls a limited number of newest stories tagged with a specific tag.
For example, the newest 6 stories tagged "Business."
Ran across this question, which seems to get me almost to where I need to go, but I'm missing one key element, which I think is how to craft the query string to sort by one key while searching by the other.
Here's my map function.
function(doc) {
if (doc.published == "yes" && doc.type == "news") {
for (var i = 0; i < doc.tags.length; i++) {
if (doc.tags[i]) {
emit([doc.created, doc.tags[i]], doc);
}
}
}
}
So how do I query that view for a all documents tagged "Business" that are the newest documents based on created.
The created attribute is a date sortable format.
First, I would switch the order of your emit:
emit([doc.tags[i], doc.created]);
(leave out doc as well, you can just add include_docs=true to get the entire document, and your view won't take up so much disk-space in the process)
Now you can query for the all the stories tagged as "Business" by using the following querystring:
startkey=["Business"]&endkey=["Business",{}]
You'll get all the documents with the tag business, and they'll be sorted by date.
This takes advantage of view collation, which basically is the rules governing how indexes are sorted/queried. For complex keys like this, the sorting is done for each item of the array separately. (ie. the first key is sorted first, the second key is sorted second, etc) This is why the order matters, as you must always move from left to right when querying a view index.
If you want the 6 most recent, your querystring will need to change:
descending=true&limit=6&endkey=["Business"]&startkey=["Business",{}]
NOTICE You need to swap the startkey/endkey values, due to how the descending parameter works. See the View reference page on the wiki for further explanation.
OK, I think I figured this out, but I'm not quite certain I fully understand it.
I found this story about complex keys and searching and sorting.
My map function looks like this:
function(doc) {
if (doc.published == "yes" && doc.type == "news") {
for (var i = 0; i < doc.tags.length; i++) {
if (doc.tags[i]) {
emit([doc.tags[i], doc.created], doc);
}
}
}
}
And to query and sort using it, the query looks like this.
http://localhost:5984/database/_design/story/_view/tagged?limit=10&startkey=["Business"]&endkey=["Business",{}]&descending=false
I'm getting the results I want, but I'm not entirely certain I understand it all.

Resources