Multiple Document Upsert in MongoEngine - python-3.x

I have been really struggling to get multi-doc upserts to work in Mongoengine for Python. I may not be thinking about the query structure correctly or how the mongoengine documents actually work.
Currently, I have to upsert each document individually and it's awfully inefficient. The process is that i'm receiving messages, generating mongoengine documents, storing them in a list, and then attempting to pass that to a collection, but it doesn't work at all.
code looks like this:
new_docs = []
for data in data_for_docs:
new_docs.append(create_doc(data))
Definitions.objects.update(new_docs, upsert=True, multi=True)
I've read a few posts that show the method to_mongo, but i'm not sure how that could help in this instance.

Related

How to get data from elastic search, if new data came then update it, and again inject it?

I have nearly 200 000 lines of tuples in my Pandas Dataframe. I injected that data into elastic search. Now, when I run the program It should check whether the present data already there in elastic search if not present insert into it.
I'd recommend to not worry about it and just load everything into Elasticsearch. As long as your _ids are consistent the existing documents will be overwritten instead of duplicated. So just be sure to specify an _id for each document and you are fine, the bulk helpers in the elasticsearch-py client all support you setting an _id value for each document alredy.

Can we post an array of objects with knex?

I'm using knex with node and I was thinking of making a field in the migration that would take an array of objects. I know simple stuff like
.string("bla")
.notNullable()
and so on, looking in the documentation can't seem to find it
wanting something like
.array_of_objects //field name
knex.schema.createTable('users', function (table) {
table.increments();
table.string('name');
table.timestamps();
//example
table.array
})
With table.specificType you can create which ever type of database column you like. You need to check from your database, which kind of columns it supports.
It would make easier to answer the question if you could tell what kind of SQL queries you are after. To be able to use knex, you need to know SQL. Knex will never abstract SQL away.
If you are looking for a database library for node.js that abstracts SQL away you might like to checkout sequelize.
Thanks for the post with the specificType. From what I saw PG didn't have an array of objects field, but did have an array of strings field. I ended up taking my array of objects and separating it into two string arrays one named keys and the other values. I stored both of those arrays in the database. Then when I did a get request I took the two arrays joined them together into an array of objects with some javascript tacking. Seems to do the trick.

solr-api accessing document field is taking a lot of time

I am trying to access a field in custom request handler. I am accessing it like this for each document:
Document doc;
doc = reader.document(id);
DocFields = doc.getValues("state");
There are around 600,000 documents in the solr. For a query running on all the docs, it is taking more than 65 seconds.
I have also tried SolrIndexSearcher.doc method, but it is also taking around 60 seconds.
Removing the above lines of code bring down the qtime to milliseconds. But, I need to access that field for my algo.
Is there a more optimised way to do this?
It seems that you querying one document at a time which is slow.
If you need to query all documents try to query *:*(instead of asking for a specific id) and then iterate over the results.

Mongoose bulk insert or update documents

I am working on a node.js app, and I've been searching for a way around using the Model.save() function because I will want to save many documents at the same time, so it would be a waste of network and processing doing it one by one.
I found a way to bulk insert. However, my model has two properties that makes them unique, an ID and a HASH (I am getting this info from an API, so I believe I need these two informations to make a document unique), so, I wanted that if I get an already existing object it would be updated instead of inserted into the schema.
Is there any way to do that? I was reading something about making concurrent calls to save the objects, using Q, however I still think this would generate an unwanted load on the Mongo server, wouldn't it? Does Mongo or Mongoose have a method to bulk insert or update like it does with insert?
Thanks in advance
I think you are looking for the Bulk.find(<query>).upsert().update(<update>) function.
You can use it this way:
bulk = db.yourCollection.initializeUnorderedBulkOp();
for (<your for statement>) {
bulk.find({ID: <your id>, HASH: <your hash>}).upsert().update({<your update fields>});
}
bulk.execute(<your callback>)
For each document, it will look for a document matching the {ID: <your id>, HASH: {your hash}} criteria. Then:
If it finds one, it will update that document using {<your update fields>}
Otherwise, it will create a new document
As you need, it will not make a connection to the mongo server on each iteration of the for loop. Instead a single call will be made on the bulk.execute() line.

How to efficiently bulk insert and update mongodb document values from an array?

I have a Tags collection which contains documents of the following structure:
{
word:"movie", //tag word
count:1 //count of times tag word has been used
}
I am given an array of new tags that need to be added/updated in the Tags collection:
["music","movie","book"]
I can update the counts all Tags currently existing in the tags collection by using the following query:
db.Tags.update({word:{$in:["music","movies","books"]}}, {$inc:{count:1}}), true, true);
While this is an effective strategy to update, I am unable to see which tag values were not found in the collection, and setting the upsert flag to true did not create new documents for the unfound tags.
This is where I am stuck, how should I handle the bulk insert of "new" values into the Tags collection?
Is there any other way I could better utilize the update so that it does upsert the new tag values?
(Note: I am using Node.js with mongoose, solutions using mongoose/node-mongo-native would be nice but not necessary)
Thanks ahead
The concept of using upsert and the $in operator simultaneously is incongruous. This simply will not work as there is no way to different between upsert if *any* in and upsert if *none* in.
In this case, MongoDB is doing the version you don't want it to do. But you can't make it change behaviour.
I would suggest simply issuing three consecutive writes by looping through the array of tags. I know that's it's annoying and it has a bad code smell, but that's just how MongoDB works.

Resources