CosmosDB sometimes doesn't return documents - azure

I'm playing with CosmosDB as an alternative to MongoDB cluster but faced a strange issue.
I inserted several hundred documents into a collection. After that, I open Robo 3T tool (it's a tool for working with MongoDB) and try to query the top 50 documents, but it doesn't return anything. I even run .count() query and get 0 as a result. But after several tries, I can get these documents. When I restart Robo3T and do the same test, again several first requests don't return anything.
I have exactly the same issues when I read documents by C# application with MongoDB driver.
Did anybody face such issues? What I'm doing wrong?
Thank you in advance!

Related

How to bulk delete (say millions) of documents spread across millions of logical partitions in Cosmos db sql api?

MS Azure documentation does not talk anything about it. Formal bulk executor documentations talks only about insert and update options, not delete. There is a suggested java script server side program to create a stored procedure which sounds very good, but that requires us to input the partition key value. It wont make sense if our documents are spread across millions of logical partitions.
This is a very simple business need. While migrating huge volume of data in a sql api cosmos collection, if we insert some wrong data, there seems to be no option to delete other then restore to previous state. I have explored for few hrs now, but couldnt find a solution. Even raised a case with MS support, they directed to some .net code which I see need to see as that does not look straightforward. What if someone dont know .net.
Cant we easily bulk delete docs spread across several logical partitions in MS Cosmos SQL API ? Feels disgusting ..
I hope you can provide some accurate details. How to achieve this with some simple straight forward sample code and steps as well. Hope MS and Cosmos db experts to share views as well.
Even raised a case with MS support, they directed to some .net code
which I see need to see as that does not look straightforward.
Obviously,you have already made some efforts to find any solutions except below 2 scenarios:
1.Bulk delete Stored procedure:https://github.com/Azure/azure-cosmosdb-js-server/blob/master/samples/stored-procedures/bulkDelete.js
2.Bulk delete executor:
.NET: https://github.com/Azure/azure-cosmosdb-bulkexecutor-dotnet-getting-started/blob/master/BulkDeleteSample/BulkDeleteSample/Program.cs
Java: https://github.com/Azure/azure-cosmosdb-bulkexecutor-java-getting-started/blob/master/samples/bulkexecutor-sample/src/main/java/com/microsoft/azure/cosmosdb/bulkexecutor/bulkdelete/BulkDeleter.java
So far, only above official solutions are supported. Another workaround is TTL for cosmos db.I believe you have your own logic to judge which part of data is correct and which part of data is wrong,should be deleted. You could set TTL on those data so that they could be killed as soon as expired data arrivals.
Has anyone tried this .. looks like a good solution in java
https://github.com/Azure/azure-cosmosdb-bulkexecutor-java-getting-started#bulk-delete-api
If you write a batch job to do that delete documents over night by using some date configuration we could achieve it. Here is the article published on how to do it.
https://medium.com/#vaibhav.medavarapu/bulk-delete-documents-from-azure-cosmos-db-using-asp-net-core-8bc95dd20411

CouchDB replication ignoring sporadic documents

I've got a CouchDB setup (CouchDB 2.1.1) for my app, which relies heavily on replication integrity. We are using the "one db per user" approach, with an additional layer of "role" db:s that groups users like the image below.
Recently, while increasing the number of beta testers, we discovered that some documents had not been replicated as they should. We are unable to see any pattern in document size, creation/update time, user or other. The errors seem to happen sporadically, with 2-3 successfully replicated docs followed by 4-6 non-replicated docs.
The server responds with {"error":"not_found","reason":"missing"} on those docs.
Most (but not all) of the user documents has been replicated to the corresponding Role DB, but very few made it all the way to the Master DB. This never happened when testing with < 100 documents (now we're at 1000-1200 docs in the db).
I discovered a problem with the "max open files" setting mentioned in the Performance chapter in the docs and fixed it, but the non-replicated documents are still not replicating. If I open a document and save it, it will replicate.
This is my current theory:
The replication process tried to copy new documents when the user went online
The write process failed due to Linux's "max_open_files" peaked
The master DB still thinks the replication was successful
At a later replication, the master DB ignores those old documents and only tries to replicate new ones
Could this be correct? And can I somehow make the CouchDB server "double check" all documents and the integrity of previous replications?
Thank you for your time and any helpful comments!
I have experienced something similar in the past - when attempting to replicate documents without sufficient permissions the replication fails as it should do. But when the permissions issue is fixed the documents you attempted to replicate cannot then be replicated, although edit/save on the documents fixes the issue. I wonder if this is due to checkpoints? The CouchDb manual says about the "use_checkpoints" flag:
Disabling checkpoints is not recommended as CouchDB will scan the
Source database’s changes feed from the beginning.
Though scanning from the beginning sounds like it might fix the problem, so perhaps disabling checkpoints could help. I never got back to that issue at the time so I am afraid this is not a proper answer, just a suggestion.

How do you perform queries without specifying shard key in mongodbapi and how do you query across partitions?

How do you perform queries without specifying shard key in mongodb api and how do you query across partitions?
In sql api the latter is enabled by setting EnableCrossPartitionQuery to true on the request but I'm not able to find anything like that for the mongodb api. And my queries that work on an unsharded collection now fails(queries that specify the shard key works as expected).
The queries fail indiscriminately of whether I use the AsQueryable extension syntax or the aggregation framework.
As I know, no such property similar to EnableCrossPartitionQuery in CosmosDB Mongo API. In fact, CosmosDB is an independent server implementation that does not directly align with MongoDB server versions and features.
CosmosDB supports a subset of the MongoDB API and translates requests into the CosmosDB SQL equivalent. CosmosDB has some different behaviours and results, particularly with their implementation of partitioning as compared to MongoDB's sharding. But the onus is on CosmosDB to improve their emulation of MongoDB.
Certainly, you could add feedback here to get official assistance or consider using MongoDB Atlas on Azure if you'd like full MongoDB feature support.
Hope it helps you.
Was confirmed a bug by the Product Group team! Will be fixed in first two weeks of september in case anyone runs into the same problems in the mean time.

Connect multiple Nodejs clients to Mongodb

I am running an app on Heroku and I want to send a notification every morning at 8. I have about 200k users and it takes a long time and slows down my app, so I would like to separate the two and keep running my API on one instance, and have a separate instance just for sending the notification in the morning.
How can I have two Nodejs servers using the same Mongodb database (and therefore the same models).
I do not understand how to connect the two instances to the same database (I am using MLab on Heroku) without copying the model schema.
Because in that case, if I modify it on one instance, I would need to do the same to the other, and it doesn't make sense to me.
I hope it is clear enough.
Thank you
TGrif: thanks a lot, I just used MongoDB native driver, and it works. I've always used Mongoose and didnt even think about looking for alternatives.
Documentation here: https://github.com/mongodb/node-mongodb-native
Thank you

Issue with PrestoDB & MongoDB

I having some strange issues querying mongodb from presto CLI. I have my mongodb.properties set and connecting to 3 different databases as shown below.
connector.name=mongodb
mongodb.seeds=172.23.0.7:27017
mongodb.schema-collection=stage,configuration,hub
mongodb.credentials=<username>:<password>#stage,<username>:<password>#hub,<username>:<password>#configuration
None of the queries including show columns from <collection> or select count(*) from <collection> is not working on stage or hub and for collections in configuration too.
Question is, does Presto support these kind of queries on MongoDB. If yes, what could be the problem with my configuration or queries. Our intention is to compare data from Oracle to MongoDB.
Appreciate your help.
This is an old post, but I hope this is still useful for future users. You shouldn't be setting the mongodb.schema-collection as such. This property is meant to point to the mongo collection which describes the schema of other collections, typically defaulting to _schema when it exists. This is covered in the docs of most presto distributions, including prestodb.
This does not allow you to control which collections Presto will have access to, this must be done elsewhere (e.g. when setting up presto's user in the MongoDB cluster). Once correctly set up, Presto will be able to perform queries such as the ones in your example in all the collections it has access to.

Resources