I'm planing on having my database stored in Cloudant.
We do not plan to have replication into Cloudant, only outside for backup purposes.
Is it safe to assume that there should not be any conflict in documents from the inner-working of BigCouch?
It is safe to assume that the clustered "big-couch inspired" code we run at Cloudant does not normally create additional conflicts in your documents. If you want to become a power user you can read up on 'quorum' in docs.cloudant.com, but you can safely ignore that to first order.
Related
I'm utilizing a local database on my web server to sync certain data from external APIs. The local database would be used to serve the web application. The data I'm syncing is different for each user who would be visiting the web app. Since the sync job is periodically but continuously writing to the DB while users are accessing their data from the web page, I'm wondering what would give me the best performance here.
Since the sync job is continuously writing to the DB, I believe the collection is locked until it's done. I'm thinking that having multiple collections would help here since the lock would be on a particular collection that is being written to rather than on a single collection every time.
Is my thinking correct here? I basically don't want reads to get throttled since the write operation is continuously locking up one collection.
Collection level locking was never a thing in MongoDB. Before the WiredTiger storage engine arrived with MongoDB 4.x there were plenty of occcasions when the whole database would lock.
Nowdays with WiredTiger writing with multiple threads and/or processes to a single collection is extremely efficient. The right way to distribute a very heavy write load in MongoDB is to shard your collection.
To test a sharded vs unsharded config you can easily spin up both configurations in parallel with MongoDB Atlas.
There is an extensive amount of information regarding lock granularity and locking in MongoDB in general here.
In general, writing to multiple collections, for a small to medium value of "multiple", and assuming all of the collections are created in advance, can be faster than using a single collection, at the cost of queries becoming awkward as well as potentially slow if you have to perform joins via the aggregation pipeline instead of performing a single collection/index scan, for example.
If you have so many collections that there are so many files open that either the DB or the OS starts evicting files out of their respective caches, performance will start dropping again.
Creating collections may also be relatively slow, so if this happens under load it may not be very good for performance.
I'm quite new to CouchDB... let say that I have a multiple databases in my CouchDB, one per user, each of db have a one config document. I need to add a property to that document across all dbs. Is it possible to update that config document in all databases (not doing it one by one)? if yes what is the best way to achieve this?
If I'm reading your question correctly, you should be able to update the document in one database, and then use filtered replication to update the other databases (though you can't have modified the document in those other databases, otherwise you'll get a conflict).
Whether it makes sense for the specific use case, depends. If it's just a setting shared by all users, I'd probably create a shared settings database instead.
I do not think that is possible, and I don't think that is the intended use for CouchDB. Even if you have everything in a single database it's not possible to do it in a quick way (there is no equivalent to a SQL update/where statement).
I've setup an Azure batch process to read multiple csv files at the same time and write to Azure DocumentDb. I need a suggestion on the consistency level that fits the best for me.
I read through the consistency levels document(http://azure.microsoft.com/en-us/documentation/articles/documentdb-consistency-levels/) but am unable to relate my business case to the options provided in there.
My process
Get Document by Id
-If found then will pull a copy of the document, update changes and replace it.
-If not found, create a new entry.
if your writes and reads are from the same process (or you can share an instance of the documentclient) then session consistency will give you the best performance while ensuring you get consistent reads. This is because each SDK manages the session tokens ensuring that the read goes to a replica that has seen the write. Even if you don't do this, in your case the write will fail if you use the same document id. Within a collection, document ids are guaranteed to be unique.
Short version - session consistency (the default) is probably a good choice.
So when a document is deleted, the metadata is actually preserved forever. For a hosted service like cloudant, where storage costs every month, I instead would like to completely purge the deleted documents.
I read somewhere about a design pattern where you use dbcopy in a view to put the docs into a 'current' db then periodically delete the expired dbs. But I cant find the article, and I don't quite understand how database naming would work. How would the cloudant clients always know the 'current' database name?
Cloudant does not expose the _purge endpoint (the loose consistency guarantees between the clustered nodes make purging tricky).
The most common solution to this problem is to create a second database and use replication with a validate_document_update so that deleted documents with no existing entry in the target database are rejected. When replication is complete (or acceptably up-to-date if using continuous replication), switch your application to use the new database and delete the old one. There is currently no way to rename databases but you could use a virtual host which points to the "current" database.
I'd caution that a workload which generates a high ratio of deleted:active documents is generally an anti-pattern in Cloudant. I would first consider whether you can change your document model to avoid it.
Deleted documents are kept for ever in couchdb. Even after compaction .Though the size of document is pretty small as it contains only three fields
{_id:234wer,_rev:123,deleted:true}
The reason for this is to make sure that all the replicated databases are consistent. If a document that is replicated on several databases is deleted from one location there is no way to tell it to other replicated stores.
There is _purge but as explained in the wiki it is only to be used in special cases.
I am using a C++ shell extension DLL which used to read, write data into the SQLite database tables. There another application ( exe) which used to access all the tables.
Sometimes, my dll displaying an exception "The database file is locked" when I try to Delete/Insert/Update to the SQLite Database tables. This is because the other application was accessing the tables at this time.
Is there any way to resolve this issue from my DLL? Can I use the solution as mentioned in the link : "http://stackoverflow.com/questions/6455290/implementing-sqlite3-busy-timeout-in-an-ios-app"
In the current code, I am using CppSQLite3.cpp method execQuery(const char* szSQL) to execute the SQL query.
Please advice.
First of all you should know that SQLite does a Database level locking. When you start a transaction and the other application tries to write something to the same database, then you get Database is locked and SQLite automatically tries executing that same query after sqlite3_busy_timeout interval.
So the trick is to make sure you keep your transactions are as short as possible i.e
1. Do a begin transaction
2. Update/Delete/Insert
3. Commit
and not have anything else between these 3 steps.
And also increase your sqlite3_busy_timeout interval to suite your application depending on how large your transactions are.
You can try WAL mode, where reading and writing to SQLite can be done at the same time. But it comes with its own set of disadvantages. You can refer SQLite documentation.
SQLite has some restrictions regarding multiple users and multiple transactions. That means you can't read/write on a resource from different transactions. The database will be locked when the database is being updated.
Here are some links that might help you
http://sqlite.org/c3ref/busy_timeout.html
http://www.sqlite.org/c3ref/busy_handler.html
Good Luck