Couchdb purge whole database - couchdb

I'm quite sure that I want to delete a database in order to release my resources. I'll never need replication nor the old version, nor logs anymore. But despite my frequently deletion of the database and recreating another, the disk space increases gradually.
How can I simply get rid of the whole database and it's affects on the disk?

Deleting the database via DELETE /db-name removes the database's data and associated indexes on disk. The database is as deleted as it's going to be.
If you are using the purge feature to remove all the documents in the database, instead consider a DELETE followed by a PUT to recreate.
Logs are a different matter, as they are not database-specific, but for the database engine itself. It might be that you need to clear old logs, but you will probably only be able to do that on a time-based rather than database-based manner.

Related

implications of sharing in-memory sqlite with multiple process

Initially I have created an sqlite database('temp.db') and shared its connection to multiple process. I started to get lots of database locked error.
I needed database for temporary storage only. The only operation performed are INSERT and SELECT on a single TABLE also no COMMIT is done on the database.
To overcome above lock issue I have created an in-memory(':memory:') sqlite database and shared its connection to multiple process. I have not run into any database lock error until now.
In both the cases I have not used any locking mechanisms. Using in first case might have resolved the issue, but dont want to increase execution time.
Is locking needed in second case? What could be other pitfalls to care for? Its impact on long running application?

What to do if Azure SQL Managed Instance reaches the max storage limit?

Azure SQL Managed Instance can reach the storage limit if the total sum of sizes of the database (both user and system) reaches the instance limit. In this case the following issues might happen:
Any operation that updates data or rebuild structures might fail because it cannot be written in the log.
Some read-only queries might fail if they require tempdb that cannot grow.
Automated backup might not be taken because database must perform checkpoint to flush the dirty pages to data files, and this action fails because there is no space.
How to resolve this problem is the managed instance reaches the storage limit?
There are several way to resolve this issue:
Increase the instance storage limit using portal, PowerShell, Azure
CLI.
Decrease the size of database by using DBCC SHRINKDB, or
dropping unnecessary data/tables (for example #temporary tables in
tempdb)
The preferred way is is to increase the storage because even if you free some space, next maintenance operation might fill it again.

Make Node/MEANjs Highly Available

I'm probably opening up a can of worms with regard to how many hundreds of directions can be taken with this- but I want high availability / disaster recovery with my MEANjs servers.
Right now, I have 3 servers:
MongoDB
App (Grunt'ing the main application, this is the front end
server)
A third server for other processing on the back-end
So at the moment, if I reboot my MongoDB server (or more realistically, it crashes for some reason), I suddenly see this in my App server terminal:
MongoDB connection error: Error: failed to connect to
[172.30.3.30:27017] [nodemon] app crashed - waiting for file changes
before starting...
After MongoDB is back online, nothing happens on the app server until I re-grunt.
What's the best practice for this situation? You can see in the error I'm using nodeMon to monitor changes to the app. I bet upon init I could get my MongoDB server to update a file on the app server within nodemon's view to force a restart? Or is there some other tool I can use for this? Or should I be handling my connections to the db server more gracefully so the app doesn't "crash"?
Is there a way to re-direct to a secondary mongodb in case the primary isn't available? This would be more apt to HA/DR type stuff.
I would like to start with a side note: Given the description in the question and the comments to it, I am not convinced that using AWS is a wise option. A PaaS provider like Heroku, OpenShift or AppFog seems to be more suitable, especially when combined with a MongoDB service provider. Running MongoDB on EBS can be quite a challenge when you are new to MongoDB. And pretty expensive, too, as soon as you need provisioned IOPS.
Note In the following paragraphs, I simplified a few things for the sake of comprehensibility
If you insist on running it on your own, however, you have an option. MongoDB itself comes with means of automatic, transparent failover, called a replica set.
A minimal replica set consists of of two data bearing nodes and a so called arbiter. Write operations go to the node currently elected "primary" only, and reads do, too, unless you explicitly allow or request reads to be performed on the current "secondary". The secondary constantly syncs to the primary. If the current primary goes down for some reason, the former secondary becomes elected primary.
The arbiter is there so that there is always a quorum (qualified majority would be an equivalent term) of members to elect the current secondary to be the new primary. This quorum is mainly important for edge cases, but since you can not rule out these edge cases, an uneven number of members is a hard requirement for a MongoDB replica set (setting aside some special cases).
The beauty of this is that almost all drivers, and the node.js for sure, are replica set aware and deal with the failover procedure pretty gracefully. They simply send the reads and writes to the new primary, without any change to be done at any other point.
You only need to deal with some cases during the failover process. Without going into much detail, you basically check for certain errors in the according callbacks and redo the operation, if you encounter one of those errors and redoing the operation is feasible.
As you might have noticed, the third member, the arbiter, does not hold much data. It is a very lightweight process and can basically run on the cheapest instance you can find.
So you have data replication and automatic, transparent failover with relative ease at the cost of the cheapest VM you can find, since you would need two data bearing nodes anyway if you used any other means.

Is it safe to compact a CouchDB database that has continuous replication?

We have a couple of production couchdb databases that have blown out to 30GB and need to be compacted. These are used by a 24/7 operations website and are replicated with another server using continuous replication.
From tests I've done it'll take about 3 mins to compact these databases.
Is it safe to compact one side of the replication while the production site and replication are still running?
Yes, this is perfectly safe.
Compaction works by constructing the new compacted state in memory, then writing that new state to a new database file and updating pointers. This is because CouchDB has a very firm rule that the internals of the database file never gets updated, only appended to with an fsync. This is why you can rudely kill CouchDB's processes and it doesn't have to recover or rebuild the database like you would in other solutions.
This means that you need extra disk space available to re-write the file. So, trying to compact a CouchDB database to prevent full disk warnings is usually a non-starter.
Also, replication uses the internal representation of sequence trees (b+trees). The replicator is not streaming the entire database file from disk onto the network pipe.
Lastly, there will of course be an increase in system resource utilization. However, your tests should have shown you roughly how much this costs on your system vs an idle CouchDB, which you can use to determine how closely you're pushing your system to the breaking point.
I have been working with CouchDB since a while; replicating databases and writing Views to fetch data.
I have seen its replication behavior and observed this, which can answer your question:
In the replication process previous revisions of the documents are not replicated to the destination, only current revision is replicated.
Compacting database only removes the previous revisions. So it will not cause any problem.
Compaction will be done on the database on which you are currently logged in. So it should not affect its replica which is continuously listening for changes in it. Because it listens for the current revision changes not the previous revisions. To verify it you can see this:
Firing this query will show you changes of all the sequences of database. It only works on the basis of latest revision changes not the previous ones(So I think compaction will not make any harm):
curl -X GET $HOST/db/_changes
The result is simple:
{"results":[
],
"last_seq":0}
More info can be found here: CouchDB Replication Basics
This might help you to understand it. In short answer of your question is YES, It is safe to compact database in continuous replication.

How to limit the size of Azure Table Storage for logs?

Is it possible to limit the size of an Azure Table Storage table? I'm using it for storing logs. Also how can I do something like, when the limit is reached, the old entries are deleted to make space for the new ones? Something like capped collections for MongoDB or Round-robin databases?
Any help would be greatly appreciated. Thanks in advance!
Somewhat remarkably: no, there's no way (currently) to do this that I'm aware of.
We had the same situation, and we now use the Cerebrata Diagnostics Manager (http://cerebrata.com/Products/AzureDiagnosticsManager/) to purge them periodically.
It is also possible to explicitly drop the WAD* tables, but you may see issues if you have an instance still running when you do this. From http://social.msdn.microsoft.com/Forums/en-AU/windowsazuretroubleshooting/thread/3329834a-ddae-4180-b787-ceb7aee16e83:
#Sam --> I would be careful about deleting the table. Deleting the
WAD* table is a viable option if you don't have too much data in it.
What happens when you delete a table is that it is not deleted at that
very moment however it is marked for deletion and some background
process actually deletes that table. If you (or the diagnostics
process) try to create the same table you would get an error that
"Table is being deleted".
You can also use Visual Studio to purge logs from a given time: the above link includes a way to do that. I have a feeling that a Powershell script could also be written to do this.

Resources