Relocating live mongodb data to another drive - linux

I have a mongodb running on windows azure linux vm,
The database is located on the system drive and i wish to move it to another hard drive since there is not enough space there.
I found out this post :
Changing MongoDB data store directory
These seems to be a fine solution suggested there, yet there is another person who mentioned something about copying the files,
My database is live and getting data all the time, how can i make this proccess with lossing the least data possible ?
Thanks,

First, if this is a production system you really need to be running this as a replica set. Running production databases on singleton mongodb instances is not a best practice. I would consider 2 full members plus 1 arbiter the minimum production set up.
If you want to go the replica set route, you can first convert this instance to a replica set:
http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/
this should have minimal down time.
Then add 2 new instances with the correct storage set up. After they sync you will have a full 3 member set. You can then fail over to one of the new instances. Remove this bad instance from the replica set. Finally I'd add an arbiter instance to get you back up to 3 members of the replica set while keeping costs down.
If on the other hand you do not want to run as a replica set, I'd shutdown mongod on this instance, copy the files over to the new directory structure on another appropriate volume, change the config to point to it (either changing dbpath or using a symlink) and then startup again. Downtime will be largely a factor of the size of your existing database, so the sooner you do this the better.
However - I will stress this again - if you are looking for little to no down time with mongoDB you need to use a replica set.

Related

How to perform backup and restore of Janusgraph database which is backed by Apache Cassandra?

I'm having trouble in figuring out on how to take the backup of Janusgraph database which is backed by persistent storage Apache Cassandra.
I'm looking for correct methodology on how to perform backup and restore tasks. I'm very new to this concept and have no idea on how to do this. It will be highly appreciated if someone explain the correct approach or point me to rightful documentation to safely execute the tasks.
Thanks a lot for your time.
Cassandra can be backed up a few ways. One way is called a "snapshot". You can issue this via "nodetool snapshot" command. What cassandra will do is to create a "snapshots" sub-directory, if it doesn't already exist, under each table that's being "backed up" (each table has its own directory where it stores its data) and then it will create the specific snapshot directory for this particular occurrence of the snapshot (either you can name the directory with the "nodetool snapshot" parameter or let it default). Cassandra will then create soft links to all of the sstables that exist for that particular table - looping through each table, keyspace or database - depending on your "nodetool snapshot" parameters. It's very fast as creating soft links takes almost 0 time. You will have to perform this command on each node in the cassandra cluster to back up all of the data. Each node's data will be backed up to the local host. I know DSE, and possibly Apache, are adding functionality to back up to object storage as well (I don't know if this is an OpsCenter-only capability or if it can be done via the snapshot command as well). You will have to watch the space consumption on this as there are no processes to clean these up.
Like many database systems, you can also purchase/use 3rd party software to perform backups (e.g. Cohesity (formally Talena), Rubrik, etc.). We use one such product in our environments and it works well (graphical interface, easy-to-use point-in-time recoveryt, etc.). They also offer easy-to-use "refresh" capabilities (e.g. refresh your PT environment from, say, production backups).
Those are probably the two best options.
Good luck.

Cloning Couch DB data from one server to another through file systems (without replicator)

We have two nodes with couchDB installed. One of the nodes have data on it, we want to copy the data from that instance to another instance of couch db. We want to avoid replicator due to volume of the data.
We tried copying data from %couchdb%/data/shards and %couchdb%/data/.shards to corresponding locations of target node as per one of the suggestions from CouchDB backups and cloning the database
but not able to see the Data in the server Fauxton UI. Can someone suggest what is missing?
Couchtransform lets you convert or just clone data from one db to another, its multi threaded and won't need to deal with massive files.

deleting local modifications when replicating couchdb

I have a master couchdb, which is replicated to a local db every time the local application starts.
The user can modify the local docs, but I want these docs to be deleted when the replication starts if they have disappeared from the master db.
How can I achieve that?
This is already how replication works. When a document is modified (including deletion), that change gets replicated.
The only possible problem you may encounter is that if a local change is made at the same time a deletion occurs, then upon sync, there will be a conflict.
So you need your local app to do some sort of conflict resolution, which selects the deleted revision. I suggest reading about the CouchDB Replication and Conflict Model as a starting place.

MongoDB : Is it possible to store "Data Directory" on GlusterFS Volume (across Multiple VMs), so that standby Mongo Server can use it when required?

I'm a newbie in MongoDB. And I'm sorry if the Question is not clear enough. What i mean is:
I have clustered GlusterFS Volumes (configured on top of 2 CentOS). Which means, same data directory can be read from both CentOS boxes:
Lets call: CentOS-1 and CentOS-2
And i wanna install MongoDB Servers mongod on both CentOS boxes. But start (run) only one. (The other one on CentOS-2 might be purposely stopped)
Then the Applications will be connecting to that one (current Active) on CentOS-1.
Here the main question comes in (please refer to the picture below):
Let's say: if CentOS-1 Server goes down, and i manually start the another MongoDB Server (mongod on the another box CentOS-2), and let all the Applications to connect to CentOS-2:
(1) Will everything be still working?
(2) Will there be 'lock' issues as in MySQL?
(3) If it works, does it mean, we can add any amount of MongoDB Servers (in stand-by mode), and whenever they swing, there's no problem?
Note:
Only 1 Server at a time will be running. Not like: the Data Store is being accessed by multiple Server.
Thanks for all opinions in advanced :)
Yes you can. There won't be any problem in moving the data files to a different server as long as you plan to use the same version of mongodb and the same operating system. When you move the files make sure to delete the mongodb.lock file if it exists in data directory.
Glusterfs is good for file replication between various servers, but its not good idea to sync mongodb data using glusterfs.
Will everything be still working?
probabily no
Will there be 'lock' issues as in MySQL?
yes it will be. check this https://docs.mongodb.org/v3.0/faq/concurrency/ .glusterfs locks the file while it write on gluster-volumes and mongodb data may change frequently which could result problem.
you can consider mongodb replication (https://docs.mongodb.org/manual/core/replication-introduction/) for your purpose

Taking over a project running on EC2 and having access issues

I've recently been hired to take over a website. The last developer really gave the guys a run around, and subsequently left them high and dry. I'm now attempting to access the server instances on AWS' EC2 system. However, I obviously don't have access to the key pair required and also am unsure about getting past any password protection(I obviously can't test this until I have the correct key anyway) Can anyone suggest a way for me to at least save the data?
It depends where the data is being stored. If its on EBS volumes, getting access to it should be easy. Create a new instance with a new key, and attach volumes from an exiting instance. You should be able to mount that volume and replace the key pair stored on the volume.
Now if ephemeral volumes are being used, you wont have many options. Ephemeral volumes are lost when an instance is stopped.

Resources