MongoDB backup -> tar -> gz -> gpg - linux

I have a MongoDB server and I am using mongodump command to create backup. I run command mongodump --out ./mongo-backup then tar -czf ./mongo-backup.tar.gz ./mongo-backup then gpg --encrypt ./mongo-backup.tar.gz > ./mongo-backup.tar.gz.gpg and send this file to backup server.
My MongoDB database has 20GB with MongoDB show dbs command, MongoDB mongodump backup directory has only 3.8GB, MongoDB gzipped-tarball has only 118MB and my gpg file has only 119MB in size.
How is this possible to reduce 20GB database to 119MB file? Is it fault tolerant?
I tried to create new server ( clone of production ), enabled firewall to ensure that noone could connect and run this backup procedure. I create fresh new server and import data and there are some differences:
I ran same command from mongo shell use db1; db.db1_collection1.count(); and use db2; db.db2_collection1.count(); and results are:
807843 vs. 807831 ( db1.collection1 source server vs. db1.collection1 restored server )
3044401 vs. 3044284 ( db2.collection1 source server vs. db2.collection1 restored server )

If you have validated the counts and size of documents/collections in your restored data, this scenario is possible although atypical in the ratios described.
My MongoDB database has 20GB with MongoDB show dbs command
This shows you the size of files on disk, including preallocated space that exists from deletion of previous data. Preallocated space is available for reuse, but some MongoDB storage engines are more efficient than others.
MongoDB mongodumpbackup directory has only 3.8GB
The mongodump tool (as at v3.2.11, which you mention using) exports an uncompressed copy of your data unless you specify the --gzip option. This total should represent your actual data size but does not include storage used for indexes. The index definitions are exported by mongodump and the indexes will be rebuilt when the dump is reloaded via mongorestore.
With WiredTiger the uncompressed mongodump output is typically larger than the size of files on disk, which are compressed by default. For future backups I would consider using mongodump's built-in archiving and compression options to save yourself an extra step.
Since your mongodump output is significantly smaller than the storage size, your data files are either highly fragmented or there is some other data that you have not accounted for such as indexes or data in the local database. For example, if you have previously initialised this server as a replica set member the local database would contain a large preallocated replication oplog which will not be exported by mongodump.
You can potentially reclaim excessive unused space by running the compact command for a WiredTiger collection. However, there is an important caveat: running compact on a collection will block operations for the database being operated on so this should only be used during scheduled maintenance periods.
MongoDB gzipped-tarball has only 118MB and my gpg file has only 119MB in size.
Since mongodump output is uncompressed by default, compressing can make a significant difference depending on your data. However, 3.8GB to 119MB seems unreasonably good unless there is something special about your data (large number of small collections? repetitive data?). I would double check that your restored data matches the original in terms of collection counts, document counts, data size, and indexes.

Related

Why does moving postgres database to another server change database size?

I'm moving a postgres database to another server. I use the following commands to do the dumping and loading.
pg_dump database_name_name > backup.sql // Dump
psql db_name < backup.sql // Load
I find that when I do the move the new database is 28MiB in size whereas the old database was 36MiB in size. Why is this, should I be worried that the move isn't complete?
It is to be expected that the restored database is smaller than the original.
A live database always has a certain amount of bloat (empty space) that is caused by updates and deletes. That space will be reused and is no problem normally.
The restored database is densely packed and doesn't have that bloat.
However, a bloat of more than 25% is rather on the high side.
You can use pgstattuple to determine if any of your tables have an undue amount of bloat.
High bloat an be caused by mass deletes or a high change rate with which autovacuum cannot keep up.
Such tables can be reorganized with VACUUM (VERBOSE), and if the cause is a high change rate, you should tune autovacuum to be more aggressive on these tables.

How to convert data in cassandra commitlogs to readable format

Is it possible to see data in commit-log, if so how can we convert this to readable form which we can interpret.
Commit log files, these are encrypted files maintained internally by Cassandra, so you won't be able to access them.
Uses of Commit log:
If Cassandra was writing these SSTables on every update it would be completely IO bound and very slow.
So Cassandra uses a few tricks to get better performance. Instead of writing SSTables to disk on every column update, it keeps the updates in memory and flushes those changes to disk periodically to keep the IO to a reasonable level.

Cassandra backup to tape or real snapshots

Is there a way to backup Cassandra directly to tape (streaming device)?
Or to perform real snapshots?
The snapshot Cassandra is referring to is not what I want to call a snapshot.
It is more a consistent copy of the database files to a directory.
Regards Tomas
First, let's clarify the Cassandra write path, so we know what we need to back up. Writes come in and are first journaled in the commitlog, then written to the memtable, then eventually flushed to sstables. When sstables flush, the relevant commitlog segments are deleted.
If you want a consistent backup of Cassandra, you need at the very least the sstables, but ideally the sstables + commitlog, so you can replay any data between the commitlog and the most recent flush.
If you're using tape backup, you can treat the files on disk (both commitlog and sstables) as typical data files - you can tar them, rsync them, copy them as needed, or point amanda or whatever tape system you're using at the data file directory + commitlog directory, and it should just work - there's not a lot of magic there, just grab them and back them up. One of the more common backup processes involves using tablesnap, which watches for new sstables and uploads them to s3.
You can backup Cassandra directly to Tape using SPFS
SPFS is a file system for Spectrum Protect.
Just mount the SPFS file system where you want the backups to land.
Eg
mount -t spfs /backup
And backup Cassandra to this path.
All operations that goes via this mountpoint (/backup), will automatically be translated to Spectrum Protect Client API calls.
On the Spectrum Protect backup server, one can use any type of supported media.
For instance: CD, Tape, VTL, SAS, SATA, SSD, Cloud etc..
In this way, you can easily backup your Cassandra directly to a backup server.

How to recover a dropped database on MySQL

I accidentaly dropped a database on MySQL yog ultimate. Also, I found that the IT guy uninstalled MySQL yog from the machine.
Now am working on two machines which includes the one from which database was dropped and mysql was uninstalled.
Is there a way to recover the dropped databases.
You said in a comment that you have a backup from a couple of hours prior to the data loss.
If you also have binary logs, you can restore the backup, and then reapply changes from the binary logs.
Here is documentation on this operation: http://dev.mysql.com/doc/refman/5.6/en/point-in-time-recovery.html
You can even filter the binary logs to reapply changes for just one database (mysqlbinlog --database name). For example you may have other databases that were not dropped on the same instance, and you wouldn't want to reapply changes to those other databases.
Recovering two hours worth of binary logs won't take "a very long amount of time." The trickiest part is figuring out the start point to begin replaying the binary logs. If you were lucky enough to include the binary log position with the backup, this will be simpler and very precise. If you have to go by timestamp, it's less precise and you probably cannot hope to do an exact recovery.
If you didn't have binary logs enabled on this instance since you backed up the database, it's a lot trickier to do a data recovery of lost files. You might be able to use a filesystem undelete tool like the EaseUS Data Recovery Wizard (though I can't say I have experience using that tool).
Reconstructing the files you recover is not for the faint of heart, and it's too much to get into here. You might want to get help from a professional MySQL consulting firm. I work for one such firm, Percona, who offers data recovery services.
There's really only one word: Backups.
After MySQL drops database the data is still on the media for a while. So you can fetch records and rebuild database with DBRECOVER.
mysql> drop database employees;
Query OK, 14 rows affected (0.16 sec)
#sync
#sync
select drop database recovery
select MYSQL VERSION as you used; Page Size should be left as 16k
click select directory , and input the ##datadir directory
!!!caution: you should input the ##datadir directory here. pls don't copy the ##datadir directory to any other filesystem or mount point , and use the copy one . The software need to scan the orginal filesystem or mount point ,otherwise it can't work.You'd better set #datadir mount point as read only, avoid any more disk write is necessary. And don't locate DBRECOVER software package on the same filesystem.
https://youtu.be/ao7OY8IbZQE

How to calculate the total memory occupied by the database

I am using sqlserver 2008, How can I calculate the total memory occupied by the Database with tables (>30) and also data in it.
I mean if I have DB (DB_name) and with few tables(tblabc, tbldef......) with data in it, how to calculate the total memory occupied by the database in the server.
Kindly Help me.
Thanks
Ramm
See the sizes of mdf and log files
EDIT: Sql Server stores its db in mdf files(one or multiple). You need the lof file too. See where your db is stored and these files are files you need.
Be aware that if you are using FILESTREAM, the actual files are not in the db (mdf)
EDIT2:Books Online
When you create a database, you must either specify an initial size for the data and log files or accept the default size. As data is added to the database, these files become full.
So, there is a file with some size even if you have no data..
By default, the data files grow as much as required until no disk space remains.
...
Alternatively, SQL Server lets you create data files that can grow automatically when they fill with data, but only to a predefined maximum size. This can prevent the disk drives from running out of disk space completely.
If data is added (and there is no more space in the file)the file grows, but when it is deleted, it keeps its size, you need to shrink it...
I suppose that you refer to disk space and not memory. That would be very hard to get correct since you would have to know exactly how SQL Server stores the data, indexes and so on. Fortunately you do not have to calculate it, just fire up Microsoft SQL Server Management Studio. Right click on your database->Reports->Disk usage.

Resources