I am attempting to do an ensure -> purge on a small number of very large files. The files exist on a different partition than filebucket (and its partition is very small). I would like to avoid any partition resizing if at all possible. Is there any way to skip filebucket archival when doing a purge on files/directories? (Don't worry - if the result is not what I expect, I have options to restore the previous state of the machine since it's a VM.)
To keep a file from being stored in the filebucket prior to removal, set the backup parameter to false.
file {
'/opt/data/huge-file1':
ensure => absent,
backup => false,
}
As an aside, I assume you're using ensure => directory with purge => true, because purge is not a valid ensure value.
Related
I'm trying to alter a vsam file for write logs for any update operation.
I perform the updates through a cics transaction.
Can anyone give me an idea, how can i save immediately all updates in logstream file?
To get update log records written by CICS for VSAM file updates you will need to configure the recovery attributes for that VSAM file. Depending upon the type of file, how the file is accessed (RLS or non-RLS) and the types of log records required will determine what options can be set and where to set them.
To keep it simple, if you set the recovery attributes in the ICF catalog definition for the VSAM data set with RECOVERY(ALL) and LOGSTREAMID(your_logstream_name) then before and after images will be written. Depending upon what the log records are needed for also consider using the LOGREPLICATE(YES) option instead or as well.
Be careful turning recovery on, records (or CIs) in the file will be locked until the transaction making the updates completes. This could lead to deadlocks and rollbacks if multiple transactions make multiple updates to the file concurrently. Also if the file is an ESDS then there are further complexities.
Make sure the general log stream or model log stream has been created so CICS has or can create somewhere to write the log records to.
I'd also recommend reading more on the recovery options available so that only the log records needed are written. You can find more info on CICS logging here
Theoretical question:
Lets say I have a cassandra cluster with some data in it.
Backups are created on a daily basis.
Now a subset of data is being lost, either by application error or manual deletion.
What is the best way to restore data from existing backup?
I can think of starting a separate node with the backup disk attached, then export data manually through selects and reimport into the prod database.
That would work but sounds complicated, is there a more straight forward solution for such problems?
If its a single partition probably best bet is to use sstabledump or something like sstable-tools to read from it and just manually reinstert. If ok with restoring everything deleted from time of snapshot: reduce gcgrace to purge any tombstones with a force compact (or else they will continue to shadow the restored data) and use the sstable loader or if the token ranges are the same copy the backed up sstables back in the data directory.
As the data in case of Cassandra is physically removed during compaction, is it possible to access the recently deleted data in any way? I'm looking for something similar to Oracle Flashback feature (AS OF TIMESTAMP).
Also, I can see the pieces of deleted data in the relevant commit log file, however it's obviously unreadable. Is it possible to convert this file to a more readable format?
You will want to execute a restore from your commitlog.
The safest is to copy the commitlog to a new cluster (with same schema), and restore following the instructions (comments) from commitlog_archiving.properties file. In your case, you will want to set restore_point_in_time to a time between your insert and your delete.
I have a MongoDB server and I am using mongodump command to create backup. I run command mongodump --out ./mongo-backup then tar -czf ./mongo-backup.tar.gz ./mongo-backup then gpg --encrypt ./mongo-backup.tar.gz > ./mongo-backup.tar.gz.gpg and send this file to backup server.
My MongoDB database has 20GB with MongoDB show dbs command, MongoDB mongodump backup directory has only 3.8GB, MongoDB gzipped-tarball has only 118MB and my gpg file has only 119MB in size.
How is this possible to reduce 20GB database to 119MB file? Is it fault tolerant?
I tried to create new server ( clone of production ), enabled firewall to ensure that noone could connect and run this backup procedure. I create fresh new server and import data and there are some differences:
I ran same command from mongo shell use db1; db.db1_collection1.count(); and use db2; db.db2_collection1.count(); and results are:
807843 vs. 807831 ( db1.collection1 source server vs. db1.collection1 restored server )
3044401 vs. 3044284 ( db2.collection1 source server vs. db2.collection1 restored server )
If you have validated the counts and size of documents/collections in your restored data, this scenario is possible although atypical in the ratios described.
My MongoDB database has 20GB with MongoDB show dbs command
This shows you the size of files on disk, including preallocated space that exists from deletion of previous data. Preallocated space is available for reuse, but some MongoDB storage engines are more efficient than others.
MongoDB mongodumpbackup directory has only 3.8GB
The mongodump tool (as at v3.2.11, which you mention using) exports an uncompressed copy of your data unless you specify the --gzip option. This total should represent your actual data size but does not include storage used for indexes. The index definitions are exported by mongodump and the indexes will be rebuilt when the dump is reloaded via mongorestore.
With WiredTiger the uncompressed mongodump output is typically larger than the size of files on disk, which are compressed by default. For future backups I would consider using mongodump's built-in archiving and compression options to save yourself an extra step.
Since your mongodump output is significantly smaller than the storage size, your data files are either highly fragmented or there is some other data that you have not accounted for such as indexes or data in the local database. For example, if you have previously initialised this server as a replica set member the local database would contain a large preallocated replication oplog which will not be exported by mongodump.
You can potentially reclaim excessive unused space by running the compact command for a WiredTiger collection. However, there is an important caveat: running compact on a collection will block operations for the database being operated on so this should only be used during scheduled maintenance periods.
MongoDB gzipped-tarball has only 118MB and my gpg file has only 119MB in size.
Since mongodump output is uncompressed by default, compressing can make a significant difference depending on your data. However, 3.8GB to 119MB seems unreasonably good unless there is something special about your data (large number of small collections? repetitive data?). I would double check that your restored data matches the original in terms of collection counts, document counts, data size, and indexes.
I have a text file uploaded in my Azure storage account.Now, in my worker role , what i need to do is every time it is run, it fetches some content from Database, and that content must be written in the Uploaded text file, specifically , each time the content of Text file should be overwritten with some new content.
Here, they have given a way to upload a text file to your storage and also delete a file.But i don't want to do that, need to just MODIFY the already present text file each time.
I'm assuming you're referring to storing a file in a Windows Azure blob. If that's the case: A blob isn't a file system; it's just a place to store data (and the notion of a file is a bit artificial - it's just... a blob stored in a bunch of blocks).
To modify the file, you would need to download it and save it to local disk, modify it (again, on local disk), then do an upload. A few thoughts on this:
For this purpose, you should allocate a local disk within your worker role's configuration. This disk will be a logical disk, created on a local physical disk within the machine your vm is running on. In other words, it'll be attached storage and perfect for this type of use.
The bandwidth between your vm instance and storage is 100Mbps per core. So, grabbing a 10MB file, while on a Small instance, would take maybe a second. On an XL, maybe around a tenth of a second. really fast, and varies with VM series (A, D, G) and size.
Because your file is in blob storage, if you felt so inclined to do so (or had the need for this), you could take a snapshot prior to uploading an updated version. Snapshots are like link-lists to your stored data blocks. And there's no cost to snapshots until, one day, you make a change to existing data (and now you'd have blocks representing both old and new data). An excellent way to preserve versions of a blob on a blob-by-blob basis (and it's trivial to delete snapshots).
Just to make sure this download/modify/upload pattern is clear, here's a very simple example (I just typed this up quickly in Visual Studio but haven't tested it. Just trying to illustrate the point):
// initial setup
var acct = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
var client = acct.CreateCloudBlobClient();
// what you'd call each time you need to update a file stored in a blob
var blob = client.GetContainerReference("mycontainer").GetBlockBlobReference("myfile.txt");
using (var fileStream = System.IO.File.OpenWrite(#"path\myfile.txt"))
{
blob.DownloadToStream(fileStream);
}
// ... modify file...
// upload modified file
using (var fileStream = System.IO.File.OpenWrite(#"path\myfile.txt"))
{
blob.UploadFromStream(fileStream);
}