Applying ZFS snapshot to a non-ZFS FS - linux

So this is a bit of a question of theory as well as specific (temporary use case)
Two servers are to be in sync of each other. One On-Site, the other an Off-Site backup.
However, the Off-Site should have the data duplicated and accessible if need be (not storing archive images of server1)
server1 and server2 are connected over internet via VPN connection
server1 uses ZFS Raid 10
server2 uses ext4 Raid5 (temporary setup, will be replaced in future with ZFS and this use case vanishes)
Can you take a ZFS snapshot on server1, send it to server2 and have it be unpacked/applied to the raid5 array, essentially duplicating server1 via incremental snapshots?
I know that there are some other tools for duplication of filesystems, but i was wondering if we can use snapshots in a non zfs fs. (documentation leads me to believe this is not possible, but i do not know enough about this)

Yes, there are two theoretical options. Both use async replication so will have a nonzero RPO (although from your description that seems acceptable to some extent):
Use zfs send to create a stream on the source system, and then use some tool that can understand the contents of that stream and translate to POSIX filesystem primitives on the receiving system.
Take a snapshot on the source system and then use an FS-agnostic tool to copy stuff from that snapshot over.
The first one has the benefit of being the most performant option, because ZFS knows what parts of its pool have been changed and only has to look at / send those parts. However, I don’t know of any tool that can actually do this. (Prototypes have been built at ZFS developer hackathons, but there is not a big audience for this type of tool so they’ve never been made production quality AFAIK.)
The second one is less performant because it will have to inspect the data to see what changed, but it has the benefit that tools exist — although you may have to fight with it a little, you can use rsync for this. Also, its RPO might be higher since transferring the data will take a bit longer. The slightly tricky parts will be:
Writing its metadata to a writable part of the pool on the source side, since the snapshot you’re copying will be read-only. (Look in the .zfs/ directory in the root of the filesystem you want to copy to find a readable copy of the snapshot.)
Making the failover target not have intermediate state if the source system dies during an rsync run. Hopefully your target filer has the ability to snapshot before you start an rsync run, so that you can roll back to the “last good state” if the run fails. Otherwise, hopefully your data / application can tolerate some inconsistencies. (Or maybe there’s an rsync option that does this that I haven’t used before.)

Related

Are docker volumes better option for write heavy operations than binding directories directly?

Reading through docker documentation I found this passage (located here):
Block-level storage drivers such as devicemapper, btrfs, and zfs perform better for write-heavy workloads (though not as well as Docker
volumes).
So does this mean that one should always use docker volumes when expecting lot's of persistent writing?
The container-local filesystem never stores persistent data, so you don't have a choice but to mount something into the container if you want data to live on after the container exits. The "block-level storage drivers" you quote discuss particular install-time options for how images and containers are stored, and aren't related to any particular volume or bind-mount implementation.
As far as performance goes, my general expectation is that the latency of disk I/O will far outweigh any overhead of any particular implementation. Without benchmarking any particular implementation, on a native Linux host, I would expect a named volume, a bind-mount, and writes to the container filesystem to be more or less similar.
From a programming point of view, you will probably get better long-term performance improvement from figuring out how to have fewer disk accesses (for example, by grouping together related database requests into a single transaction) than by trying to optimize the Docker-level storage.
The one prominent exception to this is that bind mounts on MacOS are known to be very slow and you should avoid them if your workload involves substantial disk access. (This includes both reading and writing, and includes some interpreted languages that want to read in every possible source file at startup time.) If you're managing something like database storage where you can't usefully directly access the files anyways, use a named volume. For your application code, COPY it into an image in a Dockerfile and do not overwrite it at run time.
should always use docker volumes when expecting lot's of persistent writing?
It depends.
Yes you want some kind of external to the container storage for any persistent data since data written inside the container is lost when that container is removed.
Whether that should be a host bind or named volume depends on how you need to manage that data. A host volume is a bind mount to the host filesystem. It gives you direct access to that data, but that direct access also comes with uid/gid permission issues and losses the initialization feature of named volumes.
Named volumes with all the defaults is just a bind mount to a folder under /var/lib/docker, so performance would be the same as a host volume of the underlying filesystem is the same. That said the named volume can be configured to mount just about anything you can do with the mount command.
Since each of these options can have varying underlying filesystem, and the performance difference comes from that underlying filesystem choice, there's no way to answer this in any generic sense. Hence, it depends.

Processing speed over mounted path

I have two scenarios.
Scenario 1: Machine A contains 1000 documents as folders. This folder of machine A is mounted in machine B. I process documents within these folders in machine B and store the output result in mounted path in machine B.
Scenario 2: The documents in machine A is directly copied into machine B and processed
Scenario 2 is much faster than Scenario 1. I could guess its because there is no data transfer happening over the network between 2 machines. Is there a way I can use mounting and still achieve better performance?
Did you try enabling a cache? - for NFS: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/fscachenfs.html - CIFS should have caching enabled by default (unless you disabled it)
The other option would be to use something like Windows’ offline files, which copies files and folders between client and server in the background, so you don’t need to deal with it. The only thing I’ve found for linux is OFS.
But the performance depends on the size of the files and if you read them randomly or sequentially. For instance when I am encoding videos, I access the file right away via the network from my NFS, because it takes as much time as it would take to read and write the file. This way no additional time is “wasted” on the encoding, as the application can encode the stream which is coming from the network.
So for large files you might want to change the algorithms to a sequential read, on the other hand small files which are copied within seconds, could be also synced between server and client using rsync, bittorrent sync, dropbox or one of the other hundreds of tools. And this is actually quite commonly done.

Migrate data from one server to another

I bought a new server and I want to move all the data (directories, sub directories, users, passwords, ..etc) from my old server to it.
Is there a way to do that?
Thanks,
Do you have physical access to both servers? If so you can use the dd command to make a clone of the disk from the old server to the disk that is going into the new server.
In order to do this though, both hard drives have to be installed in one of the servers.
You can also use netcat and dd to clone a disk over a network.
for the directories and files, use a FTP client from your server, if it allows you to, if not, just download all the content to your computer and upload it to the new server.
For the users and passwords, i guess they are in a Database, connect to the database using SSH, telnet, or MysqlAdmin or any RMDB client system and export a dump file, then log in to the new server's SQL system and import that dump file.
Anyway you should give more details of both servers anyway so we can help you, for example, are they Shared hosting or dedicated machine? and what kind of access do you have to them, also, their operative system would help people to reply you accurately
In principle, yes.
If the hardware is similar (= just more RAM, disk space but same CPU architecture and no special graphics card drivers), you might be able to copy every file and then install the boot loader once more (the boot loader config usually changes when the hard disk size changes).
Or you can create a list of all services that you use, determine which config files each one uses and then just copy those. Ideally, you shouldn't copy them but compare the old and the new versions and merge them.
The most work intensive way is to use a tool like puppet. In a nutshell, puppet allows to create install scripts for services (along with all the configuration that you need). So if you need to install a service again (new hardware, second server), you just tell puppet to do it. On the plus side, your whole installation will be documented, too. If you ever wonder why something is the way it is, you can look into the puppet files.
Of course, this approach takes a lot of time and discipline, so it might not be worth it in your case. Apply common sense.

using torrents to back up vhd's

Hi it's a question and it may be redundant but I have a hunch there is a tool for this - or there should be and if there isn't I might just make it - or maybe I am barking up the wrong tree in which case correct my thinking:
But my problem is this: I am looking for some way to migrate large virtual disk drives off a server once a week via an internet connection of only moderate speed, in a solution that must be able to be throttled for bandwidth because the internet connection is always in use.
I thought about it and the problem is familar: large files that can moved that also be throttled that can easily survive disconnection/reconnection/large etc etc - the only solution I am familiar with that just does it perfectly is torrents.
Is there a way to automatically strategically make torrents and automatically "send" them to a client download list remotely? I am working in Windows Hyper-V Host but I use only Linux for the guests and I could easily cook up a guest to do the copying so consider it a windows or linux problem.
PS: the vhds are "offline" copies of guest servers by the time I am moving them - consider them merely 20-30gig dum files.
PPS: I'd rather avoid spending money
Bittorrent is an excellent choice, as it handles both incremental updates and automatic resume after connection loss very well.
To create a .torrent file automatically, use the btmakemetainfo script found in the original bittorrent package, or one from the numerous rewrites (bittornado, ...) -- all that matters is that it's scriptable. You should take care to set the "disable DHT" flag in the .torrent file.
You will need to find a tracker that allows you to track files with arbitrary hashes (because you do not know these in advance); you can either use an existing open tracker, or set up your own, but you should take care to limit the client IP ranges appropriately.
This reduces the problem to transferring the .torrent files -- I usually use rsync via ssh from a cronjob for that.
For point to point transfers, torrent is an expensive use of bandwidth. For 1:n transfers it is great as the distribution of load allows the client's upload bandwidth to be shared by other clients, so the bandwidth cost is amortised and everyone gains...
It sounds like you have only one client in which case I would look at a different solution...
wget allows for throttling and can resume transfers where it left off if the FTP/http server supports resuming transfers... That is what I would use
You can use rsync for that (http://linux.die.net/man/1/rsync). Search for the --partial option in man and that should do the trick. When a transfer is interrupted the unfinished result (file or directory) is kept. I am not 100% sure if it works with telnet/ssh transport when you send from local to a remote location (never checked that) but it should work with rsync daemon on the remote side.
You can also use that for sync in two local storage locations.
rsync --partial [-r for directories] source destination
edit: Just confirmed the crossed out statement with ssh

Good Secure Backups Developers at Home [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
What is a good, secure, method to do backups, for programmers who do research & development at home and cannot afford to lose any work?
Conditions:
The backups must ALWAYS be within reasonably easy reach.
Internet connection cannot be guaranteed to be always available.
The solution must be either FREE or priced within reason, and subject to 2 above.
Status Report
This is for now only considering free options.
The following open-source projects are suggested in the answers (here & elsewhere):
BackupPC is a high-performance,
enterprise-grade system for backing
up Linux, WinXX and MacOSX PCs and
laptops to a server's disk.
Storebackup is a backup utility
that stores files on other disks.
mybackware: These scripts were
developed to create SQL dump files
for basic disaster recovery of small
MySQL installations.
Bacula is [...] to manage
backup, recovery, and verification
of computer data across a network of
computers of different kinds. In
technical terms, it is a network
based backup program.
AutoDL 2 and Sec-Bk: AutoDL 2
is a scalable transport independant
automated file transfer system. It
is suitable for uploading files from
a staging server to every server on
a production server farm [...]
Sec-Bk is a set of simple utilities
to securely back up files to a
remote location, even a public
storage location.
rsnapshot is a filesystem
snapshot utility for making backups
of local and remote systems.
rbme: Using rsync for backups
[...] you get perpetual incremental
backups that appear as full backups
(for each day) and thus allow easy
restore or further copying to tape
etc.
Duplicity backs directories by
producing encrypted tar-format
volumes and uploading them to a
remote or local file server. [...]
uses librsync, [for] incremental
archives
simplebup, to do real-time backup of files under active development, as they are modified. This tool can also be used for monitoring of other directories as well. It is intended as on-the-fly automated backup, and not as a version control. It is very easy to use.
Other Possibilities:
Using a Distributed Version Control System (DVCS) such as Git(/Easy Git), Bazaar, Mercurial answers the need to have the backup available locally.
Use free online storage space as a remote backup, e.g.: compress your work/backup directory and mail it to your gmail account.
Strategies
See crazyscot's answer
I prefer http://www.jungledisk.com/ .
It's based on Amazon S3, cheap, multiplatform, multiple machines with a single license.
usb hard disk + rsync works for me
(see here for a Win32 build)
Scott Hanselman recommends Windows Home Server in his aptly titled post
The Case of the Failing Disk Drive or Windows Home Server Saved My Marriage.
First of all: keeping backups off-site is as important for individuals as it is for businesses. If you house burns down, you don't want to loose everything.
This is especially true because it is so easy to accomplish. Personally, I have an external USB harddisk I keep at my fathers house. Normally, it is hooked up to his internet connections and I backup over the net (using rsync), but when I need to backup really big things, I collect it and copy things over USB. Ideally, I should get another disk, to spread the risk.
Other options are free online storage facilities (use encryption!).
For security, just use TrueCrypt. It has a good name in the IT world, and seems to work very well.
Depends on which platform you are running on (Windows/Linux/Mac/...?)
As a platform independent way, I use a personal subversion server. All the valuables are there, so if I lose one of the machines, a simple 'svn checkout' will take things back. This takes some initial work, though, and requires discipline. It might not be for you?
As a second backup for the non-svn stuff, I use Time Machine, which is built-in to OS X. Simply great. :)
I highly recommend www.mozy.com. Their software is easy and works great, and since it's stored on their servers you implicitly get offsite backups. No worrying about running a backup server and making sure it's working. Also, the company is backed by EMC (a leading data storage product company), so gives me enough confidence to trust them.
I'm a big fan of Acronis Trueimage.Make sure you rotate through a few backup HDDs to you have a few generations to go back to, or if one of the backups goes bang. If it's a major milestone I snail-mail a set of DVDs to Mum and she files em for me. She lives in a different state so it should cover most disasters of less-than-biblical proportions.
EDIT: Acronis has encryption via a password. I also find the bandwidth of snailmail to be somewhat infinite - 10GB overnight = 115 kb/s, give or take. Never been throttled by Australia Post.
My vote goes for cloud storage of some kind. The problem with nearly all 'home' backups is they stay in the home, that means any catastrophic damage to the system being backed up will probably damage the backups as well (fire, flood etc). My requirements would be
1) automated - manual backups get forgotten, usually just when most needed
2) off-site - see above
3) multiple versions - that is backup to more than one thing, in case that one thing fails.
As a developer, usually data sizes for backup are relatively small so a couple of free cloud backup accounts might do. They also often fulfil part 1 as they can usually be automated. I've heard good things about www.getdropbox.com/.
The other advantage of more than 1 account is you could have one on 'daily sync' and another on 'weekly sync' to give you some history. This is nowhere near as good as true incremental backups.
Personally I prefer a scripted backup (to local hard-drives, which I rotate to work as 'offsites'. This is in large part due to my hobby (photography) and thus my relatively lame internet upstream bandwith not coping with the data volume.
Take home message - don't rely on one solution and don't assume that your data is not important enough to think about the issues as deeply as the 'Enterprise' does.
Buy a fire-safe.
This is not just a good idea for storing backups, but a good idea period.
Exactly what media you put in it is the subject of other answers here.
But, from the perspective of recovering from a fire, having a washable medium is good. As long as the temperature doesn't get too high CDs and DVDs seem reasonably resilient, although I'd be concerned about smoke damage.
Ditto for hard-drives.
A flash drive does have the benefit that there are no moving parts to be damaged and you don't need to be concerned about the optical properties.
mozy.com is king. I started using it just to backup code and then ponied up the 5 bux a month to backup my personal pictures and other stuff that I'd rather not lose if the house burns down. The initial backup can take a little while but after that you can pretty much forget about it until you need to restore something.
Get an external hard drive with a network port so you can keep your backups in another room which provides a little security against fire in addition to being a simple solution you can do yourself at home.
The next step is to get storage space in some remote location (there are very cheap monthly prices for servers for example) or to have several external hard drives and periodically switch between the one at home and a remote location. If you use encryption, this can be anywhere such as a friend's or parents' place or work.
Bacula is a good software, it's open source, and shall give good performance, kind of commercial software, a bit difficult the first time to configure, but not so hard. It has good documentation
I second the vote for JungleDisk. I use it to push my documents and project folders to S3. My average monthly bill from amazon is about 20c.
All my projects are in Subversion on an external host.
As well as this, I am on a Mac, so I use SuperDuper to take a nightly image of my drive. I am sure there are good options in the Windows/Linux world.
I have two external drives that I rotate on a weekly basis, and I store one of the drives off-site during it's week off.
This means that I am only ever 24 hours away from an image in case of failure, and I am only 7 days from an image in case of catastrophic failure (fire theft). The ability to plug the drive in to a machine and be running instantly from the image has saved me immensely. My boot partition was corrupted during a power failure (not a hardware failure, luckily). I plugged the backup in, restored and was working again in the time it took to transfer the files of the external drive.
Another vote for mozy.com
You get 2gb for free, or for $5/month gives you unlimited backup space. Backups can occur on a timed basis, or when your PC/Mac is not busy. It's encrypted during transit and storage.
You can retrieve files via built in software, through the web or pay for a DVD to be burned and posted back.
William Macdonald
If you feel like syncing to the cloud and don't mind the initial, beta, 2GB cap, I've fallen in love with Dropbox.
It has versions for Windows, OSX, and Linux, works effortlessly, keeps files versioned, and works entirely in the background based on when the files changed (not a daily schedule or manual activations).
Ars Technica and Joel Spolsky have both fallen in love (though the love seems strong with Spolsky, but lets pretend!) with the service if the word of a random internet geek is not enough.
These are interesting times for "the personal backup question".
There are several schools of thought now:
Frequent Automated Local Backup + Periodic Local Manual Backup
Automated: Scheduled Nightly backup to external drive.
Manual: Copy to second external drive once per week / month / year / oops-forgot
and drop it of at "Mom's house".
Lot's of software in the field, but here's a few: There's RSync and TimeMachine on Mac, and DeltaCopy www.aboutmyip.com/AboutMyXApp/DeltaCopy.jsp for Windows.
Frequent Remote Backup
There are a pile of services that enable you to backup across you internet connection to a remote data centre. Amazon's S3 service + JungleDisk's client software is a strong choice these days - not the cheapest option, but you pay for what you use and Amazon's track record suggests as a company it will be in business as long or longer than any other storage providers who hang their shingle today.
Did I mention it should be encrypted? Props to JungleDisk for handling the "encryption issue" and future-proofing (open source library to interoperate with Jungle Disk) pretty well.
All of the above.
Some people call it being paranoid ... others think to themselves "Ahhh, I can sleep at night now".
Also, it's more fault-tolerance than backup, but you should check out Drobo - basically it's dead simple RAID that seems to work quite well.
Here are the features I'd look out for:
As near to fully automatic as possible. If it relies on you to press a button or run a program regularly, you will get bored and eventually stop bothering. An hourly cron job takes care of this for me; I rsync to the 24x7 server I run on my home net.
Multiple removable backup media so you can keep some off site (and/or take one with you when you travel). I do this with a warm-pluggable SATA drive bay and a cron job which emails me every week to remind me to change drives.
Strongly encrypted media, in case you lose one. The linux encrypted device support (cryptsetup et al) does this for me.
Some sort of point-in-time recovery, but consider carefully what resolution you want. Daily might be enough - having multiple backup media probably gets you this - or you might want something more comprehensive like Apple's Time Machine. I've used some careful rsync options with my removable drives: every day creates a fresh snapshot directory, but files which are unchanged from the previous day are hard linked instead of copied, to save space.
Or simply just set up a gmail account and mail it to yourself :) Unless you're a bit paranoid about google knowing about your stuff since you said research. It doesn't help you much with structure and stuff but it's free, big storage and off-site so quite safe.
If you use OS X 10.5 or above then the cost of Time Machine is the cost of an external hard drive. Not only that, but the interface is dead simple to use. Open the folder you wish to recover, click on the time machine icon, and browse the directory as if it was 1999 all over again!
I haven't tried to encrypt it, but I imagine you could use truecrypt.
Yes this answer was posted quite some time after the question was asked, however I believe it should help those who stumble across this posting in the future (like I did).
Setup a Linux or xBSD server:
-Setup a source control system of your choice on it.
--Mirror Raid (raid 1) at min
--Daily (or even hourly) backups to external drive[s].
From the server you could also setup an automatic offsite backup. If the internet is out, you'd still have your external drive and just have it auto sync once it comes back.
Once it's setup it should be about 0 work.
You don't need anything "fancy" for offsite backup. Get a webhost that allows storing non-web data. sync via sftp or rsync over ssh. Store data on other end in true crypt container if your paranoid.
If you work for an employeer/contractor also ask them. Most places already have something in place or let you work with their IT.
My vote goes to dirvish (for linux). It uses rsync as backend but is very easy to configure.
It makes automatic, periodically and differential backups of directories. The big benefit is, that it creates hardlinks to all files not changed since the last backup. So restore is easy: Just copy last created directory back - instead of restoring all diffs one after another like other differential backup tools need to do.
I have the following backup scenarios and use rsync scripts to store on USB and network shares.
(weekly) Windows backup for "bare metal" recovery
Content of System drive C:\ using Windows Backup for quick recovery after physical disk failure, as I don't want to reinstall Windows and applications from scratch. This is configured to run automatically using Windows Backup schedule.
(daily and conditional) Active content backup using rsync
Rsync takes care of all changed files from laptop, phone, other devices. I backup laptop every night and after significant changes in content, like import of the recent photo RAWs from SD card to laptop.
I've created a bash script that I run from Cygwin on Windows to start rsync: https://github.com/paravz/windows-rsync-backup
If you're using deduplicaiton STAY AWAY from JungleDisk. Their restore client makes a mess of the reparsepoint, and makes the file unusable. You hopefully can fix it in safe mode with:
fsutil reparsepoint delete

Resources