To have truly off-site and durable backups of my ZFS pool, I would like to store zfs snapshots in Amazon Glacier. The data would need to be encrypted locally, independently from Amazon, to ensure privacy. How could I accomplish this?
An existing snapshot can be sent to a S3 bucket as following:
zfs send -R <pool name>#<snapshot name> | gzip | gpg --no-use-agent --no-tty --passphrase-file ./passphrase -c - | aws s3 cp - s3://<bucketname>/<filename>.zfs.gz.gpg
or for incremental back-ups:
zfs send -R -I <pool name>#<snapshot to do incremental backup from> <pool name>#<snapshot name> | gzip | gpg --no-use-agent --no-tty --passphrase-file ./passphrase -c - | aws s3 cp - s3://<bucketname>/<filename>.zfs.gz.gpg
This command will take an existing snapshot, serialize it with zfs send, compress it, and encrypt it with a passphrase with gpg. The passphrase must be readable on the first line in the ./passphrase file.
Remember to back-up your passphrase-file separately in multiple locations! - If you lose access to it, you'll never be able to get to your data again!
This requires:
A pre-created Amazon s3 bucket
awscli installed (pip install awscli) and configured (aws configure).
gpg installed
Lastly, S3 lifecycle rules can be used to transition the S3 object to glacier after a pre-set amount of time (or immediately).
For restoring:
aws s3 cp s3://<bucketname>/<filename>.zfs.gz.gpg - | gpg --no-use-agent --passphrase-file ./passphrase -d - | gunzip | sudo zfs receive <new dataset name>
Related
I need to transfer around 50 databases from one server to other. Tried two different backup options.
1. pg_dump -h XX.XXX.XX.XXX -p pppp -U username -Fc -d db -f db.cust -- all backups in backup1 folder size 10GB
2. pg_dump -h XX.XXX.XX.XXX -p pppp -U username -j 4 -Fd -d db -f db.dir -- all backups in backup2 folder size 10GB
then transferred to other server for restoration using scp.
scp -r backup1 postgres#YY.YYYY.YY.YYYY:/backup/
scp -r backup2 postgres#YY.YYYY.YY.YYYY:/backup/
Noticed a strange thing. Though the backup folder size are same for both backup but it takes different time to transfer using scp. For directory format backup, transfer is 4 times than custom format backup. Both the SCP done in same network and tried multiple times but result are some.also tried rsync but no difference.
please suggest, what could be the reason for slowness and how can I speed up. I am open to use any other method to transfer.
Thanks
I am using the following command to download the file from s3 and zip it into smaller zips.
aws s3 cp "${INFILE}" - | gunzip | split -b 1000m --filter "gzip > ./logs/cdn/2021-04-14/\$FILE.gz | echo \"\$FILE.gz\""
This is working fine on local and saving the file in local.
I am not sure how to upload this file directly to s3 without saving in local after smaller zips are generated.
I have python script pscript.py which takes input parameters -c input.txt -s 5 -o out.txt. The files are all located in an aws s3 bucket. How do I run it after creating an instance? Do I have to mount the bucket on EC2 instance and execute the code? or use lambda? I am not sure. Reading so many aws documentations kinda confusing.
Command line run is as follows:
python pscript.py -c input.txt -s 5 -o out.txt
You should copy the file from Amazon S3 to the EC2 instance:
aws s3 cp s3://my-bucket/pscript.py
You can then run your above command.
Please note that, to access the object in Amazon S3, you will need to assign an IAM Role to the EC2 instance. The role needs sufficient permission to access the bucket/object.
I have a file system where files can be dropped into an EC2 instance and I have a shell script running to sync the newly dropped files to an s3 bucket. I'm looking to delete the files off the E2C instance once they are synced. Specifically the files are dropped into the "yyyyy" folder.
Below is my shell code:
#!/bin/bash
inotifywait -m -r -e create "yyyyy" | while read -r NEWFILE
do
if lsof | grep "$NEWFILE" ; then
echo "$NEWFILE";
else
sleep 15
aws s3 sync yyyyy s3://xxxxxx-xxxxxx/
fi
Instead of using aws s3 sync, you could use aws s3 mv (which is a 'move').
This will copy the file to the destination, then delete the original (effectively 'moving' the file).
Can also be used with --recursive to move a whole folder, or --include and --exclude to specify multiple files.
I have already uploaded my VMDK file to the S3 bucket using following command:
s3cmd put /root/Desktop/centos-ldaprad.vmdk --multipart-chunk-size-mb=10 s3://xxxxx
Now When I would like to create AWS Instance from the same VMDK available at S3 bucket:
ec2-import-instance centos-ldaprad.vmdk -f VMDK -t t2.micro -a x86_64 -b xxxxx -o <XXXX_ACCESS_KEY_XXXX> -w <XXXX_SECRET_KEY_XXX> -p Linux --dont-verify-format -s 5 --ignore-region-affinity
But It looks on present working directory for the source VMDK file. I will be really greatful if you can guide to how to point source VMDK at bucket instead of local source?
Does this --manifest-url url points to the S3 bucket? But when I have uploaded do not have any idea whether it has created any such file? If it creates where it would be created?
Another thing is using above ec2-import-instance when I am creating it searches for VMDK on present working directory and if found it will start uploading. But is there any provision to make upload in parts and also to resume in case of interruption?
It's not really the answer you were after, but I've attached the script I use to upload VMDKs and convert them to AMI images.
This uses the ec2-resume-import, so you can restart it if a upload partially fails.
http://pastebin.com/bD8c3gQu
It's worth pointing out that when I register the device I specify a block device mapping. This is because my images always include a separate boot partition, and a LVM based root partition.
--root-device-name /dev/sda1 -b /dev/sda=$SNAPSHOT_ID:10:true --region $REGION -a x86_64 --kernel aki-52a34525