Saving file to s3 while decompressing - linux

I am using the following command to download the file from s3 and zip it into smaller zips.
aws s3 cp "${INFILE}" - | gunzip | split -b 1000m --filter "gzip > ./logs/cdn/2021-04-14/\$FILE.gz | echo \"\$FILE.gz\""
This is working fine on local and saving the file in local.
I am not sure how to upload this file directly to s3 without saving in local after smaller zips are generated.

Related

How do I run a python script and files located in an aws s3 bucket

I have python script pscript.py which takes input parameters -c input.txt -s 5 -o out.txt. The files are all located in an aws s3 bucket. How do I run it after creating an instance? Do I have to mount the bucket on EC2 instance and execute the code? or use lambda? I am not sure. Reading so many aws documentations kinda confusing.
Command line run is as follows:
python pscript.py -c input.txt -s 5 -o out.txt
You should copy the file from Amazon S3 to the EC2 instance:
aws s3 cp s3://my-bucket/pscript.py
You can then run your above command.
Please note that, to access the object in Amazon S3, you will need to assign an IAM Role to the EC2 instance. The role needs sufficient permission to access the bucket/object.

How to delete file after sync from EC2 to s3

I have a file system where files can be dropped into an EC2 instance and I have a shell script running to sync the newly dropped files to an s3 bucket. I'm looking to delete the files off the E2C instance once they are synced. Specifically the files are dropped into the "yyyyy" folder.
Below is my shell code:
#!/bin/bash
inotifywait -m -r -e create "yyyyy" | while read -r NEWFILE
do
if lsof | grep "$NEWFILE" ; then
echo "$NEWFILE";
else
sleep 15
aws s3 sync yyyyy s3://xxxxxx-xxxxxx/
fi
Instead of using aws s3 sync, you could use aws s3 mv (which is a 'move').
This will copy the file to the destination, then delete the original (effectively 'moving' the file).
Can also be used with --recursive to move a whole folder, or --include and --exclude to specify multiple files.

Storing locally encrypted incremental ZFS snapshots in Amazon Glacier

To have truly off-site and durable backups of my ZFS pool, I would like to store zfs snapshots in Amazon Glacier. The data would need to be encrypted locally, independently from Amazon, to ensure privacy. How could I accomplish this?
An existing snapshot can be sent to a S3 bucket as following:
zfs send -R <pool name>#<snapshot name> | gzip | gpg --no-use-agent --no-tty --passphrase-file ./passphrase -c - | aws s3 cp - s3://<bucketname>/<filename>.zfs.gz.gpg
or for incremental back-ups:
zfs send -R -I <pool name>#<snapshot to do incremental backup from> <pool name>#<snapshot name> | gzip | gpg --no-use-agent --no-tty --passphrase-file ./passphrase -c - | aws s3 cp - s3://<bucketname>/<filename>.zfs.gz.gpg
This command will take an existing snapshot, serialize it with zfs send, compress it, and encrypt it with a passphrase with gpg. The passphrase must be readable on the first line in the ./passphrase file.
Remember to back-up your passphrase-file separately in multiple locations! - If you lose access to it, you'll never be able to get to your data again!
This requires:
A pre-created Amazon s3 bucket
awscli installed (pip install awscli) and configured (aws configure).
gpg installed
Lastly, S3 lifecycle rules can be used to transition the S3 object to glacier after a pre-set amount of time (or immediately).
For restoring:
aws s3 cp s3://<bucketname>/<filename>.zfs.gz.gpg - | gpg --no-use-agent --passphrase-file ./passphrase -d - | gunzip | sudo zfs receive <new dataset name>

How to scp and compress simultaneously without decompressing on the destination machine

I would like a efficient method to scp a huge directory to a machine, while simultaneously compressing the directory. I need only the compressed directory in the destination machine.
Is it possible without having to do this in 2 steps manually?
Use tar:
tar cfz - /path/to/local|ssh user#remotehost 'cd /desired/location; tar xfz -'
the local tar will create/compress your file structure, and output it to stdout (- for the filename), which gets piped through ssh to a tar on the remote host, which reads the compressed stream from stdin (- filename, again) and extracts the contents
If you only want the compressed file written out, then
tar ... | ssh user#remotehoust 'cat - > file.tar.gz'

How to create instance from already uploaded VMDK image at S3 bucket

I have already uploaded my VMDK file to the S3 bucket using following command:
s3cmd put /root/Desktop/centos-ldaprad.vmdk --multipart-chunk-size-mb=10 s3://xxxxx
Now When I would like to create AWS Instance from the same VMDK available at S3 bucket:
ec2-import-instance centos-ldaprad.vmdk -f VMDK -t t2.micro -a x86_64 -b xxxxx -o <XXXX_ACCESS_KEY_XXXX> -w <XXXX_SECRET_KEY_XXX> -p Linux --dont-verify-format -s 5 --ignore-region-affinity
But It looks on present working directory for the source VMDK file. I will be really greatful if you can guide to how to point source VMDK at bucket instead of local source?
Does this --manifest-url url points to the S3 bucket? But when I have uploaded do not have any idea whether it has created any such file? If it creates where it would be created?
Another thing is using above ec2-import-instance when I am creating it searches for VMDK on present working directory and if found it will start uploading. But is there any provision to make upload in parts and also to resume in case of interruption?
It's not really the answer you were after, but I've attached the script I use to upload VMDKs and convert them to AMI images.
This uses the ec2-resume-import, so you can restart it if a upload partially fails.
http://pastebin.com/bD8c3gQu
It's worth pointing out that when I register the device I specify a block device mapping. This is because my images always include a separate boot partition, and a LVM based root partition.
--root-device-name /dev/sda1 -b /dev/sda=$SNAPSHOT_ID:10:true --region $REGION -a x86_64 --kernel aki-52a34525

Resources