How to scp to Amazon s3? - linux

I need to send backup files of ~2TB to S3. I guess the most hassle-free option would be Linux scp command (have difficulty with s3cmd and don't want an overkill java/RoR to do so).
However I am not sure whether it is possible: How to use S3's private and public keys with scp, and don't know what would be my destination IP/url/path?
I appreciate your hints.

As of 2015, SCP/SSH is not supported (and probably never will be for the reasons mentioned in the other answers).
Official AWS tools for copying files to/from S3
command line tool (pip3 install awscli) - note credentials need to be specified, I prefer via environment variables rather than a file: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY.
aws s3 cp /tmp/foo/ s3://bucket/ --recursive --exclude "*" --include "*.jpg"
http://docs.aws.amazon.com/cli/latest/reference/s3/index.html
and an rsync-like command:
aws s3 sync . s3://mybucket
http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
Web interface:
https://console.aws.amazon.com/s3/home?region=us-east-1
Non-AWS methods
Any other solutions depend on third-party executables (e.g. botosync, jungledisk...) which can be great as long as they are supported. But third party tools come and go as years go by and your scripts will have a shorter shelf life.
https://github.com/ncw/rclone
EDIT: Actually, AWS CLI is based on botocore:
https://github.com/boto/botocore
So botosync deserves a bit more respect as an elder statesman than I perhaps gave it.

Here's just the thing for this, boto-rsync. From any Linux box, install boto-rsync and then use this to transfer /local/path/ to your_bucket/remote/path/:
boto-rsync -a your_access_key -s your_secret_key /local/path/ s3://your_bucket/remote/path/
The paths can also be files.
For a S3-compatible provider other than AWS, use --endpoint:
boto-rsync -a your_access_key -s your_secret_key --endpoint some.provider.com /local/path/ s3://your_bucket/remote/path/

You can't SCP.
The quickest way, if you don't mind spending money, is probably just to send it to them on a disk and they'll put it up there for you. See their Import/Export service.

Here you go,
scp USER#REMOTE_IP:/FILE_PATH >(aws s3 cp - s3://BUCKET/SAVE_FILE_AS_THIS_NAME)

Why don't you scp it to an EBS volume and then use s3cmd from there? As long as your EBS volume and s3 bucket are in the same region, you'll only be charged for inbound data charges once (from your network to the EBS volume)
I've found that once within the s3 network, s3cmd is much more reliable and the data transfer rate is far higher than direct to s3.

There is an amazing tool called Dragon Disk. It works as a sync tool even and not just as plain scp.
http://www.s3-client.com/
The Guide to setup the amazon s3 is provided here and after setting it up you can either copy paste the files from your local machine to s3 or setup an automatic sync. The User Interface is very similiar to WinSCP or Filezilla.

for our AWS backups we use a combination of duplicity and trickle duplicity for rsync and encryption and trickle to limit the upload speed

Related

How to move a large folder to s3?

I'm looking for the most suitable tool to transfer 600 GB of media from a Linux server to s3, so far I found s3 sync and s3cmd , but they do not work in background mode, tell me the best option?
You can run your command in tmux, or nohup. This way the AWS CLI command will persist after you logout. There are other ways, but I personally find tmux being my preferred choice.

Quick way of uploading many images (3000+) to a google cloud server

I am working on object detection for a school project. To train my CNN model I am using a google cloud server because I do not own a strong enough GPU to train it locally.
The training data consists of images (.jpg files) and annotations (.txt files) and is spread over around 20 folders due to the fact that they come from different sources and I do not want to mix pictures from different sources so I want to keep this directory structure.
My current issue is that I could not find a fast way of uploading them to my google cloud server.
My workaround was to upload those image folders as a .zip file on google drive and download them on the cloud and unzip them there. This process needs way too much time because I have to upload many folders and google drive does not have a good API to download folders to Linux.
On my local computer, I am using Windows 10 and my cloud server runs Debian.
Therefore, I'd be really grateful if you know a fast and easy way to either upload my images directly to the server or at least to upload my zipped folders.
Couldn't you just create an infinite loop to look for jpg files and scp/sftp the jpg directly to your server once the file is there? On windows, you can achieve this using WSL.
(sorry this may not be your final answer, but i don't have the reputation to ask you this question)
I would upload them to a Google Cloud Storage bucket using gsutil with multithreading. This means that multiple files are copied at once, so the only limitation here is your internet speed. Gsutil installers for Windows and Linux are found here. Example command:
gsutil -m cp -r dir gs://my-bucket
Then on the VM you do exactly the opposite:
gsutil -m cp -r gs://my-bucket dir
This is super fast, and you only pay a small amount for the storage, which is super cheap and/or falls within the GCP free tier.
Note: make sure you have write permissions on the storage bucket and the default compute service account (i.e. the VM service account) has read permissions on the storage bucket.
The best stack for the use case will be gsutil + storage bucket
Copy the zip files to cloud storage bucket and put a sync cron to get the files on the VM.
Make use of gsutil
https://cloud.google.com/storage/docs/gsutil

Stream dd stdout from ec2 to s3

Trying to save money on EBS snapshots, so the idea is to take manual copies of the file systems (using dd) and storing manually in S3 to lifecycle to IA and Glacier.
The following works fine for smaller files (tested with 1GB), but on larger (~800GB), after around 40GB, everything slows to a crawl and never finishes
sudo dd if=/dev/sdb bs=64M status=progress | aws s3 cp - s3://my-bucket/sdb_backup.img --sse AES256 --storage-class STANDARD_IA
Running this from an m4.4xlarge instance (16 vcpu, 64GB RAM)
Not exactly sure why it's crawling to a halt, or whether this is the best way to solve this problem (manually storing file systems on s3 Infrequent Access storage class)
Any thoughts?
Thanks!!
It is not a good idea because snapshots are incremental, so you'll spend more starting from the next few hand-made snapshots.
If you still want this way then consider multi-part upload (chunks up to 5GB).
You can use something like goofys to redirect output to S3. I've personally tested with files up to 1TB.
First consider the multipart upload for large sizes.
Second use the compressed version,
dd if=/dev/sdX | gzip -c | aws s3 cp - s3://bucket-name/desired_image_name.img
If you wish to copy files to Amazon S3, the easiest method is to use the AWS Command-Line Interface (CLI):
aws s3 sync dir s3://my-bucket/dir
As an alternative to Standard-Infrequent Access, you could create a lifecycle policy on the S3 bucket to move files to Glacier. (This is worthwhile for long-term storage, but not for the short-term due to higher request charges.)

What is the best approach in nodejs to switch between the local filesystem and Amazon S3 filesystem?

I'm looking for the best way to switch between using the local filesystem and the Amazon S3 filesystem.
I think ideally I would like a wrapper to both filesystems that I can code against. A configuration change would tell the wrapper which filesystem to use. This is useful to me because a developer can use their local filesystem, but our hosted environments can use Amazon S3 by just changing a configuration option.
Are there any existing wrappers that do this? Should I write my own wrapper?
Is there another approach I am not aware of that would be better?
There's a project named s3fs that offers a subset of POSIX file system function on top of S3. There's no native Amazon-provided way to do this.
However, you should think long and hard about whether or not this is a sensible option. S3 is an object store, not a regular file system, and it has quite different performance and latency characteristics.
If you're looking for high iops, NAS-style storage then Amazon EFS (in preview) would be more appropriate. Or roll your own NFS/CIFS solution using EBS volumes or SoftNAS or Gluster.
I like your idea to build a wrapper that can use either the local file system or S3. I'm not aware of anything existing that would provide that for you, but would certainly be interested to hear if you find anything.
An alternative would be to use some sort of S3 file system mount, so that your application can always use standard file system I/O but the data might be written to S3 if your system has that location configured as an S3 mount. I don't recommend this approach because I've never heard of an S3 mounting solution that didn't have issues.
Another alternative is to only design your application to use S3, and then use some sort of S3 compatible local object storage in your development environment. There are several answers to this question that could provide an S3 compatible service during development.
There's a service called JuiceFS that can do what you want.
According to their documentation:
JuiceFS is a POSIX-compatible shared filesystem specifically designed
to work in the cloud.
It is designed to run in the cloud so you can utilize the cheap price
of object storage service to store your data economically.
It is a
POSIX-compatible filesystem so you can access your data seamlessly as
accessing local files.
It is a shared filesystem so you can share your
files across multiple machines.
s3 is one of the backends supported, you can even configure it to replicate files to a different object storage system on another cloud.

GNU make's install target to push files on a remote SSH?

I'm working on a project that needs to be tested on an embedded Linux system. After every little change, I have to scp all files to the device over a SSH connection. Can you suggest a more convenient way to deploy files on a remote target? For example some trick on make's install command:
make install INSTALL='scp 192.168.1.100:/'
or something.
if you can use scp, you can probably also use rsync, specifically rsync over ssh. Use of rsync has as advantage is that it builds a delta of source and destination files, and transfers only what is necessary. In case of transfer after changing very little this would be of considerable benefit. I'd probably invoke it if building completes without error, like make ... && upload (where upload could be a script covering the details of transfer)
Just for completeness, sshfs is often quite useful. You can mount a remote folder visible over ssh on to a folder on your local hard disk. Performance is not great, but certainly serviceable enough for a deploy step, and it's transparent to all tools.

Resources