How to move a large folder to s3? - linux

I'm looking for the most suitable tool to transfer 600 GB of media from a Linux server to s3, so far I found s3 sync and s3cmd , but they do not work in background mode, tell me the best option?

You can run your command in tmux, or nohup. This way the AWS CLI command will persist after you logout. There are other ways, but I personally find tmux being my preferred choice.

Related

Cluster (sharing file storage) for file storage for PHP application

I am using a load balancer in aws and want to sync files in real time. I was trying to do it by rsync but it's not a real time we set it by cron. I want to do it by real time, I am using it in Singapore region and there is no EFS option.
There is a daemon called lsyncd, which does exactly what you need.
You can read further about it here
"rsync is an excellent and versatile backup tool, but it does have one drawback: you have to run it manually when you want to back up your data. Sure, you can use cron to create scheduled backups, but even this solution cannot provide seamless live synchronization. If this is what you want, then you need the lsyncd tool, a command-line utility which uses rsync to synchronize (or rather mirror) local directories with a remote machine in real time. To install lsyncd on your machine, download the latest .tar.gz archive from the project's Web site, unpack it, and use the terminal to switch to the resulted directory. Run then the ./configure command followed by make, and make install (the latter command requires root privileges). lsyncd is rather straightforward in use, as it features just one command and a handful of options"

How to use s3cmd multithreaded?

Our goal is to download about 5 millions of tiny files from AWS to CentOS server. We found s3cmd utility and it was very good for almost all things. Except downloading, because it is only supports one thread :( which follow to 60 days of downloading, which is just crazy!
Is there any new version of s3cmd or another way to download all files in multithread mode?
I have open s4cmd for myself, awesome tool. And yes, it supports multithreading.

Whats faster? Copy via nfs-mount or via scp?

we have a network of several machines and we want to distribute a big directory (ca. 10 GB) to every box.
It is located on an nfs-server and is mounted on all machines, so first approach is to just use normal cp to copy the files from the mounted to a local directory. This is easy, but unfortunately there is no progress bar, because it is not intended to use it for network copies (or is it?).
Using scp is intended for copying across network, but it may encrypt everything and therefore be slow.
Should one be faster, and if so, which: cp on nfs-mount or scp?
You can always use rsync, it can show you the progress (with --progress option) and is more lightweight than scp.
You can enable compression manually with -z.

How to download files from a Linux server from the command line?

I'd like to start using my localhost to develop from. I am trying to work out the best way to sync my local folder with the files directory on the remote web server. In some cases there will be 10,000+ files.
This is not for component files such as php, css, javascript etc. This is for content and media files which I do not wish to use git/svn for.
Thanks
I would recommend using rsync. It's built specifically for remote synchronization tasks. Take a look at the compression and differential modes.
There is always rsync
http://linux.about.com/library/cmd/blcmdl1_rsync.htm
It is built for exactly this kind of task, and optimized for syncing when there are small deltas to large file sets.
if rsync is out of the question you can try wget, it has some nifty fetuers.

How to scp to Amazon s3?

I need to send backup files of ~2TB to S3. I guess the most hassle-free option would be Linux scp command (have difficulty with s3cmd and don't want an overkill java/RoR to do so).
However I am not sure whether it is possible: How to use S3's private and public keys with scp, and don't know what would be my destination IP/url/path?
I appreciate your hints.
As of 2015, SCP/SSH is not supported (and probably never will be for the reasons mentioned in the other answers).
Official AWS tools for copying files to/from S3
command line tool (pip3 install awscli) - note credentials need to be specified, I prefer via environment variables rather than a file: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY.
aws s3 cp /tmp/foo/ s3://bucket/ --recursive --exclude "*" --include "*.jpg"
http://docs.aws.amazon.com/cli/latest/reference/s3/index.html
and an rsync-like command:
aws s3 sync . s3://mybucket
http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
Web interface:
https://console.aws.amazon.com/s3/home?region=us-east-1
Non-AWS methods
Any other solutions depend on third-party executables (e.g. botosync, jungledisk...) which can be great as long as they are supported. But third party tools come and go as years go by and your scripts will have a shorter shelf life.
https://github.com/ncw/rclone
EDIT: Actually, AWS CLI is based on botocore:
https://github.com/boto/botocore
So botosync deserves a bit more respect as an elder statesman than I perhaps gave it.
Here's just the thing for this, boto-rsync. From any Linux box, install boto-rsync and then use this to transfer /local/path/ to your_bucket/remote/path/:
boto-rsync -a your_access_key -s your_secret_key /local/path/ s3://your_bucket/remote/path/
The paths can also be files.
For a S3-compatible provider other than AWS, use --endpoint:
boto-rsync -a your_access_key -s your_secret_key --endpoint some.provider.com /local/path/ s3://your_bucket/remote/path/
You can't SCP.
The quickest way, if you don't mind spending money, is probably just to send it to them on a disk and they'll put it up there for you. See their Import/Export service.
Here you go,
scp USER#REMOTE_IP:/FILE_PATH >(aws s3 cp - s3://BUCKET/SAVE_FILE_AS_THIS_NAME)
Why don't you scp it to an EBS volume and then use s3cmd from there? As long as your EBS volume and s3 bucket are in the same region, you'll only be charged for inbound data charges once (from your network to the EBS volume)
I've found that once within the s3 network, s3cmd is much more reliable and the data transfer rate is far higher than direct to s3.
There is an amazing tool called Dragon Disk. It works as a sync tool even and not just as plain scp.
http://www.s3-client.com/
The Guide to setup the amazon s3 is provided here and after setting it up you can either copy paste the files from your local machine to s3 or setup an automatic sync. The User Interface is very similiar to WinSCP or Filezilla.
for our AWS backups we use a combination of duplicity and trickle duplicity for rsync and encryption and trickle to limit the upload speed

Resources