Retain owner and file permissions info when syncing to AWS S3 Bucket from Linux - linux

I am syncing a directory to AWS S3 from a Linux server for backup.
rsync -a --exclude 'cache' /path/live /path/backup
aws s3 sync path/backup s3://myBucket/backup --delete
However, I noticed that when I want to restore a backup like so:
aws s3 sync s3://myBucket/backup path/live/ --delete
The owner and file permissions are different. Is there anything I can do or change in the code to retain the original Linux information of the files?
Thanks!

I stumbled on this question while looking for something else and figured you (or someone) might like to know you can use other tools that can preserve original (Linux) ownership information.
There must be others but I know that s3cmd can keep the ownership information (stored in the metadata of the object in the bucket) and restore it if you sync it back to a Linux box.
The syntax for syncing is as follows
/usr/bin/s3cmd --recursive --preserve sync /path/ s3://mybucket/path/
And you can sync it back with the same command just reversing the from/to.
But, as you might know (if you did a little research on S3 costs optimisation), depending on the situation, it could be wiser to use a compressed file.
It saves space and it should take less requests so you could end up with some savings at the end of the month.
Also, s3cmd is not the fastest tool to synchronise with S3 as it does not use multi-threading (and is not planning to) like other tools, so you might want to look for other tools that could preserve ownership and profits of multi-threading if that's still what you're looking for.
To speedup data transfer with s3cmd, you could execute multiple s3cmd with different --exclude --include statements.
For example
/usr/bin/s3cmd --recursive --preserve --exclude="*" --include="a*" sync /path/ s3://mybucket/path/ & \
/usr/bin/s3cmd --recursive --preserve --exclude="*" --include="b*" sync /path/ s3://mybucket/path/ & \
/usr/bin/s3cmd --recursive --preserve --exclude="*" --include="c*" sync /path/ s3://mybucket/path/

Related

How to copy files to the timestamp auto generated folder?

Hello I am trying to copy all files from Documents directory to the backup directory that has a timestamp. So I have created a folder called bk$( the time stamp of the folder) and I am trying to copy files from the Documents directory to the new created folder that is unique. This will be in a crontab backing up files from documents and when the backup will kick in, it will create new directory for each backup that is uniquely identified by the folder timestamp. For some reason I cannot get the cp or cpio -mdp. Now someone had mentioned I could use $PATH variable which seems promising, if that is the solution, if someone could help me out on making it work.
bkdest=home/user/backup/
bksource="/home/user/Documents/"
export PATH=/$bkdest:$PATH
mkdir /"$bkdest"bk.$(date +%Y_%m_%d_%H_%M_%S)
cp /"$bksource"* $PATH
My other approach which I have tried to use to make it work:
cp $bksource * ls | tail -l | $PATH
I could have gone with the ctime but unfortunately it does not work with the folder creation date.
This was my approach but with the latest created folder and not file
find $HOME -type d -daystart ctime 0
If someone could please help me out to copy to that new folder, I would really appreciate it. Thank you!
Store the target name in a variable:
bkdest=/home/user/backup
bksource=/home/user/Documents
target=${bkdest}/bk.$(date +%Y_%m_%d_%H_%M_%S)
mkdir -p $target
cp ${bksource}/* ${target}/
Note I tidied up your use of variables a little.
Also, this won't copy subdirectories. For that you need to use cp -R. When I do backups I prefer to use rsync.
I did not fully understand your approach or what exactly you want to do but here it goes.
CP Approach
You should not use cp for backups, rsync is far more suitable for this. But if for some reason you really need to use cp, you can use the following script.
#!/bin/bash
BKP_DIR=/tmp/bkp
BKP_SRC=/tmp/foo
SNAPSHOT=${BKP_DIR}/$(date +%F.%H-%M-%S.%N)
mkdir -p ${SNAPSHOT}
cp -r ${BKP_SRC}/* ${SNAPSHOT}
Rsync Approach
No big change here.
#!/bin/bash
BKP_DIR=/tmp/bkp
BKP_SRC=/tmp/foo
SNAPSHOT=${BKP_DIR}/$(date +%F.%H-%M-%S.%N)
rsync -a ${BKP_SRC}/ ${SNAPSHOT}/
Improved Rsync Approach (RECOMMENDED)
#!/bin/bash
BKP_DIR=/tmp/bkp
BKP_SRC=/tmp/foo
SNAPSHOT=${BKP_DIR}/$(date +%F.%H-%M-%S.%N)
LATEST=${BKP_DIR}/latest
rsync \
--archive \
--delete \
--backup \
--backup-dir=${SNAPSHOT} \
--log-file=${BKP_DIR}/rsync.log \
${BKP_SRC}/ ${LATEST}/
EXPLAINING: --archive plus --delete will make sure that $LATEST is a perfect copy of $BKP_SRC, it means that files that no longer exist in $BKP_SRC will be deleted from $LATEST. The --archive option also ensure that permissions and owners will be maintained, symlinks will be copied as symlinks, and more (look at man rsync for more information).
The --backup plus --backup-dir options will create a backup directory to put differential files. In other words, all files that were deleted or modified since last backup will be put in there, so you do not lost them as they are deleted from $LATEST.
--log-file is optional, but it is aways good to keep logs for debug purposes.
At the end you have an incremental backup.

How to do a backup of files using the terminal?

I've already done a backup of my database, using mysqldump like this:
mysqldump -h localhost -u dbUsername -p dbDatabase > backup.sql
After that the file is in a location outside public access, in my server, ready for download.
How may I do something like that for files? I've tried to google it, but I get all kind of results, but that.
I need to tell the server running ubuntu to backup all files inside folder X, and put them into a zip file.
Thanks for your help.
You can use tar for creating backups for a full system backup
tar -cvpzf backup.tar.gz --exclude=/backup.tar.gz /
for a single folder
tar -cvpzf backup.tar.gz --exclude=/backup.tar.gz /your/folder
to create a gzipped tar file of your whole system. You might need additional excludes like
--exclude=/proc --exclude=/sys --exclude=/dev/pts.
If you are outside of the single folder you want to backup the --exclude=/backup.tar.gz isn't needed.
More details for example here (you can do it over network, split the archive etc.).

Compare two folders containing source files & hardlinks, remove orphaned files

I am looking for a way to compare two folders containing source files and hard links (lets use /media/store/download and /media/store/complete as an example) and then remove orphaned files that don't exist in both folders. These files may have been renamed and may be stored in subdirectories.
I'd like to set this up on a cron script to run regularly. I just can't logically figure out myself how work the logic of the script - could anyone be so kind as to help?
Many thanks
rsync can do what you want, using the --existing, --ignore-existing, and --delete options. You'll have to run it twice, once in each "direction" to clean orphans from both source and target directories.
rsync -avn --existing --ignore-existing --delete /media/store/download/ /media/store/complete
rsync -avn --existing --ignore-existing --delete /media/store/complete/ /media/store/download
--existing says don't copy orphan files
--ignore-existing says don't update existing files
--delete says delete orphans on target dir
The trailing slash on the source dir, and no trailing slash on the target dir, are mandatory for your task.
The 'n' in -avn means not to really do anything, and I always do a "dry run" with the -n option to make sure the command is going to do what I want, ESPECIALLY when using --delete. Once you're confident your command is correct, run it with just -av to actually do the work.
Perhaps rsync is of use ?
Rsync is a fast and extraordinarily versatile file copying tool. It
can copy locally, to/from another host over any remote shell, or
to/from a remote rsync daemon. It offers a large number of options
that control every aspect of its behavior and permit very flexible
specification of the set of files to be copied. It is famous for its
delta-transfer algorithm, which reduces the amount of data sent over
the network by sending only the differences between the source files
and the existing files in the destination. Rsync is widely used for
backups and mirroring and as an improved copy command for everyday
use.
Note it has a --delete option
--delete delete extraneous files from dest dirs
which could help with your specific use case above.
You can also use "diff" command to list down all the different files in two folders.

how to scp multiple files from multiple directories, while different files in different directories may have the same name

I want to scp several files from remote to local, the files in remote is like this:
/data/1792348/a.stat
/data/1792348/b.stat
/data/187657/a.stat
/data/187657/b.stat
... ...
1792348 187657 etc, the middle directory name is random.
how can i scp all the files ends with .stat from remote to local?
if i tried scp -P36000 user#host:/data//*.stat .*, i can only get 2 files a.stat b.stat.
why i can's submit this question?
i really don't know how to solve this, and hadn't search a answer from google.
i would use rsync (which uses scp internally; but is way more elaborate, e.g. it will only transmit minimal changesets of data, so if you run it several times, you will get an impressive speedup)
rsync -avz /data/ \
--include "*/" --include "*.stat" --exclude "*" \
user#host:/path/to/dest/data/

Google Cloud Storage - GSUtil - Copy files, skip existing, do not overwrite

I want to sync a local directory to a bucket in Google Cloud Storage. I want to copy the local files that do not exist remotely, skipping files that already exist both remote and local. Is this possible to do this with GSUtil? I cant seem to find a "sync" option for GSUtil or a "do not overwrite". Is it possible to script this?
I am on Linux (Ubuntu 12.04)?
gsutil supports the noclobber flag (-n) on the cp command. This flag will skip files that already exist at the destination.
You need to add (-n) to the command, mentioned officially on Google Cloud Platform:
-n: No-clobber. When specified, existing files or objects at the destination will not be overwritten. Any items that are skipped by this option will be reported as being skipped. This option will perform an additional GET request to check if an item exists before attempting to upload the data. This will save retransmitting data, but the additional HTTP requests may make small object transfers slower and more expensive.
Example (Using multithreading):
gsutil -m cp -n -a public-read -R large_folder gs://bucket_name
Using rsync, you can copy missing/modified files/objects:
gsutil -m rsync -r <local_folderpath> gs://<bucket_id>/<cloud_folderpath>
Besides, if you use the -d option, you will also delete files/objects in your bucket that are not longer present locally.
Another option could be to use Object Versioning, so you will replace the files/objects in your bucket with your local data, but you can always go back to the previous version.

Resources