How to exclude binaries when using rsync - linux

I want to rsync a directory to server from a mac machine to linux machine while excluding compiled files like .o files and binary executables. How do I exclude binary files?
What I am using at the moment:
rsync -av --compress --exclude="*.o" dir server:dir

This is a sticky problem because a Unix system does not have a hard and fast definition of the distinction between "binary" and "text" files. You can do a pretty good job by using the file command and searching for text in the output (see How to tell binary from text files in linux), so I'd run find to generate a list of files which file considers to be text, and use that as the list of files to rsync:
find dir | xargs file | awk -F: '$2 ~ /text/ { print $1 }' | \
rsync --files-from=- -av --compress dir server:dir
This will require some tweaking to make sure the pathnames are correct relative to the source dir, and so on, but it should get close to what you want.
In the long term, I'd want to rework my build process to put generated files in a dir/build directory, but this might help for now :-)

You can add a .cvsignore file in the directories and use the option -C to rsync.
But this is only vaguely what you specified. Maybe it suits you well, maybe it assumes other things than you. So be careful and test that properly.
Also, you can run a find before the rsync, scanning the complete tree for files matching your idea of being "binary" (maybe compiled executables?), and place all their names in an exclude file which you then use with option --exclude-from.

Related

Bash Scripting with xargs to BACK UP files

I need to copy a file from multiple locations to the BACK UP directory by retaining its directory structure. For example, I have a file "a.txt" at the following locations /a/b/a.txt /a/c/a.txt a/d/a.txt a/e/a.txt, I now need to copy this file from multiple locations to the backup directory /tmp/backup. The end result should be:
when i list /tmp/backup/a --> it should contain /b/a.txt /c/a.txt /d/a.txt & /e/a.txt.
For this, I had used the command: echo /a/*/a.txt | xargs -I {} -n 1 sudo cp --parent -vp {} /tmp/backup. This is throwing the error "cp: cannot stat '/a/b/a.txt /a/c/a.txt a/d/a.txt a/e/a.txt': No such file or directory"
-I option is taking the complete input from echo instead of individual values (like -n 1 does). If someone can help debug this issue that would be very helpful instead of providing an alternative command.
Use rsync with the --relative (-R) option to keep (parts of) the source paths.
I've used a wildcard for the source to match your example command rather than the explicit list of directories mentioned in your question.
rsync -avR /a/*/a.txt /tmp/backup/
Do the backups need to be exactly the same as the originals? In most cases, I'd prefer a little compression. [tar](https://man7.org/linux/man-pages/man1/tar.1.html) does a great job of bundling things including the directory structure.
tar cvzf /path/to/backup/tarball.tgz /source/path/
tar can't update compressed archives, so you can skip the compression
tar uf /path/to/backup/tarball.tar /source/path/
This gives you versioning of a sort, as if only updates changed files, but keeps the before and after versions, both.
If you have time and cycles and still want the compression, you can decompress before and recompress after.

Is there a way to download files matching a pattern trough SFTP on shell script?

I'm trying to download multiple files trough SFTP on a linux server using
sftp -o IdentityFile=key <user>#<server><<END
get -r folder
exit
END
which will download all contents on a folder. It appears that find and grep are invalid commands, so are for loops.
I need to download files having a name containing a string e.g.
test_0.txt
test_1.txt
but no file.txt
Do you really need the -r switch? Are there really any subdirectories in the folder? You do not mention that.
If there are no subdirectories, you can use a simple get with a file mask:
cd folder
get *test*
Are you required to use sftp? A tool like rsync that operates over ssh has flexible include/exclude options. For example:
rsync -a <user>#<server>:folder/ folder/ \
--include='test_*.txt' --exclude='*.txt'
This requires rsync to be installed on the remote system, but that's very common these days. If rsync isn't available, you could do something similar using tar:
ssh <user>#<server> tar -cf- folder/ | tar -xvf- --wildcards '*/test_*.txt'
This tars up all the files remotely, but then only extracts files matching your target pattern on the receiving side.

rsync - copy files with same name

I have some different files with the same name and I want to copy all of them to the destination which has a flat structure (no directories, just files), is there any way to append some text onto one of the file names so that both can be copied.
Need to use rsync because there are some files that I need to exclude from the copy.
For example:
dir1/file1.txt
dir1/dir2/file1.txt
both get copied, and in the destination there is:
file1.txt
file1.txt.txt
typically, when I want to do some complex name-mungling, I just write the list of files (with find dir1 >listfiles) and fix it with a text editor.
for example, s/^.*\/([^\/]+)$/cp \0 destination/\1/ converts a file like
dir1/file1.txt
dir1/dir2/file1.txt
to a script like:
cp dir1/file1.txt destination/file1.txt
cp dir1/dir2/file1.txt destination/file1.txt
then you could do something like cut -f 3 <listfiles | sort | uniq -d to find those with the same destination filename. then go back to the editor and fix those lines.
After a few minutes you get a full script for exactly the copy you want, without surprises because you can see each command and apply the best fix for each case.
As far as i know there is no default option in rsync to do that. But i guess that since you are copying files with the same name but from different directories, you are using
multiple rsync commands.
So, this gives you two options:
Create folders..
rsync -av /home/user1/file1 /media/foo/user1/file1
rsync -av /home/user2/file1 /media/foo/user2/file1
etc..
or rename the files with an id
rsync -av /home/user1/file1 /media/foo/parent_dir-file1
rsync -av /home/user2file1 /media/foo/parent_dir-file1
etc..
If you want to use the second solution you can build a simple script. As you are using rsync i suppose that you know the basics on GNU-Linux, so a simple bash script would be enough!
A basic ID is to get the parent folder name and add it as variable to the path of the rsync command. ( it won't always work )
IF you want to be sure of a good id you can for example set a counter and increment like
file1-1
file1-2
file1-3
But you will loose the track of its absolute path.
All the solutions can work, its up to you to choice the one that feed your needs!

Verify copied data between Windows & Linux shares?

I just copied a ton of data from a machine running Windows 7 Ultimate to a server running Ubuntu Server LTS 10.04. I used the robocopy utility via PowerShell to accommplish this task, but I couldn't find any informaiton online regarding whether Robocopy verifies the copied file's integrity once it is copied to the server.
First of all, does anyone know if this is done inherently? There is no switch that explicitly allows you to add verification to a file transfer.
Second, if it doesn't or there is uncertainty about whether or not it does, what would be the simplest method to accomplish this for multiple directories with several files/sub-directories?
Thanks!
The easiest mechanism I know would rely on an md5sum and Unix-like find utility on your Windows machine.
You can generate a manifest file of filenames / md5sums:
find /source -type f -exec md5sum {} \; > MD5SUM
Copy the MD5SUM file to your Linux machine, and then run:
cd /path/to/destination
md5sum --quiet -c MD5SUM
You'll get a list of files that fail:
$ md5sum --quiet -c /tmp/MD5SUM
/etc/aliases: FAILED
md5sum: WARNING: 1 of 341 computed checksums did NOT match
Much easier method is to use unix uilities diff and rsync
With diff you can compare two file but you can compare also two directories. With diff I would recomend this command:
diff -r source/directory/ destination/directory
-r forces diff to analyse directorysor recursively
The second option is to use rsync which is ment to sync files or directories but with -n option you can use it also to analyes differencies between directories. rsync also works even when the files are not on the same host it means on one there can be remote host and you can acess it even with ssh. Pluss rsync is really flexible with its meny options avaialbe
rsync -avn --itemize-changes --progress --stats source/directory/ destination/directory/
-n option makes rsync do a "dry-run", meaning it makes no changes on the
-a otion includes in it "Recursive mode,symbolic links,file permissions, file timestamps, file owner parameter, file group paratmeter
-v increase verbosity
--itemize-changes output a change-summary for all updates
Here you can find even more ways how to compear directories:
https://askubuntu.com/questions/12473/file-and-directory-comparison-tool
On rsync wikipedia page you can find windows alternative programs for rsync

What's the best way to move a directory into place in a Makefile install?

I'm currently using the usual technique in my Makefile to install individual files:
install:
install -D executable ${BIN_DIR}
But I just ran across a situation where I need to move a whole directory and all files underneath it into place.
Is cp -r the best way or is there a more linux-y/unix-y way to do this?
Yeah, it's hard to think of a more unix-ish way that cp -r, although the -r is a relatively late addition to cp. I can tell you the way we used to do it, and that works neatly across filesystems and such:
Let src be the source directory you want to move, and /path/to/target be an absolute path to the target. Then you can use:
$ tar cf - src | (cd /path/to/target; tar xf -)
My version of install(1) (Debian) has:
-d, --directory
treat all arguments as directory names; create all components of the specified directories
-t, --target-directory=DIRECTORY
copy all SOURCE arguments into DIRECTORY
So if you wanted to use install(1) consistently throughout your Makefile you could do:
install -d destdir
install srcdir/* -t destdir
-t isn't recursive however - if srcdir contains directories, then they won't get copied.
Linking is another viable alternative. That would allow you to keep multiple directories (representing different versions) accessible.

Resources