Force rsync to transfer files in the order listed in the from-file param - cygwin

Say you have a list of files in filelist.txt and call rynsc like so:
$rsync --files-from=filelist.txt /path/to/source /path/to/destination
Rsync sorts the files in filelist.txt before processing them. According to the man page, this is to make the transfer more efficient.
How do I turn that off? I have a case where I want the files transferred in a very specific order and I don't care if it makes the transfer less efficient.

cat filelist.txt | xargs -n1 -I{} rsync --progress "/path-from/{}" "/path-to/{}"
This should pass each file to rsync via xargs one line at a time.
You can replace
cat xxx.txt
with
ls -t | xargs ....
if you prefer.
Indeed rsync annoyingly doesn't use the order that you specified in the list.

Looks like two options:
Do one at a time.
Patch rsync
Wish there was a better answer. It would take a good look at the source code to reveal exactly what it's doing, but it looks alphabetical.

Related

Compare files content with similar names on two folders

I have two folders (I'll use database names as example):
MongoFolder/
CassandraFolder/
These two folders have similar files inside like:
MongoFolder/
MongoFile
MongoStatus
MongoConfiguration
MongoPlugin
CassandraFolder/
CassandraFile
CassandraStatus
CassandraConfiguration
Those files have content also very similar, only changing the name of the database for example, so they all have code or configuration only changing the name Mongo for Cassandra.
How can I compare this two folders, so the result is the files missing from one to the other (for example the file CassandraPlugin for the CassandraFolder) and also that the contents of the files alike, have to be similar, only changing the database name.
This will give you the names of the missing files (minus the database name):
find MongoFolder/ CassandraFolder/ | \
sed -e s/Mongo//g -e s/Cassandra//g | sort | uniq -u
Output:
Folder/Plugin
the following provides a full diff, including missing files and changed content:
cp -r CassandraFolder cmpFolder
# rename files
find cmpFolder -name "Cassandra*" -print | while read file; do
mongoName=`echo "$file" | sed 's/Cassandra/Mongo/'`
mv "$file" "$mongoName"
done
# fix content
find cmpFolder -type f -exec perl -pi -e 's/Cassandra/Mongo/g' {} \;
# inspect result
diff -r MongoFolder cmpFolder # or use a gui tool like kdiff3
I haven't tested this though, feel free fix bugs or to ask if something specific is unclear.
Instead of mv you can also use rename but that's different on different flavours of linux.

shell script to download latest file from FTP

I am writing shell script first time, I want to download latest create file from FTP.
I want to download latest file of specific folder. Below is my code for that. But it is downloading all the files of the folder not the latest one.
ftp -in ftp.abc.com << SCRIPTEND
user xyz xyz
binary
cd Rpts/
mget ls -t -r | tail -n 1
quit
SCRIPTEND
help me with this, please?
Try using wget or lftp utility instead, it compares file time/date and AFAIR its purpose is ftp scripting. Switch to ssh/rsync if possible, you can read a bit about lftp instead of rsync here:
https://serverfault.com/questions/24622/how-to-use-rsync-over-ftp
Probably the easiest way is to link last version on server side to "current", and always get the file pointed. If you're not admin of the server, you need to list all files with date/time, grab the information, parse it, decide which one is newest, in the meantime state on the server can change, and you find yourself in more complicated solution than it's worth.
The point is, that "ls" sorts output in some way, and time may not be default. There are switches to sort it e.g. base on modification time, however even when server responds with OK on ls -t , you can't be sure it really supports sorting, it can just ignore all switches and always return the same list, that's why admins usually use "current" link (ln -s). If there's no "current", to make sure you have the right file, you need to parse list anyway ( ls -al ).
http://www.catb.org/esr/writings/unix-koans/shell-tools.html
Looking at the code, the line
mget ls -t -r | tail -n 1
doesn't do what you think. It actually grabs all of the output of ls -t and then tail processes the output of mget. You could replace this line with
mget $(ls -t -r | tail -n 1)
but I am not sure if ftp will support such a call...
Try using an FTP client other than ftp. For example, curlftpfs available at curlftpfs.sourceforge.net is a good candidate as it allows you to mount an FTP to a directory as if it is a local folder and then run different commands on the files there (including find, grep, etc.). Take a look at this article.
This way, since the output comes form a local command, you'd be more certain that ls -t returns a properly sorted list.
Btw, it's a bit less convoluted to use ls -t | head -1 than ls -t -r | tail -1. They produce the same result but why reverse and grab from the tail when you can just grab the head :)
If you use curlftpfs then your script would be something like this (assuming server ftp.abc.com and user xyz with password xyz).
mkdir /tmp/ftpsession
curlftpfs ftp://xyz:xyz#ftp.abc.com /tmp/ftpsession
cd /tmp/ftpsession/Rpts
cp -Rpf $(ls -t | head -1) /your/destination/folder/or/file
cd -
umount /tmp/ftpsession
My Solution is this:
curl 'ftp://server.de/dir/'$(curl 'ftp://server.de/dir/' 2>/dev/null | tail -1 | awk '{print $(NF)}')

Question on grep

Out of many results returned by grepping a particular pattern, if I want to use all the results one after the other in my script, how can I go about it?For e.g. I grep for .der in a certificate folder which returns many results. I want to use each and every .der certificate listed from the grep command. How can I use one file after the other out of the grep result?
Are you actually grepping content, or just filenames? If it's file names, you'd be better off using the find command:
find /path/to/folder -name "*.der" -exec some other commands {} ";"
It should be quicker in general.
One way is to use grep -l. This ensures you only get every file once. -l is used to print the name of each file only, not the matches.
Then, you can loop on the results:
for file in `grep ....`
do
# work on $file
done
Also note that if you have spaces in your filenames, there is a ton of possible issues. See Looping through files with spaces in the names on the Unix&Linux stackexchange.
You can use the output as part of a for loop, something like:
for cert in $(grep '\.der' *) ; do
echo ${cert} # or something else
done
Of course, if those der things are actually files (and you're using ls | grep to get them), you can directly use the files:
for cert in *.der ; do
echo ${cert} # or something else
done
In both cases, you may need to watch out for arguments with embedded spaces.

How to execute a command with one parameter at a time in the *nix shell?

Some commands like svn log, for example will only take one input from the command line, so I can't say grep 'pattern' | svn log. It will only return the information for the first file, so I need to execute svn log against each one independently.
I can do this with find using it's exec option: find -name '*.jsp' -exec svn log {} \;. However, grep and find provide differently functionality, and the -exec option isn't available for grep or a lot of other tools.
So is there a generalized way to take output from a unix command line tool and have it execute an arbitrary command against each individual output independent of each other like find does?
The answer is xargs -n 1.
echo moo cow boo | xargs -n 1 echo
outputs
moo
cow
boo
try xargs:
grep 'pattern' | xargs svn log
A little one off shell script (using xargs is much better for a one off, that's why it exists)
#!/bin/sh
# Shift past argv[0]
shift 1
for file in "$#"
do
svn log $file
done
You could name it 'multilog' or something like that. Call it like this:
./multilog.sh foo.c abc.php bar.h Makefile
It allows for a little more sanity when being called by automated build scripts, i.e. test the existence of each before talking to SVN, or redirect each output to a separate file, insert it into a sqlite database, etc.
That may or may not be what you are looking for.

How to use "xargs" properly when argument list is too long

Can someone please give me an example of using xargs on the below operation?
tar c $dir/temp/*.parse\
| lzma -9 > $dir/backup/$(date '+%Y-%m-%d')-archive.tar.lzma
I get the error from bash "/bin/tar: Argument list too long"
Particularily I am trying to do LZMA compression on about 4,500 files; so this isn't surprising. I just don't know how to modify the above to use xargs and get rid of the error! Thanks.
Expanding on CristopheDs answer and assuming you're using bash:
tar c --files-from <(find $dir/temp -maxdepth 1 -name "*.parse") | lzma -9 > $dir/backup/$(date '+%Y-%m-%d')-archive.tar.lzma
The reason xargs doesn't help you here is that it will do multiple invocations until all arguments have been used. This won't help you here since that will create several tar archives, which you don't want.
As a side note:
Always, always avoid xargs(1). It's a broken tool and is only remotely useful if you use it with the -0 option. Even then, it's almost always better to use find(1)'s -exec option, or a simple for or while loop.
Why is xargs so bad? Firstly, it splits input on whitespace, meaning all your filenames that contain whitespace will cause havoc. Secondly, it tries to be a smartass, and parse quotes for you. Which only leads to more headaches as you use it on filenames that contain quotes as part of the filename, such as songs: "I don't wanna miss a thing.mp3". This will just make xargs puke and curse at you that you quoted its input badly. No, you didn't: it should just learn that quotes do not belong in input data, they belong in shell scripts, and xargs has no business trying to parse them.
Mind you, xargs(1) doesn't do the whitespace splitting or the quote parsing when you pass -0 to it. It does the right thing, which is use NULL bytes to delimit filenames. But that means you need to give it input that uses NULL byte-delimited filenames (such as "find -foo -print0"). Which brings us back to: it's better to just use find's -exec: "find -foo -exec bar {} +".
Use find to pipe the 'wanted' filenames to a temporary file,
and then use tar with the '-–files-from' command line option.
Edit:
or pipe them directly into each other to avoid the temporary file.
So: use find to list the wanted filenames | tar ... --files-from
Perhaps you want something like this?
find $dir/temp/ -name '*.parse' -print0 | tar --null -T - -c | lzma -9 > $dir/backup/$(date '+%Y-%m-%d')-archive.tar.lzma
Since you're on Linux, you have GNU tar, and you can use the '-F -' or '--files-from' option to read the file names.
However, you can also use:
--use-compress-program="lzma -9"
to specify the compression program to use, and simply give the compresses file name as the target file for the 'tar' command.
This was necessary for 'bzip2' compression before the '-j' option was added.

Resources