Rsync directories into a flatter structure - linux

I'm looking for a way to flatten directories from a /year/month/day/directory format to just /directory via rsync
The source directories, containing data files, are formatted like this:
/year/month/day/round-number/files
Source:
/2018/06/01/round-1111/(files)
/2018/06/01/round-1112/(etc)
/2018/06/01/round-1113
/2018/06/02/round-1114
/2018/06/02/round-1115
/2018/06/02/round-1116
/2018/06/03/round-1117
/2018/06/03/round-1118
/2018/06/03/round-1119
I need them to come out like this at the Destination:
/round-1111/(files)
/round-1112/(etc)
/round-1113
/round-1114
/round-1115
/round-1116
/round-1117
/round-1118
/round-1119
The command I'm using right now is basically "rsync -a source destination"
I'd like to keep the processing load of this command low, as it needs to run frequently while also not disturbing the source server too much.

Related

How to solve 'ascp: "user#host:" in all sources must match' when download SRA data with linux?

I'm running the command -ascp -v -i ~/.aspera/connect/etc/asperaweb_id_dsa.openssh -k 1 -T -l200m anonftp#ftp-private.ncbi.nlm.nih.gov:/sra/sra-instant/reads/ByRun/sra/SRR/SRR590/SRR5907429 /SRR5907429 .sra ~/sra_download with Linux
and I get this error -
"user#host:" in all sources must match
What does this mean?How to solve it?
First,"-private"should be removed.Secondly,need to correct the space error in the sentence,example "SRR5907429 ".'ascp -v -i ~/.aspera/connect/etc/asperaweb_id_dsa.openssh -k 1 -T -l200m anonftp#ftp.ncbi.nlm.nih.gov:/sra/sra-instant/reads/ByRun/sra/SRR/SRR590/SRR5907429/SRR5907429.sra ~/sra_download'is the correct answer we need.enter image description here
Your problem:
the ascp syntax is:
Usage: ascp [OPTION] SRC... DEST
SRC to DEST, or multiple SRC to DEST dir
SRC, DEST format: [[user#]host:]PATH
Display full usage: -h,--help
You get this by simply executing ascp, get more with "ascp -h" and have a manual for it as well, or https://download.asperasoft.com/download/docs/entsrv/3.9.1/es_admin_linux/webhelp/index.html#dita/ascp_2.html
it is pretty much like "scp", but works also in "pull" mode.
so, you have:
options then one or multiple sources, then a single destination (always the last argument).
if the destination is: user#server:folder, then you do a push
if source is user#server:folder, then you do a pull
globally, you can only do a push or a pull at the same time. but there can be multiple sources, and always a single destination (on command line).
in you case you have:
options: -v -i ~/.aspera/connect/etc/asperaweb_id_dsa.openssh -k 1 -T -l200m
sources: anonftp#ftp-private.ncbi.nlm.nih.gov:/sra/sra-instant/reads/ByRun/sra/SRR/SRR590/SRR5907429 /SRR5907429 .sra
destination:~/sra_download
the first source is: anonftp#ftp-private.ncbi.nlm.nih.gov:/sra/sra-instant/reads/ByRun/sra/SRR/SRR590/SRR5907429
the other sources are: /SRR5907429 .sra
so you specify one remote source, two local sources, and one local destination.
This is the error you get.
My advice:
do not use the legacy syntax, as you did, but instead, use the advanced syntax:
ascp [options] --mode=<send|recv> --user=<user> --host=<server> sources... destination
There are plenty of options, for instance, if all your source files are in the same folder, you can use: --source-prefix=
you can also use file list file (i.e. a file that contains the list of files you want to transfer, in case it is long and generated by a script) or even file par list file.
Note also, that there is an interesting front end for aspera command line transfers:
https://www.rubydoc.info/gems/asperalm

rsync is nesting the source directory in the destination as if it had no trailing slash when --files-from option is used

Pulling my hair out here trying to get this to work. Heres an example of the details and command.
I have a file with a list of directories named list.txt The contents look like this:
HYTTCCCXX
HYTVNCCXX
HYV5TCCXX
My rsync command looks like:
rsync -av --recursive --files-from='/tmp/list.txt' /test/apple/ /destination/files/
The issue is that when I run the command, it includes both
/test/ (which is an autofs top level, so contains nothing really) and /test/apple/ in the files to be transferred. Causing the files to be written twice into the destination as if I left the trailing slash off my source.
So the destination ends up with both the directories in the list, and another copy of the source like:
/destination/files/HYW22CCXX
/destination/files/HYTVNCCXX
/destination/files/HYV5TCCXX
/destination/files/test/apple/HYW22CCXX
/destination/files/test/apple/HYW22CCXX
/destination/files/test/apple/HYTVNCCXX
So I end up with 2 copies of everything.
Ive tried every combination of exclude like --exclude='/test/apple/' or --exclude='/test/* or --exclude='apple/* to try and keep it from being included. But nothing works.
Any ideas? Im going bananas trying to figure this out.
Thank you!
This is due to the fact that the --files-from option implies --relative.
Quote from the rsync man page, the section on --files-from:
The --relative (-R) option is implied, which preserves the path information that is specified for each item in the file (use --no-relative or --no-R if you want to turn that off).
Try the following options and see if it helps:
rsync -av --recursive --no-relative --files-from='/tmp/list.txt' /test/apple/ /destination/files/

How to keep directory structure with aria2?

I need to download files simultaneously- wget doesn't support that so I want to try aria2. But I don't see an option in aria2 to keep directory structure.
Determine the directory structure first,
then build and use a download description file:
aria2c -i uri.txt
where uri.txt might contain
http://serverA/file1.iso http://mirror-serverB/file1.iso
# parameters must begin with a space, otherwise it's treatened as url!
dir=/downloads/a
# not mandatory
out=file1.iso
http://serverA/file2.iso http://mirror-serverB/file2.iso
dir=/downloads/b
out=file2.iso
Keep in mind that aria2 is a download util - not an sync util, like rsync or lftp.
Referencing an rsync answer: https://stackoverflow.com/a/4147263/1163786
and an lftp answer: https://superuser.com/a/305236.

Linux rename files based on input file

I need to rename hundreds of files in Linux to change the unique identifier of each from the command line. For sake of examples, I have a file containing:
old_name1 new_name1
old_name2 new_name2
and need to change the names from new to old IDs. The file names contain the IDs, but have extra characters as well. My plan is therefore to end up with:
abcd_old_name1_1234.txt ==> abcd_new_name1_1234.txt
abcd_old_name2_1234.txt ==> abcd_new_name2_1234.txt
Use of rename is obviously fairly helpful here, but I am struggling to work out how to iterate through the file of the desired name changes and pass this as input into rename?
Edit: To clarify, I am looking to make hundreds of different rename commands, the different changes that need to be made are listed in a text file.
Apologies if this is already answered, I've has a good hunt, but can't find a similar case.
rename 's/^(abcd_)old_name(\d+_1234\.txt)$/$1new_name$2/' *.txt
Should work, depending on whether you have that package installed. Also have a look at qmv (rename-utils)
If you want more options, use e.g.
shopt -s globstart
rename 's/^(abcd_)old_name(\d+_1234\.txt)$/$1new_name$2/' folder/**/*.txt
(finds all txt files in subdirectories of folder), or
find folder -type f -iname '*.txt' -exec rename 's/^(abcd_)old_name(\d+_1234\.txt)$/$1new_name$2/' {} \+
To do then same using GNU find
while read -r old_name new_name; do
rename "s/$old_name/$new_name/" *$old_name*.txt
done < file_with_names
In this way, you read the IDs from file_with_names and rename the files replacing $old_name with $new_name leaving the rest of the filename untouched.
I was about to write a php function to do this for myself, but I came upon a faster method:
ls and copy & paste the directory contents into excel from the terminal window. Perhaps you may need to use on online line break removal or addition tool. Assume that your file names are in column A In excel, use the following formula in another column:
="mv "&A1&" prefix"&A1&"suffix"
or
="mv "&A1&" "&substitute(A1,"jpeg","jpg")&"suffix"
or
="mv olddirectory/"&A1&" newdirectory/"&A1
back in Linux, create a new file with
nano rename.txt and paste in the values from excel. They should look something like this:
mv oldname1.jpg newname1.jpg
mv oldname1.jpg newname2.jpg
then close out of nano and run the following command:
bash rename.txt. Bash just runs every line in the file as if you had typed it.
and you are done! This method gives verbose output on errors, which is handy.

Using Rsync include and exclude options to include directory and file by pattern

I'm having problems getting my rsync syntax right and I'm wondering if my scenario can actually be handled with rsync. First, I've confirmed that rsync is working just fine between my local host and my remote host. Doing a straight sync on a directory is successful.
Here's what my filesystem looks like:
uploads/
1260000000/
file_11_00.jpg
file_11_01.jpg
file_12_00.jpg
1270000000/
file_11_00.jpg
file_11_01.jpg
file_12_00.jpg
1280000000/
file_11_00.jpg
file_11_01.jpg
file_12_00.jpg
What I want to do is run rsync only on files that begin with "file_11_" in the subdirectories and I want to be able to run just one rsync job to sync all of these files in the subdirectories.
Here's the command that I'm trying:
rsync -nrv --include="**/file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/
This results in 0 files being marked for transfer in my dry run. I've tried various other combinations of --include and --exclude statements, but either continued to get no results or got everything as if no include or exclude options were set.
Anyone have any idea how to do this?
The problem is that --exclude="*" says to exclude (for example) the 1260000000/ directory, so rsync never examines the contents of that directory, so never notices that the directory contains files that would have been matched by your --include.
I think the closest thing to what you want is this:
rsync -nrv --include="*/" --include="file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/
(which will include all directories, and all files matching file_11*.jpg, but no other files), or maybe this:
rsync -nrv --include="/[0-9][0-9][0-9]0000000/" --include="file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/
(same concept, but much pickier about the directories it will include).
rsync include exclude pattern examples:
"*" means everything
"dir1" transfers empty directory [dir1]
"dir*" transfers empty directories like: "dir1", "dir2", "dir3", etc...
"file*" transfers files whose names start with [file]
"dir**" transfers every path that starts with [dir] like "dir1/file.txt", "dir2/bar/ffaa.html", etc...
"dir***" same as above
"dir1/*" does nothing
"dir1/**" does nothing
"dir1/***" transfers [dir1] directory and all its contents like "dir1/file.txt", "dir1/fooo.sh", "dir1/fold/baar.py", etc...
And final note is that simply dont rely on asterisks that are used in the beginning for evaluating paths; like "**dir" (its ok to use them for single folders or files but not paths) and note that more than two asterisks dont work for file names.
Here's my "teach a person to fish" answer:
Rsync's syntax is definitely non-intuitive, but it is worth understanding.
First, use -vvv to see the debug info for rsync.
$ rsync -nr -vvv --include="**/file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/
[sender] hiding directory 1280000000 because of pattern *
[sender] hiding directory 1260000000 because of pattern *
[sender] hiding directory 1270000000 because of pattern *
The key concept here is that rsync applies the include/exclude patterns for each directory recursively. As soon as the first include/exclude is matched, the processing stops.
The first directory it evaluates is /Storage/uploads. Storage/uploads has 1280000000/, 1260000000/, 1270000000/ dirs/files. None of them match file_11*.jpg to include. All of them match * to exclude. So they are excluded, and rsync ends.
The solution is to include all dirs (*/) first. Then the first dir component will be 1260000000/, 1270000000/, 1280000000/ since they match */. The next dir component will be 1260000000/. In 1260000000/, file_11_00.jpg matches --include="file_11*.jpg", so it is included. And so forth.
$ rsync -nrv --include='*/' --include="file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/
./
1260000000/
1260000000/file_11_00.jpg
1260000000/file_11_01.jpg
1270000000/
1270000000/file_11_00.jpg
1270000000/file_11_01.jpg
1280000000/
1280000000/file_11_00.jpg
1280000000/file_11_01.jpg
https://download.samba.org/pub/rsync/rsync.1

Resources