Using RSync to copy a sequential range of files - linux

Sorry if this makes no sense, but I will try to give all the information needed!
I would like to use rsync to copy a range of sequentially numbered files from one folder to another.
I am archiving a DCDM (Its a film thing) and it contains in the order of 600,000 individually numbered, sequential .tif image files (~10mb ea.).
I need to break this up to properly archive onto LTO6 tapes. And I would like to use rsync to prep the folders such that my simple bash .sh file can automate the various folders and files that I want to back up to tape.
The command I normally use when running rsync is:
sudo rsync -rvhW --progress --size only <src> <dest>
I use sudo if needed, and I always test the outcome first with --dry-run
The only way I’ve got anything to work (without kicking out errors) is by using the * wildcard. However, this only does files with the set pattern (eg. 01* will only move files from the range 010000 - 019999) and I would have to repeat for 02, 03, 04 etc..
I've looked on the internet, and am struggling to find an answer that works.
This might not be possible, and with 600,000 .tif files, I can't write an exclude for each one!
Any thoughts as to how (if at all) this could be done?
Owen.

You can check for the file name starting with a digit by using pattern matching:
for file in [0-9]*; do
# do something to $file name that starts with digit
done
Or, you could enable the extglob option and loop over all file names that contain only digits. This could eliminate any potential unwanted files that start with a digit but contain non-digits after the first character.
shopt -s extglob
for file in +([0-9]); do
# do something to $file name that contains only digits
done
+([0-9]) expands to one or more occurrence of a digit
Update:
Based on the file name pattern in your recent comment:
shopt -s extglob
for file in legendary_dcdm_3d+([0-9]).tif; do
# do something to $file
done

Globing is the feature of the shell to expand a wildcard to a list of matching file names. You have already used it in your question.
For the following explanations, I will assume we are in a directory with the following files:
$ ls -l
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 file.txt
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 funny_cat.jpg
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2013-1.pdf
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2013-2.pdf
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2013-3.pdf
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2013-4.pdf
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2014-1.pdf
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2014-2.pdf
The most simple case is to match all files. The following makes for a poor man's ls.
$ echo *
file.txt funny_cat.jpg report_2013-1.pdf report_2013-2.pdf report_2013-3.pdf report_2013-4.pdf report_2014-1.pdf report_2014-2.pdf
If we want to match all reports from 2013, we can narrow the match:
$ echo report_2013-*.pdf
report_2013-1.pdf report_2013-2.pdf report_2013-3.pdf report_2013-4.pdf
We could, for example, have left out the .pdf part but I like to be as specific as possible.
You have already come up with a solution to use this for selecting a range of numbered files. For example, we can match reports by quater:
$ for q in 1 2 3 4; do echo "$q. quater: " report_*-$q.pdf; done
1. quater: report_2013-1.pdf report_2014-1.pdf
2. quater: report_2013-2.pdf report_2014-2.pdf
3. quater: report_2013-3.pdf
4. quater: report_2013-4.pdf
If we are to lazy to type 1 2 3 4, we could have used $(seq 4) instead. This invokes the program seq with argument 4 and substitutes its output (1 2 3 4 in this case).
Now back to your problem: If you want chunk sizes that are a power of 10, you should be able to extend the above example to fit your needs.

old question i know, but someone may find this useful. the above examples for expanding a range also work with rsync. for example to copy files starting with a, b and c but not d and e from dir /tmp/from_here to dir /tmp/to_here:
$ rsync -avv /tmp/from_here/[a-c]* /tmp/to_here
sending incremental file list
delta-transmission disabled for local transfer or --whole-file
alice/
bob/
cedric/
total: matches=0 hash_hits=0 false_alarms=0 data=0
sent 89 bytes received 24 bytes 226.00 bytes/sec
total size is 0 speedup is 0.00

If you are writing to LTO6 tapes, you should consider including "--inplace" to your command. Inplace is meant for writing to linear filesystems such as LTO

Related

rename first part of the filename in linux?

I have lot of files starting with processConfig-. I wanted to rename it to processCfg-. What's the easy way to change the first part of file name to processCfg- in linux?
But I don't want to rename this file processConfig.json since it doesn't match with my prefix.
> ls -lrth
total 467
-rw-r--r-- 1 david staff 9.8K May 26 15:14 processConfig-data-1234.json
-rw-r--r-- 1 david staff 11K May 26 15:14 processConfig-data-8762.json
-rw-r--r-- 1 david staff 4.9K May 26 15:14 processConfig-dataHold-1.json
-rw-r--r-- 1 david staff 6.6K May 26 15:14 processConfig-letter.json
-rw-r--r-- 1 david staff 5.6K May 26 16:44 processConfig-data-90987.json
-rw-r--r-- 1 david staff 284K May 28 18:44 processConfig.json
Like this :
rename -n 's/^processConfig-/processCfg-/' processConfig-*.json
Remove -n switch when the output looks good to rename for real.
man rename
There are other tools with the same name which may or may not be able to do this, so be careful.
The rename command that is part of the util-linux package, won't.
If you run the following command (GNU)
$ file "$(readlink -f "$(type -p rename)")"
and you have a result that contains Perl script, ASCII text executable and not containing ELF, then this seems to be the right tool =)
If not, to make it the default (usually already the case) on Debian and derivative like Ubuntu :
$ sudo apt install rename
$ sudo update-alternatives --set rename /usr/bin/file-rename
If you don't have this command with another distro, search your package manager to install it or do it manually (no deps...)
This tool was originally written by Larry Wall, the Perl's dad.

cp command to handle empty directories (resulting in diff file sizes)

I am trying to copy directories (& files) recursively from one directory to another.
I tried the following -
rsync -avz <source> <target>
cp -ruT <source> <taret>
Both were successful. but, when i try to compare the sizes using (du -c), the empty directories seem to have mismatch in size.
In target directory
drwxrwxr-x 2 abc devl 4096 Jun 9 01:25 .
drwxrwxr-x 4 abc devl 4096 Jul 20 07:46 ..
In source directory
drwxrwxr-x 2 prod ops 2 Jun 9 01:25 .
drwxrwxr-x 4 prod ops 36 Jul 20 07:46 ..
Is there a special way to handle this? diff -qr doesn't show any differences though.
Thanks for your help.
Are both folders on the same volume? If not chances are that the sector size for those volumes are different and in turn the inode sizes differ. In case of diff it's just looking at whenever or not the directory exists and if it contains the corresponding files. It's similar in how diff doesn't include permission differences because those might be pretty system specific.
A pretty comprehensive answer can be found here: Why size reporting for directories is different than other files?

Why doesn't grep work if a file is not specified?

I have some problem with the Linux grep command, it don't work !!!
I am trying the following test on my Ubuntu system:
I have create the following folder: /home/andrea/Scrivania/prova
Inside this folder I have created a txt file named prova.txt and inside this file I have write the string test and I have save it
In the shell I have first access the folder /home/andrea/Scrivania/prova and so I have launched the grep command in the following way:
~/Scrivania/prova$ grep test
The problem is that the cursor continues to blink endlessly and cannot find NOTHING! Why? What is the problem?
You've not provided files for the grep command to scan
grep "test" *
or for recursive
grep -r "test" *
Because grep searches standard input if no files are given. Try this.
grep test *
You are not running the command you were looking for.
grep test * will look for test in all files in your current directory.
grep test prova.txt will look for test specifically in prova.txt
(grep test will grep the test string in stdin, and will not return until EOF.)
You need to pipe in something to grep - you cant just call grep test without any other arguments as it is actually doing nothing. try grep test *
Another use for grep is to pipe in a command
e.g. This is my home directory:
drwx------+ 3 oliver staff 102 12 Nov 21:57 Desktop
drwx------+ 10 oliver staff 340 17 Nov 18:34 Documents
drwx------+ 17 oliver staff 578 20 Nov 18:57 Downloads
drwx------# 12 oliver staff 408 13 Nov 20:53 Dropbox
drwx------# 52 oliver staff 1768 11 Nov 12:05 Library
drwx------+ 3 oliver staff 102 12 Nov 21:57 Movies
drwx------+ 5 oliver staff 170 17 Nov 10:40 Music
drwx------+ 3 oliver staff 102 20 Nov 19:17 Pictures
drwxr-xr-x+ 4 oliver staff 136 12 Nov 21:57 Public
If i run
l | grep Do
I get the result
drwx------+ 10 oliver staff 340 17 Nov 18:34 Documents
drwx------+ 17 oliver staff 578 20 Nov 18:57 Downloads
remember to pipe the grep command
From grep man page:
Grep searches the named input FILEs (or standard input if no files
are
named, or the file name - is given) for lines containing a match to the
given PATTERN.
If you don't provide file name(s) for it to use, it will try to read from stdin.
Try grep test *
As per GNU Grep 3.0
A file named - stands for standard input. If no input is specified,
grep searches the working directory . if given a command-line
option specifying recursion; otherwise, grep searches standard input.
So for OP's command, without any additional specification, grep tries to search in standard input, which is not actually provided there.
A simple approach is grep -r [pattern], as per the above, to specify recursion with -r and search in current directory and sub-directories.
Also note that wildcard * only includes files, not directories. If used, a prompt might be shown for hint:
grep: [directory_name]: Is a directory

Basic Unix refresher inquiry: ls -ld

I know this is really basic, but I cannot find this information
in the ls man page, and need a refresher:
$ ls -ld my.dir
drwxr-xr-x 1 smith users 4096 Oct 29 2011 my.dir
What is the meaning of the number 1 after drwxr-xr-x ?
Does it represent the number of hard links to the direcory my.dir?
I cannot remember. Where can I find this information?
Thanks,
John Goche
I found it on Wikipedia:
duuugggooo (hard link count) owner group size modification_date name
The number is the hard link count.
If you want a more UNIXy solution, type info ls. This gives more detailed information including:
`-l'
`--format=long'
`--format=verbose'
In addition to the name of each file, print the file type, file
mode bits, number of hard links, owner name, group name, size, and
timestamp (*note Formatting file timestamps::), normally the
modification time. Print question marks for information that
cannot be determined.
That is the number of named (hard links) of the file. And I suppose, there is an error here. That must be at least 2 here for a directory.
$ touch file
$ ls -l
total 0
-rw-r--r-- 1 igor igor 0 Jul 15 10:24 file
$ ln file file-link
$ ls -l
total 0
-rw-r--r-- 2 igor igor 0 Jul 15 10:24 file
-rw-r--r-- 2 igor igor 0 Jul 15 10:24 file-link
$ mkdir a
$ ls -l
total 0
drwxr-xr-x 2 igor igor 40 Jul 15 10:24 a
-rw-r--r-- 2 igor igor 0 Jul 15 10:24 file
-rw-r--r-- 2 igor igor 0 Jul 15 10:24 file-link
As you can see, as soon as you make a directory, you get 2 at the column.
When you make subdirectories in a directory, the number increases:
$ mkdir a/b
$ ls -ld a
drwxr-xr-x 3 igor igor 60 Jul 15 10:41 a
As you can see the directory has now three names ('a', '.' in it, and '..' in its subdirectory):
$ ls -id a ; cd a; ls -id .; ls -id b/..
39754633 a
39754633 .
39754633 b/..
All these three names point to the same directory (inode 39754633).
Trying to explain why for directory the initial link count value =2.
Pl. see if this helps.
Any file/directory is indentified by an inode.
Number of Hard Links = Number of references to the inode.
When a directory/file is created, one directory entry (of the
form - {myname, myinodenumber}) is created in the parent directory.
This makes the reference count of the inode for that file/directory =1.
Now when a directory is created apart from this the space for directory is also created which by default should be having two directory entries
one for the directory which is created and another for the
parent directory that is two entries of the form {., myinodenumber}
and {.., myparent'sinodenumber}.
Current directory is referred by "." and the parent is referred by ".." .
So when we create a directory the initial number of Links' value = 1+1=2,
since there are two references to myinodenumber. And the parent's number
of link value is increased by 1.

Backup files on webserver ! and ~

My LAMP web server renders backup files like these:
!index.php
!~index.php
bak.index.php
Copy%20of%20index.php
I tried deleting with rm but it cannot find the files.
Does this have something to do with bash or vim? How can this be fixed?
Escape the characters (with a backslash) like so:
[ 09:55 jon#hozbox.com ~/t ]$ ll
total 0
-rw-r--r-- 1 jon people 0 Nov 27 09:55 !abc.html
-rw-r--r-- 1 jon people 0 Nov 27 09:55 ~qwerty.php
[ 09:55 jon#hozbox.com ~/t ]$ rm -v \!abc.html \~qwerty.php
removed '!abc.html'
removed '~qwerty.php'
[ 09:56 jon#hozbox.com ~/t ]$ ll
total 0
[ 09:56 jon#hozbox.com ~/t ]$
Another way to do that, other than the one suggested by chown, is write the filenames within "".
Example:
rm "!abc.html" "~qwerty.php"
If you don't like the special treatment of the character !, use set +H in your shell to turn of history expansion. See section 'HISTORY EXPANSION' in man bash for more information.
Interestingly, I can delete files starting with ~ without having to escape the file names.

Resources