cp command to handle empty directories (resulting in diff file sizes) - linux

I am trying to copy directories (& files) recursively from one directory to another.
I tried the following -
rsync -avz <source> <target>
cp -ruT <source> <taret>
Both were successful. but, when i try to compare the sizes using (du -c), the empty directories seem to have mismatch in size.
In target directory
drwxrwxr-x 2 abc devl 4096 Jun 9 01:25 .
drwxrwxr-x 4 abc devl 4096 Jul 20 07:46 ..
In source directory
drwxrwxr-x 2 prod ops 2 Jun 9 01:25 .
drwxrwxr-x 4 prod ops 36 Jul 20 07:46 ..
Is there a special way to handle this? diff -qr doesn't show any differences though.
Thanks for your help.

Are both folders on the same volume? If not chances are that the sector size for those volumes are different and in turn the inode sizes differ. In case of diff it's just looking at whenever or not the directory exists and if it contains the corresponding files. It's similar in how diff doesn't include permission differences because those might be pretty system specific.
A pretty comprehensive answer can be found here: Why size reporting for directories is different than other files?

Related

file zip/tar in linux at specific location

I want to zip a set of directories and files on my centos 8 VM.
There are 3 directories and 1 file which I want to zip in such a way that only env.conf file will move to /etc/env.txt after unzipping it and remaining directories will be unzipped at current location.
Is there any way to achieve this.
drwxr-xr-x. 9 root root 114 Feb 25 12:40 config
-rw-r--r--. 1 root root 340 Feb 25 09:01 env.conf
drwxr-xr-x. 9 root root 4096 Feb 28 05:11 platform
drwxr-xr-x. 2 root root 135 Feb 28 07:49 install
I don't think this is possible. in fact this is considered a vulnerability if you could do that.
Imagine you download a zip file from some website. and after you unzip it in a temp folder. It registers itself as a service by writing a file in /etc somewhere, and gets control over your pc.
Example: zip-slip
You could however create a one-liner that extracts and moves the file wherever you want like this:
unzip <filename> && mv env.conf /etc/env.txt

Using RSync to copy a sequential range of files

Sorry if this makes no sense, but I will try to give all the information needed!
I would like to use rsync to copy a range of sequentially numbered files from one folder to another.
I am archiving a DCDM (Its a film thing) and it contains in the order of 600,000 individually numbered, sequential .tif image files (~10mb ea.).
I need to break this up to properly archive onto LTO6 tapes. And I would like to use rsync to prep the folders such that my simple bash .sh file can automate the various folders and files that I want to back up to tape.
The command I normally use when running rsync is:
sudo rsync -rvhW --progress --size only <src> <dest>
I use sudo if needed, and I always test the outcome first with --dry-run
The only way I’ve got anything to work (without kicking out errors) is by using the * wildcard. However, this only does files with the set pattern (eg. 01* will only move files from the range 010000 - 019999) and I would have to repeat for 02, 03, 04 etc..
I've looked on the internet, and am struggling to find an answer that works.
This might not be possible, and with 600,000 .tif files, I can't write an exclude for each one!
Any thoughts as to how (if at all) this could be done?
Owen.
You can check for the file name starting with a digit by using pattern matching:
for file in [0-9]*; do
# do something to $file name that starts with digit
done
Or, you could enable the extglob option and loop over all file names that contain only digits. This could eliminate any potential unwanted files that start with a digit but contain non-digits after the first character.
shopt -s extglob
for file in +([0-9]); do
# do something to $file name that contains only digits
done
+([0-9]) expands to one or more occurrence of a digit
Update:
Based on the file name pattern in your recent comment:
shopt -s extglob
for file in legendary_dcdm_3d+([0-9]).tif; do
# do something to $file
done
Globing is the feature of the shell to expand a wildcard to a list of matching file names. You have already used it in your question.
For the following explanations, I will assume we are in a directory with the following files:
$ ls -l
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 file.txt
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 funny_cat.jpg
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2013-1.pdf
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2013-2.pdf
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2013-3.pdf
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2013-4.pdf
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2014-1.pdf
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2014-2.pdf
The most simple case is to match all files. The following makes for a poor man's ls.
$ echo *
file.txt funny_cat.jpg report_2013-1.pdf report_2013-2.pdf report_2013-3.pdf report_2013-4.pdf report_2014-1.pdf report_2014-2.pdf
If we want to match all reports from 2013, we can narrow the match:
$ echo report_2013-*.pdf
report_2013-1.pdf report_2013-2.pdf report_2013-3.pdf report_2013-4.pdf
We could, for example, have left out the .pdf part but I like to be as specific as possible.
You have already come up with a solution to use this for selecting a range of numbered files. For example, we can match reports by quater:
$ for q in 1 2 3 4; do echo "$q. quater: " report_*-$q.pdf; done
1. quater: report_2013-1.pdf report_2014-1.pdf
2. quater: report_2013-2.pdf report_2014-2.pdf
3. quater: report_2013-3.pdf
4. quater: report_2013-4.pdf
If we are to lazy to type 1 2 3 4, we could have used $(seq 4) instead. This invokes the program seq with argument 4 and substitutes its output (1 2 3 4 in this case).
Now back to your problem: If you want chunk sizes that are a power of 10, you should be able to extend the above example to fit your needs.
old question i know, but someone may find this useful. the above examples for expanding a range also work with rsync. for example to copy files starting with a, b and c but not d and e from dir /tmp/from_here to dir /tmp/to_here:
$ rsync -avv /tmp/from_here/[a-c]* /tmp/to_here
sending incremental file list
delta-transmission disabled for local transfer or --whole-file
alice/
bob/
cedric/
total: matches=0 hash_hits=0 false_alarms=0 data=0
sent 89 bytes received 24 bytes 226.00 bytes/sec
total size is 0 speedup is 0.00
If you are writing to LTO6 tapes, you should consider including "--inplace" to your command. Inplace is meant for writing to linear filesystems such as LTO

anacron script in cron.daily not running via symlink

What can I do to make this script run daily?
If I manually run the script, it works. I can see that it did what it's supposed to do. (backup files) However, it will not run as a cron.daily script. I've let it go for days without touching it -- and it never runs.
The actual script is here /var/www/myapp/backup.sh
There is a symlink to it here /etc/cron.daily/myapp_backup.sh -> /var/www/myapp/backup.sh
The cron log at /var/log/cron shows anacron running this script:
Aug 19 03:09:01 ip-123-456-78-90 anacron[31537]: Job `cron.daily' started
Aug 19 03:09:01 ip-123-456-78-90 run-parts(/etc/cron.daily)[31545]: starting myapp_backup.sh
Aug 19 03:09:01 ip-123-456-78-90 run-parts(/etc/cron.daily)[31559]: finished myapp_backup.sh
Yet there is no evidence that the script actually did anything.
Here is the security info on these files:
ls -la /var/cron.daily
<snip>
lrwxrwxrwx 1 root root 25 Aug 12 21:18 myapp_backup.sh -> /var/www/myapp/backup.sh
</snip>
ls -la /var/www/myapp
<snip>
drwxr-xr-x 2 root root 4096 Aug 13 13:55 .
drwxr-xr-x 10 root root 4096 Jul 12 01:00 ..
-rwxr-xr-x 1 root root 407 Aug 12 23:37 backup.sh
-rw-r--r-- 1 root root 33 Aug 12 21:13 list.txt
</snip>
The file called list.txt is used by backup.sh.
The script just runs tar to create an archive.
From the cron manpage of a debian/ubuntu system:
the files under these directories have to be pass some sanity checks including the following: be executable, be owned by root, not be writable by group or other and, if symlinks, point to files owned by root. Additionally, the file names must conform to the filename requirements of run-parts: they must be entirely made up of letters, digits and can only contain the special signs underscores ('_') and hyphens ('-'). Any file that does not conform to these requirements will not be executed by run-parts. For example, any file containing dots will be ignored.
So:
file need to be owned by root
if symlink, the source file need to be owned by root
if symlink, the link name should NOT contain dots
I had a similar situation with cron.hourly and awstats processing.
I THINK it is related to SELinux and anacron not having the same powers/permissions as cron.
The ACTUAL solution defeated me (so far).
MY WORKAROUND SOLUTION: Run the job via root's cron entries (crontab -e ) and simply schedule it hourly.

Crontab isn't running

My crontab isn't running and I'm trying to figure out why. I've created a symbolic link within /etc/cron.d to /var/www/mysite.crontab
user#ip-xxxxxxxxxx:/etc/cron.d$ ll
total 20
drwxr-xr-x 2 root root 4096 Apr 11 03:48 ./
drwxr-xr-x 96 root root 4096 Apr 16 00:50 ../
lrwxrwxrwx 1 root root 30 Apr 11 03:47 mysite.crontab -> /var/www/mysite.crontab
-rw-r--r-- 1 root root 124 Feb 27 2012 drupal7
-rw-r--r-- 1 root root 544 Sep 12 2012 php5
-rw-r--r-- 1 root root 102 Apr 2 2012 .placeholder
The actual cron file is...
#Purge old deals
4 1 * * * www-data wget -q -O- http://www.mysite.com/cron/clean > /dev/null 2>&1;
Oddly enough the problem is with the name of the file. You are not permitted to use a . as a part of the name of the file when present in the /etc/cron.d dirctory.
The logic for this is in the database.c file, in the function valid_name. Renaming the file to something like mysite_crontab should fix the issue.
In general, the filename should probably just be a simple name mysite the fact that it's in this directory implies that it's a cron file already.
The file that is being pointed to must be owned by root, this is stated in the man page for the support of the /etc/cron.d directory:
Support for /etc/cron.d is included in the cron daemon itself, which handles this location as the system-wide crontab spool. This directory can contain any file defining tasks following the format used in /etc/crontab, i.e. unlike the user cron spool, these files must provide the username to run the task as in the task definition.
Files in this directory have to be owned by root, do not need to be executable (they are configuration files, just like /etc/crontab) and must conform to the same naming convention as used by run-parts(8): they must consist solely of upper- and lower-case letters, digits, underscores, and hyphens. This means that they cannot contain any dots. If the -l option is specified to cron (this option can be setup through /etc/default/cron, see below), then they must conform to the LSB namespace specification, exactly as in the --lsbsysinit option in run-parts.
The intended purpose of this feature is to allow packages that require finer control of their scheduling than the /etc/cron.{hourly,daily,weekly,monthly} directories to add a crontab file to /etc/cron.d. Such files should be named after the package that supplies them.

Basic Unix refresher inquiry: ls -ld

I know this is really basic, but I cannot find this information
in the ls man page, and need a refresher:
$ ls -ld my.dir
drwxr-xr-x 1 smith users 4096 Oct 29 2011 my.dir
What is the meaning of the number 1 after drwxr-xr-x ?
Does it represent the number of hard links to the direcory my.dir?
I cannot remember. Where can I find this information?
Thanks,
John Goche
I found it on Wikipedia:
duuugggooo (hard link count) owner group size modification_date name
The number is the hard link count.
If you want a more UNIXy solution, type info ls. This gives more detailed information including:
`-l'
`--format=long'
`--format=verbose'
In addition to the name of each file, print the file type, file
mode bits, number of hard links, owner name, group name, size, and
timestamp (*note Formatting file timestamps::), normally the
modification time. Print question marks for information that
cannot be determined.
That is the number of named (hard links) of the file. And I suppose, there is an error here. That must be at least 2 here for a directory.
$ touch file
$ ls -l
total 0
-rw-r--r-- 1 igor igor 0 Jul 15 10:24 file
$ ln file file-link
$ ls -l
total 0
-rw-r--r-- 2 igor igor 0 Jul 15 10:24 file
-rw-r--r-- 2 igor igor 0 Jul 15 10:24 file-link
$ mkdir a
$ ls -l
total 0
drwxr-xr-x 2 igor igor 40 Jul 15 10:24 a
-rw-r--r-- 2 igor igor 0 Jul 15 10:24 file
-rw-r--r-- 2 igor igor 0 Jul 15 10:24 file-link
As you can see, as soon as you make a directory, you get 2 at the column.
When you make subdirectories in a directory, the number increases:
$ mkdir a/b
$ ls -ld a
drwxr-xr-x 3 igor igor 60 Jul 15 10:41 a
As you can see the directory has now three names ('a', '.' in it, and '..' in its subdirectory):
$ ls -id a ; cd a; ls -id .; ls -id b/..
39754633 a
39754633 .
39754633 b/..
All these three names point to the same directory (inode 39754633).
Trying to explain why for directory the initial link count value =2.
Pl. see if this helps.
Any file/directory is indentified by an inode.
Number of Hard Links = Number of references to the inode.
When a directory/file is created, one directory entry (of the
form - {myname, myinodenumber}) is created in the parent directory.
This makes the reference count of the inode for that file/directory =1.
Now when a directory is created apart from this the space for directory is also created which by default should be having two directory entries
one for the directory which is created and another for the
parent directory that is two entries of the form {., myinodenumber}
and {.., myparent'sinodenumber}.
Current directory is referred by "." and the parent is referred by ".." .
So when we create a directory the initial number of Links' value = 1+1=2,
since there are two references to myinodenumber. And the parent's number
of link value is increased by 1.

Resources