Linux - Sum total of files in different directories - linux

How do I calculate the sum total size of multiple files located in different directories?
I have a text file containing the full path and name of the files.
I figure a simple script using while read line and du -h might do the trick...
Example of text file (new2.txt) containing list of files to sum:
/mount/st4000/media/A/amediafile.ext
/mount/st4000/media/B/amediafile.ext
/mount/st4000/media/C/amediafile.ext
/mount/st4000/media/D/amediafile.ext
/mount/st4000/media/E/amediafile.ext
/mount/st4000/media/F/amediafile.ext
/mount/st4000/media/G/amediafile.ext
/mount/st4000/media/H/amediafile.ext
/mount/st4000/media/I/amediafile.ext
/mount/st4000/media/J/amediafile.ext
/mount/st4000/media/K/amediafile.ext
Note: the folder structure is not necessarily consecutive as in A..K
Based on the suggestion from AndreaT, adapting it slightly, I tried
while read mediafile;do du -b "$mediafile"|cut -f -1>>subtotals.txt;done<new2.txt
subtotals.txt looks like
733402685
944869798
730564608
213768
13332480
366983168
6122559750
539944960
735039488
1755005744
733478912
To add all the subtotals
sum=0; while read num; do ((sum += num)); done < subtotals.txt; echo $sum

Assuming that file input is like this
/home/administrator/filesum/cliprdr.c
/home/administrator/filesum/cliprdr.h
/home/administrator/filesum/event.c
/home/administrator/filesum/event.h
/home/administrator/filesum/main.c
/home/administrator/filesum/main.h
/home/administrator/filesum/utils.c
/home/administrator/filesum/utils.h
and the result of command ls -l is
-rw-r--r-- 1 administrator administrator 13452 Oct 4 17:56 cliprdr.c
-rw-r--r-- 1 administrator administrator 1240 Oct 4 17:56 cliprdr.h
-rw-r--r-- 1 administrator administrator 8141 Oct 4 17:56 event.c
-rw-r--r-- 1 administrator administrator 2164 Oct 4 17:56 event.h
-rw-r--r-- 1 administrator administrator 32403 Oct 4 17:56 main.c
-rw-r--r-- 1 administrator administrator 1074 Oct 4 17:56 main.h
-rw-r--r-- 1 administrator administrator 5452 Oct 4 17:56 utils.c
-rw-r--r-- 1 administrator administrator 1017 Oct 4 17:56 utils.h
the simplest command to run is:
cat filelist.txt | du -cb | tail -1 | cut -f -1
with following output (in bytes)
69370
Keep in mind that du prints actual disk usage rounded up to a multiple of (usually) 4kb instead of logical file size.
For small files this approximation may not be acceptable.

To sum one directory, you will have to do a while, and export the result to the parent shell.
I used an echo an the subsequent eval :
eval ' let sum=0$(
ls -l | tail -n +2 |\
while read perms link user uid size date day hour name ; do
echo -n "+$size" ;
done
)'
It produces a line, directly evaluated, which looks like
let sum=0+205+1201+1201+1530+128+99
You just have to reproduce twice this command on both folders.

The du command doesn't have a -b option on the unix systems I have available. And there are other ways to get file size.
Assuming you like the idea of a while loop in bash, the following might work:
#!/bin/bash
case "$(uname -s)" in
Linux) stat_opt=(-c '%s') ;;
*BSD|Darwin) stat_opt=(-f '%z') ;;
*) printf 'ERROR: I don'\''t know how to run on %s\n' "$(uname -s)" ;;
esac
declare -i total=0
declare -i count=0
declare filename
while read filename; do
[[ -f "$filename" ]] || continue
(( total+=$(stat "${stat_opt[#]}" "$filename") ))
(( count++ ))
done
printf 'Total: %d bytes in %d files.\n' "$total" "$count"
This would take your list of files as stdin. You can run it in BSD unix or in Linux -- the options to the stat command (which is not internal to bash) are the bit that are platform specific.

Related

Rsync Incremental Backup still copies all the files

I am currently writing a bash script for rsync. I am pretty sure I am doing something wrong. But I can't tell what it is. I will try to elaborate everything in detail so hopefully someone can help me.
The goal of script is to do full backups and incremental ones using rsync. Everything seems to work perfectly well, besides one crucial thing. It seems like even though using the --link-dest parameter, it still copies all the files. I have checked the file sizes with du -chs.
First here is my script:
#!/bin/sh
while getopts m:p: flags
do
case "$flags" in
m) mode=${OPTARG};;
p) prev=${OPTARG};;
*) echo "usage: $0 [-m] [-p]" >&2
exit 1 ;;
esac
done
date="$(date '+%Y-%m-%d')";
#Create Folders If They Do Not Exist (-p paramter)
mkdir -p /Backups/Full && mkdir -p /Backups/Inc
FullBackup() {
#Backup Content Of Website
mkdir -p /Backups/Full/$date/Web/html
rsync -av user#IP:/var/www/html/ /Backups/Full/$date/Web/html/
#Backup All Config Files NEEDED. Saving Storage Is Key ;)
mkdir -p /Backups/Full/$date/Web/etc
rsync -av user#IP:/etc/apache2/ /Backups/Full/$date/Web/etc/
#Backup Fileserver
mkdir -p /Backups/Full/$date/Fileserver
rsync -av user#IP:/srv/samba/private/ /Backups/Full/$date/Fileserver/
#Backup MongoDB
ssh user#IP /usr/bin/mongodump --out /home/DB
rsync -av root#BackupServerIP:/home/DB/ /Backups/Full/$date/DB
ssh user#IP rm -rf /home/DB
}
IncrementalBackup(){
Method="";
if [ "$prev" == "full" ]
then
Method="Full";
elif [ "$prev" == "inc" ]
then
Method="Inc";
fi
if [ -z "$prev" ]
then
echo "-p Parameter Empty";
else
#Get Latest Folder - Ignore the hacky method, it works.
cd /Backups/$Method
NewestBackup=$(find . ! -path . -type d | sort -nr | head -1 | sed s#^./##)
IFS='/'
read -a strarr <<< "$NewestBackup"
Latest_Backup="${strarr[0]}";
cd /Backups/
#Incremental-Backup Content Of Website
mkdir -p /Backups/Inc/$date/Web/html
rsync -av --link-dest /Backups/$Method/"$Latest_Backup"/Web/html/ user#IP:/var/www/html/ /Backups/Inc/$date/Web/html/
#Incremental-Backup All Config Files NEEDED
mkdir -p /Backups/Inc/$date/Web/etc
rsync -av --link-dest /Backups/$Method/"$Latest_Backup"/Web/etc/ user#IP:/etc/apache2/ /Backups/Inc/$date/Web/etc/
#Incremental-Backup Fileserver
mkdir -p /Backups/Inc/$date/Fileserver
rsync -av --link-dest /Backups/$Method/"$Latest_Backup"/Fileserver/ user#IP:/srv/samba/private/ /Backups/Inc/$date/Fileserver/
#Backup MongoDB
ssh user#IP /usr/bin/mongodump --out /home/DB
rsync -av root#BackupServerIP:/home/DB/ /Backups/Full/$date/DB
ssh user#IP rm -rf /home/DB
fi
}
if [ "$mode" == "full" ]
then
FullBackup;
elif [ "$mode" == "inc" ]
then
IncrementalBackup;
fi
The command i used:
Full-Backup
bash script.sh -m full
Incremental
bash script.sh -m inc -p full
Executing the script is not giving any errors at all. As I mentioned above, it just seems like it's still copying all the files. Here are some tests I did.
Output of du -chs
root#Backup:/Backups# du -chs /Backups/Full/2021-11-20/*
36K /Backups/Full/2021-11-20/DB
6.5M /Backups/Full/2021-11-20/Fileserver
696K /Backups/Full/2021-11-20/Web
7.2M total
root#Backup:/Backups# du -chs /Backups/Inc/2021-11-20/*
36K /Backups/Inc/2021-11-20/DB
6.5M /Backups/Inc/2021-11-20/Fileserver
696K /Backups/Inc/2021-11-20/Web
7.2M total
Output of ls -li
root#Backup:/Backups# ls -li /Backups/Full/2021-11-20/
total 12
1290476 drwxr-xr-x 4 root root 4096 Nov 20 19:26 DB
1290445 drwxrwxr-x 6 root root 4096 Nov 20 18:54 Fileserver
1290246 drwxr-xr-x 4 root root 4096 Nov 20 19:26 Web
root#Backup:/Backups# ls -li /Backups/Inc/2021-11-20/
total 12
1290506 drwxr-xr-x 4 root root 4096 Nov 20 19:28 DB
1290496 drwxrwxr-x 6 root root 4096 Nov 20 18:54 Fileserver
1290486 drwxr-xr-x 4 root root 4096 Nov 20 19:28 Web
Rsync Output when doing the incremental backup and changing/adding a file
receiving incremental file list
./
lol.html
sent 53 bytes received 194 bytes 164.67 bytes/sec
total size is 606 speedup is 2.45
receiving incremental file list
./
sent 33 bytes received 5,468 bytes 11,002.00 bytes/sec
total size is 93,851 speedup is 17.06
receiving incremental file list
./
sent 36 bytes received 1,105 bytes 760.67 bytes/sec
total size is 6,688,227 speedup is 5,861.72
*Irrelevant MongoDB Dump Text*
sent 146 bytes received 2,671 bytes 1,878.00 bytes/sec
total size is 2,163 speedup is 0.77
I suspect that the ./ has something to do with that. I might be wrong, but it looks suspicious. Though when executing the same command again, the ./ are not in the log, probably because I did it on the same day, so it was overwriting in the /Backup/Inc/2021-11-20 Folder.
Let me know for more information. I have been trying around for a long time now. Maybe I am simply wrong and there are links made and disk space economized.
I didn't read the entire code because the main problem didn't seem to lay there.
Verify the disk usage of your /Backups directory with du -sh /Backups and then compare it with the sum of du -sh /Backups/Full and du -sh /Backups/Inc.
I'll show you why with a little test:
Create a directory containing a file of 1 MiB:
mkdir -p /tmp/example/data
dd if=/dev/zero of=/tmp/example/data/zerofile bs=1M count=1
Do a "full" backup:
rsync -av /tmp/example/data/ /tmp/example/full
Do an "incremental" backup
rsync -av --link-dest=/tmp/example/full /tmp/example/data/ /tmp/example/incr
Now let's see what we got:
with ls -l
ls -l /tmp/example/*
-rw-rw-r-- 1 user group 1048576 Nov 21 00:24 /tmp/example/data/zerofile
-rw-rw-r-- 2 user group 1048576 Nov 21 00:24 /tmp/example/full/zerofile
-rw-rw-r-- 2 user group 1048576 Nov 21 00:24 /tmp/example/incr/zerofile
and with du -sh
du -sh /tmp/example/*
1.0M /tmp/example/data
1.0M /tmp/example/full
0 /tmp/example/incr
Oh? There was a 1 MiB file in /tmp/example/incr but du missed it ?
Actually no. As the file wasn't modified since the previous backup (referenced with --link-dest), rsync created a hard-link to it instead of copying its content. — Hard-links connect a same memory space to different files
And du can detect hard-links and show you the real disk usage, but only when the hard-linked files are included (even in sub-dirs) in its arguments. For example, if you use du -sh independently for /tmp/example/incr:
du -sh /tmp/example/incr
1.0M /tmp/example/incr
How do you detect that there is hard-links to a file ?
ls -l actually showed it to us:
-rw-rw-r-- 2 user group 1048576 Nov 21 00:24 /tmp/example/full/zerofile
^
HERE
This number means that there are two existing hard-links to the file: this file itself and another one in the same filesystem.
about your code
It doesn't change anything but I would replace:
#Get Latest Folder - Ignore the hacky method, it works.
cd /Backups/$Method
NewestBackup=$(find . ! -path . -type d | sort -nr | head -1 | sed s#^./##)
IFS='/'
read -a strarr <<< "$NewestBackup"
Latest_Backup="${strarr[0]}";
cd /Backups/
with:
#Get Latest Folder
glob='20[0-9][0-9]-[0-1][0-9]-[0-3][0-9]' # match a timestamp (more or less)
NewestBackup=$(compgen -G "/Backups/$Method/$glob/" | sort -nr | head -n 1)
glob makes sure that the directories/files found by compgen -G will have the right format.
Adding / at the end of a glob makes sure that it matches directories only.

How do I find the latest date folder in a directory and then construct the command in a shell script?

I have a directory in which I will have some folders with date format (YYYYMMDD) as shown below -
david#machineX:/database/batch/snapshot$ ls -lt
drwxr-xr-x 2 app kyte 86016 Oct 25 05:19 20141023
drwxr-xr-x 2 app kyte 73728 Oct 18 00:21 20141016
drwxr-xr-x 2 app kyte 73728 Oct 9 22:23 20141009
drwxr-xr-x 2 app kyte 81920 Oct 4 03:11 20141002
Now I need to extract latest date folder from the /database/batch/snapshot directory and then construct the command in my shell script like this -
./file_checker --directory /database/batch/snapshot/20141023/ --regex ".*.data" > shardfile_20141023.log
Below is my shell script -
#!/bin/bash
./file_checker --directory /database/batch/snapshot/20141023/ --regex ".*.data" > shardfile_20141023.log
# now I need to grep shardfile_20141023.log after above command is executed
How do I find the latest date folder and construct above command in a shell script?
Look, this is one of approaches, just grep only folders that have 8 digits:
ls -t1 | grep -P -e "\d{8}" | head -1
Or
ls -t1 | grep -E -e "[0-9]{8}" | head -1
You could try the following in your script:
pushd /database/batch/snapshot
LATESTDATE=`ls -d * | sort -n | tail -1`
popd
./file_checker --directory /database/batch/snapshot/${LATESTDATE}/ --regex ".*.data" > shardfile_${LATESTDATE}.log
See BashFAQ#099 aka "How can I get the newest (or oldest) file from a directory?".
That being said, if you don't care for actual modification time and just want to find the most recent directory based on name you can use an array and globbing (note: the sort order with globbing is subject to LC_COLLATE):
$ find
.
./20141002
./20141009
./20141016
./20141023
$ foo=( * )
$ echo "${foo[${#foo[#]}-1]}"
20141023

rsync prints "skipping non-regular file" for what appears to be a regular directory

I back up my files using rsync. Right after a sync, I ran it expecting to see nothing, but instead it looked like it was skipping directories. I've (obviously) changed names, but I believe I've still captured all the information I could. What's happening here?
$ ls -l /source/backup/myfiles
drwxr-xr-x 2 me me 4096 2010-10-03 14:00 foo
drwxr-xr-x 2 me me 4096 2011-08-03 23:49 bar
drwxr-xr-x 2 me me 4096 2011-08-18 18:58 baz
$ ls -l /destination/backup/myfiles
drwxr-xr-x 2 me me 4096 2010-10-03 14:00 foo
drwxr-xr-x 2 me me 4096 2011-08-03 23:49 bar
drwxr-xr-x 2 me me 4096 2011-08-18 18:58 baz
$ file /source/backup/myfiles/foo
/source/backup/myfiles/foo/: directory
Then I sync (expecting no changes):
$ rsync -rtvp /source/backup /destination
sending incremental file list
backup/myfiles
skipping non-regular file "backup/myfiles/foo"
skipping non-regular file "backup/myfiles/bar"
And here's the weird part:
$ echo 'hi' > /source/backup/myfiles/foo/test
$ rsync -rtvp /source/backup /destination
sending incremental file list
backup/myfiles
backup/myfiles/foo
backup/myfiles/foo/test
skipping non-regular file "backup/myfiles/foo"
skipping non-regular file "backup/myfiles/bar"
So it worked:
$ ls -l /source/backup/myfiles/foo
-rw-r--r-- 1 me me 3126091 2010-06-15 22:22 IMGP1856.JPG
-rw-r--r-- 1 me me 3473038 2010-06-15 22:30 P1010615.JPG
-rw-r--r-- 1 me me 3 2011-08-24 13:53 test
$ ls -l /destination/backup/myfiles/foo
-rw-r--r-- 1 me me 3126091 2010-06-15 22:22 IMGP1856.JPG
-rw-r--r-- 1 me me 3473038 2010-06-15 22:30 P1010615.JPG
-rw-r--r-- 1 me me 3 2011-08-24 13:53 test
but still:
$ rsync -rtvp /source/backup /destination
sending incremental file list
backup/myfiles
skipping non-regular file "backup/myfiles/foo"
skipping non-regular file "backup/myfiles/bar"
Other notes:
My actual directories "foo" and "bar" do have spaces, but no other strange characters. Other directories have spaces and have no problem. I 'stat'-ed and saw no differences between the directories that don't rsync and the ones that do.
If you need more information, just ask.
Are you absolutely sure those individual files are not symbolic links?
Rsync has a few useful flags such as -l which will "copy symlinks as symlinks". Adding -l to your command:
rsync -rtvpl /source/backup /destination
I believe symlinks are skipped by default because they can be a security risk. Check the man page or --help for more info on this:
rsync --help | grep link
To verify these are symbolic links or pro-actively to find symbolic links you can use file or find:
$ file /path/to/file
/path/to/file: symbolic link to `/path/file`
$ find /path -type l
/path/to/file
Are you absolutely sure that it's not a symbolic link directory?
try a:
file /source/backup/myfiles/foo
to make sure it's a directory
Also, it could very well be a loopback mount
try
mount
and make sure that /source/backup/myfiles/foo is not listed.
You should try the below command, most probably it will work for you:
rsync -ravz /source/backup /destination
You can try the following, it will work
rsync -rtvp /source/backup /destination
I personally always use this syntax in my script and works a treat to backup the entire system (skipping sys/* & proc/* nfs4/*)
sudo rsync --delete --stats --exclude-from $EXCLUDE -rlptgoDv / $TARGET/ | tee -a $LOG
Here is my script run by root's cron daily:
#!/bin/bash
#
NFS="/nfs4"
HOSTNAME=`hostname`
TIMESTAMP=`date "+%Y%m%d_%H%M%S"`
EXCLUDE="/home/gcclinux/Backups/root-rsync.excludes"
TARGET="${NFS}/${HOSTNAME}/SYS"
LOGDIR="${NFS}/${HOSTNAME}/SYS-LOG"
CMD=`/usr/bin/stat -f -L -c %T ${NFS}`
## CHECK IF NFS IS MOUNTED...
if [[ ! $CMD == "nfs" ]];then
echo "NFS NOT MOUNTED"
exit 1
fi
## CHECK IF LOG DIRECTORY EXIST
if [ ! -d "$LOGDIR" ]; then
/bin/mkdir -p $LOGDIR
fi
## CREATE LOG HEADER
LOG=$LOGDIR/"rsync_result."$TIMESTAMP".txt"
echo "-------------------------------------------------------" | tee -a $LOG
echo `date` | tee -a $LOG
echo "" | tee -a $LOG
## START RUNNING BACKUP
/usr/bin/rsync --delete --stats --exclude-from $EXCLUDE -rlptgoDv / $TARGET/ | tee -a $LOG
In some cases just copy file to another location (like home) then try again

Using sed within "while read" expression

I am pretty stuck with that script.
#!/bin/bash
STARTDIR=$1
MNTDIR=/tmp/test/mnt
find $STARTDIR -type l |
while read file;
do
echo Found symlink file: $file
DIR=`sed 's|/\w*$||'`
MKDIR=${MNTDIR}${DIR}
mkdir -p $MKDIR
cp -L $file $MKDIR
done
I passing some directory to $1 parameter, this directory have three symbolic links. In while statement echoed only first match, after using sed I lost all other matches.
Look for output below:
[artyom#LBOX tmp]$ ls -lh /tmp/imp/
total 16K
lrwxrwxrwx 1 artyom adm 19 Aug 8 10:33 ok1 -> /tmp/imp/sym3/file1
lrwxrwxrwx 1 artyom adm 19 Aug 8 09:19 ok2 -> /tmp/imp/sym2/file2
lrwxrwxrwx 1 artyom adm 19 Aug 8 10:32 ok3 -> /tmp/imp/sym3/file3
[artyom#LBOX tmp]$ ./copy.sh /tmp/imp/
Found symlink file: /tmp/imp/ok1
[artyom#LBOX tmp]$
Can somebody help with that issue?
Thanks
You forgot to feed something to sed. Without explicit input, it reads nothing in this construction. I wouldn't use this approach anyway, but just use something like:
DIR=`dirname "$file"`

bash script to rename all files in a directory?

i have bunch of files that needs to be renamed.
file1.txt needs to be renamed to file1_file1.txt
file2.avi needs to be renamed to file2_file2.avi
as you can see i need the _ folowed by the original file name.
there are lot of these files.
So far all the answers given either:
Require some non-portable tool
Break horribly with filenames containing spaces or newlines
Is not recursive, i.e. does not descend into sub-directories
These two scripts solve all of those problems.
Bash 2.X/3.X
#!/bin/bash
while IFS= read -r -d $'\0' file; do
dirname="${file%/*}/"
basename="${file:${#dirname}}"
echo mv "$file" "$dirname${basename%.*}_$basename"
done < <(find . -type f -print0)
Bash 4.X
#!/bin/bash
shopt -s globstar
for file in ./**; do
if [[ -f "$file" ]]; then
dirname="${file%/*}/"
basename="${file:${#dirname}}"
echo mv "$file" "$dirname${basename%.*}_$basename"
fi
done
Be sure to remove the echo from whichever script you choose once you are satisfied with it's output and run it again
Edit
Fixed problem in previous version that did not properly handle path names.
For your specific case, you want to use mmv as follows:
pax> ll
total 0
drwxr-xr-x+ 2 allachan None 0 Dec 24 09:47 .
drwxrwxrwx+ 5 allachan None 0 Dec 24 09:39 ..
-rw-r--r-- 1 allachan None 0 Dec 24 09:39 file1.txt
-rw-r--r-- 1 allachan None 0 Dec 24 09:39 file2.avi
pax> mmv '*.*' '#1_#1.#2'
pax> ll
total 0
drwxr-xr-x+ 2 allachan None 0 Dec 24 09:47 .
drwxrwxrwx+ 5 allachan None 0 Dec 24 09:39 ..
-rw-r--r-- 1 allachan None 0 Dec 24 09:39 file1_file1.txt
-rw-r--r-- 1 allachan None 0 Dec 24 09:39 file2_file2.avi
You need to be aware that the wildcard matching is not greedy. That means that the file a.b.txt will be turned into a_a.b.txt, not a.b_a.b.txt.
The mmv program was installed as part of my CygWin but I had to
sudo apt-get install mmv
on my Ubuntu box to get it down. If it's not in you standard distribution, whatever package manager you're using will hopefully have it available.
If, for some reason, you're not permitted to install it, you'll have to use one of the other bash for-loop-type solutions shown in the other answers. I prefer the terseness of mmv myself but you may not have the option.
for file in file*.*
do
[ -f "$file" ] && echo mv "$file" "${file%%.*}_$file"
done
Idea for recursion
recurse() {
for file in "$1"/*;do
if [ -d "$file" ];then
recurse "$file"
else
# check for relevant files
# echo mv "$file" "${file%%.*}_$file"
fi
done
}
recurse /path/to/files
find . -type f | while read FN; do
BFN=$(basename "$FN")
NFN=${BFN%.*}_${BFN}
echo "$BFN -> $NFN"
mv "$FN" "$NFN"
done
I like the PERL cookbook's rename script for this. It may not be /bin/sh but you can do regular expression-like renames.
The /bin/sh method would be to use sed/cut/awk to alter each filename inside a for loop. If the directory is large you'd need to rely on xargs.
One should mention the mmv tool, which is especially made for this.
It's described here: http://tldp.org/LDP/GNU-Linux-Tools-Summary/html/mass-rename.html
...along with alternatives.
I use prename (perl based), which is included in various linux distributions. It works with regular expressions, so to say change all img_x.jpg to IMAGE_x.jpg you'd do
prename 's/img_/IMAGE_/' img*jpg
You can use the -n flag to preview changes without making any actual changes.
prename man entry
#!/bin/bash
# Don't do this like I did:
# files=`ls ${1}`
for file in *.*
do
if [ -f $file ];
then
newname=${file%%.*}_${file}
mv $file $newname
fi
done
This one won't rename sub directories, only regular files.

Resources