Tar backup keeps increasing in size

Tar backup keeps increasing in size - linux

I've got a backup script that's supposed to backup the whole system, but it keeps increasing in size days after the backup. I've set to do one backup each day in crontab.
TIME=`date +%b-%d-%y`
FILENAME=backup-$TIME.tar.gz
SRCDIR=/
DESDIR=/home/backup
tar -cpzf $DESDIR/$FILENAME --exclude=/home/backup --exclude=/tmp --exclude=/sys --exclude=/dev $SRCDIR
Is there some other directory that is constantly changing that i need to exclude aswell?
Thanks in advance.

the du -hs command can show you the disk space used in each directory. You can do a cd / and then du -h --max-depth 1 to check each directories size. From there you can tell why the tar size is increasing.

You can also exclude /var/log.
If you have some database on the system, it's ok that the size of the backup grows.

Related

How to show the disk usage of each subdirectory in Linux?

I have a directory, /var/lib/docker, which contains several subdirectories:
/var/lib/docker$ sudo ls
aufs containers image network plugins swarm tmp trust volumes
I'd like to find out how big each directory is. However, using the du command as follows,
/var/lib/docker$ sudo du -csh .
15G .
15G total
I don't see the 'breakdown' for each directory. In the examples I've seen in http://www.tecmint.com/check-linux-disk-usage-of-files-and-directories/, however, it seems that I should see it. How might I obtain this overview to see which directory is taking up the most space?

Use asterisk to get info for each directory, like this:
sudo du -hs *
It will output something like the below:
0 backup
0 bin
70M boot
0 cfg
8.0K data
0 dev
140K docs

ncdu is also a nice way to analyze disk usage. It allows to quickly navigate through subdirectories and identify largest directories and files.
It should be available in most distributions official repositories.

Try using the max-depth argument. It prints the total disk space usage for a directory (or file, with --all) only if it is Nor fewer levels below the command line argument.
For e.g. the following command will show the disk space usage upto 3 level deep subdirectories
du --max-depth=3 -h
For informations on N-levels, use this du --max-depth=N -h where N is a positive integer.

Let shell expand the directory contents:
du -h *

Call du for each directory:
find . -type d | xargs du -csh

In addition,
du -h <directory> should do it.

In addition you can simply use :
du /directory
This would give of the space used by all of the sub directories inside the parent directory.

docker container size much greater than actual size

I am trying to build an image from debian:latest. After the build, the reported virtual size of the image from docker images command is 1.917 GB. I logged in to check the size (du -sh /)and it's 573 MB. I am pretty sure that this huge size is not possible normally. What is going on here? How to get the correct size of the image? More importantly when I push this repository the size is 1.9 GB and not 573 MB.
Output of du -sh /*
8.9M /bin
4.0K /boot
0 /dev
1.1M /etc
4.0K /home
30M /lib
4.0K /lib64
4.0K /media
4.0K /mnt
4.0K /opt
du: cannot access '/proc/11/task/11/fd/4': No such file or directory
du: cannot access '/proc/11/task/11/fdinfo/4': No such file or directory
du: cannot access '/proc/11/fd/4': No such file or directory
du: cannot access '/proc/11/fdinfo/4': No such file or directory
0 /proc
427M /root
8.0K /run
3.9M /sbin
4.0K /srv
0 /sys
8.0K /tmp
88M /usr
15M /var

Do you build that image via a Dockerfile? When you do that take care about your RUN statements. When you execute multiple RUN statements for each of those a new image layer is created which remains in the images history and counts on the images total size.
So for instance if one RUN statement downloads a huge archive file, a next one unpacks that archive, and a following one cleans up that archive the archive and its extracted files remain in the images history:
RUN curl <options> http://example.com/my/big/archive.tar.gz
RUN tar xvzf <options>
RUN <do whatever you need to do with the unpacked files>
RUN rm archive.tar.gz
There are more efficient ways in terms of image size to combine multiple steps in one RUN statement using the && operator. Like:
RUN curl <options> http://example.com/my/big/archive.tar.gz \
&& tar xvzf <options> \
&& <do whatever you need to do with the unpacked files> \
&& rm archive.tar.gz
In that way you can clean up files and folders that you need for the build process but not in the resulting image and keep them out of the images history as well. That is a quite common pattern to keep image sizes small.
But of course you will not have a fine-grained image history which you could make reuse of, then.
Update:
As well as RUN statements ADD statements also create new image layers. Whatever you add to an image that way it stays in history and counts on the total image size. You cannot temporarily ADD things and then remove them so that they do not count on the total size.
Try to ADD as less as possible to the image. Especially when you work with large files. Are there other ways to temporary get those files within a RUN statement so that you can do a cleanup during the same RUN execution? E.g. RUN git clone <your repo> && <do stuff> && rm -rf <clone dir>?
A good practice would be to only ADD those things that are meant to stay on the image. Temporary things should be added and cleaned up with a single RUN statement instead where possible.

The 1.9GB size is not the image, it's the image and its history. Use docker history textbox to check what takes so much space.
See also Why are Docker container images so large?
To reduce the size, you can change the way you build the image (it will depends on what you do, see answers from the link above), use docker export (see How to flatten a Docker image?) or use other extensions.

Mount /var on ramdisk at boot - Bash Script Issue

I have an embedded device where i need to put my /var and /tmp in ram in order to diminish the number of writes on the drive (Compact flash). I know how to do it with /tmp as i don't have to recover anything whenever i reboot or shutdown.
But the /var directory has important stuff. I have been researching and i found this, but it doesn't seem to be working.
Here is the script:
# insert this on file 'rc.sys.init'
# after the mount of the root file system
# to create the /var on ramdisk
echo "Create ramdisk........."
#dd if=/dev/zero of=/dev/ram0 bs=1k count=16384
mkfs.ext2 -j -m 0 -q -L ramdisk /dev/ram0
if [ ! -d /mnt/ramdisk ]; then
mkdir -p /mnt/ramdisk
fi
mount /dev/ram0 /mnt/ramdisk
if [ -L /var ]; then
tar -xf /vartmp.tar -C /mnt/ramdisk
else
tar -C / -cf /vartmp.tar var
cp -a /var /mnt/ramdisk
rm -rf /var
ln -s /mnt/ramdisk/var /var
fi
# insert this into file 'halt'
# to stop the ram disk properly on shutdown.
#
if [ -e /vartmp.tar ]; then
rm -f /vartmp.tar
fi;
tar -C /mnt/ramdisk -cf /vartmp.tar var
Is there any problem with this script? If not, in which inicialization and termination script should i include them?

For all that have the same problem i do i have solved my problem (kind of)
The two scripts i posted are correct and accomplish the job. What you have to be careful is where you put them.
In Slackware the first run script is rc.S. At first i copy pasted my first script into the middle of that one. It definitely should be there, just not where i put it. You have to see where does the script rc.S call for a particular directory or file from /var. The creation of the ramdisk should be before those lines.
the shutdown script should be added in the bottom of the rc.6 script (shutdown script)
Also i should point out that although this improves the life expectancy of the drive, it is a little volatile and sometimes randomly reboots, so be careful.

Nice script...but it seems to me that it is volatile for several reasons. First did you tell the system max ramdisk size...first as a kernel argument.....linux /vmlinuz ramdisk_size=204800......then in rc mke2fs -t ext2 /dev/ram1 204800.....and maybe use ram1 not ram0.......also use a script for manual saving of ramdisk contents to /var.....cp -a /mnt/ramdisk/var/. /var........backup real /var to another directoryusin tar compression, but introducing tar compression to reduce data size probably introduces lag, latency and instability. Just seems to me to be so.

Copying entire contents of a server

I need to copy the whole contents of a linux server, but I'm not sure how to do it recursively.
I have a migration script to run on the server itself, but it won't run because the disc is full, so I need something I can run remotely which just gets everything.

I need to copy the whole contents of a linux server, but I'm not sure how to do it recursively.
How about
scp -r root#remotebox:/ your_local_copy

sudo rsync -hxDPavil -H --stats --delete / remote:/backup/
this will copy everything (permissions, owners, timestamps, devices, sockets, hardlinks etc). It will also delete stuff that no longer exists in source. (note that -x indicates to only copy files within the same mountpoint)
If you want to preserve owners but the receiving end is not on the same domain, use --numeric-ids
To automate incremental backup w/snapshots, look at rdiff-backup or rsnapshot.
Also, gnu tar is highly underrated
sudo tar cpf / | ssh remote 'cd /backup && tar xv'

How to Free Inode Usage?

I have a disk drive where the inode usage is 100% (using df -i command).
However after deleting files substantially, the usage remains 100%.
What's the correct way to do it then?
How is it possible that a disk drive with less disk space usage can have
higher Inode usage than disk drive with higher disk space usage?
Is it possible if I zip lot of files would that reduce the used inode count?

If you are very unlucky you have used about 100% of all inodes and can't create the scipt.
You can check this with df -ih.
Then this bash command may help you:
sudo find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n
And yes, this will take time, but you can locate the directory with the most files.

It's quite easy for a disk to have a large number of inodes used even if the disk is not very full.
An inode is allocated to a file so, if you have gazillions of files, all 1 byte each, you'll run out of inodes long before you run out of disk.
It's also possible that deleting files will not reduce the inode count if the files have multiple hard links. As I said, inodes belong to the file, not the directory entry. If a file has two directory entries linked to it, deleting one will not free the inode.
Additionally, you can delete a directory entry but, if a running process still has the file open, the inode won't be freed.
My initial advice would be to delete all the files you can, then reboot the box to ensure no processes are left holding the files open.
If you do that and you still have a problem, let us know.
By the way, if you're looking for the directories that contain lots of files, this script may help:
#!/bin/bash
# count_em - count files in all subdirectories under current directory.
echo 'echo $(ls -a "$1" | wc -l) $1' >/tmp/count_em_$$
chmod 700 /tmp/count_em_$$
find . -mount -type d -print0 | xargs -0 -n1 /tmp/count_em_$$ | sort -n
rm -f /tmp/count_em_$$

My situation was that I was out of inodes and I had already deleted about everything I could.
$ df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 942080 507361 11 100% /
I am on an ubuntu 12.04LTS and could not remove the old linux kernels which took up about 400,000 inodes because apt was broken because of a missing package. And I couldn't install the new package because I was out of inodes so I was stuck.
I ended up deleting a few old linux kernels by hand to free up about 10,000 inodes
$ sudo rm -rf /usr/src/linux-headers-3.2.0-2*
This was enough to then let me install the missing package and fix my apt
$ sudo apt-get install linux-headers-3.2.0-76-generic-pae
and then remove the rest of the old linux kernels with apt
$ sudo apt-get autoremove
things are much better now
$ df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 942080 507361 434719 54% /

My solution:
Try to find if this is an inodes problem with:
df -ih
Try to find root folders with large inodes count:
for i in /*; do echo $i; find $i |wc -l; done
Try to find specific folders:
for i in /src/*; do echo $i; find $i |wc -l; done
If this is linux headers, try to remove oldest with:
sudo apt-get autoremove linux-headers-3.13.0-24
Personally I moved them to a mounted folder (because for me last command failed) and installed the latest with:
sudo apt-get autoremove -f
This solved my problem.

I had the same problem, fixed it by removing the directory sessions of php
rm -rf /var/lib/php/sessions/
It may be under /var/lib/php5 if you are using a older php version.
Recreate it with the following permission
mkdir /var/lib/php/sessions/ && chmod 1733 /var/lib/php/sessions/
Permission by default for directory on Debian showed drwx-wx-wt (1733)

We experienced this on a HostGator account (who place inode limits on all their hosting) following a spam attack. It left vast numbers of queue records in /root/.cpanel/comet. If this happens and you find you have no free inodes, you can run this cpanel utility through shell:
/usr/local/cpanel/bin/purge_dead_comet_files

You can use RSYNC to DELETE the large number of files
rsync -a --delete blanktest/ test/
Create blanktest folder with 0 files in it and command will sync your test folders with large number of files(I have deleted nearly 5M files using this method).
Thanks to http://www.slashroot.in/which-is-the-fastest-method-to-delete-files-in-linux

Late answer:
In my case, it was my session files under
/var/lib/php/sessions
that were using Inodes.
I was even unable to open my crontab or making a new directory let alone triggering the deletion operation.
Since I use PHP, we have this guide where I copied the code from example 1 and set up a cronjob to execute that part of the code.
<?php
// Note: This script should be executed by the same user of web server
process.
// Need active session to initialize session data storage access.
session_start();
// Executes GC immediately
session_gc();
// Clean up session ID created by session_gc()
session_destroy();
?>
If you're wondering how did I manage to open my crontab, then well, I deleted some sessions manually through CLI.
Hope this helps!

firstly, get the inode storage usage:
df -i
The next step is to find those files. For that, we can use a small script that will list the directories and the number of files on them.
for i in /*; do echo $i; find $i |wc -l; done
From the output, you can see the directory which uses a large number of files, then repeat this script for that directory like below. Repeat it until you see the suspected directory.
for i in /home/*; do echo $i; find $i |wc -l; done
When you find the suspected directory with large number of unwanted files. Just delete the unwanted files on that directory and free up some inode space by the following the command.
rm -rf /home/bad_user/directory_with_lots_of_empty_files
You have successfully solved the problem. Check the inode usage now with the df -i command again, you can see the difference like this.
df -i

eaccelerator could be causing the problem since it compiles PHP into blocks...I've had this problem with an Amazon AWS server on a site with heavy load. Free up Inodes by deleting the eaccelerator cache in /var/cache/eaccelerator if you continue to have issues.
rm -rf /var/cache/eaccelerator/*
(or whatever your cache dir)

We faced similar issue recently, In case if a process refers to a deleted file, the Inode shall not be released, so you need to check lsof /, and kill/ restart the process will release the inodes.
Correct me if am wrong here.

As told before, filesystem may run out of inodes, if there are a lot of small files. I have provided some means to find directories that contain most files here.

In one of the above answers it was suggested that sessions was the cause of running out of inodes and in our case that is exactly what it was. To add to that answer though I would suggest to check the php.ini file and ensure session.gc_probability = 1 also session.gc_divisor = 1000 and
session.gc_maxlifetime = 1440. In our case session.gc_probability was equal to 0 and caused this issue.

this article saved my day:
https://bewilderedoctothorpe.net/2018/12/21/out-of-inodes/
find . -maxdepth 1 -type d | grep -v '^\.$' | xargs -n 1 -i{} find {} -xdev -type f | cut -d "/" -f 2 | uniq -c | sort -n

On Raspberry Pi I had a problem with /var/cache/fontconfig dir with large number of files. Removing it took more than hour. And of couse rm -rf *.cache* raised Argument list too long error. I used below one
find . -name '*.cache*' | xargs rm -f

you could see this info
for i in /var/run/*;do echo -n "$i "; find $i| wc -l;done | column -t

For those who use Docker and end up here,
When df -i says 100% Inode Use;
Just run docker rmi $(docker images -q)
It will let your created containers (running or exited) but will remove all image that ain't referenced anymore freeing a whole bunch of inodes; I went from 100% back to 18% !
Also might be worth mentioning I use a lot CI/CD with docker runner set up on this machine.

It could be the /tmp folder (where all the temporarily files are stored, yarn and npm script execution for exemple, specifically if you are starting a lot of node script). So normally, you just have to reboot your device or server, and it will delete all the temporarily file that you don't need. For my, I went from 100% of use to 23% of use !

Many answers to this one so far and all of the above seem concrete. I think you'll be safe by using stat as you go along, but OS depending, you may get some inode errors creep up on you. So implementing your own stat call functionality using 64bit to avoid any overflow issues seems fairly compatible.

Run sudo apt-get autoremove command
in some cases it works. If previous unused header data exists, this will be cleaned up.

If you use docker, remove all images. They used many space....
Stop all containers
docker stop $(docker ps -a -q)
Delete all containers
docker rm $(docker ps -a -q)
Delete all images
docker rmi $(docker images -q)
Works to me

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Tar backup keeps increasing in size - linux

the du -hs command can show you the disk space used in each directory. You can do a cd / and then du -h --max-depth 1 to check each directories size. From there you can tell why the tar size is increasing.

You can also exclude /var/log. If you have some database on the system, it's ok that the size of the backup grows.

Related

How to show the disk usage of each subdirectory in Linux?

docker container size much greater than actual size

Mount /var on ramdisk at boot - Bash Script Issue

Copying entire contents of a server

How to Free Inode Usage?

Categories

Resources