I am relatively new to Linux. In one of our projects, we use amazon's EC2 instance for processing of some files. We upload files to S3 server after processing. EC2 instance is booted using an existing AMI
Recently I got an error no space left on disk, hence processing of files was halted. I cleaned up some older files and the processing continued.
Now when I look at available space using df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 9.9G 5.7G 3.7G 61% /
none 3.7G 0 3.7G 0% /dev/shm
/dev/xvdb 414G 199M 393G 1% /mnt
/dev/xvdc 414G 199M 393G 1% /data
I can see my files are effecting only /dev/xvda1.
I have following queries
What is the use of other partitions when I can see my files only effecting /dev/xvda1
It looks like we are only using 10 GB of space effectively and other is being wasted. How can I use other space? Can I move some disk space to /dev/xvda1 or directly store files in other areas?
As you can see from the output of df -h, there are two large partitions mouted on /mnt and /data respectively. I suggest that you use those partitions by processing the files in one of those directories. If you cannot move where the processing happens for some reason, you can remount the partitions in the appropriate place.
If for example your files are processed in the directory /var/mydir and you cannot change that, do the following (as root):
umount /mnt
mount /dev/xvdb /var/mydir
You can use the other partition as well of course if you prefer that.
Related
when i'm writing df -h in my instance i'm getting this data:
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.7G 0 7.7G 0% /dev
tmpfs 7.7G 0 7.7G 0% /dev/shm
tmpfs 7.7G 408K 7.7G 1% /run
tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup
/dev/nvme0n1p1 32G 24G 8.5G 74% /
tmpfs 1.6G 0 1.6G 0% /run/user/1000
but when i'm clicking sudo du -sh / i'm getting:
11G /
So in df -h, / size is 24G but in du -sh same directory the size is 11G.
I'm trying to get some free space on my instance and can't find the files that cause that.
What i'm missing?
did df -h is really giving fake data?
This question comes up quite often. The file system allocates disk blocks in the file system to record its data. This data is referred to as metadata which is not visible to most user-level programs (such as du). Examples of metadata are inodes, disk maps, indirect blocks, and superblocks.
The du command is a user-level program that isn't aware of filesystem metadata, while df looks at the filesystem disk allocation maps and is aware of file system metadata. df obtains true filesystem statistics, whereas du sees only a partial picture.
There are many causes on why the disk space used or available when running the du or df commands differs.
Perhaps the most common is deleted files. Files that have been deleted may still be open by at least one process. The entry for such files is removed from the associated directory, which makes the file inaccessible. Therefore the command du which only counts files does not take these files into account and comes up with a smaller value. As long as a process still has the deleted file in use, however, the associated blocks are not yet released in the file system, so df which works at the kernel level correctly displays these as occupied. You can find out if this is the case by running the following:
lsof | grep '(deleted)'
The fix for this issue would be to restart the services that still have those deleted files open.
The second most common cause is if you have a partition or drive mounted on top of a directory with the same name. For example, if you have a directory under / called backup which contains data and then you mount a new drive on top of that directory and label it /backup but it contains no data then the space used will show up with the df command even though the du command shows no files.
To determine if there are any files or directories hidden under an active mount point, you can try using a bind-mount to mount your / filesystem which will enable me to inspect underneath other mount points. Note, this is recommended only for experienced system administrators.
mkdir /tmp/tmpmnt
mount -o bind //tmp/tmpmnt
du /tmp/tmpmnt
After you have confirmed that this is the issue, the bind mount can be removed by running:
umount /tmp/tmpmnt/
rmdir /tmp/tmpmnt
Another possible cause might be filesystem corruption. If this is suspected, please make sure you have good backups, and at your convenience, please unmount the filesystem and run fsck.
Again, this should be done by experienced system administrators.
You can also check the calculation by running:
strace -e statfs df /
This will give you output similar to:
statfs("/", {f_type=XFS_SB_MAGIC, f_bsize=4096, f_blocks=20968699, f_bfree=17420469,
f_bavail=17420469, f_files=41942464, f_ffree=41509188, f_fsid={val=[64769, 0]},
f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/vda1 83874796 14192920 69681876 17% /
+++ exited with 0 +++
Notice the difference between f_bfree and f_bavail? These are the free blocks in the filesystem vs free blocks available to an unprivileged user. The used column is merely a calculation between the two.
Hope this will make your idea clear. Let me know if you still have any doubts.
I don't know actually if this is more a "classic" linux or a docker question but:
On an VM where some of my docker containers are running I've a strange thing. /var/lib/docker is an own partitionwith 20GB. When I look over the partition with df -h I see this:
eti-gwl1v-dockerapp1 root# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 815M 7.0G 11% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/sda2 12G 3.2G 8.0G 29% /
/dev/sda7 3.9G 17M 3.7G 1% /tmp
/dev/sda5 7.8G 6.8G 649M 92% /var
/dev/sdb2 20G 47M 19G 1% /usr2
/dev/sdb1 20G 2.9G 16G 16% /var/lib/docker
So usage is at 16%. But when I now navigate to /var/lib and do a du -sch docker I see this:
eti-gwl1v-dockerapp1 root# cd /var/lib
eti-gwl1v-dockerapp1 root# du -sch docker
19G docker
19G total
eti-gwl1v-dockerapp1 root#
So same directory/partition but two sizes? How is that going?
This is really a question for unix.stackexchange.com, but there is filesystem overhead that makes the partition larger than the total size of the individual files within it.
du and df show you two different metrics:
du shows you the (estimated) file space usage, i.e. the sum of all file sizes
df shows you the disk space usage, i.e. how much space on the disk is actually used
These are distinct values and can often diverge:
disk usage may be bigger than the mere sum of file sizes due to additional meta data: e.g. the disk usage of 1000 empty files (file size = 0) is >0 since their file names and permissions need to be stored
the space used by one or multiple files may be smaller than their reported file size due to:
holes in the file - block consisting of only null bytes are not actually written to disk, see sparse files
automatic file system compression
deduplication through hard links or copy-on-write
Since docker uses the image layers as a means of deduplication the latter is most probably the cause of your observation - i.e. the sum of the files is much bigger because most of them are shared/deduplicated through hard links.
du estimates filesystem usage through summing the size of all files in it. This does not deal well with the usage of overlay2: there will be many directories which contain the same files as contained in another, but overlaid with additional layers using overlay2. As such, du will show a very inflated number.
I have not tested this since my Docker daemon is not using overlay2, but using du -x to avoid going into overlays could give the right amount. However, this wouldn't work for other Docker drivers, like btrfs, for example.
I have mounted 600 GB of extra space in my AWS EC2 at /data. But as I started using Jenkins i realized that My Jenkins is not using any of that extra space and now I am left with only 1.5 GB of storage.
Is there any way to merge the extra storage with root storage?
Result of df -h command
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.9G 68K 7.9G 1% /dev
tmpfs 7.9G 4.0K 7.9G 1% /dev/shm
/dev/xvda1 7.9G 6.3G 1.5G 82% /
/dev/xvdf 600G 1.8G 598G 1% /data
I want to merge /dev/xvda1 and /dev/xvdf.
Is it even possible?
Edit: Someone suggested to move my jenkins to new drive. If it will not not hamper my current work then i think it will be a good solution. Any opinions on this?
Quick way:
you can stop your jenkins instance and create AMI image on it.
Then create base on this image as a new EC2 instance with large storage directly.
We have a few datasets of small images, where each image is about 100KB, and there about 50K images per dataset (around 5GB each dataset). We typically use these datasets to batch-load each image incrementally into a memory of a Google VM instance in order to perform machine learning studies. This is done several times a day.
Currently, a few of us each have our own Google Persistent Disk attached to the VM with the datasets replicated on each. This is not ideal since they are pricey, however, data access is very fast which allows us to iterate on our studies fairly rapidly. We don't share one disk because of the inconvenience of having to manage read/write settings with Google disks when sharing.
Is there an alternative Google Cloud option to handle this use case? Google Buckets are too slow since it is reading many small files.
If your main interest is having rapid I/O your best bet is using an SSD for obvious reasons. Why I don't understand is why you don't want to share one disk. You can have one SSD attached to one of your instances as R/W for loading and modifying your datasets and mounting it read-only to the instances that need to fetch the data.
I'm not sure how faster will be this solution compared to using a bucket, though. I guess you are aware that gsutil has an option for multithreading transfers, which exponentially increases the data transfer speed, specially when transfering a lot of small files? The flag is -m
-m Causes supported operations (acl ch, acl set, cp, mv, rm, rsync,
and setmeta) to run in parallel. This can significantly improve
performance if you are performing operations on a large number of
files over a reasonably fast network connection.
gsutil performs the specified operation using a combination of
multi-threading and multi-processing, using a number of threads
and processors determined by the parallel_thread_count and
parallel_process_count values set in the boto configuration
file. You might want to experiment with these values, as the
best values can vary based on a number of factors, including
network speed, number of CPUs, and available memory.
Using the -m option may make your performance worse if you
are using a slower network, such as the typical network speeds
offered by non-business home network plans. It can also make
your performance worse for cases that perform all operations
locally (e.g., gsutil rsync, where both source and destination
URLs are on the local disk), because it can "thrash" your local
disk.
If a download or upload operation using parallel transfer fails
before the entire transfer is complete (e.g. failing after 300 of
1000 files have been transferred), you will need to restart the
entire transfer.
Also, although most commands will normally fail upon encountering
an error when the -m flag is disabled, all commands will
continue to try all operations when -m is enabled with multiple
threads or processes, and the number of failed operations (if any)
will be reported at the end of the command's execution.
If you want to go with the instance with R/W SSD and multiple read only clients see below:
One option is to set up an NFS on your SSD, one instance will act as the NFS server with R/W rights and the rest will have only read permissions. I will be using Ubuntu 16.04 but the process is similar in all distros:
1 - Install the required packages on both server and clients:
Server: sudo apt install nfs-kernel-server
Client: sudo apt install nfs-common
2 - Mount the disk SSD disk on the server (after formatting it to the filesystem you want to use):
Server:
jordim#instance-5:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 0 50G 0 disk <--- My extra SSD disk
sda 8:0 0 10G 0 disk
└─sda1 8:1 0 10G 0 part /
jordim#instance-5:~$ sudo fdisk /dev/sdb
(I will create a single primary ext4 partition)
jordim#instance-5:~$ sudo fdisk /dev/sdb
(create partition)
jordim#instance-5:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 0 50G 0 disk
└─sdb1 8:17 0 50G 0 part <- Newly created partition
sda 8:0 0 10G 0 disk
└─sda1 8:1 0 10G 0 part /
jordim#instance-5:~$ sudo mkfs.ext4 /dev/sdb1
(...)
jordim#instance-5:~$ sudo mkdir /mount
jordim#instance-5:~$ sudo mount /dev/sdb1 /mount/
Make a dir for your NFS share folder:
jordim#instance-5:/mount$ sudo mkdir shared
Now configure the exports on your server. Add the folder to share and the private IPs of the clients. Also you can tweak permissions here, use "ro" for "read only" or "rw" for read-write permissions.
jordim#instance-5:/mount$ sudo vim /etc/exports
(inside the exports file, note the IP is the private IP of the client instance):
/mount/share 10.142.0.5(ro,sync,no_subtree_check)
Now start the nfs service on the server:
root#instance-5:/mount# systemctl start nfs-server
Now to create the mountpoint on the client:
jordim#instance-4:~$ sudo mkdir -p /nfs/share
And mount the folder:
jordim#instance-4:~$ sudo mount 10.142.0.6:/mount/share /nfs/share
Now let's test it:
Server:
jordim#instance-5:/mount/share$ touch test
Client:
jordim#instance-4:/nfs/share$ ls
test
Also, see the mounts:
jordim#instance-4:/nfs/share$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 1.8G 0 1.8G 0% /dev
tmpfs 370M 9.9M 360M 3% /run
/dev/sda1 9.7G 1.5G 8.2G 16% /
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
tmpfs 370M 0 370M 0% /run/user/1001
10.142.0.6:/mount/share 50G 52M 47G 1% /nfs/share
There you go, now you have only one instance with a r/w disk and as many clients as you want with read only permissions.
Data deleted from /mnt directory after stop and start EC2 instance
[root#localhost opt]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 9.7G 1.4G 7.8G 15% /
none 1.9G 0 1.9G 0% /dev/shm
/dev/xvdb 394G 199M 374G 1% /mnt
I place my data in /mnt .I stop instance yesterday .
Afetr starting the instance today ,I didnt find any data form /mnt .
I have another from /opt .
How can i recover that data from /mnt .
If /mnt is Temporary mounting point .Then how can i use these all space
On EC2, /mnt directory is mounted to ephemeral storage.
After reboot or instance stop/start, all data is lost.
Please refer to this post.
It is a common misconception that a reboot/restart will wipe the ephemeral storage - this isn't true.
You can try it yourself and see.
What will is a stop/start - that actually deprovisions your VM, and then moves it to another host machine - which will have wiped ephemeral drive(s) and then starts it up with your root ebs (at least) attached. Stop/start and reboot are often conflated - but they are very different things here.
/mnt should really just be used for ephemeral data storage that is not critical if instance needs to be restarted. This is actually well suited for things such as local on-disk cache, temporary data storage, etc. as this ephemeral storage will oftentimes perform better from an I/O standpoint than say an EBS volume mount. Just understand that you should only place non-critical data there.