Disk space issue in AWS EMR Cluster

Disk space issue in AWS EMR Cluster - linux

We have provisioned 11 nodes(1 master + 10 cores) EMR cluster in AWS. We have chosen disk space for each node as 100 GB.
When the cluster is provisioned, the EMR automatically allocated only 10GB to root partition(/dev/xvda1). After some days root partition disk space becomes full, due to this we couldn't run any job or install basic softwares like git using yum command.
[hadoop#<<ip address>> ~]$ df -BG
Filesystem 1G-blocks Used Available Use% Mounted on
devtmpfs 79G 1G 79G 1% /dev
tmpfs 79G 0G 79G 0% /dev/shm
/dev/xvda1 10G 10G 0G 100% /
/dev/xvdb1 5G 1G 5G 4% /emr
/dev/xvdb2 95G 12G 84G 12% /mnt
/dev/xvdf 99G 12G 83G 12% /data
Could you please help us, how to resolve this issue?
How to increase root partition(/dev/xvda1) disk space to 30GB?
By default all installation using yum or rpm goes to root partition(/dev/xvda1). How to by-pass softwares installing to root partition(/dev/xvda1)?
Whatever the solution, it should not disturb the existing EMR installation.
Help would be much appreciated.

Recently ran into same issue. Find the corresponding ec2 instance and in description tab find and click on the link root device. It points to a EBS Id, click on it. In the actions click on modify volume. After requesting required total space. you might have to aditionally run commands such as "growpart" to let the os adjust to the new size.

All EMR AMI's come with fixed root volume of 10GB and so will be all ec2 instances of your EMR cluster. All applications that you select on EMR will be installed on this root volume and are expected to take about 90% of this disk. At this moment, neither increasing this volume size nor application installation behavior can be altered. So, you should refrain from using this root volume to install application and rather install your custom apps on bigger volumes like /mnt/. You can also symlink some root directories to bigger volumes and then install your apps.

Seems like /var/aws/emr/packages takes most of the space (30%). Idk if this folder can be rm -rf /var/aws/emr/packages'd or should be symlinked to /mnt, but removing it seems to have worked for me.

EBS root volume size can also be increased while at the time of launching the EMR cluster. Default is 10GB
Once the EMR is up and running, then also we can increase the root volume. Refer to this AWS blog -> https://aws.amazon.com/premiumsupport/knowledge-center/ebs-volume-size-increase/

Related

Reduce AWS EBS volume size

I want to reduce the size of the EBS volume from 250GB to 100GB. I know it can't be done directly from the console. That's why I have tried few links like Decrease the size of EBS volume in your EC2 instance and Amazon EBS volumes: How to Shrink ’em Down to Size which haven't helped me. May be this will work for plain data but in my case I have to do it on /opt which have installations and configuration.
Please let me know if it is possible to do, and how.

mount new volume to /opt2, copy all the files from /opt with rsync or something preserve the links etc. update your /etc/fstab and reboot.
if all good, umount the old volume from the ec2.

Hi Laurel and Jayesh Basically you guys have to follow following instructions:
First, Shut down the instance (MyInstance) to prevent any problem.
Create a new 6GIB EBS volume.
Mount the new volume (myVolume)
Copy data from the old volume to the new volume (myVolume)
Use rysnc to copy from old volume to the new volume (myVolume) sudo rsync -axv / /mnt/myVolume/.
Wait until it’s finished. ✋
Install GRUB
Install grub on myVolume using command
Log out from the instance and shut it down.
Detach old volume and attach the new volume (myVolume) to /dev/xvda
Start instance, you see an instance is now running with 6GIB EBS volume size.
Reference: https://www.svastikkka.com/2021/04/create-custom-ami-with-default-6gib.html

Kubernetes: in-memory volume shared between pods

I need a shared volume accessible from multiple pods for caching files in RAM on each node.
The problem is that the emptyDir volume provisioner (which supports Memory as its medium) is available in Volume spec but not in PersistentVolume spec.
Is there any way to achieve this, except by creating a tmpfs volume manually on each host and mounting it via local or hostPath provisioner in the PV spec?
Note that Docker itself supports such volumes:
docker volume create --driver local --opt type=tmpfs --opt device=tmpfs \
--opt o=size=100m,uid=1000 foo
I don't see any reason why k8s doesn't. Or maybe it does, but it's not obvious?
I tried playing with local and hostPath PVs with mountOptions but it didn't work.

EmtpyDir tied to lifetime of a pod, so it can't be used via shared with multiple pods.
What you request, is additional feature and if you look at below github discussions, you will see that you are not the first that asking for this feature.
consider a tmpfs storage class
Also according your mention that docker supports this tmpfs volume, yes it supports, but you can't share this volume between containers. From Documentation
Limitations of tmpfs mounts:
Unlike volumes and bind mounts, you can’t
share tmpfs mounts between containers.

Disk out of space on Azure Web app on Linux

I having trouble building and deploying new Docker containers on Azure Web App on Linux.
Error logs is claiming to be out off space, and when looking at disk usage through Kudu I can see that I'm indeed out of space.
/>df -H gives:
Filesystem Size Used Avail Use% Mounted on
none 29G 28G 0 100% /
/dev/sda1 29G 28G 0 100% /etc/hosts
Have deployed several docker containers in web apps before and removed them aswell but it seems as they are still taking up space.
Creating a new App Service plan without anything deployed gives about 5.7G of free space.
Can't seem to run docker commands from the Kudu terminal so I'm not able to check how many images and can't figure out how to clean up space. Also sodu isn't available.
Does anyone have any ideas about how to free up some space?

Your disk was indeed full of Docker images. I have cleared them off; you should be unblocked.
This is a known issue that we will have a fix for soon. Iterating and deploying new containers is a common scenario, and the goal is that this should be completely abstracted away and you should not have to worry about this.

I believe my coworker and I ran into this issue when pulling images from a repository on Azure. The images would not be cleared after running docker-compose pull, but not appear to be present on the primary node.
We would see the following upon sshing onto that node:
> ssh username#server.eastus.cloudapp.azure.com -A -p 2200
> df -h
Filesystem Size Used Avail Use% Mounted on
# ...
/dev/sda1 29G 2.0G 26G 8% /
We would still enounter space issues. After some debugging, we found these results differed when attached to a container itself:
> docker-compose exec container_name /bin/bash
> df -h
Filesystem Size Used Avail Use% Mounted on
# ...
/dev/sda1 29G 29G 0G 100% /etc/hosts
The following snippet worked to clear all images not in use without issue:
docker rmi $(docker images --filter "dangling=true" -q --no-trunc)
Note that --no-trunc is required, without it docker complains that the images don't actually exist.

AWS EC2: How to remount previous EBS volume using pivot_root?

I launched an EC2 Spot Instance and unchecked the "Delete On Termination" option for the EBS root volume. I chose the Ubuntu 14.04 64-bit HVM AMI.
At some point the instance got terminated due to max price and the EBS volume stayed behind as intended. Now eventually when the Spot Instance is relaunched it creates a brand-new EBS root volume. The old EBS root volume is still sitting out there.
Actually I simulated the above events for testing purposes by manually terminating the Spot Instance and launching a new one, but I assume the result would be the same in real usage.
So now, how can I get the old EBS volume re-mounted as the current root volume?
I tried the example from http://linux.die.net/man/8/pivot_root, with a few modifications to get around obvious errors:
# manually attach old EBS to /dev/sdf in the AWS console, then do:
sudo su -
mkdir /new-root
mkdir /new-root/old-root
mount /dev/xvdf1 /new-root
cd /new-root
pivot_root . old-root
exec chroot . sh <dev/console >dev/console 2>&1
umount /old-root
The terminal hangs at the exec chroot command, and the instance won't accept new ssh connections.
I'd really like to get this working, as it provides a convenient mechanism to save money off the On Demand prices for development, test, and batch-oriented EC2 instances without having to re-architect the whole application deployment, and without the commitment of a Reserved Instance.
What am I missing?

The answer is to place the pivot_root call inside of /sbin/init on the initial (ephemeral) EBS root volume.
Here are some scripts that automate the process of launching a new Spot Instance and modifying the /sbin/init on the 1st (ephemeral) EBS volume to chain-load the system from a 2nd (persistent) EBS volume:
https://github.com/atramos/ec2-spotter

Spark HDFS size on AWS?

I ran the spark-ec2 script with --ebs-vol-size=1000 (and the 1000GB volumes are attached) but when I run hadoop dfsadmin -report shows only:
Configured Capacity: 396251299840 (369.04 GB)
per node. How do I increase the space or tell HDFS to use the full capacity?

Run lsblk and see where the volume is mounted. It is probably vol0. In your hdfs-site.xml , add /vol0 to dfs.data.dir value after comma to the existing default. Copy this to all slaves and restart cluster. You should see full capacity now

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Disk space issue in AWS EMR Cluster - linux

Seems like /var/aws/emr/packages takes most of the space (30%). Idk if this folder can be rm -rf /var/aws/emr/packages'd or should be symlinked to /mnt, but removing it seems to have worked for me.

EBS root volume size can also be increased while at the time of launching the EMR cluster. Default is 10GB Once the EMR is up and running, then also we can increase the root volume. Refer to this AWS blog -> https://aws.amazon.com/premiumsupport/knowledge-center/ebs-volume-size-increase/

Related

Reduce AWS EBS volume size

Kubernetes: in-memory volume shared between pods

Disk out of space on Azure Web app on Linux

AWS EC2: How to remount previous EBS volume using pivot_root?

Spark HDFS size on AWS?

Categories

Resources