How to release hugepages from the crashed application - linux

I have an application that uses hugepage and the application suddenly crashed due to some bug.
After crashing, since the application does not release the hugepage properly, the free hugepage number is not increased in sys filesystem.
$ sudo cat /sys/kernel/mm/hugepages/hugepages-2048kB/free_hugepages
0
$ sudo cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
1024
Is there a way to release the hugepages by force?

Sometimes need to check all directory that hugetlbfs has been mounted.
So,
find mounted directory by command mount | grep huge.
check every directory except especially /dev/hugepages.
delete all 2M-sized files. (2M is the size of hugepage)

Use ipcs -m to list the shared memory segments.
Use ipcrm to remove the left over shared memory segments.
Edit on 06/24/2019:
Ok, so, the above answer, while correct as far as it goes, was a bit brief. In particular, if you have a host with multiple DB instances, and only one is crashed how can you determine which (if any) memory segments should be cleaned up?
Well, this too, can be done. For each running instance, connect w/ / as sysdba, then do oradebug setmypid (any pid will do, as all Oracle PIDs connect to the SGA). Then do oradebug ipc. That will (hopefully) return IPC information written to the trace file. So, go to the udump (or diag_dest) directory, and look for your trace file. It will contain all the IPC information for the instance. This will include ShmId. Look through the file for the ShmId(s) that this instance is using. Now look at the output of ipcs -m.
When you have done that for all the running instances, any memory segment output by ipcs -m that shows non-zero memory allocation, and that you cannot account for in the oradebug ipc information from any running instance, must be the left over memory segments from the crashed instance. Use ipcrm to remove it/them.
When doing this on a host with multiple running instances, this can be a bit fraught. Please proceed with caution. You don't want to remove the SGA of a running instance!
Hope that helps....

HugeTLB can either be used for shared memory (and Mark J. Bobak's answer would deal with that) or the app mmaps files created in a hugetlb filesystem. If the app crashes without removing those files they survive and keep corresponding memory 'allocated'.
Check hugeTLB filesystem and see if there are any leftover files from the app. Removing them would release the memory.

If you follow the instruction below, you can get rid of the allocated hugepages:
1) Let's check the hugepages which were free at restart
dpdk#dpdkvm:~$ ls /mnt/huge/
empty
dpdk#dpdkvm:~/dpdk-1.8.0/examples/kni$ cat /proc/meminfo
...
HugePages_Total: 256
HugePages_Free: 256
...
2) Starting a dpdk application with wrong parameters, producing an error
dpdk#dpdkvm:~/dpdk-1.8.0/examples/kni$ sudo ./build/kni -c 0x03 -n 2 -- -P -p 0x03 --config="(0,0,1),(1,0,1)"
...
EAL: Error - exiting with code: 1
Cause: No supported Ethernet device found
3) When I check hugepages, there is not any free
dpdk#dpdkvm:~/dpdk-1.8.0/examples/kni$ cat /proc/meminfo
...
HugePages_Total: 256
HugePages_Free: 0
...
4) Now, when I check the mounted hugepage directory, I can see the files which are not given back to OS by dpdk application.
dpdk#dpdkvm:~/dpdk-1.8.0/examples/kni$ ls /mnt/huge/
...
rtemap_0 rtemap_137 rtemap_176 rtemap_214 rtemap_253 rtemap_62
...
5) Finally, if you remove the files starting with rtemap, you can give the hugepages back
dpdk#dpdkvm:~/dpdk-1.8.0/examples/kni$ sudo rm /mnt/huge/*
[sudo] password for dpdk:
dpdk#dpdkvm:~/dpdk-1.8.0/examples/kni$ cat /proc/meminfo
...
HugePages_Total: 256
HugePages_Free: 256
...

your hugetlb may be used by shared memory or mmap files.
try to remove the shared memories or umount the hugetlb fs

Related

How to create swap partition/file on a Yocto distribution

I'm trying to create a swap partition/file on my board where a core-image-minimal has been installed.
The fdisk -l command doesn't show any partition thus I'm not able to figure out which block device I need to use to create a new partition.
Secondly, launchig a swapon command on a swapfile correctly initialized using mkswap will raise an invalid argument error saying that the file contains holes even though I created it using dd.
At this point I'm not sure if I can do something like this since the free output looks like:
total used free shared buff/cache available
Mem: 503304 32108 101108 216 370088 465180
Swap: 0 0 0
To add any partition to your image, you need to modify the wks file that is used for your build.
To get the current wks file run :
bitbake -e | grep ^WKS_FILE=
Then, look for that file in your layers sources.
In that file you can add (example 1GB swap):
part swap --ondisk mmcblk0 --size 44 --label swap --fstype=swap --size=1024M --overhead-factor 1
For a real example, you can see the raspberry-pi machine swap support commit here.
You can use a custom wks file and set it to your custom machine conf file:
WKS_FILE ?= "custom-image.wks"
For detailed info, check the Yocto reference about wks.

How do I change the filesystem of my 64GB USB, from FAT32 to anything which allows me to put a 35GB file from my x86_64 Linux machine onto the USB?

'uname -a' on my machine gives:
Linux ct-lt-966 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux
Currently the filesystem of my USB is MS-DOS 'FAT32' which has a ~4.5 GB maximum size for individual files. I want to change this filesystem to something else, which does not have a limit. (I am trying to put a 35GB file onto a 64GB USB but I believe most USB filesystems do not limit the size of individual files).
I have not found it clear what choices of USB filesystem that I have. I tried to change the filesystem to 'NTFS', but I could not install or locate 'mkfs.ntfs' or even 'ntfsprogs'. (I also tried installing with 'pacman' and 'yum' but apparently 'pacman' requires an aarch architecture and I could not get access to 'yum-config-manager' in order to enable any repos).
So to conclude, with my minimal prowess I am just looking for any way to change the filesystem of my 64GB USB to anything which will accept a 35GB file from my machine.
Thanks
Edit 1: Just planning to use the USB on this Linux machine, not Windows.
If there's nothing on the stick you want, or it's safe to delete it then basically:
delete the current FAT32 partition from the stick
add a new partition, utilising the full size of the device
create an ext4 filesystem on the new partition
PLEASE BE CAREFUL WITH THIS PROCESS: selecting the wrong device can obliterate a disk you needed such as a $HOME or your root OS
All the following is from memory and untested: I don't have a USB stick available right now to test fully.
Start by plugging in the stick while tailing the syslog in a console and see where it gets mounted (hopefully it automounts which it should if it's a desktop based Linux you're running. Possibly not if it's a server)..
sudo tail -f /var/log/syslog
(it might be /var/log/messages depending on distro)
then plug the stick. syslog should show it being allocated a device and a mount point. A file manager window may open depending on your config if you are in a GUI. For example, you might see it being loaded on /dev/sdc1 and mounted at /media/<yourusername>/USBKEY or something.
Confirm by running lsblk and note the device for the key, i.e.
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 167.7G 0 disk
├─sda1 8:1 0 69.9G 0 part /
└─sda2 8:2 0 97.9G 0 part /home
sdb 8:16 0 149.1G 0 disk
└─sdb1 8:17 0 149.1G 0 part /mnt/snapshots
sdc 8:32 0 931.5G 0 disk
└─sdc1 8:33 0 931.5G 0 part /storage
sdd 8:48 0 465.8G 0 disk
└─sdd1 8:49 0 465.8G 0 part /mnt/backup
sr0 11:0 1 1024M 0 rom
Unmount the stick (if it mounted) but leave it plugged in. Assuming again your device is at /dev/sdc1...
umount /dev/sdc1
Now run cfdisk in a terminal if you have it (friendlier) or fdisk if not, passing it the device related to your USB stick, without the partition number.
man cfdisk
sudo cfdisk /dev/sdc
This should show the current FAT32 partition. Delete it, then create a new partition of type 'Linux', following the defaults for start and end blocks which will be suggested in such a way as to fill the available space.
When done, select the option to Write the changes. Again, DOUBLE AND TRIPLE CHECK you have the right device or you will blow away your main disk probably.
Once the changes are written, you can create the ext4 file system;
sudo mkfs.ext4 /dev/sdc1
And after it completes, you should be able to re-plug your stick and find that it remounts, this time with a file system that can take your large files.
This isn't the only way to achieve this, but it's probably the least fiddly. For the sake of repetition, don't make a mistake with the device identifiers. If you're unsure, ask.

shm_unlink from the shell?

I have a program that creates a shared memory object with shm_open that I'm working on. It tries to release the object with shm_unlink, but sometimes a programming error will cause it to crash before it can make that call. In that case, I need to unlink the shared memory object "by hand," and I'd like to be able to do it in a normal shell, without writing any C code -- i.e. with normal Linux utilities.
Can it be done? This question seems to say that using unlink(1) on /dev/shm/path_passed_to_shm_open, but the manpages aren't clear.
Unlinking the file in /dev/shm will delete the shared memory object if no other process has the object mapped.
Unlike SYSV shared memory, which is implemented in the Kernel, POSIX shared memory objects are simply "files in disguise".
When you call shm_open and mmap, you can see the following in the process process map (using pmap -X):
Address Perm Offset Device Inode Mapping
b7737000 r--s 00000000 00:0d 2267945 test_object
The device major and minor number correspond to the tmpfs mounted at /dev/shm (some systems mount this at /run, and then symlink /dev/shm to /run/shm).
A listing of the folder will show the same inode number:
$ ls -li /dev/shm/
2267945 -rw------- 1 mikel mikel 1 Apr 14 13:36 test_object
Like any other inode, the space will be freed when all references are removed. If we close the only program referencing this we see:
$ cat /proc/meminfo | grep Shmem
Shmem: 700 kB
Once we remove the last reference (created in /dev/shm), the space will be freed:
$ rm /dev/shm/test_object
$ cat /proc/meminfo | grep Shmem
Shmem: 696 kB
If you're curious, you can look at the corresponding files in the glibc sources. shm_directory.c and shm_directory.h generate the filename as /dev/your_shm_name. The implementations shm_open and shm_unlink simply open and unlink this file. So it should be easy to see that rm /dev/shm/your_shm_name performs the same operation,

What is the significance of the numbers in the name of the flush processes for newer linux kernels?

I am running kernel 2.6.33.7.
Previously, I was running v2.6.18.x. On 2.6.18, the flush processes were named pdflush.
After upgrading to 2.6.33.7, the flush processes have a format of "flush-:".
For example, currently I see flush process "flush-8:32" popping up in top.
In doing a google search to try to determine an answer to this question, I saw examples of "flush-8:38", "flush-8:64" and "flush-253:0" just to name a few.
I understand what the flush process itself does, my question is what is the significance of the numbers on the end of the process name? What do they represent?
Thanks
Device numbers used to identify block devices. A kernel thread may be spawned to handle a particular device.
(On one of my systems, block devices are currently numbered as shown below. They may change from boot to boot or hotplug to hotplug.)
$ grep ^ /sys/class/block/*/dev
/sys/class/block/dm-0/dev:254:0
/sys/class/block/dm-1/dev:254:1
/sys/class/block/dm-2/dev:254:2
/sys/class/block/dm-3/dev:254:3
/sys/class/block/dm-4/dev:254:4
/sys/class/block/dm-5/dev:254:5
/sys/class/block/dm-6/dev:254:6
/sys/class/block/dm-7/dev:254:7
/sys/class/block/dm-8/dev:254:8
/sys/class/block/dm-9/dev:254:9
/sys/class/block/loop0/dev:7:0
/sys/class/block/loop1/dev:7:1
/sys/class/block/loop2/dev:7:2
/sys/class/block/loop3/dev:7:3
/sys/class/block/loop4/dev:7:4
/sys/class/block/loop5/dev:7:5
/sys/class/block/loop6/dev:7:6
/sys/class/block/loop7/dev:7:7
/sys/class/block/md0/dev:9:0
/sys/class/block/md1/dev:9:1
/sys/class/block/sda/dev:8:0
/sys/class/block/sda1/dev:8:1
/sys/class/block/sda2/dev:8:2
/sys/class/block/sdb/dev:8:16
/sys/class/block/sdb1/dev:8:17
/sys/class/block/sdb2/dev:8:18
/sys/class/block/sdc/dev:8:32
/sys/class/block/sdc1/dev:8:33
/sys/class/block/sdc2/dev:8:34
/sys/class/block/sdd/dev:8:48
/sys/class/block/sdd1/dev:8:49
/sys/class/block/sdd2/dev:8:50
/sys/class/block/sde/dev:8:64
/sys/class/block/sdf/dev:8:80
/sys/class/block/sdg/dev:8:96
/sys/class/block/sdh/dev:8:112
/sys/class/block/sdi/dev:8:128
/sys/class/block/sr0/dev:11:0
/sys/class/block/sr1/dev:11:1
/sys/class/block/sr2/dev:11:2
You should also be able to figure this out by searching for those numbers in /proc/self/mountinfo, eg:
$ grep 8:32 /proc/self/mountinfo
25 22 8:32 / /var rw,relatime - ext4 /dev/mapper/sysvg-var rw,barrier=1,data=ordered
This has the side benefit of working with nfs as well:
$ grep 0:73 /proc/self/mountinfo
108 42 0:73 /foo /mnt/foo rw,relatime - nfs host.domain.com:/volume/path rw, ...
Note, the data I included here is fabricated, but the mechanism works just fine.

du -skh * in / returns vastly different size from df on centos 5.5

I have a vps slice running centos 5.5 I am supposed to have 15 gigs of disk space, but according to df it seems to double my disk space usage.
when I run du -skh * in / as root i get:
[root#yardvps1 /]# du -skh *
0 aquota.group
0 aquota.user
5.2M bin
4.0K boot
4.0K dev
4.9M etc
2.5G home
12M lib
14M lib64
4.0K media
4.0K mnt
299M opt
0 proc
692K root
23M sbin
4.0K selinux
4.0K srv
0 sys
48K tmp
2.0G usr
121M var
this is consistent with what I have uploaded to the machine, and adds up to about 5gigs.
BUT when i run df i get:
[root#yardvps1 /]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/simfs 15728640 11659048 4069592 75% /
none 262144 4 262140 1% /dev
it is showing me using almost 12 gigs already.
what is causing this discrepancy and is there anything I can do about it, I planned the server out based on 15 gigs but now it is basically only letting me have about 7 gigs of stuff on it.
thanks.
The most common cause of this effect is open files that have been deleted.
The kernel will only free the disk blocks of a deleted file if it is not in use at the time of its deletion. Otherwise that is deferred until the file is closed, or the system is rebooted.
A common Unix-world trick to ensure that no temporary files are left around is the following:
A process creates and opens a temporary file
While still holding the open file descriptor, the process unlinks (i.e. deletes) the file
The process reads and writes to the file normally using the file descriptor
The process closes the file descriptor when it's done, and the kernel frees the space
If the process (or the system) terminates unexpectedly, the temporary file is already deleted and no clean-up is necessary.
As a bonus, deleting the file reduces the chances of naming collisions when creating temporary files and it also provides an additional layer of obscurity over the running processes - for anyone but the root user, that is.
This behaviour ensures that processes don't have to deal with files that are suddenly pulled from under their feet, and also that processes don't have to consult each other in order to delete a file. It is unexpected behaviour for those coming from Windows systems, though, since there you are not normally allowed to delete a file that is in use.
The lsof command, when run as root, will show all open files and it will specifically indicate deleted files that are deleted:
# lsof 2>/dev/null | grep deleted
bootlogd 2024 root 1w REG 9,3 58 917506 /tmp/init.0W2ARi (deleted)
bootlogd 2024 root 2w REG 9,3 58 917506 /tmp/init.0W2ARi (deleted)
Stopping and restarting the guilty processes, or just rebooting the server should solve this issue.
Deleted files could also be held open by the kernel if, for example, it's a mounted filesystem image. In this case unmounting the filesystem or rebooting the server should do the trick.
In your case, judging by the size of the "missing" space I'd look for any references to the file that you used to set up the VPS e.g. the Centos DVD image that you deleted after installing.
Another case which I've come across although it doesn't appear to be your issue is if you mount a partition "on top" of existing files.
If you do so you effectively hide existing files that exist in the directory on the mounted-to partition (the mount point) from the mounted partition.
To fix: stop any processes with open files on the mounted partition, unmount partition, find and move/remove any files that now appear in mount point directory.
I had the same trouble with FreeBSD server. The reboot helped.

Resources