how to use cgroup v2 memory controller - linux

i want to use cgroup v2 to control memory
firstly:
`
cd /sys/fs/cgroup
mkdir test
cat cgroup.controllers
cpuset cpu io memory pids
echo 1M > memory.high
then open a new terminal and
stress -m 1 --vm-bytes 200M --vm-keep #here i get pid
then
echo pid > cgroup.procs
i use cat cgroup.procs find pid exists in the file, and cat memoey.hgh 104857600,but memory.current is 0
i think these will make sense,but actually not,is some step wrong? and what shoud i do?

Related

Is there a way to read the memory counter used by cgroups to kill processes?

I am running a process under a cgroup with an OOM Killer. When it performs a kill, dmesg outputs messages such as the following.
[9515117.055227] Call Trace:
[9515117.058018] [<ffffffffbb325154>] dump_stack+0x63/0x8f
[9515117.063506] [<ffffffffbb1b2e24>] dump_header+0x65/0x1d4
[9515117.069113] [<ffffffffbb5c8727>] ? _raw_spin_unlock_irqrestore+0x17/0x20
[9515117.076193] [<ffffffffbb14af9d>] oom_kill_process+0x28d/0x430
[9515117.082366] [<ffffffffbb1ae03b>] ? mem_cgroup_iter+0x1db/0x3c0
[9515117.088578] [<ffffffffbb1b0504>] mem_cgroup_out_of_memory+0x284/0x2d0
[9515117.095395] [<ffffffffbb1b0f95>] mem_cgroup_oom_synchronize+0x305/0x320
[9515117.102383] [<ffffffffbb1abf50>] ? memory_high_write+0xc0/0xc0
[9515117.108591] [<ffffffffbb14b678>] pagefault_out_of_memory+0x38/0xa0
[9515117.115168] [<ffffffffbb0477b7>] mm_fault_error+0x77/0x150
[9515117.121027] [<ffffffffbb047ff4>] __do_page_fault+0x414/0x420
[9515117.127058] [<ffffffffbb048022>] do_page_fault+0x22/0x30
[9515117.132823] [<ffffffffbb5ca8b8>] page_fault+0x28/0x30
[9515117.330756] Memory cgroup out of memory: Kill process 13030 (java) score 1631 or sacrifice child
[9515117.340375] Killed process 13030 (java) total-vm:18259139756kB, anon-rss:2243072kB, file-rss:30004132kB
I would like to be able to tell how much memory the cgroups OOM Killer believes the process is using at any given time.
Is there a way to query for this quantity?
I found the following in the official documentation for cgroup-v1, which shows how to query current memory usage, as well as altering limits:
a. Enable CONFIG_CGROUPS
b. Enable CONFIG_MEMCG
c. Enable CONFIG_MEMCG_SWAP (to use swap extension)
d. Enable CONFIG_MEMCG_KMEM (to use kmem extension)
3.1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
# mount -t tmpfs none /sys/fs/cgroup
# mkdir /sys/fs/cgroup/memory
# mount -t cgroup none /sys/fs/cgroup/memory -o memory
3.2. Make the new group and move bash into it
# mkdir /sys/fs/cgroup/memory/0
# echo $$ > /sys/fs/cgroup/memory/0/tasks
Since now we're in the 0 cgroup, we can alter the memory limit:
# echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes
NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.)
NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited).
NOTE: We cannot set limits on the root cgroup any more.
# cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes
4194304
We can check the usage:
# cat /sys/fs/cgroup/memory/0/memory.usage_in_bytes
1216512

QEMU: /bin/sh: can't access tty; job control turned off

As a development environment for linux kernel, I'm using qemu with setting up initramfs as similar to what is shown here, with few additional executable. Basically, it uses busybox for creating minimal environment and package it up using cpio. Content of init is shown below.
$ cat init
mount -t proc none /proc
mount -t sysfs none /sys
echo -e "\nBoot took $(cut -d' ' -f1 /proc/uptime) seconds\n"
exec /bin/sh
Using following command to start VM:
qemu-system-x86_64 -kernel bzImage -initrd initramfs -append "console=ttyS0" -nographic
It throws following error:
/bin/sh: can't access tty; job control turned off
Although, system functions normal in most cases. But, I'm not able to create background process:
$ prog &
/bin/sh: can't open '/dev/null'
$ fg
/bin/sh: fg: job (null) not created under job control
Root of all problems seem to be not having access to tty. How can I fix this?
EDIT: Apart from Accepted answer, as a get around cttyhack of busybox can be used.
$cat init
#!/bin/sh
mount -t proc none /proc
mount -t sysfs none /sys
mknod -m 666 /dev/ttyS0 c 4 64
echo -e "\nBoot took $(cut -d' ' -f1 /proc/uptime) seconds\n"
setsid cttyhack sh
exec /bin/sh
From Linux From Scratch Chapter 6.8. Populating /dev
6.8.1. Creating Initial Device Nodes
When the kernel boots the system, it requires the presence of a few device nodes, in particular the console and null devices. Create these by running the following commands:
mknod -m 600 /dev/console c 5 1
mknod -m 666 /dev/null c 1 3
You should then continue with the steps in "6.8.2. Mounting tmpfs and Populating /dev". Note the <-- below, and I suggest you read the entire free LFS.
mount -n -t tmpfs none /dev
mknod -m 622 /dev/console c 5 1
mknod -m 666 /dev/null c 1 3
mknod -m 666 /dev/zero c 1 5
mknod -m 666 /dev/ptmx c 5 2
mknod -m 666 /dev/tty c 5 0 # <--
mknod -m 444 /dev/random c 1 8
mknod -m 444 /dev/urandom c 1 9
chown root:tty /dev/{console,ptmx,tty}

How to limit CPU and RAM resources for mongodump?

I have a mongod server running. Each day, I am executing the mongodump in order to have a backup. The problem is that the mongodump will take a lot of resources and it will slow down the server (that by the way already runs some other heavy tasks).
My goal is to somehow limit the mongodump which is called in a shell script.
Thanks.
You should use cgroups. Mount points and details are different on distros and a kernels. I.e. Debian 7.0 with stock kernel doesn't mount cgroupfs by default and have memory subsystem disabled (folks advise to reboot with cgroup_enabled=memory) while openSUSE 13.1 shipped with all that out of box (due to systemd mostly).
So first of all, create mount points and mount cgroupfs if not yet done by your distro:
mkdir /sys/fs/cgroup/cpu
mount -t cgroup -o cpuacct,cpu cgroup /sys/fs/cgroup/cpu
mkdir /sys/fs/cgroup/memory
mount -t cgroup -o memory cgroup /sys/fs/cgroup/memory
Create a cgroup:
mkdir /sys/fs/cgroup/cpu/shell
mkdir /sys/fs/cgroup/memory/shell
Set up a cgroup. I decided to alter cpu shares. Default value for it is 1024, so setting it to 128 will limit cgroup to 11% of all CPU resources, if there are competitors. If there are still free cpu resources they would be given to mongodump. You may also use cpuset to limit numver of cores available to it.
echo 128 > /sys/fs/cgroup/cpu/shell/cpu.shares
echo 50331648 > /sys/fs/cgroup/memory/shell/memory.limit_in_bytes
Now add PIDs to the cgroup it will also affect all their children.
echo 13065 > /sys/fs/cgroup/cpu/shell/tasks
echo 13065 > /sys/fs/cgroup/memory/shell/tasks
I run couple of tests. Python that tries to allocate bunch of mem was Killed by OOM:
myaut#zenbook:~$ python -c 'l = range(3000000)'
Killed
I've also run four infinite loops and fifth in cgroup. As expected, loop that was run in cgroup got only about 45% of CPU time, while the rest of them got 355% (I have 4 cores).
All that changes do not survive reboot!
You may add this code to a script that runs mongodump, or use some permanent solution.

Executing scripts that read from most recent USB udev plugged in device

As of right now, I've got my udev setup to execute scripts when a USB flash drive is plugged in or removed, but I'm stuck trying to figure out if there is some way that I can execute a script to read a file from the most recent USB device plugged in.
I am using usbmount to automatically mount all of my flash drives, and they are mounted according to this scheme:
/dev/sdb1 15G 8.0K 15G 1% /media/usb0
/dev/sdc1 15G 8.0K 15G 1% /media/usb1
/dev/sdd1 15G 8.0K 15G 1% /media/usb2
/dev/sde1 15G 8.0K 15G 1% /media/usb3
So for example, when I plug in USB flash drive #5, it gets automounted to /media/usb4, then I would like to say execute 'cat /media/usb4/data.txt > /tmp/output.txt' and only that drive that was just plugged in. Ideally I would like this to work no matter the number assigned to /media/usbx, so that if I replug in device 2, it would execute the script just for that device and not the rest.
Any ideas of how this can be done through bash scripting preferably but open to other ideas.
Thank you for your time.
------------- EDIT
I figured out a way although it's definitely not the prettiest or maybe even the most reliable:
$ sudo tail -n2 /var/log/syslog
Oct 4 14:40:58 development usbmount[32250]: executing command: mount -tvfat -osync,noexec,nodev,noatime,nodiratime /dev/sda1 /media/usb0
Oct 4 14:40:58 development usbmount[32250]: executing command: run-parts /etc/usbmount/mount.d
$
OK, so now to cut that down to just the media mount point,
$ sudo tail -n2 /var/log/syslog |grep media | awk '{print $12}'
/media/usb0
$
With this assuming no other errors or anything filling the last two spots on the syslog, I can execute scripts using something like:
#!/bin/bash
device=`sudo tail -n2 /var/log/syslog |grep media | awk '{print $12}'`
cat $device/data.txt > /tmp/output.txt
The run-parts bit is a tip-off...you can create a file /etc/usbmount/mount.d/50_copydata
Something like this:
#!/bin/bash
set -u
[[ -f "${UM_MOUNTPOINT}/data.txt" ]] && cat "${UM_MOUNTPOINT}/data.txt" > /tmp/output.txt
usbmount will set $UM_MOUNTPOINT to i.e. /media/usb0. I use set -u to make sure it only executes if $UM_MOUNTPOINT is set.
I assume you are going to filter the data - if you are only going to cat the file you might as well use cp.
Remember to make the file executable:
chmod +x /etc/usbmount/mount.d/50_copydata
Unplug and re-plug your device(s) to test. Hope this helps!

cgroup blkio files cannot be written

I'm trying to control I/O bandwidth by using cgroup blkio controller.
Cgroup has been setup and mounted successfully, i.e. calling grep cgroup /proc/mounts
returns:
....
cgroup /sys/fs/cgroup/blkio cgroup rw,relatime,blkio 0 0
...
I then make a new folder in the blkio folder and write to the file blkio.throttle.read_bps_device, as follows:
1. mkdir user1; cd user1
2. echo "8:5 10485760" > blkio.throtlle.read_bps_device
----> echo: write error: Invalid argument
My device major:minor number is correct from using df -h and ls -l /dev/sda5 for the storage device.
And I can still write to file that requires no device major:minor number, such as blkio.weight (but the same error is thrown for blkio.weigth_device)
Any idea why I got that error?
Not sure which flavour/version of Linux you are using, on RHEL 6.x kernels, this was did not work for some reason, however it worked when I compiled on a custom kernel on RHEL and on other Fedora versions without any issues.
To check if supported on your kernel, run lssubsys -am | grep blkio. Check the path if you can file the file blkio.throttle.read_bps_device
However, here is an example of how you can do it persistently, set a cgroups to limit the program not to exceed more than 1 Mibi/s:
Get the MARJOR:MINOR device number from /proc/partitions
`cat /proc/partitions | grep vda`
major minor #blocks name
252 0 12582912 vda --> this is the primary disk (with MAJOR:MINOR -> 8:0)
Now if you want to limit your program to 1mib/s (convert the value to bytes/s) as follows. => 1MiB/s => 1024 kiB/1MiB * 1024 B/s = 1048576 Bytes/sec
Edit /etc/cgconfig.conf and add the following entry
group ioload {
blkio.throttle.read_bps_device = "252:0 1048576"
}
}
Edit /etc/cgrules.conf
*: blkio ioload
Restart the required services
`chkconfig {cgred,cgconfig} on;`
`service {cgred,cgconfig} restart`
Refer: blkio-controller.txt
hope this helps!

Resources