cgroup blkio files cannot be written - io
I'm trying to control I/O bandwidth by using cgroup blkio controller.
Cgroup has been setup and mounted successfully, i.e. calling grep cgroup /proc/mounts
returns:
....
cgroup /sys/fs/cgroup/blkio cgroup rw,relatime,blkio 0 0
...
I then make a new folder in the blkio folder and write to the file blkio.throttle.read_bps_device, as follows:
1. mkdir user1; cd user1
2. echo "8:5 10485760" > blkio.throtlle.read_bps_device
----> echo: write error: Invalid argument
My device major:minor number is correct from using df -h and ls -l /dev/sda5 for the storage device.
And I can still write to file that requires no device major:minor number, such as blkio.weight (but the same error is thrown for blkio.weigth_device)
Any idea why I got that error?
Not sure which flavour/version of Linux you are using, on RHEL 6.x kernels, this was did not work for some reason, however it worked when I compiled on a custom kernel on RHEL and on other Fedora versions without any issues.
To check if supported on your kernel, run lssubsys -am | grep blkio. Check the path if you can file the file blkio.throttle.read_bps_device
However, here is an example of how you can do it persistently, set a cgroups to limit the program not to exceed more than 1 Mibi/s:
Get the MARJOR:MINOR device number from /proc/partitions
`cat /proc/partitions | grep vda`
major minor #blocks name
252 0 12582912 vda --> this is the primary disk (with MAJOR:MINOR -> 8:0)
Now if you want to limit your program to 1mib/s (convert the value to bytes/s) as follows. => 1MiB/s => 1024 kiB/1MiB * 1024 B/s = 1048576 Bytes/sec
Edit /etc/cgconfig.conf and add the following entry
group ioload {
blkio.throttle.read_bps_device = "252:0 1048576"
}
}
Edit /etc/cgrules.conf
*: blkio ioload
Restart the required services
`chkconfig {cgred,cgconfig} on;`
`service {cgred,cgconfig} restart`
Refer: blkio-controller.txt
hope this helps!
Related
How to change the cpu scaling_governor value via udev at the booting time
everyone. I’m tring to use udev to set my cpu's scaling_governor value from powersave to performance. And here is my udev rule file: [root#node1 ~]$ cat /etc/udev/rules.d/50-scaling-governor.rules SUBSYSTEM=="cpu", KERNEL=="cpu[0-9]|cpu[0-9][0-9]", ACTION=="add", ATTR{cpufreq/scaling_governor}="performance" Before testing my 50-scaling-governor.rules, let's see what the value of scaling_governor is first. [root#node1 ~]$ cat /sys/devices/system/cpu/cpu16/cpufreq/scaling_governor powersave And then I use udevadm command to execute my 50-scaling-governor.rule [root#node1 ~]$ udevadm test --action="add" /devices/system/cpu/cpu16 calling: test version 219 This program is for debugging only, it does not run any program specified by a RUN key. It may show incorrect results, because some values may be different, or not available at a simulation run. === trie on-disk === tool version: 219 file size: 8873994 bytes header size 80 bytes strings 2300642 bytes nodes 6573272 bytes Load module index Created link configuration context. timestamp of '/etc/udev/rules.d' changed # omit some unrelevant messages ... Reading rules file: /etc/udev/rules.d/50-scaling-governor.rules ... rules contain 49152 bytes tokens (4096 * 12 bytes), 21456 bytes strings 3908 strings (46431 bytes), 2777 de-duplicated (26107 bytes), 1132 trie nodes used no db file to read /run/udev/data/+cpu:cpu16: No such file or directory ATTR '/sys/devices/system/cpu/cpu16/cpufreq/scaling_governor' writing 'performance' /etc/udev/rules.d/50-scaling-governor.rules:1 IMPORT builtin 'hwdb' /usr/lib/udev/rules.d/50-udev-default.rules:11 IMPORT builtin 'hwdb' returned non-zero RUN 'kmod load $env{MODALIAS}' /usr/lib/udev/rules.d/80-drivers.rules:5 RUN '/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; /usr/bin/systemd-run --no-block /usr/lib/udev/kdump-udev-throttler'' /usr/lib/udev /rules.d/98-kexec.rules:14 ACTION=add DEVPATH=/devices/system/cpu/cpu16 DRIVER=processor MODALIAS=cpu:type:x86,ven0000fam0006mod004F:feature:,0000,0001,0002,0003,0004,0005,0006,0007,0008,0009,000B,000C,000D,000E,000F,0010,0011,0013,0015,0016, 0017,0018,0019,001A,001B,001C,001D,001F,002B,0034,003A,003B,003D,0068,006B,006C,006D,006F,0070,0072,0074,0075,0076,0078,0079,007C,0080,0081,0082,0083,008 4,0085,0086,0087,0088,0089,008B,008C,008D,008E,008F,0091,0092,0093,0094,0095,0096,0097,0098,0099,009A,009B,009C,009D,009E,00C0,00C5,00C8,00E1,00E3,00E4,0 0E6,00E7,00EB,00EC,00F0,00F1,00F3,00F5,00F6,00F9,00FA,00FB,00FD,0100,0101,0102,0103,0104,0111,0120,0121,0123,0124,0125,0127,0128,0129,012A,012B,012C,012D ,012F,0132,0133,0134,0139,0140,0160,0161,0162,0163,0165,01C0,01C1,01C2,01C4,01C5,01C6,024A,025A,025B,025C,025F SUBSYSTEM=cpu USEC_INITIALIZED=184210753630 run: 'kmod load cpu:type:x86,ven0000fam0006mod004F:feature:,0000,0001,0002,0003,0004,0005,0006,0007,0008,0009,000B,000C,000D,000E,000F,0010,0011,0013,001 5,0016,0017,0018,0019,001A,001B,001C,001D,001F,002B,0034,003A,003B,003D,0068,006B,006C,006D,006F,0070,0072,0074,0075,0076,0078,0079,007C,0080,0081,0082,0 083,0084,0085,0086,0087,0088,0089,008B,008C,008D,008E,008F,0091,0092,0093,0094,0095,0096,0097,0098,0099,009A,009B,009C,009D,009E,00C0,00C5,00C8,00E1,00E3 ,00E4,00E6,00E7,00EB,00EC,00F0,00F1,00F3,00F5,00F6,00F9,00FA,00FB,00FD,0100,0101,0102,0103,0104,0111,0120,0121,0123,0124,0125,0127,0128,0129,012A,012B,01 2C,012D,012F,0132,0133,0134,0139,0140,0160,0161,0162,0163,0165,01C0,01C1,01C2,01C4,01C5,01C6,024A,025A,025B,025C,025F' run: '/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; /usr/bin/systemd-run --no-block /usr/lib/udev/kdump-udev-throttler'' Unload module index Unloaded link configuration context. And now, the value of cpu16/scaling_governor has changed, so it's nothing wrong with my udev rule. [root#node1 ~]$ cat /sys/devices/system/cpu/cpu16/cpufreq/scaling_governor performance But after rebooting my server, I find that the scaling_governor value of cpu16 is still powersave. I have no idea why my udev rule can work properly by udevadm while failing by rebooting. Some environment information about my machine is as follows: OS: CentOS Linux release 7.9.2009 (Core) kernel version: 5.4.154-1.el7.elrepo.x86_64 udev version: 219 Can anyone give me some hint or advice? Thanks in advance
Is there a way to read the memory counter used by cgroups to kill processes?
I am running a process under a cgroup with an OOM Killer. When it performs a kill, dmesg outputs messages such as the following. [9515117.055227] Call Trace: [9515117.058018] [<ffffffffbb325154>] dump_stack+0x63/0x8f [9515117.063506] [<ffffffffbb1b2e24>] dump_header+0x65/0x1d4 [9515117.069113] [<ffffffffbb5c8727>] ? _raw_spin_unlock_irqrestore+0x17/0x20 [9515117.076193] [<ffffffffbb14af9d>] oom_kill_process+0x28d/0x430 [9515117.082366] [<ffffffffbb1ae03b>] ? mem_cgroup_iter+0x1db/0x3c0 [9515117.088578] [<ffffffffbb1b0504>] mem_cgroup_out_of_memory+0x284/0x2d0 [9515117.095395] [<ffffffffbb1b0f95>] mem_cgroup_oom_synchronize+0x305/0x320 [9515117.102383] [<ffffffffbb1abf50>] ? memory_high_write+0xc0/0xc0 [9515117.108591] [<ffffffffbb14b678>] pagefault_out_of_memory+0x38/0xa0 [9515117.115168] [<ffffffffbb0477b7>] mm_fault_error+0x77/0x150 [9515117.121027] [<ffffffffbb047ff4>] __do_page_fault+0x414/0x420 [9515117.127058] [<ffffffffbb048022>] do_page_fault+0x22/0x30 [9515117.132823] [<ffffffffbb5ca8b8>] page_fault+0x28/0x30 [9515117.330756] Memory cgroup out of memory: Kill process 13030 (java) score 1631 or sacrifice child [9515117.340375] Killed process 13030 (java) total-vm:18259139756kB, anon-rss:2243072kB, file-rss:30004132kB I would like to be able to tell how much memory the cgroups OOM Killer believes the process is using at any given time. Is there a way to query for this quantity?
I found the following in the official documentation for cgroup-v1, which shows how to query current memory usage, as well as altering limits: a. Enable CONFIG_CGROUPS b. Enable CONFIG_MEMCG c. Enable CONFIG_MEMCG_SWAP (to use swap extension) d. Enable CONFIG_MEMCG_KMEM (to use kmem extension) 3.1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) # mount -t tmpfs none /sys/fs/cgroup # mkdir /sys/fs/cgroup/memory # mount -t cgroup none /sys/fs/cgroup/memory -o memory 3.2. Make the new group and move bash into it # mkdir /sys/fs/cgroup/memory/0 # echo $$ > /sys/fs/cgroup/memory/0/tasks Since now we're in the 0 cgroup, we can alter the memory limit: # echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo, mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited). NOTE: We cannot set limits on the root cgroup any more. # cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes 4194304 We can check the usage: # cat /sys/fs/cgroup/memory/0/memory.usage_in_bytes 1216512
Kernel Panic with ramfs on embedded device: No filesystem could mount root
I'm working on an embedded ARM device running Linux (kernel 3.10), with NAND memory for storage. I'm trying to build a minimal linux which will reside on its own partition and carry out updates of the main firmware. The kernel uses a very minimal root fs which is stored in a ramfs. However, I can't get it to boot. I get the following error: [ 0.794113] List of all partitions: [ 0.797600] 1f00 128 mtdblock0 (driver?) [ 0.802669] 1f01 1280 mtdblock1 (driver?) [ 0.807697] 1f02 1280 mtdblock2 (driver?) [ 0.812735] 1f03 8192 mtdblock3 (driver?) [ 0.817761] 1f04 8192 mtdblock4 (driver?) [ 0.822794] 1f05 8192 mtdblock5 (driver?) [ 0.827820] 1f06 82944 mtdblock6 (driver?) [ 0.832850] 1f07 82944 mtdblock7 (driver?) [ 0.837876] 1f08 12288 mtdblock8 (driver?) [ 0.842906] 1f09 49152 mtdblock9 (driver?) [ 0.847928] No filesystem could mount root, tried: squashfs [ 0.853569] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(1,0) [ 0.861806] CPU: 0 PID: 1 Comm: swapper Not tainted 3.10.73 #11 [ 0.867732] [<800133ec>] (unwind_backtrace+0x0/0x12c) from [<80011a50>] (show_stack+0x10/0x14) (...etc) The root fs is built by the build process, using the following (simplified for clarity): # [Copy some things to $(ROOTFS_OUT_DIR)/mini_rootfs] cd $(ROOTFS_OUT_DIR)/mini_rootfs && find . | cpio --quiet -o -H newc > $(ROOTFS_OUT_DIR)/backup.cpio gzip -f -9 $(ROOTFS_OUT_DIR)/backup.cpio This creates $(ROOTFS_OUT_DIR)/backup.cpio.gz The kernel is then built like this: #$(MAKE) -C $(LINUX_SRC_DIR) O=$(LINUX_OUT_DIR) \ CONFIG_INITRAMFS_SOURCE="$(ROOTFS_OUT_DIR)/backup.cpio.gz" \ CONFIG_INITRAMFS_ROOT_UID=0 CONFIG_INITRAMFS_ROOT_GID=0 I think this means it uses the same config as the main firmware (built elsewhere), but supplies the minimal ramfs image using CONFIG_INITRAMFS_SOURCE. From Kernel.Org, the ramfs is always built anyway, and CONFIG_INITRAMFS_SOURCE is all that is needed to specify a pre-made root fs to use. There are no build errors to indicate that there is a problem creating the ramfs, and the size of the resulting kernel looks about right. backup.cpio.gz is about 3.6 MB; the final zImage is 6.1 MB; the image is written to a partition which is 8 MB in size. To use this image, I set some flags used by the (custom) boot loader which tell it to boot from the minimal partition, and also set a different command line for the kernel. Here is the command line used to boot: console=ttyS0 rootfs=ramfs root=/dev/ram rw rdinit=/linuxrc mem=220M Note that the nimimal root fs contains "/linuxrc", which is actually a link to /bin/busybox: lrwxrwxrwx 1 root root 11 Nov 5 2015 linuxrc -> bin/busybox Why doesn't this boot? Why is it trying "squashfs" filesystem, and is this wrong?
SOLVED! It turned out that a file name used by the (custom) build system had changed as part of an update, and so it was not putting the correct kernel image into the firmware package. I was actually trying to boot the wrong kernel with the "rootfs=ramfs" parameter, one which didn't have a ramfs. So, for future reference, this error occurs if you specify "rootfs=ramfs" but your kernel wasn't built with any rootfs built in (CONFIG_INITRAMFS_SOURCE=... NOT specified)
mount: you must specify the filesystem type
I was trying to execute qemu while following the qemu/linaro tutorial, https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Virtual_ARM_Linux_environment I was executing the command, sudo mount -o loop,offset=106496 -t auto vexpress.img /mnt/tmp mount: you must specify the filesystem type so i did fdisk on the img file and got the following, Device Boot Start End Blocks Id System vexpress.img1 * 63 106494 53216 e W95 FAT16 (LBA) vexpress.img2 106496 6291455 3092480 83 Linux The filesystem is Linux according to the fdisk command. But I get error, sudo mount -o loop,offset=106496 -t Linux vexpress.img /mnt/tmp mount: unknown filesystem type 'Linux' Kindly help.
You correctly decide to mount the particular partition by specifying its offset but the offset parameter is in bytes and fdisk shows the offset in blocks (the block size is shown before the partition list --- usually 512). For the block size 512 the command would be: sudo mount -o loop,offset=$((106496*512)) -t auto vexpress.img /mnt/tmp If the automatic file system type detection does not still work there is another problem. Linux is not really a file system type. In the partition table it is a collective type used for multiple possible particular file systems. For mount you must specify the particular file system. In Linux you can list the supported ones by cat /proc/filesystems.
How to limit CPU and RAM resources for mongodump?
I have a mongod server running. Each day, I am executing the mongodump in order to have a backup. The problem is that the mongodump will take a lot of resources and it will slow down the server (that by the way already runs some other heavy tasks). My goal is to somehow limit the mongodump which is called in a shell script. Thanks.
You should use cgroups. Mount points and details are different on distros and a kernels. I.e. Debian 7.0 with stock kernel doesn't mount cgroupfs by default and have memory subsystem disabled (folks advise to reboot with cgroup_enabled=memory) while openSUSE 13.1 shipped with all that out of box (due to systemd mostly). So first of all, create mount points and mount cgroupfs if not yet done by your distro: mkdir /sys/fs/cgroup/cpu mount -t cgroup -o cpuacct,cpu cgroup /sys/fs/cgroup/cpu mkdir /sys/fs/cgroup/memory mount -t cgroup -o memory cgroup /sys/fs/cgroup/memory Create a cgroup: mkdir /sys/fs/cgroup/cpu/shell mkdir /sys/fs/cgroup/memory/shell Set up a cgroup. I decided to alter cpu shares. Default value for it is 1024, so setting it to 128 will limit cgroup to 11% of all CPU resources, if there are competitors. If there are still free cpu resources they would be given to mongodump. You may also use cpuset to limit numver of cores available to it. echo 128 > /sys/fs/cgroup/cpu/shell/cpu.shares echo 50331648 > /sys/fs/cgroup/memory/shell/memory.limit_in_bytes Now add PIDs to the cgroup it will also affect all their children. echo 13065 > /sys/fs/cgroup/cpu/shell/tasks echo 13065 > /sys/fs/cgroup/memory/shell/tasks I run couple of tests. Python that tries to allocate bunch of mem was Killed by OOM: myaut#zenbook:~$ python -c 'l = range(3000000)' Killed I've also run four infinite loops and fifth in cgroup. As expected, loop that was run in cgroup got only about 45% of CPU time, while the rest of them got 355% (I have 4 cores). All that changes do not survive reboot! You may add this code to a script that runs mongodump, or use some permanent solution.