Linux 3.2 /dev/shm performance variable? - linux

I'm using /dev/shm tmpfs for writing lots of temporary files.
A set of 8-10 files/second, with each set containing files which range from 70kB .. 750kB. The file sets are all approximately the same sizes and arrive to be written regularly about once per second.
The code which writes these files is python calling a library which uses fwrite() to do the writing.
When the application starts, the write times take from 30 to over 400ms. Usually it's the largest (700k) files which can take 400ms to write but then it varies.
Here is an example of a set of 8, given in ms: 42,30,320,76,66,72,102,440.
It seems that the standard deviation of writing to /dev/shm is quite large.
After the application runs for a couple of minutes, the time to write plummets and the variance is much smaller (e.g. 7,8,15,23,24,32,51,71) - this behavior is stable and I have run the application for several hours.
There are no other applications of consequence running concurrently and there is plenty of room on /dev/shm.
It seems that the Linux kernel is dynamically adjusting to the application's use of /dev/shm. My question is: is my suspicion about the linux kernel correct? If so, is there any way to configure or notify the kernel ahead of time to use the desired behavior when my application starts? (faster writes to /dev/shm)
I'm using Ubuntu 12.04 LTS
$ uname -a
Linux devsb02 3.2.0-24-generic #39-Ubuntu SMP Mon May 21 16:52:17 UTC 2012 x86_64 x86_64
x86_64 GNU/Linux
$ ls -l /dev/shm
lrwxrwxrwx 1 root root 8 Aug 23 12:19 /dev/shm -> /run/shm
$ mount
....
none on /run/shm type tmpfs (rw,nosuid,nodev)

Related

How can we change RAPL power limits in Haswell CPU?

I am trying to change the power limits defined in the RAPL registers of my system. It is a Haswell CPU.
I have tried two approaches:
Using the MSR regsiters:
I try to use the rdmsr (as root) command to read the contents of the 0x610 regsiter in which the power limits are defined. Then I am using the wrmsr command to write to it. I try to change the first bit of this register from 1 to 0 to unlock the power limits.
rdmsr -p0 0x610 returns: 8042828a001a8208
wrmsr -p0 0x610 0x0042828a001a8208 executes without any error message
Then i read the register again using: rdmsr -p0 0x610
It prints: 8042828a001a8208
As you can see, i am trying to change the first hexabit from 8 to 0. The rest is same. but it won't change the bit.
The other approach I tried to change the power limits was to edit the system powercap files. I migrate to the directory /sys/class/powercap/intel-rapl/intel-rapl:0
Here we have these two files:
-rwxr-xr-x 1 root root 4.0K Nov 21 15:45 constraint_0_power_limit_uw and
-rw-r--r-- 1 root root 4.0K Nov 21 15:42 constraint_1_power_limit_uw
As you can see I have changed the privileges of the first file. The first one has value 65000000 and the second one has value 81250000. i try to change the value of the first to (say) 62000000 but when I try to save it the file trows an FSync failed (E667)error. I unset Fsync using 'set nofsync' command but then it throw filesystem full error(E514) . I reduce my file consumption and even rebooted the system but then it throws E509.
What am I doing wrong? I need to manipulate the RAPL power limits to regulate my system's TDP. Is there any other way to change RAPL limits?
Please guide me. Thanks in advance.

Bash on Windows 10, no loop devices

I've just tried Bash on my Windows 10 PC, and it works fine. However, I found that there is no such thing as loop devices by ls /dev/, and modprobe loop gives an error output.
Does it mean this Bash doesn't support loop devices at all or is there a solution for mounting an image as a loop device?
Windows Subsystem for Linux 1 (WSL, formerly known as Bash on Ubuntu on Windows) did not support loop devices. There was a feature request and an issue about it on Microsoft's Git repo.
WSL 2, however, does support loop devices.
$ uname -a
Linux Blade 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ fallocate -l 1G test.img
$ mkfs.ext3 test.img
mke2fs 1.45.5 (07-Jan-2020)
Discarding device blocks: done
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: 549cca4d-a65f-4f4f-8428-e324feaed3d0
Superblock backups stored on blocks:
32768, 98304, 163840, 229376
Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done
$ sudo mount -o loop test.img /media/
$ ls /media/
lost+found
Do you know that Bash is just a shell (something that reads your commands, executes them, pipes between them and permits you to write scripts) and is not an operating system?
Loop devices are part of the Linux kernel, and they simply don't exist in the Windows kernel.

How to limit CPU and RAM resources for mongodump?

I have a mongod server running. Each day, I am executing the mongodump in order to have a backup. The problem is that the mongodump will take a lot of resources and it will slow down the server (that by the way already runs some other heavy tasks).
My goal is to somehow limit the mongodump which is called in a shell script.
Thanks.
You should use cgroups. Mount points and details are different on distros and a kernels. I.e. Debian 7.0 with stock kernel doesn't mount cgroupfs by default and have memory subsystem disabled (folks advise to reboot with cgroup_enabled=memory) while openSUSE 13.1 shipped with all that out of box (due to systemd mostly).
So first of all, create mount points and mount cgroupfs if not yet done by your distro:
mkdir /sys/fs/cgroup/cpu
mount -t cgroup -o cpuacct,cpu cgroup /sys/fs/cgroup/cpu
mkdir /sys/fs/cgroup/memory
mount -t cgroup -o memory cgroup /sys/fs/cgroup/memory
Create a cgroup:
mkdir /sys/fs/cgroup/cpu/shell
mkdir /sys/fs/cgroup/memory/shell
Set up a cgroup. I decided to alter cpu shares. Default value for it is 1024, so setting it to 128 will limit cgroup to 11% of all CPU resources, if there are competitors. If there are still free cpu resources they would be given to mongodump. You may also use cpuset to limit numver of cores available to it.
echo 128 > /sys/fs/cgroup/cpu/shell/cpu.shares
echo 50331648 > /sys/fs/cgroup/memory/shell/memory.limit_in_bytes
Now add PIDs to the cgroup it will also affect all their children.
echo 13065 > /sys/fs/cgroup/cpu/shell/tasks
echo 13065 > /sys/fs/cgroup/memory/shell/tasks
I run couple of tests. Python that tries to allocate bunch of mem was Killed by OOM:
myaut#zenbook:~$ python -c 'l = range(3000000)'
Killed
I've also run four infinite loops and fifth in cgroup. As expected, loop that was run in cgroup got only about 45% of CPU time, while the rest of them got 355% (I have 4 cores).
All that changes do not survive reboot!
You may add this code to a script that runs mongodump, or use some permanent solution.

Why is the system CPU time (% sy) high?

I am running a script that loads big files. I ran the same script in a single core OpenSuSe server and quad core PC. As expected in my PC it is much more faster than in the server. But, the script slows down the server and makes it impossible to do anything else.
My script is
for 100 iterations
Load saved data (about 10 mb)
time myscript (in PC)
real 0m52.564s
user 0m51.768s
sys 0m0.524s
time myscript (in server)
real 32m32.810s
user 4m37.677s
sys 12m51.524s
I wonder why "sys" is so high when i run the code in server. I used top command to check the memory and cpu usage.
It seems there is still free memory, so swapping is not the reason. % sy is so high, its probably the reason for the speed of server but I dont know what is causing % sy so high. The process that is using highest percent of CPU (99%) is "myscript". %wa is zero in the screenshot but sometimes it gets very high (50 %).
When the script is running, load average is greater than 1 but have never seen to be as high as 2.
I also checked my disc:
strt:~ # hdparm -tT /dev/sda
/dev/sda:
Timing cached reads: 16480 MB in 2.00 seconds = 8247.94 MB/sec
Timing buffered disk reads: 20 MB in 3.44 seconds = 5.81 MB/sec
john#strt:~> df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 245G 102G 131G 44% /
udev 4.0G 152K 4.0G 1% /dev
tmpfs 4.0G 76K 4.0G 1% /dev/shm
I have checked these things but I am still not sure what is the real problem in my server and how to fix it. Can anyone identify a probable reason for the slowness? What could be the solution?
Or is there anything else I should check?
Thanks!
You're getting a high sys activity because the load of the data you're doing takes system calls that happen in kernel. To resolve your slowness problems without upgrading the server might be possible. You can modify scheduling priority. See the man pages for nice and renice. See here and especially:
Niceness values range from -20 (the highest priority, lowest niceness) and 19 (the lowest priority, highest niceness).
$ ps -lp 941
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
4 S 0 941 1 0 70 -10 - 1713 poll_s ? 00:00:00 sshd
$ nice -n 19 ./test.sh
My niceness value is 19
$ renice -n 10 -p 941
941 (process ID) old priority -10, new priority 10

du -skh * in / returns vastly different size from df on centos 5.5

I have a vps slice running centos 5.5 I am supposed to have 15 gigs of disk space, but according to df it seems to double my disk space usage.
when I run du -skh * in / as root i get:
[root#yardvps1 /]# du -skh *
0 aquota.group
0 aquota.user
5.2M bin
4.0K boot
4.0K dev
4.9M etc
2.5G home
12M lib
14M lib64
4.0K media
4.0K mnt
299M opt
0 proc
692K root
23M sbin
4.0K selinux
4.0K srv
0 sys
48K tmp
2.0G usr
121M var
this is consistent with what I have uploaded to the machine, and adds up to about 5gigs.
BUT when i run df i get:
[root#yardvps1 /]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/simfs 15728640 11659048 4069592 75% /
none 262144 4 262140 1% /dev
it is showing me using almost 12 gigs already.
what is causing this discrepancy and is there anything I can do about it, I planned the server out based on 15 gigs but now it is basically only letting me have about 7 gigs of stuff on it.
thanks.
The most common cause of this effect is open files that have been deleted.
The kernel will only free the disk blocks of a deleted file if it is not in use at the time of its deletion. Otherwise that is deferred until the file is closed, or the system is rebooted.
A common Unix-world trick to ensure that no temporary files are left around is the following:
A process creates and opens a temporary file
While still holding the open file descriptor, the process unlinks (i.e. deletes) the file
The process reads and writes to the file normally using the file descriptor
The process closes the file descriptor when it's done, and the kernel frees the space
If the process (or the system) terminates unexpectedly, the temporary file is already deleted and no clean-up is necessary.
As a bonus, deleting the file reduces the chances of naming collisions when creating temporary files and it also provides an additional layer of obscurity over the running processes - for anyone but the root user, that is.
This behaviour ensures that processes don't have to deal with files that are suddenly pulled from under their feet, and also that processes don't have to consult each other in order to delete a file. It is unexpected behaviour for those coming from Windows systems, though, since there you are not normally allowed to delete a file that is in use.
The lsof command, when run as root, will show all open files and it will specifically indicate deleted files that are deleted:
# lsof 2>/dev/null | grep deleted
bootlogd 2024 root 1w REG 9,3 58 917506 /tmp/init.0W2ARi (deleted)
bootlogd 2024 root 2w REG 9,3 58 917506 /tmp/init.0W2ARi (deleted)
Stopping and restarting the guilty processes, or just rebooting the server should solve this issue.
Deleted files could also be held open by the kernel if, for example, it's a mounted filesystem image. In this case unmounting the filesystem or rebooting the server should do the trick.
In your case, judging by the size of the "missing" space I'd look for any references to the file that you used to set up the VPS e.g. the Centos DVD image that you deleted after installing.
Another case which I've come across although it doesn't appear to be your issue is if you mount a partition "on top" of existing files.
If you do so you effectively hide existing files that exist in the directory on the mounted-to partition (the mount point) from the mounted partition.
To fix: stop any processes with open files on the mounted partition, unmount partition, find and move/remove any files that now appear in mount point directory.
I had the same trouble with FreeBSD server. The reboot helped.

Resources