How to set resource limits for init on boot? - linux

I'm trying to find a way to set the rlimit value for the init process during the boot time. Normally, rlimit is set by calling the "setrlimit" system call.
So I was wondering is there any way to call a system call in the boot time (like calling it in the shell script)? Or, is there any other way to perform equivalent operation of setrlimit?

On Linux you can use the prlimit syscall/utility to set resource limits for other processes:
prlimit — get and set process resource limits
prlimit [options] [ −−resource [=limits] ] [ −−pid PID ]
and
int prlimit(pid_t pid, int resource, const struct rlimit *new_limit,
struct rlimit *old_limit);
You can run this on boot however you want, such as in /etc/rc.local or equivalent, or #reboot in crontab if supported.

Related

Can not get correct pid in WSL2

I am learning Linux programing.
When I trying to write a simple module to get family of a process, I find I can not get current pid of a process and its parent process. How to fix it?
Here is a part of my code.
static pid_t pid = 1;
module_param(pid, int, 0644);
static int hello_init(void) {
struct task_struct *p;
struct list_head *pp;
struct task_struct *psibling;
struct pid *kpid;
kpid = find_get_pid(pid);
p = pid_task(kpid, PIDTYPE_PID);
printk("me: %d %s\n", pid, p->comm);
if (p->parent == NULL) {
printk("No Parent\n");
}
else {
printk("Parent: %d %s\n", p->parent->pid, p->parent->comm);
}
list_for_each(pp, &p->parent->children) {
psibling = list_entry(pp, struct task_struct, sibling);
printk("sibling %d %s \n", psibling->pid, psibling->comm);
}
list_for_each(pp, &p->children) {
psibling = list_entry(pp, struct task_struct, sibling);
printk("children %d %s \n", psibling->pid, psibling->comm);
}
return 0;
}
result:
sudo insmod module.ko pid=1
dmesg
[ 6396.170631] me: 237 systemd
[ 6396.170633] Parent: 235 unshare
[ 6396.170633] sibling 237 systemd
[ 6396.170633] children 286 systemd-journal
[ 6396.170634] children 306 systemd-udevd
[ 6396.170635] children 314 systemd-network
[ 6396.170635] children 501 snapfuse
[ 6396.170636] children 508 dbus-daemon
[ 6396.170636] children 509 NetworkManager
[ 6396.170637] children 632 systemd-logind
[ 6396.170637] children 639 systemd
[ 6396.170638] children 665 rtkit-daemon
[ 6396.170638] children 671 polkitd
[ 6396.170638] children 711 udisksd
[ 6396.170639] children 761 upowerd
I'm not a Linux systems development expert, but I'll take a stab at helping based on what I see you trying.
First, you don't mention it in your question, but you are clearly running some sort of Systemd enablement. As you know, Systemd isn't normally supported on WSL. At a high level, the scripts to enable Systemd on WSL all have two essential functions:
Create a new PID namespace where Systemd is running as PID1. At the most basic level, this can be done via:
sudo -b unshare --pid --fork --mount-proc /lib/systemd/systemd --system-unit=basic.target
We can see the unshare in the list of processes returned, so that's getting called, at least.
Wait for Systemd to fully start, then enter the namespace that was created above. This is typically something like:
sudo -E nsenter --all -t $(pgrep -xo systemd) $SHELL
The actual scripts are typically a bit more complicated in order to handle multiple shells, distributions, etc. They also attempt to preserve more of the WSL environment inside the namespace in order to enable the Interop features such as running Windows .exes. But the core concept is always the same.
So, taking a guess here (again, as a non-systems-dev guy), it seems that:
kpid=find_get_pid(1) is returning the systemd process inside the namespace
pid_task(kpid, PIDTYPE_PID) is returning the "true" process information from the root namespace.
It seems to me that code must be running outside the namespace, since you see the unshare as part of it. From within the namespace, the unshare doesn't exist. You can verify this (inside the namespace) with ps -ef | grep unshare.
There are at least two possible solutions:
If it's not an issue (and from the comments, it wasn't), then just run your code from the root pid namespace. I'm assuming that your Systemd script is running via your shell startup files, so you should be able to get back to the root namespace by starting up with something like wsl ~ -e bash --noprofile --norc. This will start the shell without any of the startup scripts.
Of course, other techniques for disabling the Systemd script are probably documented by whatever script you are using.
If you do want your code to work properly from within a PID namespace, then you'll probably need to find the namespace (I'd start with the source of lsns as an example).
Then find the task struct within that namespace (probably find_task_by_pid_ns?).

Does GNU time memory output account for child processes too?

When running GNU time (/usr/bin/time) and checking for memory consumption, does its output account for the memory usage of the child processes of my target program?
Could not find anything in GNU's time manpage.
Yes.
You can easily check with:
$ /usr/bin/time -f '%M' sh -c 'perl -e "\$y=q{x}x(2*1024*1024)" & wait'
8132
$ /usr/bin/time -f '%M' sh -c 'perl -e "\$y=q{x}x(8*1024*1024)" & wait'
20648
GNU time is using the wait4 system call on Linux (via the wait3 glibc wrapper), and while undocumented, the resource usage it returns in the struct rusage also includes the descendands of the process waited for. You can look at the kernel implementation of wait4 in kernel/exit.c for all the details:
$ grep -C2 RUSAGE_BOTH include/uapi/linux/resource.h
#define RUSAGE_SELF 0
#define RUSAGE_CHILDREN (-1)
#define RUSAGE_BOTH (-2) /* sys_wait4() uses this */
#define RUSAGE_THREAD 1 /* only the calling thread */
FreeBSD and NetBSD also have a wait6 system call which returns separate info for the process waited for and for its descendants. They also clearly document that the rusage returned by wait3 and wait4 also includes grandchildren.

How to add a different scheduler to a shell script in Linux?

I am trying to use different schedulers to measure CPU usage among various programs. I am currently having trouble figuring out how to add a different scheduler to the script. I have tried using the chrt command, but I can not reliably get the pid for the script.
PIDs are fickle and racy (only the parent process of a PID can be sure it hasn't died and been recycled).
I'd use the first form (chrt [options] prio command [arg]...
) instead, relying on two scripts:
wrapper_script:
exec chrt --fifo 99 wrapee #wrapee must be in $PATH
wrapee:
echo "I'm a hi-priority hello world"

reboot within an initrd image

I am looking for a method to restart/reset my linux system from within an init-bottom script*. At the time my script is executed the system is found under /root and I have access to a busybox.
But the "reboot" command which is part of my busybox does not work. Is there any other possibility?
My system is booted normally with an initramfs image and my script is eventually causing an update process. The new systemd which comes with debian irritates this. But with a power reset everything is fine.
I have found this:
echo b >/proc/sysrq-trigger
(it's like pressing CTRL+ALT+DEL)
If you -are- init (the PID of your process/script is 0), then starting the busybox reboot program won't work since it tries to signal init (which is not started) to reboot.
Instead, as PID 0, you should do what init would do. This is call the correct kernel API for the reboot. See Man reboot(2) for details.
Assuming you are running a c program or something, one would do:
#include <unistd.h>
#include <sys/reboot.h>
void main() { reboot(0x1234567); }
This is much better than executing the sysrq trigger which will act more like a panic restart than a clean restart.
As a final note, busybox's init actually forks a process to do the reboot for it. This is because the reboot systemcall actually also exists the program, and the system should never run without an init process (which will also panic the kernel). Hence in this case, you would do something like:
pid_t pid;
pid = vfork();
if (pid == 0) { /* child */
reboot(0x1234567);
_exit(EXIT_SUCCESS);
}
while (1); /* Parent (init) waits */

I don't get coredump with all process

I try to get a coredump, so i use :
ulimit -c unlimited
I run my program in background, and I kill it :
kill -SEGV %1
But i just get :
[1]+ Exit 1 ./Test
And no coredumps are created.
I did the same with other programs and it works, so why that didn't work with all ? Anybody can help me ?
Thanks. (GNU/Linux, Debian 2.6.26)
If your program traps the SEGV signal and does something else, it won't invoke the OS core dump routine. Check that it doesn't do that.
Under Linux, processes which change their user ID using setuid, seteuid or some other parameters get excluded from dumping core for security reasons (Think: /bin/passwd dumps core while reading /etc/shadow into memory)
You can re-enable dumping core on Linux programs which change their user ID by calling prctl() after the change of UID
Also you might want to check that the program you're running is not changing its working directory ( chdir() ), because then it will create the core file in a different directory than the one you're running it from.
And you can try this too:
kill -ABRT pid
Try (as root):
sysctl kernel.core_pattern=core
and then repeat your experiment. On some systems that variable is set to /dev/null by default.
However, if you see exit status 1, perhaps the program indeed intercepts the signal.

Resources