tickless kernel , isolcpus,nohz_full,and rcu_nocbs - linux

I have add "isolcpus=3 nohz_full=3 rcu_nocbs=3" in grub.conf in
RedHat 7.1 , kernel: linux 3.10.0-229 kernel and according to http://www.breakage.org/2013/11/15/nohz_fullgodmode/
I also execute the following command :
cat /sys/bus/workqueue/devices/writeback/cpumask
f
echo 1 > /sys/bus/workqueue/devices/writeback/cpumask
cat /sys/bus/workqueue/devices/writeback/numa
1
echo 0 > /sys/bus/workqueue/devices/writeback/numa
The box has only 4 cpu cores , I run the following shell :
watch -d 'cat /proc/interrupts'
look like work perfect , only cpu0 Local timer interrupts has 2000 per 2 secs,
the else cpu 1 to cpu 3 has less than 10 per 2 secs .
and then I test the following source :
void *Thread2(void *param)
{
pthread_detach(pthread_self());
while( 1 ){
sleep( 100000 ) ;
}
}
void *Thread1(void *param)
{
pthread_detach(pthread_self());
while( 1 ){
;
}
}
int main(int argc, char** argv)
{
pthread_t tid ;
pthread_create(&tid , NULL, Thread1, (void*)(long)3);
pthread_create(&tid , NULL, Thread2, (void*)(long)3);
while( 1 )
sleep( 5 ) ;
}
and run it by :
taskset -c 3 ./x1.exe
watch the output in :
watch -d 'cat /proc/interrupts'
this time , cpu 3 get 10~30 Local timer interrupts per 2 secs , look fine,
then I try to run 2 thread1 by :
pthread_create(&tid , NULL, Thread1, (void*)(long)3);
pthread_create(&tid , NULL, Thread1, (void*)(long)3);
then again run it :
taskset -c 3 ./x1.exe
then I watch the core 3 has the same Local timer interrupts with core 0 ,
it is 2000 interrupts per 2 secs .
May I ask , why 2 very busy thread1 will cause core 3 has
much more timer interrupts ?! what cause this happened ?!
and how to modify it if it can be ?!

In the second case, Kernel needs to schedule 2 cpu bound tasks on core 3 and the dynamic ticks configuration is applicable only when there is exactly one runnable task.
I thought SCHED_FIFO would stop these interrupts (and so I started answering), but that isn't yet implemented as per https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt
There is no way to change this behaviour except scheduling threads on different CPUs. You can always hack the kernel to achieve what you need.

Related

Interactively toggle output on and off in Linux?

For controlling output on Linux there is control-s and control-t, which provides a method for temporarily halting terminal output and then resuming it. On VMS in addition there was control-O, which would toggle all output on and off. This didn't pause output, it discarded it.
Is there an equivalent keyboard shortcut in Linux?
This comes up most often for me in gdb, when debugging programs which output millions of status lines. It would be very convenient to be able to temporarily send most of that to /dev/null rather than the screen, and then pick up with the output stream further on, having dispensed with a couple of million lines in between.
(Edited: The termios(3) man page mentions VDISCARD - and then says that it isn't going to work in POSIX or Linux. So it looks like this is out of the question for general command line use on linux. gdb might still be able to discard output though, through one of its own commands. Can it?)
Thanks.
On VMS in addition there was control-O ...
This functionality doesn't appear to exist on any UNIX system I've ever dealt with (or maybe I just never knew it existed; it's documented in e.g. FreeBSD man page, and is referenced by Solaris and HP-UX docs as well).
gdb might still be able to discard output though, through one of its own commands. Can it?
I don't believe so: GDB doesn't actually intercept the output from the inferior (being debugged) process, it simply makes it run (between breakpoints) with the inferior output going to wherever it's going.
That said, you could do it yourself:
#include <stdio.h>
int main()
{
int i;
for (i = 0; i < 1000; ++i) {
printf("%d\n", i);
}
}
gcc -g foo.c
gdb -q ./a.out
(gdb) break 6
Breakpoint 1 at 0x40053e: file foo.c, line 6.
(gdb) run 20>/dev/null # run the program, file descriptor 20 goes to /dev/null
Starting program: /tmp/a.out 20>/dev/null
Breakpoint 1, main () at foo.c:6
6 printf("%d\n", i);
(gdb) c
Continuing.
0
Breakpoint 1, main () at foo.c:6
6 printf("%d\n", i);
We've now run two iterations. Let's prevent further output for 100 iterations:
(gdb) call dup2(20, 1)
$1 = 1
(gdb) ign 1 100
Will ignore next 100 crossings of breakpoint 1.
(gdb) c
Continuing.
Breakpoint 1, main () at foo.c:6
6 printf("%d\n", i);
(gdb) p i
$2 = 102
No output, as desired. Now let's restore output:
(gdb) call dup2(2, 1)
$3 = 1
(gdb) ign 1 10
Will ignore next 10 crossings of breakpoint 1.
(gdb) c
Continuing.
102
103
104
105
106
107
108
109
110
111
112
Breakpoint 1, main () at foo.c:6
6 printf("%d\n", i);
Output restored!

Real parallelism in Linux shell

I am trying to have real parallelism on Linux shell, but I can't achieve it.
I have two programs. Allones, that only prints '1' character, and allzeros, that only prints 0 characters.
When I execute "./allones & ./allzeros &", I get big prints of '0's, and big prints of '1's, that mix in big chunks (e.g. 1111....111000...0000111...111000...000"). My processor has 8 cores.
However, when I executed my own program on a multi-core FPGA (with no OS), (If I distribute programs on different cores) I get something like "011000101000011010...".
How can I run it on Linux to get a result similar to what I get on a multi-core FPGA?
Sounds like you're experiencing libc's default line buffering:
Here's a test program spam.c:
#include <stdio.h>
int main(int argc, char** argv) {
while(1) {
printf("%s", argv[1]);
}
}
We can run it with:
$ ./spam 0 & ./spam 1 & sleep 1; killall spam
11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111(...)000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000(...)
On my systems, each block is exactly 1024 bytes long, strongly hinting at a buffering issue.
Here's the same code with a fflush to prevent buffering:
#include <stdio.h>
int main(int argc, char** argv) {
while(1) {
printf("%s", argv[1]);
fflush(stdout);
}
}
This is the new output:
100111001100110011001100110011001100110011100111001110011011001100110011001100110011001100110011001100110011001100110011001100011000110001100110001100100110011001100111001101100110011001100110011001100110000000000110010011000110011

perf : How to check processess running on particular cpu

Is there any option in perf to look into processes running on a particular cpu /core, and how much percentage of that core is taken by each process.
Reference links would be helpful.
perf is intended to do a profiling which is not good fit for your case. You may try to do sampling /proc/sched_debug (if it is compiled in your kernel). For example you may check which process is currently running on CPU:
egrep '^R|cpu#' /proc/sched_debug
cpu#0, 917.276 MHz
R egrep 2614 37730.177313 ...
cpu#1, 917.276 MHz
R bash 2023 218715.010833 ...
By using his PID as a key, you may check how many CPU time in milliseconds it consumed:
grep se.sum_exec_runtime /proc/2023/sched
se.sum_exec_runtime : 279346.058986
However, as #BrenoLeitão mentioned, SystemTap is quite useful for your script. Here is script for your task.
global cputimes;
global cmdline;
global oncpu;
global NS_PER_SEC = 1000000000;
probe scheduler.cpu_on {
oncpu[pid()] = local_clock_ns();
}
probe scheduler.cpu_off {
if(oncpu[pid()] == 0)
next;
cmdline[pid()] = cmdline_str();
cputimes[pid(), cpu()] <<< local_clock_ns() - oncpu[pid()];
delete oncpu[pid()];
}
probe timer.s(1) {
printf("%6s %3s %6s %s\n", "PID", "CPU", "PCT", "CMDLINE");
foreach([pid+, cpu] in cputimes) {
cpupct = #sum(cputimes[pid, cpu]) * 10000 / NS_PER_SEC;
printf("%6d %3d %3d.%02d %s\n", pid, cpu,
cpupct / 100, cpupct % 100, cmdline[pid]);
}
delete cputimes;
}
It traces moments when process is running on CPU and stops execution on that (due to migration or sleeping) by attaching to scheduler.cpu_on and scheduler.cpu_off probes. Second probe calculates time difference between these events and saves it to cputimes aggregation along with process command line arguments.
timer.s(1) fires once per second -- it walks over aggregation and calculates percentage. Here is sample output for Centos 7 with bash running infinite loop:
0 0 100.16
30 1 0.00
51 0 0.00
380 0 0.02 /usr/bin/python -Es /usr/sbin/tuned -l -P
2016 0 0.08 sshd: root#pts/0 "" "" "" ""
2023 1 100.11 -bash
2630 0 0.04 /usr/libexec/systemtap/stapio -R stap_3020c9e7ba76838179be68cd2390a10c_2630 -F3
I understand that perf is not the proper way to do it, although you can limit perf per CPU, as using perf record -C <cpulist> or even perf stat -c <cpulist>.
The close you are going to see is the context-switch event, but, this is not going to provide you the application names at all.
I think you are going to need something more powerful, as systemtap.

How to finding all runnable processes

I'm learning about the scheduler and trying to print all runnable proceeses. So I have written a kernel module that uses the for_each_process macro to iterate over all processes, and prints the ones at "runnable" state. But this seems like a stupid (and inefficient) way of doing this. So I thought about getting a reference to all running queues and use their Red-Black-Tree to go over the runnable processes, but couldn't find a way to do this.
I have found out that there is a list of sched_classs for each CPU which are stop_sched_class->rt_sched_class->fair_sched_class->idle_sched_class and each one of them has it's own running queue. But couldn't find a way to reach them all.
I have used the module that uses the tasks_timeline to find all runnable processes, to print the address of the running queues - seems I have 3 running queues (while having only two processors).
The module:
#include <linux/module.h> /* Needed by all modules */
#include <linux/kernel.h> /* Needed for KERN_INFO */
#include <linux/sched.h>
MODULE_LICENSE("GPL");
struct cfs_rq {
struct load_weight load;
unsigned int nr_running, h_nr_running;
};
void printList(void){
int count;
struct task_struct * tsk;
count = 0;
for_each_process(tsk){
if(tsk->state)
continue;
printk("pid: %d rq: %p (%d)\n", tsk->pid, tsk->se.cfs_rq, tsk->se.cfs_rq->nr_running);
count++;
}
printk("count is: %d\n", count);
}
int init_module(void)
{
printList();
return 0;
}
void cleanup_module(void)
{
printk(KERN_INFO "Goodbye world proc.\n");
}
The output:
[ 8215.627038] pid: 9147 ffff88007bbe9200 (3)
[ 8215.627043] pid: 9148 ffff8800369b0200 (2)
[ 8215.627045] pid: 9149 ffff8800369b0200 (2)
[ 8215.627047] pid: 9150 ffff88007bbe9200 (3)
[ 8215.627049] pid: 9151 ffff88007bbe9200 (3)
[ 8215.627051] pid: 9154 ffff8800a46d4600 (1)
[ 8215.627053] count is: 6
[ 8215.653741] Goodbye world proc.
About the computer:
$ uname -a
Linux k 3.13.0-39-generic #66-Ubuntu SMP Tue Oct 28 13:30:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/cpuinfo | grep 'processor' | wc -l
2
So my questions are:
How can I print all runnable processes in a nicer way?
How are running queues made and managed?
Are the running queues somehow linked each other? (How?)
$ps -A -l and find the instance where both the process state (R) and the Process Flags (1) are as mentioned.
You can try this below cmd.
Sample output.
127:~$ ps -A -l | grep -e R -e D
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
1 S 0 1367 2 0 80 0 - 0 - ? 00:00:01 SEPDRV_ABNORMAL
4 R 1000 2634 2569 2 80 0 - 794239 - ? 00:25:06 Web Content
1 D 0 20091 2 0 80 0 - 0 - ? 00:00:00 kworker/3:2
4 R 1000 21077 9361 0 80 0 - 7229 - pts/17 00:00:00 ps

Why Linux always output "^C" upon pressing of Ctrl+C?

I have been studying signals in Linux. And I've done a test program to capture SIGINT.
#include <unistd.h>
#include <signal.h>
#include <iostream>
void signal_handler(int signal_no);
int main() {
signal(SIGINT, signal_handler);
for (int i = 0; i < 10; ++i) {
std::cout << "I'm sleeping..." << std::endl;
unsigned int one_ms = 1000;
usleep(200* one_ms);
}
return 0;
}
void signal_handler(int signal_no) {
if (signal_no == SIGINT)
std::cout << "Oops, you pressed Ctrl+C!\n";
return;
}
While the output looks like this:
I'm sleeping...
I'm sleeping...
^COops, you pressed Ctrl+C!
I'm sleeping...
I'm sleeping...
^COops, you pressed Ctrl+C!
I'm sleeping...
^COops, you pressed Ctrl+C!
I'm sleeping...
^COops, you pressed Ctrl+C!
I'm sleeping...
^COops, you pressed Ctrl+C!
I'm sleeping...
I'm sleeping...
I'm sleeping...
I understand that when pressing Ctrl+C, processes in foreground process group all receives a SIGINT(if no process chooses to ignore it).
So is it that the shell(bash) AND the instance of the above program both received the signal? Where does the "^C" before each "Oops" come from?
The OS is CentOS, and the shell is bash.
It is the terminal (driver) that intercepts the ^C and translates it to a signal sent to the attached process (which is the shell) stty intr ^B would instruct the terminal driver to intercept a ^B instead. It is also the terminal driver that echoes the ^C back to the terminal.
The shell is just a process that sits at the other end of the line, and receives it's stdin from your terminal via the terminal driver (such as /dev/ttyX), and it's stdout (and stderr) are also attached to the same tty.
Note that (if echoing is enabled) the terminal sends the keystrokes to both the process (group) and back to the terminal. The stty command is just wrapper around the ioctl()s for the tty driver for the processes "controlling" tty.
UPDATE: to demonstrate that the shell is not involved, I created the following small program. It should be executed by its parent shell via exec ./a.out (it appears an interactive shell will fork a daughter shell, anyway) The program sets the key that generates the SIGINTR to ^B, switches echo off, and than waits for input from stdin.
#include <stdio.h>
#include <string.h>
#include <termios.h>
#include <unistd.h>
#include <signal.h>
#include <errno.h>
int thesignum = 0;
void handler(int signum);
void handler(int signum)
{ thesignum = signum;}
#define THE_KEY 2 /* ^B */
int main(void)
{
int rc;
struct termios mytermios;
rc = tcgetattr(0 , &mytermios);
printf("tcgetattr=%d\n", rc );
mytermios.c_cc[VINTR] = THE_KEY; /* set intr to ^B */
mytermios.c_lflag &= ~ECHO ; /* Dont echo */
rc = tcsetattr(0 , TCSANOW, &mytermios);
printf("tcsetattr(intr,%d) =%d\n", THE_KEY, rc );
printf("Setting handler()\n" );
signal(SIGINT, handler);
printf("entering pause()\n... type something followed by ^%c\n", '#'+THE_KEY );
rc = pause();
printf("Rc=%d: %d(%s), signum=%d\n", rc, errno , strerror(errno), thesignum );
// mytermios.c_cc[VINTR] = 3; /* reset intr to ^C */
mytermios.c_lflag |= ECHO ; /* Do echo */
rc = tcsetattr(0 , TCSANOW, &mytermios);
printf("tcsetattr(intr,%d) =%d\n", THE_KEY, rc );
return 0;
}
intr.sh:
#!/bin/sh
echo $$
exec ./a.out
echo I am back.
The shell echoes everything you type, so when you type ^C, that too gets echoed (and in your case intercepted by your signal handler). The command stty -echo may or may not be useful to you depending on your needs/constraints, see the man page for stty for more information.
Of course much more goes on at a lower level, anytime you communicate with a system via peripherals device drivers (such as the keyboard driver that you use to generate the ^C signal, and the terminal driver that displays everything) are involved. You can dig even deeper at the level of assembly/machine language, registers, lookup tables etc. If you want a more detailed, in-depth level of understanding the books below are a good place to start:
The Design of the Unix OS is a good reference for these sort of things. Two more classic references: Unix Programming Environment
and Advanced Programming in the UNIX Environment
Nice summary here in this SO question How does Ctrl-C terminate a child process?
"when youre run a program, for example find, the shell:
the shell fork itself
and for the child set the default signal handling
replace the child with the given command (e.g. with find)
when you press CTRL-C, parent shell handle this signal but the child will receive it - with the default action - terminate. (the child can implement signal handling too)"

Resources