Why can't I handle NMI? - linux

I want to handle NMI and do something when NMI occur. Firstly I write a naive nmi handler:
static irqreturn_t nmi_handler(int irq, void* dev_id) {
printk("-#_#- I'm TT, I am handling NMI.\n");
return IRQ_HANDLED;
}
And write a module to register my nmi handler, then use APIC to trigger NMI 5 times:
static void __init ipi_init(void) {
printk("-#_#- I'm coming again, hahaha!\n");
int result = request_irq(NMI_VECTOR,
nmi_handler, IRQF_DISABLED, "NMI Watchdog", NULL);
printk("--- the result of request_irq is: %d\n", result);
int i;
for (i = 0; i < 5; ++i) {
apic->send_IPI_allbutself(NMI_VECTOR);
ssleep(1);
}
}
Now I type "insmod xxx.ko" to install this module, after that, I check the /var/log/syslog:
kernel: [ 1166.231005] -#_#- I'm coming again, hahaha!
kernel: [ 1166.231028] --- the result of request_irq is: 0
kernel: [ 1166.231050] Uhhuh. NMI received for unknown reason 00 on CPU 1.
kernel: [ 1166.231055] Do you have a strange power saving mode enabled?
kernel: [ 1166.231058] Dazed and confused, but trying to continue
kernel: [ 1167.196293] Uhhuh. NMI received for unknown reason 00 on CPU 1.
kernel: [ 1167.196293] Do you have a strange power saving mode enabled?
kernel: [ 1167.196293] Dazed and confused, but trying to continue
kernel: [ 1168.201288] Uhhuh. NMI received for unknown reason 00 on CPU 1.
kernel: [ 1168.201288] Do you have a strange power saving mode enabled?
kernel: [ 1168.201288] Dazed and confused, but trying to continue
kernel: [ 1169.235553] Uhhuh. NMI received for unknown reason 00 on CPU 1.
kernel: [ 1169.235553] Do you have a strange power saving mode enabled?
kernel: [ 1169.235553] Dazed and confused, but trying to continue
kernel: [ 1170.236343] Uhhuh. NMI received for unknown reason 00 on CPU 1.
kernel: [ 1170.236343] Do you have a strange power saving mode enabled?
kernel: [ 1170.236343] Dazed and confused, but trying to continue
It shows that I register nmi_handler successfully(result=0), and NMI were triggered 5 times, but I didn't find sting that should be outputed in nmi_handler.
I work on Ubuntu 10.04 LTS, Intel Pentium 4 Dual-core.
Does it mean my NMI handler didn't execute?
How do I handler NMI in Linux?

Nobody?
My partner gave me 3 more days, so I read the source code and ULK3, now I can answer question 1:
Does it mean my NMI handle didn't execute?
In fact, IRQ number and INT vector number are different! The function request_irq() call setup_irq():
/**
* setup_irq - setup an interrupt
* #irq: Interrupt line to setup
* #act: irqaction for the interrupt
*
* Used to statically setup interrupts in the early boot process.
*/
int setup_irq(unsigned int irq, struct irqaction *act)
{
struct irq_desc *desc = irq_to_desc(irq);
return __setup_irq(irq, desc, act);
}
Look at this: #irq: Interrupt line to setup
. The argument irq is interrupt line number, not interrupt vector number. Look up ULK3 PDF, P203, Timer interrupt has IRQ 0, but its INT nr is 32! So I trigger the INT2(NMI) but my handler handle the INT34 actually! I want to find more evidence in source code(e.g. how to convert IRQ to INT? I modify my handler and init, I request irq=2, and Linux allot INT=50), but get nothing, expect linux-xxx/arch/x86/include/asm/irq_vectors.h
/*
* IDT vectors usable for external interrupt sources start
* at 0x20:
*/
#define FIRST_EXTERNAL_VECTOR 0x20
Wait me for a while...let me read more codes to answer question 2.

Related

how to set arm spi interrupt to IRQ_TYPE_EDGE_BOTH

As a newbie on this topic, I am trying to change the dts of an optical port to IRQ_TYPE_EDGE_BOTH to catch the event when a port is plugged (EDGE_RISING) or removed (EDGE_FALLING). however, the kernel complains when set irq mode to IRQ_TYPE_EDGE_BOTH, which is (IRQ_TYPE_EDGE_RISING|IRQ_TYPE_EDGE_FALLING):
genirq: Setting trigger mode 3 for irq 47 failed (gic_set_type+0x0/0x98)
After a short look at the irq-gic.c file, it says:
/* SPIs have restrictions on the supported types */
if (gicirq >= 32 && type != IRQ_TYPE_LEVEL_HIGH &&
type != IRQ_TYPE_EDGE_RISING)
return -EINVAL;
Is there anyway to set the interrupt to IRQ_TYPE_EDGE_BOTH? our arm is armv7ahf neon 32-bits.

Working with GPIOs in kernel module IOCTLs

I have working with GPIOs in my kernel module, while I set or reset GPIOS from an IOCTL I got the following warning in my "dmesg" Log.
[11115.549204] WARNING: CPU: 1 PID: 5199 at drivers/gpio/gpiolib.c:2415 gpiod_get_raw_value+0x7c/0xb8
[11115.558267] Modules linked in: ariodrv(O) [last unloaded: ariodrv]
[11115.564570] CPU: 1 PID: 5199 Comm: ARIO_RMG Tainted: G W O 4.9.166.RMG.-00002-gcbd9807b6c03-dirty #13
[11115.574776] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[11115.581320] Backtrace:
[11115.583816] [<8010b150>] (dump_backtrace) from [<8010b3fc>] (show_stack+0x18/0x1c)
[11115.591426] r7:00000009 r6:600b0013 r5:80c1ae70 r4:00000000
[11115.597119] [<8010b3e4>] (show_stack) from [<803f51d4>] (dump_stack+0x9c/0xb0)
[11115.604380] [<803f5138>] (dump_stack) from [<80124878>] (__warn+0xec/0x104)
[11115.611367] r7:00000009 r6:80a39e28 r5:00000000 r4:00000000
[11115.617050] [<8012478c>] (__warn) from [<80124948>] (warn_slowpath_null+0x28/0x30)
[11115.624653] r9:8d696000 r8:7ea8cfa0 r7:0000000e r6:8d26e600 r5:8c1f9c54 r4:8c207f10
[11115.632434] [<80124920>] (warn_slowpath_null) from [<8042fbb8>] (gpiod_get_raw_value+0x7c/0xb8)
[11115.641177] [<8042fb3c>] (gpiod_get_raw_value) from [<7f00cd78>] (device_ioctl+0x334/0x9f8 [ariodrv])
[11115.650428] r5:8004d282 r4:7ea8cfa0
[11115.654034] [<7f00ca44>] (device_ioctl [ariodrv]) from [<80219c58>] (do_vfs_ioctl+0xa8/0x914)
[11115.662595] r7:0000000e r6:8d26e600 r5:8ccc5bc0 r4:7ea8cfa0
[11115.668278] [<80219bb0>] (do_vfs_ioctl) from [<8021a500>] (SyS_ioctl+0x3c/0x64)
[11115.675618] r10:00000036 r9:8d696000 r8:7ea8cfa0 r7:8004d282 r6:8d26e600 r5:0000000e
[11115.683477] r4:8d26e601
[11115.686035] [<8021a4c4>] (SyS_ioctl) from [<80107960>] (ret_fast_syscall+0x0/0x48)
[11115.693645] r9:8d696000 r8:80107b44 r7:00000036 r6:00000000 r5:768c611c r4:7ea8cf98
[11115.701504] ---[ end trace 7be84f1e05fd36af ]---
But if I set or get a value to a GPIO pin in another function, like init function of my module I don't get these warnings...
So the question is how exactly I should work with a GPIO pin in an IOCTL call?
This is part of my GPIO set IOCTL code:
IOCTL_FUNC(...) {
....
case IOCTL_RMG_GPIO_SET:
{
....
//I have initialized the GPIO pin as output before, and assume my gpio pin number is 4.
//int gpioNumber = 4;
//int value = 1;
gpio_set_value(gpioNumber, value);
break;
}
....
}
It doesn't matter either I get or set a value. If I use those GPIOs in an IOCTL call I got warning. But in other internal functions like init_module() or module_release() functions I can set and get these values without warning.
EDIT 1:
The problem I have is on GPIOs which are on my IOexpander (MCP23xxx series), This IOexpander works on i2c bus.
I don't have problem or any warning while using the GPIOs which are on my processor (iMX6DL).
EDIT 2:
#Tsyvarev and #0andriy Thank you guys, From this link I figured out gpiod_get_raw_value_cansleep() function is not what I need, Cause this function needs a GPIO descriptor to work and my kernel error was for that. But the functions gpio_get_value_cansleep() and gpio_set_value_cansleep() functions are the functions are suited for i2c IO expander.
So thank you for helping me, The working code is now:
IOCTL_FUNC(...) {
....
case IOCTL_RMG_GPIO_SET:
{
....
//I have initialized the GPIO pin as output before, and assume my gpio pin number is 4.
//int gpioNumber = 4;
//int value = 1;
gpio_set_value_cansleep(gpioNumber, value);
break;
}
case IOCTL_RMG_GPIO_GET:
{
....
//I have initialized the GPIO pin as output before, and assume my gpio pin number is 4.
//int gpioNumber = 4;
value = gpio_get_value_cansleep(gpioNumber);
break;
}
....
}
If you read the linux source at the warning, it tells you:
* This function should be called from contexts where we cannot sleep, and will
* complain if the GPIO chip functions potentially sleep.
WARN_ON(desc->gdev->chip->can_sleep);
You should be calling gpio_get_value_cansleep

Why is the compiler adding an extra 'sxtw' instruction (resulting further in a kernel panic)?

Issue/Symptom:
At the end of a function return, the compiler adds an sxtw instruction as seen in the disassembly, resulting in a return address of only 32 bits instead of 64 bits, resulting in a kernel panic:
Unable to handle kernel paging request at virtual address xxxx
Build Environment:
Platform : ARMV7LE
gcc, linux-4.4.60
Archictecture : arm64
gdb : aarch64-5.3-glibc-2.22/usr/bin/aarch64-linux-gdb
Details:
Here's the simplified project structure. It's been taken care of correctly in the corresponding makefile. Also note that file1.c and file2.c are part of same module.
../src/file1.c /* It has func1() defined as well as called /
../src/file2.c
../inc/files.h / There's no func1() declared in the header */
Cause of the issue:
A call to the func1() was added from the file2.c w/o func1 declaration in files.h or file2.c. (Basically the inclusion of func1 was accidentally missed in the files.h.)
Code compiled with no errors, but a warning as expected -- Implicit declaration of function func1.
At run time though, right after returning from func1 inside file2, the system crashed as it tried de-referencing the returned address from func1.
Further analysis showed that at the end of a function return, the compiler added an sxtw instruction as seen in the disassembly, resulting in a return address of only 32 bits instead of 64 bits, resulting in a kernel panic.
Unable to handle kernel paging request at virtual address xxxx
Note that x19 is of 64 bit while w0 is of 32 bit.
Note that x0 LS word matches with that of x19.
System crashed while de-referencing x19.
sxtw x19, w0 /* This was added by compiler as extra instruction /
ldp x1, x0, [x19,#304] / System crashed here */
Registers:
[ 91.388130] pc : [<ffffff80016c9074>] lr : [<ffffff80016c906c>] pstate: 80000145
[ 91.462090] sp : ffffff80094333b0
[ 91.552708] x29: ffffff80094333d0 x28: ffffffc06995408a
[ 91.652701] x27: ffffffc06c400a00 x26: 0000000000000000
[ 91.716243] x25: 0000000000000000 x24: ffffffc069958000
[ 91.779784] x23: ffffffc076e00000 x22: ffffffc06c400a00
[ 91.843326] x21: 0000000000000031 x20: ffffffc073060000
[ 91.906867] x19: 0000000066bfc780 x18: ffffff8009436888
[ 91.970409] x17: 0000000000000000 x16: ffffff8008193074
[ 92.033952] x15: 00000000000a8c06 x14: 2c30323030387830
[ 92.097492] x13: 3d7367616c66202c x12: 3038653030303030
[ 92.161034] x11: 3038666666666666 x10: 78303d646e65202c
[ 92.224576] x9 : 3063303030303030 x8 : 3030303030303030
[ 92.288117] x7 : 0000000000000880 x6 : 0000000000000000
[ 92.351659] x5 : ffffffc07fd10ad8 x4 : 0000000000000001
[ 92.415202] x3 : 0000000000000007 x2 : cb88537fdc8ba63c
[ 92.478743] x1 : 0000000000000000 x0 : ffffffc066bfc780
After adding the declaration of func1 in the files.h, the extra instruction and hence the crash was not seen.
Can someone please explain why the compiler added sxtw in this case?
You should have received at least two warnings, one about the missing function declaration and another one about the the implicit conversion from int to a pointer type.
The reason is that implicitly declared functions have a return type of int. Casting this int value to a 64-bit pointer throws away 32 bit of the result. This is the expected GNU C behavior, based on what C compilers for early 64-bit targets did. The sxtw instruction is required to implement this behavior. (Current C standards no longer have implicit function declarations, but GCC still has to support them for backwards compatibility with existing autoconf tests.)
Note that your platform is obviously Aarch64 (with 64-bit registers), not 32-bit ARMv7.

The irq in kernel function asm_do_IRQ() is different from the one I request in module

I did some experiment with a cortex-A9 development board. I used gpio_to_irq() to get an irq num and I requested the irq and wrote a small driver with it , it was 196 in syslog . And I added some printks in asm_do_IRQ. When I triggered the gpio interrupt , the driver works fine but the irq num in asm_do_IRQ was 62 .I can't understand. Why the irq number was different from the one I request? The driver is as follow:
#include <linux/module.h>
#include <linux/interrupt.h>
#include <linux/irq.h>
#include <linux/gpio.h>
#define GPIO_N 36 //gpio number
int flag = 0;
static irqreturn_t handler(int irq,void *dev_id)
{
printk("hello world hahahahahhahahah \n\n");
return 0;
}
static int __init gpio_test_init(void)
{
if(gpio_request_one(GPIO_N,GPIOF_DIR_IN,"some test")<0)
{
printk(KERN_ERR "Oops! BAD! BAD! BAD!\n\n");
return 0;
}
int irq,irq2;
irq = OMAP_GPIO_IRQ(TEST_GPIO);
printk("irq : %d \n",irq,irq2);
// ..................
// irq : 196 in dmesg
//......................
set_irq_type(irq,IRQ_TYPE_EDGE_FALLING);
enable_irq(gpio_to_irq(GPIO_N));
int err;
// request the irq ...
if((err = request_irq(irq,&handler,0,NULL,NULL))<0)
{
printk("err : %d\n",err);
return 0;
}
printk("gpio test init success!\n");
flag = 1;
return 0;
}
static void __exit gpio_test_exit(void)
{
int irq = gpio_to_irq(TEST_GPIO);
if(flag == 1)free_irq(irq,NULL);
gpio_free(TEST_GPIO);
printk("gpio test exit byebye!\n");
}
module_init(gpio_test_init);
module_exit(gpio_test_exit);
MODULE_LICENSE("GPL");
asm_do_IRQ in arch/arm/kernel/irq.c
asmlinkage void __exception_irq_entry
asm_do_IRQ(unsigned int irq, struct pt_regs *regs)
{
struct pt_regs *old_regs = set_irq_regs(regs);
printk("the irq : %d\n",irq);
//...............
// I get 62 here
//...............
irq_enter();
/*
* Some hardware gives randomly wrong interrupts. Rather
* than crashing, do something sensible.
*/
if (unlikely(irq >= nr_irqs)) {
if (printk_ratelimit())
printk(KERN_WARNING "Bad IRQ%u\n", irq);
ack_bad_irq(irq);
} else {
generic_handle_irq(irq);
}
/* AT91 specific workaround */
irq_finish(irq);
irq_exit();
set_irq_regs(old_regs);
}
This observation is likely due to the mapping between physical and virtual IRQ numbers. The numbers seen in your driver are virtual IRQ numbers, valid only when using the generic linux interrupt handling subsystem. The interrupt number in asm_do_IRQ will be the physical interrupt number provided by the interrupt fabric of the core.
I believe the OMAP processors support interrupts on GPIO pins. The way this is usually implemented is to allocate a single IRQ line for a bank of GPIO inputs, say 32 bits. When an interrupt occurs on any of the GPIOs, that IRQ line will activate. This is likely the number 62 on your processor. If you look in the manual for your processor, you should see that IRQ 62 corresponds to an interrupt on a GPIO bank.
Now, the linux GPIO subsystem will allow you to allocate an interrupt handler to any of the GPIOs, providing you with a mapping from a linux irq number to a physical irq number. The linux irq number in your case is 196. The GPIO subsystem is configured to handle all GPIO interrupts (say interrupt 62), read the GPIO register to determine which of the GPIO bits in a bank could have generated an interrupt, and then calls out the interrupt handler you've assigned with request_irq.
Here's a basic flow of control for a GPIO interrupt:
A change occurs on an interrupt in a GPIO bank. IRQ 62 is raised.
asm_do_IRQ runs on IRQ 62. The GPIO subsystem has been registered to handle IRQ 62 by the platform init code.
The GPIO subsystem reads the GPIO registers and determines that GPIO bit X has caused the interrupt. It calculates the mapping from bit X to the linux virtual IRQ number, in this case 196.
The GPIO interrupt handler then calls the generic_handle_irq function with 196, which calls your interrupt handler.
There is usually a static mapping defined by the platform between virtual IRQ numbers and physical IRQ numbers. To see this mapping,
enable CONFIG_VIRQ_DEBUG on kernels older than linux-3.4, or
enable CONFIG_IRQ_DOMAIN_DEBUG on newer kernels.
Then have a look to irq_domain_mapping debugfs file. E.g. on PowerPC:
# mount -t debugfs none /sys/kernel/debug
# cat /sys/kernel/debug/irq_domain_mapping
irq hwirq chip name chip data domain name
16 0x00009 IPIC 0xcf801c80 /soc8347#e0000000/pic#700
18 0x00012 IPIC 0xcf801c80 /soc8347#e0000000/pic#700
19 0x0000e IPIC 0xcf801c80 /soc8347#e0000000/pic#700
20 0x0000f IPIC 0xcf801c80 /soc8347#e0000000/pic#700
21 0x00010 IPIC 0xcf801c80 /soc8347#e0000000/pic#700
77 0x0004d IPIC 0xcf801c80 /soc8347#e0000000/pic#700

SIGSEGV when using pthreads in Stop-and-Wait Protocol implementation

I'm a college student and as part of a Networks Assignment I need to do an implementation of the Stop-and-Wait Protocol. The problem statement requires using 2 threads. I am a novice to threading but after going through the man pages for the pthreads API, I wrote the basic code. However, I get a segmentation fault after the thread is created successfully (on execution of the first line of the function passed to pthread_create() as an argument).
typedef struct packet_generator_args
{
int max_pkts;
int pkt_len;
int pkt_gen_rate;
} pktgen_args;
/* generates and buffers packets at a mean rate given by the
pkt_gen_rate field of its argument; runs in a separate thread */
void *generate_packets(void *arg)
{
pktgen_args *opts = (pktgen_args *)arg; // error occurs here
buffer = (char **)calloc((size_t)opts->max_pkts, sizeof(char *));
if (buffer == NULL)
handle_error("Calloc Error");
//front = back = buffer;
........
return 0;
}
The main thread reads packets from this bufffer and runs the stop-and wait algorithm.
pktgen_args thread_args;
thread_args.pkt_len = DEF_PKT_LEN;
thread_args.pkt_gen_rate = DEF_PKT_GEN_RATE;
thread_args.max_pkts = DEF_MAX_PKTS;
/* initialize sockets and other data structures */
.....
pthread_t packet_generator;
pktgen_args *thread_args1 = (pktgen_args *)malloc(sizeof(pktgen_args));
memcpy((void *)thread_args1, (void *)&thread_args, sizeof(pktgen_args));
retval = pthread_create(&packet_generator, NULL, &generate_packets, (void *)thread_args1);
if (retval != 0)
handle_error_th(retval, "Thread Creation Error");
.....
/* send a fixed no of packets to the receiver wating for ack for each. If
the ack is not received till timeout occurs resend the pkt */
.....
I have tried debugging using gdb but am unable to understand why a segmentation fault is occuring at the first line of my generate_packets() function. Hopefully, one of you can help. If anyone needs additional context, the entire code can be obtained at http://pastebin.com/Z3QtEJpQ. I am in a real jam here having spent hours over this. Any help will be appreciated.
You initialize your buffer as NULL:
char **buffer = NULL;
and then in main() without further do, you try to address it:
while (!buffer[pkts_ackd]); /* wait as long as the next pkt has not
Basically my semi-educated guess is that your thread hasn't generated any packets yet and you crash on trying to access an element in NULL.
[162][04:34:17] vlazarenko#alluminium (~/tests) > cc -ggdb -o pthr pthr.c 2> /dev/null
[163][04:34:29] vlazarenko#alluminium (~/tests) > gdb pthr
GNU gdb 6.3.50-20050815 (Apple version gdb-1824) (Thu Nov 15 10:42:43 UTC 2012)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries .. done
(gdb) run
Starting program: /Users/vlazarenko/tests/pthr
Reading symbols for shared libraries +............................. done
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000000
0x000000010000150d in main (argc=1, argv=0x7fff5fbffb10) at pthr.c:205
205 while (!buffer[pkts_ackd]); /* wait as long as the next pkt has not
(gdb)

Resources