Why doesn't enabling the rtc interrupt show in the /proc/interrupts? - linux

I've written a simple app to enable rtc interrupts.
#include <stdio.h>
#include <fcntl.h>
#include <linux/rtc.h>
#include <sys/ioctl.h>
int main() {
int fd = open("/dev/rtc0",O_RDONLY);
int hz = 64;
if (ioctl(fd, RTC_IRQP_SET, hz) == -1){
printf("ioctl(RTC_IRQP_SET) failed");
return 1;
}
if (ioctl(fd, RTC_PIE_ON) == -1){
printf("ioctl(RTC_PIE_ON) failed");
return 1;
}
}
After its run, I was expecting the interrupts to show up in /proc/interrupts under IRQ8.
From https://www.kernel.org/doc/Documentation/rtc.txt:
However it can also be used to generate signals from a slow 2Hz to a
relatively fast 8192Hz, in increments of powers of two. These signals
are reported by interrupt number 8. (Oh! So that is what IRQ 8 is
for...) It can also function as a 24hr alarm, raising IRQ 8 when the
alarm goes off.
But there was no change. The
8: 0 1 IO-APIC-edge rtc0
remained passive. What am I missing here?

The answer is that periodic interrupts (PIE) are implemented using a timer or hrtimer (depending on your machine), not the RTC. You can have a look at:
http://lxr.free-electrons.com/source/drivers/rtc/interface.c#L574 and
http://lxr.free-electrons.com/source/drivers/char/rtc.c#L445
Basically, you will only get an interrupt when you set an alarm.

Related

Linux Kernel: invoke call back function in user space from kernel space

I am writing Linux user space application. where I want to invoke registered callback function in user space area from the kernel space.
i.e. interrupt arriving on GPIO pin(switch press event) and registered function getting called in user space.
is there any method is available to do this.
Thanks
I found below code after lot of digging and perfectly works for me.
Handling interrupts from GPIO
In many cases, a GPIO input can be configured to generate an interrupt when it
changes state, which allows you to wait for the interrupt rather than polling in
an inefficient software loop. If the GPIO bit can generate interrupts, the file edge
exists. Initially, it has the value none , meaning that it does not generate interrupts.
To enable interrupts, you can set it to one of these values:
• rising: Interrupt on rising edge
• falling: Interrupt on falling edge
• both: Interrupt on both rising and falling edges
• none: No interrupts (default)
You can wait for an interrupt using the poll() function with POLLPRI as the event. If
you want to wait for a rising edge on GPIO 48, you first enable interrupts:
#echo 48 > /sys/class/gpio/export
#echo rising > /sys/class/gpio/gpio48/edge
Then, you use poll() to wait for the change, as shown in this code example:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <poll.h>>
int main(void) {
int f;
struct pollfd poll_fds [1];
int ret;
char value[4];
int n;
f = open("/sys/class/gpio/gpio48", O_RDONLY);
if (f == -1) {
perror("Can't open gpio48");
return 1;
}
poll_fds[0].fd = f;
poll_fds[0].events = POLLPRI | POLLERR;
while (1) {
printf("Waiting\n");
ret = poll(poll_fds, 1, -1);
if (ret > 0) {
n = read(f, &value, sizeof(value));
printf("Button pressed: read %d bytes, value=%c\n", n, value[0]);
}
}
return 0;
}
Have to implement a handler in a kernel module that triggers e.g. a char device. From user space it could be accessed by polling (e.g. ioctl() calls). It seems that it is the only way at the moment.

Why does my process take too long to die?

Basically I'm using Linux 2.6.34 on PowerPC (Freescale e500mc). I have a process (a kind of VM that was developed in-house) that uses about 2.25 G of mlocked VM. When I kill it, I notice that it takes upwards of 2 minutes to terminate.
I investigated a little. First, I closed all open file descriptors but that didn't seem to make a difference. Then I added some printk in the kernel and through it I found that all delay comes from the kernel unlocking my VMAs. The delay is uniform across pages, which I verified by repeatedly checking the locked page count in /proc/meminfo. I've checked with programs that allocate that much memory and they all die as soon as I signal them.
What do you think I should check now? Thanks for your replies.
Edit: I had to find a way to share more information about the problem so I wrote this below program:
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <string.h>
#include <errno.h>
#include <signal.h>
#include <sys/time.h>
#define MAP_PERM_1 (PROT_WRITE | PROT_READ | PROT_EXEC)
#define MAP_PERM_2 (PROT_WRITE | PROT_READ)
#define MAP_FLAGS (MAP_ANONYMOUS | MAP_FIXED | MAP_PRIVATE)
#define PG_LEN 4096
#define align_pg_32(addr) (addr & 0xFFFFF000)
#define num_pg_in_range(start, end) ((end - start + 1) >> 12)
inline void __force_pgtbl_alloc(unsigned int start)
{
volatile int *s = (int *) start;
*s = *s;
}
int __map_a_page_at(unsigned int start, int whichperm)
{
int perm = whichperm ? MAP_PERM_1 : MAP_PERM_2;
if(MAP_FAILED == mmap((void *)start, PG_LEN, perm, MAP_FLAGS, 0, 0)){
fprintf(stderr,
"mmap failed at 0x%x: %s.\n",
start, strerror(errno));
return 0;
}
return 1;
}
int __mlock_page(unsigned int addr)
{
if (mlock((void *)addr, (size_t)PG_LEN) < 0){
fprintf(stderr,
"mlock failed on page: 0x%x: %s.\n",
addr, strerror(errno));
return 0;
}
return 1;
}
void sigint_handler(int p)
{
struct timeval start = {0 ,0}, end = {0, 0}, diff = {0, 0};
gettimeofday(&start, NULL);
munlockall();
gettimeofday(&end, NULL);
timersub(&end, &start, &diff);
printf("Munlock'd entire VM in %u secs %u usecs.\n",
diff.tv_sec, diff.tv_usec);
exit(0);
}
int make_vma_map(unsigned int start, unsigned int end)
{
int num_pg = num_pg_in_range(start, end);
if (end < start){
fprintf(stderr,
"Bad range: start: 0x%x end: 0x%x.\n",
start, end);
return 0;
}
for (; num_pg; num_pg --, start += PG_LEN){
if (__map_a_page_at(start, num_pg % 2) && __mlock_page(start))
__force_pgtbl_alloc(start);
else
return 0;
}
return 1;
}
void display_banner()
{
printf("-----------------------------------------\n");
printf("Virtual memory allocator. Ctrl+C to exit.\n");
printf("-----------------------------------------\n");
}
int main()
{
unsigned int vma_start, vma_end, input = 0;
int start_end = 0; // 0: start; 1: end;
display_banner();
// Bind SIGINT handler.
signal(SIGINT, sigint_handler);
while (1){
if (!start_end)
printf("start:\t");
else
printf("end:\t");
scanf("%i", &input);
if (start_end){
vma_end = align_pg_32(input);
make_vma_map(vma_start, vma_end);
}
else{
vma_start = align_pg_32(input);
}
start_end = !start_end;
}
return 0;
}
As you would see, the program accepts ranges of virtual addresses, each range being defined by start and end. Each range is then further subdivided into page-sized VMAs by giving different permissions to adjacent pages. Interrupting (using SIGINT) the program triggers a call to munlockall() and the time for said procedure to complete is duly noted.
Now, when I run it on freescale e500mc with Linux version at 2.6.34 over the range 0x30000000-0x35000000, I get a total munlockall() time of almost 45 seconds. However, if I do the same thing with smaller start-end ranges in random orders (that is, not necessarily increasing addresses) such that the total number of pages (and locked VMAs) is roughly the same, observe total munlockall() time to be no more than 4 seconds.
I tried the same thing on x86_64 with Linux 2.6.34 and my program compiled against the -m32 parameter and it seems the variations, though not so pronounced as with ppc, are still 8 seconds for the first case and under a second for the second case.
I tried the program on Linux 2.6.10 on the one end and on 3.19, on the other and it seems these monumental differences don't exist there. What's more, munlockall() always completes at under a second.
So, it seems that the problem, whatever it is, exists only around the 2.6.34 version of the Linux kernel.
You said the VM was developed in-house. Does this mean you have access to the source? I would start by checking to see if it has anything to stop it from immediately terminating to avoid data loss.
Otherwise, could you potentially try to provide more information? You may also want to check out: https://unix.stackexchange.com/ as they would be better suited to help with any issues the linux kernel may be having.

Linux input device events, how to retrieve initial state

I am using the gpio-keys device driver to handle some buttons in an embedded device running Linux. Applications in user space can just open /dev/input/eventX and read input events in a loop.
My question is how to get the initial states of the buttons. There is an ioctl call (EVIOCGKEY) which can be used for this, however if I first check this and then start to read from /dev/input/eventX, there's no way to guarantee that the state did not change in between.
Any suggestions?
The evdev devices queue events until you read() them, so in most cases opening the device, doing the ioctl() and immediately starting to read events from it should work. If the driver dropped some events from the queue, it sends you a SYN_DROPPED event, so you can detect situations where that happened. The libevdev documentation has some ideas on how one should handle that situation; the way I read it you should simply retry, i.e. drop all pending events, and redo the ioctl() until there are no more SYN_DROPPED events.
I used this code to verify that this approach works:
#include <stdio.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <linux/input.h>
#include <string.h>
#define EVDEV "/dev/input/event9"
int main(int argc, char **argv) {
unsigned char key_states[KEY_MAX/8 + 1];
struct input_event evt;
int fd;
memset(key_states, 0, sizeof(key_states));
fd = open(EVDEV, O_RDWR);
ioctl(fd, EVIOCGKEY(sizeof(key_states)), key_states);
// Create some inconsistency
printf("Type (lots) now to make evdev drop events from the queue\n");
sleep(5);
printf("\n");
while(read(fd, &evt, sizeof(struct input_event)) > 0) {
if(evt.type == EV_SYN && evt.code == SYN_DROPPED) {
printf("Received SYN_DROPPED. Restart.\n");
fsync(fd);
ioctl(fd, EVIOCGKEY(sizeof(key_states)), key_states);
}
else if(evt.type == EV_KEY) {
// Ignore repetitions
if(evt.value > 1) continue;
key_states[evt.code / 8] ^= 1 << (evt.code % 8);
if((key_states[evt.code / 8] >> (evt.code % 8)) & 1 != evt.value) {
printf("Inconsistency detected: Keycode %d is reported as %d, but %d is stored\n", evt.code, evt.value,
(key_states[evt.code / 8] >> (evt.code % 8)) & 1);
}
}
}
}
After starting, the program deliberately waits 5 seconds. Hit some keys in that time to fill the buffer. On my system, I need to enter about 70 characters to trigger a SYN_DROPPED. The EV_KEY handling code checks if the events are consistent with the state reported by the EVIOCGKEY ioctl.

perf_event_open Overflow Signal

I want to count the (more or less) exact amount of instructions for some piece of code. Additionally, I want to receive a Signal after a specific amount of instructions passed.
For this purpose, I use the overflow signal behaviour provided by
perf_event_open.
I'm using the second way the manpage proposes to achieve overflow signals:
Signal overflow
Events can be set to deliver a signal when a threshold
is crossed. The signal handler is set up using the poll(2), select(2),
epoll(2) and fcntl(2), system calls.
[...]
The other way is by use of the PERF_EVENT_IOC_REFRESH ioctl. This
ioctl adds to a counter that decrements each time the event overflows.
When nonzero, a POLL_IN signal is sent on overflow, but once the value
reaches 0, a signal is sent of type POLL_HUP and the underlying event
is disabled.
Further explanation of PERF_EVENT_IOC_REFRESH ioctl:
PERF_EVENT_IOC_REFRESH
Non-inherited overflow counters can use this to enable a
counter for a number of overflows specified by the argument,
after which it is disabled. Subsequent calls of this ioctl
add the argument value to the current count. A signal with
POLL_IN set will happen on each overflow until the count
reaches 0; when that happens a signal with POLL_HUP set is
sent and the event is disabled. Using an argument of 0 is
considered undefined behavior.
A very minimal example would look like this:
#define _GNU_SOURCE 1
#include <asm/unistd.h>
#include <fcntl.h>
#include <linux/perf_event.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
long perf_event_open(struct perf_event_attr* event_attr, pid_t pid, int cpu, int group_fd, unsigned long flags)
{
return syscall(__NR_perf_event_open, event_attr, pid, cpu, group_fd, flags);
}
static void perf_event_handler(int signum, siginfo_t* info, void* ucontext) {
if(info->si_code != POLL_HUP) {
// Only POLL_HUP should happen.
exit(EXIT_FAILURE);
}
ioctl(info->si_fd, PERF_EVENT_IOC_REFRESH, 1);
}
int main(int argc, char** argv)
{
// Configure signal handler
struct sigaction sa;
memset(&sa, 0, sizeof(struct sigaction));
sa.sa_sigaction = perf_event_handler;
sa.sa_flags = SA_SIGINFO;
// Setup signal handler
if (sigaction(SIGIO, &sa, NULL) < 0) {
fprintf(stderr,"Error setting up signal handler\n");
perror("sigaction");
exit(EXIT_FAILURE);
}
// Configure perf_event_attr struct
struct perf_event_attr pe;
memset(&pe, 0, sizeof(struct perf_event_attr));
pe.type = PERF_TYPE_HARDWARE;
pe.size = sizeof(struct perf_event_attr);
pe.config = PERF_COUNT_HW_INSTRUCTIONS; // Count retired hardware instructions
pe.disabled = 1; // Event is initially disabled
pe.sample_type = PERF_SAMPLE_IP;
pe.sample_period = 1000;
pe.exclude_kernel = 1; // excluding events that happen in the kernel-space
pe.exclude_hv = 1; // excluding events that happen in the hypervisor
pid_t pid = 0; // measure the current process/thread
int cpu = -1; // measure on any cpu
int group_fd = -1;
unsigned long flags = 0;
int fd = perf_event_open(&pe, pid, cpu, group_fd, flags);
if (fd == -1) {
fprintf(stderr, "Error opening leader %llx\n", pe.config);
perror("perf_event_open");
exit(EXIT_FAILURE);
}
// Setup event handler for overflow signals
fcntl(fd, F_SETFL, O_NONBLOCK|O_ASYNC);
fcntl(fd, F_SETSIG, SIGIO);
fcntl(fd, F_SETOWN, getpid());
ioctl(fd, PERF_EVENT_IOC_RESET, 0); // Reset event counter to 0
ioctl(fd, PERF_EVENT_IOC_REFRESH, 1); //
// Start monitoring
long loopCount = 1000000;
long c = 0;
long i = 0;
// Some sample payload.
for(i = 0; i < loopCount; i++) {
c += 1;
}
// End monitoring
ioctl(fd, PERF_EVENT_IOC_DISABLE, 0); // Disable event
long long counter;
read(fd, &counter, sizeof(long long)); // Read event counter value
printf("Used %lld instructions\n", counter);
close(fd);
}
So basically I'm doing the following:
Set up a signal handler for SIGIO signals
Create a new performance counter with perf_event_open (returns a file descriptor)
Use fcntl to add signal sending behavior to the file descriptor.
Run a payload loop to execute many instructions.
When executing the payload loop, at some point 1000 instructions (the sample_interval) will have been executed. According to the perf_event_open manpage this triggers an overflow which will then decrement an internal counter.
Once this counter reaches zero, "a signal is sent of type POLL_HUP and the underlying event is disabled."
When a signal is sent, the control flow of the current process/thread is stopped, and the signal handler is executed. Scenario:
1000 instructions have been executed.
Event is automatically disabled and a signal is sent.
Signal is immediately delivered, control flow of the process is stopped and the signal handler is executed.
This scenario would mean two things:
The final amount of counted instructions would always be equal to an example which does not use signals at all.
The instruction pointer which has been saved for the signal handler (and can be accessed through ucontext) would directly point to the instruction which caused the overflow.
Basically you could say, the signal behavior can be seen as synchronous.
This is the perfect semantic for what I want to achieve.
However, as far as I'm concerned, the signal I configured is generally rather asynchronous and some time may pass until it is eventually delivered and the signal handler is executed. This may pose a problem for me.
For example, consider the following scenario:
1000 instructions have been executed.
Event is automatically disabled and a signal is sent.
Some more instructions pass
Signal is delivered, control flow of the process is stopped and the signal handler is executed.
This scenario would mean two things:
The final amount of counted instructions would be less than an example which does not use signals at all.
The instruction pointer which has been saved for the signal handler would point to the instructions which caused the overflow or to any one after it.
So far, I've tested above example a lot and did not experience missed instructions which would support the first scenario.
However, I'd really like to know, whether I can rely on this assumption or not.
What happens in the kernel?
I want to count the (more or less) exact amount of instructions for some piece of code. Additionally, I want to receive a Signal after a specific amount of instructions passed.
You have two task which may conflict with each other. When you want to get counting (exact amounts of some hardware event), just use performance monitoring unit of your CPU in counting mode (don't set sample_period/sample_freq of perf_event_attr structure used) and place the measurement code in your target program (as it was done in your example). In this mode according to the man page of perf_event_open no overflows will be generated (CPU's PMU are usually 64-bit wide and don't overflow when not set to small negative value when sampling mode is used):
Overflows are generated only by sampling events (sample_period must a nonzero value).
To count part of program, use ioctls of perf_event_open returned fd as described in man page
perf_event ioctl calls - Various ioctls act on perf_event_open() file descriptors: PERF_EVENT_IOC_ENABLE ... PERF_EVENT_IOC_DISABLE ... PERF_EVENT_IOC_RESET
You can read current value with rdpmc (on x86) or by read syscall on the fd like in the short example from the man page:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <sys/ioctl.h>
#include <linux/perf_event.h>
#include <asm/unistd.h>
static long
perf_event_open(struct perf_event_attr *hw_event, pid_t pid,
int cpu, int group_fd, unsigned long flags)
{
int ret;
ret = syscall(__NR_perf_event_open, hw_event, pid, cpu,
group_fd, flags);
return ret;
}
int
main(int argc, char **argv)
{
struct perf_event_attr pe;
long long count;
int fd;
memset(&pe, 0, sizeof(struct perf_event_attr));
pe.type = PERF_TYPE_HARDWARE;
pe.size = sizeof(struct perf_event_attr);
pe.config = PERF_COUNT_HW_INSTRUCTIONS;
pe.disabled = 1;
pe.exclude_kernel = 1;
pe.exclude_hv = 1;
fd = perf_event_open(&pe, 0, -1, -1, 0);
if (fd == -1) {
fprintf(stderr, "Error opening leader %llx\n", pe.config);
exit(EXIT_FAILURE);
}
ioctl(fd, PERF_EVENT_IOC_RESET, 0);
ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
printf("Measuring instruction count for this printf\n");
/* Place target code here instead of printf */
ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
read(fd, &count, sizeof(long long));
printf("Used %lld instructions\n", count);
close(fd);
}
Additionally, I want to receive a Signal after a specific amount of instructions passed.
Do you really want to get signal or you just need instruction pointers at every 1000 instructions executed? If you want to collect pointers, use perf_even_open with sampling mode, but do it from other program to disable measuring of the event collection code. Also, it will have less negative effect on your target program, if you will use not signals for every overflow (with huge amount of kernel-tracer interactions and switching from/to kernel), but instead use capabilities of perf_events to collect several overflow events into single mmap buffer and poll on this buffer. On overflow interrupt from PMU perf interrupt handler will be called to save the instruction pointer into buffer and then counting will be reset and program will return to execution. In your example, perf interrupt handler will woke your program, it will do several syscalls, return to kernel and then kernel will restart target code (so overhead per sample is greater than using mmap and parsing it). With precise_ip flag you may activate advanced sampling of your PMU (if it has such mode, like PEBS and PREC_DIST in intel x86/em64t for some counters like INST_RETIRED, UOPS_RETIRED, BR_INST_RETIRED, BR_MISP_RETIRED, MEM_UOPS_RETIRED, MEM_LOAD_UOPS_RETIRED, MEM_LOAD_UOPS_LLC_HIT_RETIRED and with simple hack to cycles too; or like IBS of AMD x86/amd64; paper about PEBS and IBS), when instruction address is saved directly by hardware with low skid. Some very advanced PMUs has ability to do sampling in hardware, storing overflow information of several events in row with automatic reset of counter without software interrupts (some descriptions on precise_ip are in the same paper).
I don't know if it is possible in perf_events subsystem and in your CPU to have two perf_event tasks active at same time: both count events in the target process and in the same time have sampling from other process. With advanced PMU this can be possible in the hardware and perf_events in modern kernel may allow it. But you give no details on your kernel version and your CPU vendor and family, so we can't answer this part.
You also may try other APIs to access PMU like PAPI or likwid (https://github.com/RRZE-HPC/likwid). Some of them may directly read PMU registers (sometimes MSR) and may allow sampling at the same time when counting is enabled.

setitimer and signal count on Linux. Is signal count directly proportional to run time?

There is a test program to work with setitimer on Linux (kernel 2.6; HZ=100). It sets various itimers to send signal every 10 ms (actually it is set as 9ms, but the timeslice is 10 ms). Then program runs for some fixed time (e.g. 30 sec) and counts signals.
Is it guaranteed that signal count will be proportional to running time? Will count be the same in every run and with every timer type (-r -p -v)?
Note, on the system should be no other cpu-active processes; and the question is about fixed-HZ kernel.
#include <stdlib.h>
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/time.h>
/* Use 9 ms timer */
#define usecs 9000
int events = 0;
void count(int a) {
events++;
}
int main(int argc, char**argv)
{
int timer,j,i,k=0;
struct itimerval timerval = {
.it_interval = {.tv_sec=0, .tv_usec=usecs},
.it_value = {.tv_sec=0, .tv_usec=usecs}
};
if ( (argc!=2) || (argv[1][0]!='-') ) {
printf("Usage: %s -[rpv]\n -r - ITIMER_REAL\n -p - ITIMER_PROF\n -v - ITIMER_VIRTUAL\n", argv[0]);
exit(0);
}
switch(argv[1][1]) {
case'r':
timer=ITIMER_REAL;
break;
case'p':
timer=ITIMER_PROF;
break;
case'v':
timer=ITIMER_VIRTUAL;
};
signal(SIGALRM,count);
signal(SIGPROF,count);
signal(SIGVTALRM,count);
setitimer(timer, &timerval, NULL);
/* constants should be tuned to some huge value */
for (j=0; j<4; j++)
for (i=0; i<2000000000; i++)
k += k*argc + 5*k + argc*3;
printf("%d events\n",events);
return 0;
}
Is it guaranteed that signal count will be proportional to running time?
Yes. In general, for all the three timers the longer the code runs, the more the number of signals received.
Will count be the same in every run and with every timer type (-r -p -v)?
No.
When the timer is set using ITIMER_REAL, the timer decrements in real time.
When it is set using ITIMER_VIRTUAL, the timer decrements only when the process is executing in the user address space. So, it doesn't decrement when the process makes a system call or during interrupt service routines.
So we can expect that #real_signals > #virtual_signals
ITIMER_PROF timers decrement both during user space execution of the process and when the OS is executing on behalf of the process i.e. during system calls.
So #prof_signals > #virtual_signals
ITIMER_PROF doesn't decrement when OS is not executing on behalf of the process. So #real_signals > #prof_signals
To summarise, #real_signals > #prof_signals > #virtual_signals.

Resources