request_irq succeeds but interrupt is never detected - linux

I am running embedded linux 3.2.6 on an ARM processor. I am using a modified version of atmel's serial driver to control the 4 USART ports on my device. When I use the driver compiled with the kernel, all works fine. But I want to run the driver as a kernel module instead. I make all of the necessary changes and disable the internal driver and everything seems fine. The 4 tty devices are registered successfully and I can see that the all of my probe and initialization functions work correctly.
So here's the problem:
When I try to write to any of the devices, my "start transmit" function gets called but then waits for an interrupt from the usart which never occurs. So the write just hangs, and using a logic analyzer I can see that RTS gets asserted but no bytes show up on the tx line. I know that my call to request_irq succeeds and yet i never see any of the irq entries in /proc/interrupts. In the driver, I have also tried using request_irq to register a separate interrupt handler for a gpio line, and this works fine.
I know that this is a problem that is probably hard to diagnose, but I am looking for ANY possible suggestions that could lead me in the right direction to finding a solution. Let me know if you need any clarifications. Thank you

The symptoms reads like a peripheral clock that has not been enabled (or turned off): the device can be initialized w/o errors and an I/O operation can be setup, but the device doesn't do anything; it plays dead. Since no I/O ever starts, you're never going to get an interrupt indicating completion!
The other thing to check are the conditional compilation directives for HW configuration structures in your arch/arm/mach-xxx/zzz_devices.c file.
Make sure that the serial port structures have something like:
#if defined(CONFIG_SERIAL_ATMEL) || defined(CONFIG_SERIAL_ATMEL_MODULE)
and not just
#if defined(CONFIG_SERIAL_ATMEL)
Addendum
I could be wrong but the clock shouldn't have any effect on the CTS pin causing an interrupt, right?
Not right.
These digital circuits are synchronous state machines: without a clock, a change-of-state by an input cannot be processed.
Also, SoCs and modern uControllers use the peripheral clocks as on/off switches for those integrated peripherals. There is often way more functionality, i.e. peripherals, on the silicon chip than can actually be used, mostly due to insufficient quantity of pins to the board. So disabling the clocks to unused devices is employed to reduce power consumption.
You are far too focused on interrupts.
You do not have a solvable interrupt problem; those are secondary failures.
The lack of output when attempting to transmit is far more significant and revealing.
The root cause is probably a flawed configuration of the USART devices, since transmitting bits is an automatic operation for a configured & operational USART.
If the difference between not-working versus working is loadable module versus static linking, then the root cause is going to be something fundamental (and trivial) like my two suggestions.
Also your lack of acknowledgement regarding the #if defined(), e.g. you didn't respond with "Oh yeah, we already knew that", raises a gigantic red flag that says "Fix me first!"
Addendum 2
I'm tempted to delete this answer after discovering that the Atmel serial driver cannot be configured/built as a loadable module using make menuconfig (which is the premise for half of the answer). (Of course the Kconfig file could be hacked to make the config variable tristate instead of boolean to overcome the module restriction.) I've left a comment for the OP. But I also wanted to preserve the comment to Mr. Stratton pointing out how symbols in the .config file are (not) used.

So I did finally fix my problem. Thank you for the responses, none of them directly solved my problem but they did prompt further examination of my code. After some trial and error I finally got it working. I had originally moved the platform_device structures for each usart from /mach-at91/xxx_devices.c to my loadable module. Well for some reason the structures weren't getting the correct data to map to the hardware, I suppose because it wasn't correctly linking the symbols from the kernel (never got an error message though) and so some of the registration functions weren't even getting called. I ended up moving the structures and platform_device_register calls back into the devices file. I also decided to keep the driver for the console built-in using the original atmel_serial.c driver. I had to change the platform_device name for the console in both the devices file and in the built-in atmel_serial.c file in order for it to not conflict with my usart ports driver. I found that changing the platform_device and platform_driver name for the usarts from anything but "atmel_usart" resulted in usart transmission failing. I really don't understand why, but i'm just leaving it as atmel_usart so it works.
Thanks again to everybody who responded to my problem.

Related

Linux FTDI USB-to-Serial overlflow errors FTDI_RS_OE on 1Mbaud

I am trying to use a genuine FTDI USB-to-serial in Linux at 1Mbaud (instead of the usual 115200). This all works fine, but sometimes I seem to lose data. When checking the counters with the TIOCGICOUNT ioctl call, I can see that I get overrun errors (not buf_overrun).
When looking in the source code of the FTDI linux driver (https://github.com/torvalds/linux/blob/master/drivers/usb/serial/ftdi_sio.c), this is done when the chip sends a USB packet containing a FTDI_RS_OE error flag.
(For reference, when I use the exact same userspace application, but with a totally different serial device (the imx6 mxc), I do not get these errors. It's really an FTDI-driver specific thing)
I find very little regarding this, and strangely enough, the windows driver does not seems to suffer from this problem. If anybody has gotten these FTDI chips working at high speeds in linux, feel free to help me out!
Kind regards,
Arnout
I think I figured it out. Bottom line was that this was a genuine RX overrun due to my code not reading the uart buffers fast enough. When I make sure to "read" the serial port fast enough, I can attain the constant Mbaud data rate. The reason I got totally misguided (and that the weird and unexpected FTDI_RS_OE is sent) I will explain below.
A few notes on the protocol I was using. I was sending some "request" packet on the serial line, and expecting a large reply. (and doing this in loop). "My bug" was that I expected the remote device to reply very quickly, and if not stop processing. This timeout was too short, but the actual reply did still come in. "Some" buffer then overflowed, causing the RX overruns.
But, this was not so clear. A few subtleties:
The rx overrun counter was only incremented upon the NEXT uart read syscall (e.g. minutes later if my code went to some idle state) (NOT immediately after the actual issue happened, which is very confusing)
I was under the assumption that, just like the imx6 driver, the linux kernel would always simply service the USB device if incoming data was available. And that the data would be sent into a 640kB buffer (defined https://elixir.bootlin.com/linux/v4.9.192/source/drivers/tty/tty_buffer.c). In the imx6 driver it can then clearly be seen what happens if that buffer overflows
But that turns out not to be the case. Instead, my best guess at what happens here (I haven't profiled/debugged the kernel to verify this) is that serial "throtteling" happens. When this 640kB is "getting" full, linux will issue a "throttle" callback to the FTDI driver. That then simply uses the generic usb_serial_generic_throttle, which sets a flag, and discards incoming urb data in https://elixir.bootlin.com/linux/latest/ident/usb_serial_generic_read_bulk_callback. This would explain why no overruns are "detected" when the incident actually occurs, but suddenly is detected when (e.g. after 1 minute of inactivity) I restart a read operation. The FTDI chip's internal buffer must be overflowing due to this mechanism, casuing this FTDI_RS_OE flag to be set, which is then only actually correctly parsed when throtteling is disabled again.
So conclusion: The main issue was at my side, but the FTDI driver does not correctly implement its overrun counters (they only show up 'late' or even never depending on the usecase) due to most likely the linux throtteling feature.

How to read the external timer counter on the BeagleBone Black?

I need to count the transitions of a 50KHz binary signal using a BBB. I think using the TIMER4 triggered by the external signal connected to the pin P8.07 would be the easiest way.
So, I issued the following commands to load the proper cape and setup the pin as a timer :
./config-pin overlay cape-universaln
./config-pin P8.07 timer
Everything seems to work and nothing appears in dmesg.
My question is : How can I read the value of TIMER4 ? I looked in SysFs and find nothing interesting. Nothing in /dev as well. How can I retrieve the value of the timer counter I just setup ? I'm open to a C/C++ solution as well, but I would like to avoid doing kernel-space programming.
I'm using the latest Ubuntu Linux for BeagleBone, kernel 4.1.10-ti-r21.
With a little googling I see a pps driver for the AM335x DMTimer subsystem here: https://github.com/ddrown/pps-gmtimer
It looks like it hasn't been merged upstream and the README gives instructions on building it into the 3.8 kernel - you could revert back to 3.8, or you could adapt that for 4.1, in which case you may need to tweak the Device Tree overlay as well for the newer version of the dtc compiler that's in 4.1.
You could also write a pulse counter firmware for the PRU (with only a 50KHz input it wouldn't need to be very optimized at all to catch every pulse). You could send a signal to the ARM every so often and catch that in your userspace program.
Another option would be to directly access the DMTimer registers from userspace using mmap to map the /dev/mem file (example of this method for GPIO here), but that's a pretty "hacky" way to do it, and it's generally frowned upon in the GNU/Linux world to do that sort of stuff from userspace instead of from kernal space.

How to generate a steady 37kHz GPIO trigger from inside linux kernel?

I have a micro controller taking care of infrared TX-carrier wave generation currently, but I started wondering if I could dispose of it, and do this work in linux side - thus bringing the cost of my embedded system down.
I'm running on a Freescale i.mx233 (454MHz ARM9), and if I access registry directly through /dev/mem, I can achieve quite steady 5MHz triggering to a GPIO pin.
Since I need 37kHz, I started looking ways of slowing it down, but it seems that at least nanowait() is way too rough for this purpose.
I found one solution of calling rand() in a for loop, and I seem to be able to generate 38,4kHz signal quite well, However there is some unacceptable jitter from time to time according to oscilloscope. (I understand that this is quite a bit waste of resources, but when the TX needs to be done, the system has no other tasks really)
My questions:
Freescales kernel code (3.8 branch) doesn't have CONFIG_PREEMPT_RT patches, so that is one thing maybe I should look into, but before that:
Could I achieve more accurate performance, by writing a kernel module to drive the GPIO from inside the kernel ? I do need to read up on some data from user space (data to be sent), but other than that, I only need to trigger the led on specified frequency at the end of the GPIO, so the driver should be pretty simple.
Can I force the priority of my driver, so that other tasks don't interrupt this gpio triggering ? (data sending takes currently roughly 400ms, and it's done very seldom)
Is there some better way to create an interrupt say every 37kHz, so that I don't stall the system by SW ?
Micro controller is perfect for this kind of tasks, but it would be nice to avoid this cost overhead if possible...
The i.MX23 PWM in "Multi-Chip Attachment Mode" is designed exactly for this requirement.
Use one of the PWM's in "Multi-Chip Attachment Mode", for example, assuming you are using a 24Mhz clock, with
MATT=1 (Enable multi-chip attachment mode)
MATT_SEL=1 (User 24Mhz clock)
CDIV=0x2 (or DIV_4, i.e. divide by 4)
INACTIVE_STATE=0x2 or 0x3
ACTIVE_STATE=0x3 or 0x2
PERIOD=175 (i.e 176-1)
If you use a 32Mhz clock you will need other CDIV and PERIOD parameters to get to 34Khz.
See the "i.MX23 Applications Processor Reference Manual" for example code. If I am not mistaken the driver code is in arch/arm/plat-mxc/pwm.c but it doesn't seem to support the MATT mode. You will probably have to extend the code yourself.
Regarding the implementation -
The above answer relates to the CPU only. In practice, the ability to implement the idea depends on the board design. The board would need a header (pins for external connection) that connects to a GPIO pin that can be connected via the pinmux to one of the PWMs. I would assume that most reference designs would have at least one PWM configurable GPIO exposed through a header. The the question is if there is only one and if you are already using it for some other control purpose.
After determining that there is a header with a free PWM configurable GPIO, you need to configure the pin mux and activate the PWM. There are instructions for this in the processor reference manual noted above. Most systems do this configuration in the boot loader board_init() (assuming U-boot), although it can probably be done in userspace also with some mmap trickery after Linux boots.
Finally you would need to write a driver based on the interface to the PWM module in platform-mxc_pwm.c.
If you are using the i.MX23 EVK 10.05 you might be able to modify the LED PWM driver since it is already configured at the level of the bootloader and kernel and connect your device to the LED output instead of the LED. (You will need a hardware technician to help you with this.) Make sure you config the kernel with the CONFIG_LEDS_MXS.
The above comments regarding implementation are somewhat speculative since I don't know the EVK. Perhaps someone who knows it can improve on this.
Update September 21, 2013
Another way to generate a 37kHz signal with the i.MX23 or with any SoC with a similar ARM CPU core is to use an unused on-chip timer to generate a FIQ interrupt at the required frequency and write a FIQ interrupt handler to toggle a GPIO pin. Maxime Ripard posted a complete example of this method using the i.MX28 SoC on his Free Electrons blog on April 30 this year. To use this method you will need both an unused timer and not be using the FIQ interrupt for another purpose such as one of the SPI, camera, or brownout-detection drivers that use the ARM FIQ. You will also need to write the ISR in ARM assembler.
The best way to get a 37 kHz signal would be to find some serial/audio/PWM output that can generate it in hardware.
It might be possible to raise the priority of your userspace process, but this won't help against interrupts or high-priority kernel tasks.
An RT kernel would allow you to get priority over more kernel tasks, but wouldn't help against all interrupts.
I don't know if you will be able to get the maximum latency below 37 kHz (27 µs); I think it's unlikely.
Doing this in the kernel would help because you could disable interrupt handling.
However, disabling interrupts for as long as 400 ms is frowned upon.

How to record the system information before system hang? [duplicate]

I have an embedded board with a kernel module of thousands of lines which freeze on random and complexe use case with random time. What are the solution for me to try to debug it ?
I have already try magic System Request but it does not work. I guess that the explanation is that I am in a loop or a deadlock in a code where hardware interrupt is disable ?
Thanks,
Eva.
Typically, embedded boards have a watch dog. You should enable this timer and use the watchdog user process to kick the watch dog hard ware. Use nice on the watchdog process so that higher priority tasks must relinquish the CPU. This gives clues as to the issue. If the device does not reset with a watch dog active, then it maybe that only the network or serial port has stopped communicating. Ie, the kernel has not locked up. The issue is that there is no user visible activity. The watch dog is also useful if/when this type of issue occurs in the field.
For a kernel lockup case, the lockup watchdogs kernel features maybe useful. This will work if you have an infinite loop/deadlock as speculated. However, if this is custom hardware, it is also possible that SDRAM or a peripheral device latches up and causes abnormal bus activity. This will stop the CPU from fetching proper code; obviously, it is tough for Linux to recover from this.
You can combine the watchdog with some fallow memory that is used as a trace buffer. memmap= and mem= can limit the memory used by the kernel. A driver/device using this memory can be written that saves trace points that survive a reboot. The fallow memory's ring buffer is dumped when a watchdog reset is detected on kernel boot.
It is also useful to register thread notifiers that can do a printk on context switches, if the issue is repeatable or to discover how to make the event repeatable. Once you determine a sequence of events that leads to the lockup, you can use the scope or logic analyzer to do some final diagnosis. Or, it maybe evident which peripheral is the issue at this point.
You may also set panic=-1 and reboot=... on the kernel command line. The kdump facilities are useful, if you only have a code problem.
Related: kernel trap (at web archive). This link may no longer be available, but aren't important to this answer.

Is forcing I2C communication safe?

For a project I'm working on I have to talk to a multi-function chip via I2C. I can do this from linux user-space via the I2C /dev/i2c-1 interface.
However, It seems that a driver is talking to the same chip at the same time. This results in my I2C_SLAVE accesses to fail with An errno-value of EBUSY. Well - I can override this via the ioctl I2C_SLAVE_FORCE. I tried it, and it works. My commands reach the chip.
Question: Is it safe to do this? I know for sure that the address-ranges that I write are never accessed by any kernel-driver. However, I am not sure if forcing I2C communication that way may confuse some internal state-machine or so.(I'm not that into I2C, I just use it...)
For reference, the hardware facts:
OS: Linux
Architecture: TI OMAP3 3530
I2C-Chip: TWL4030 (does power, audio, usb and lots of other things..)
I don't know that particular chip, but often you have commands that require a sequence of writes, first to one address to set a certain mode, then you read or write another address -- where the function of the second address changes based on what you wrote to the first one. So if the driver is in the middle of one of those operations, and you interrupt it (or vice versa), you have a race condition that will be difficult to debug. For a reliable solution, you better communicate through the chip's driver...
I mostly agree with #Wim. But I would like to add that this can definitely cause irreversible problems, or destruction, depending on the device.
I know of a Gyroscope (L3GD20) that requires that you don't write to certain locations. The way that the chip is setup, these locations contain manufacturer's settings which determine how the device functions and performs.
This may seem like an easy problem to avoid, but if you think about how I2C works, all of the bytes are passed one bit at a time. If you interrupt in the middle of the transmission of another byte, results can not only be truly unpredictable, but they can also increase the risk of permanent damage exponentially. This is, of course, entirely up to the chip on how to handle the problem.
Since microcontrollers tend to operate at speeds much faster than the bus speeds allowed on I2C, and since the bus speeds themselves are dynamic based on the speeds at which devices process the information, the best bet is to insert pauses or loops between transmissions that wait for things to finish. If you have to, you can even insert a timeout. If these pauses aren't working, then something is wrong with the implementation.

Resources