Why do we need a hold pin when we can use the ready pin in 8086 processor? - io

When supporting DMA operations in max mode we use ready signals to keep the microprocessor in wait state. Why can't we do the same in the min mode?
Thanks in advance

Related

Accessing GPIO throgh /dev/mem is it safer?

I am doing a project where the gpio switching should be fast like 40MHz speed. I checked with "sysfs" interface and the switching speed is around 300Hz. It is not at all acceptable in our case.
So, in some forums I read using /dev/mem access will increase the switching speed. I used /dev/mem and the achieved the speed 30-32MHz and it is OK for us. Now the project is going for field testing, will it cause any issue like kernel crash something like that in long run.
As far as I know, i.mx6 does not have atomic pin set/reset functionality, therefore you must assure that all GPIO output pins are controlled by your application, neither the kernel nor another process should ever attempt to change any output pin on the same GPIO controller. Reading input pins, or assigning some pins to other periperals should be OK, but always ensure that no bit-banging happens behind the scenes (e.g. some SPI drivers think that they know better when to set or reset CS, and quietly set the CS pin to GPIO output, taking it away from the SPI peripheral)
You can sustain that output speed as long as your process is not interrupted. If you don't disable interrupts, you will get glitches in the output. If you do, then the kernel scheduler and interrupt-driven hardware drivers stop working. On a dual or quad core system, it should be possible to reserve a core for exclusive use by your process, and let the rest of the system run on the other core(s). Don't just blindly disable interrupts, but use sched_setaffinity(2) and the isolcpus kernel parameter.

Handling IRQ delay in linux driver

I've build a linux driver for an SPI device.
The SPI device sends an IRQ to the processor when a new data is ready to be read.
The IRQ fires about every 3 ms, then the driver goes to read 2 bytes with SPI.
The problem I have is that sometimes, there's more than 6 ms between the IRQ has been fired and the moment where SPI transfer starts, which means I lost 2 bytes of the SPI device.
In addition, there's a uncertain delay between the 2 bytes; sometime it's close to 0, sometime it's up to 300us..
Then my question is : how can I reduce the latency between IRQ and SPI readings ?
And how to avoid latency between the 2 bytes ?
I've tried compiling the kernel with premptive option, it does not change things that much.
As for the hardware, I'm using a mini2440 board running at 400 MHz, using a hardware SPI port (not i/o simulated SPI).
Thanks for help.
BR,
Vincent.
From the brochure of the Samsung S3C2440A CPU, the SPI interface hardware supports both interrupt and DMA-based operation. A look at the actual datasheet reveals that the hardware also supports a polling mode.
If you want to achieve high data rates reliably, the DMA-based approach is what you need. Once a DMA operation is configured, the hardware will move the data to RAM on its own, without the need for low-latency interrupt handling.
That said, I do not know the state of the Linux SPI drivers for your CPU. It could be a matter of missing support for DMA, of specific system settings or even of how you are using the driver from your own code. The details w.r.t. SPI are often highly dependent on the particular implementation...
I had a similar problem: I basically got an IRQ and needed to drain a queue via SPI in less than 10 ms or the chip would start to drop data. With high system load (ssh login was actually enough) sometimes the delay between the IRQ handler enqueueing the next SPI transfer with spi_async and the SPI transfer actually happening exceeded 11 ms.
The solution I found was the rt flag in struct spi_device (see here). Enabling that will set the thread that controls the SPI to real-time priority, which made the timing of all SPI transfers super reliable. And by the way that change also removes delay before the complete callback.
Just as a heads up, I think this was not available in earlier kernel versions.
The thing is Linux SPI stack uses queues for transmitting the messages.
This means that there is no guarantee about the delay between the moment you ask to send the SPI message, and the moment where it is effectively sent.
Finally, to fullfill my 3ms requirements between each SPI message, I had to stop using Linux SPI stack, and directly write into the CPU's register inside my own IRQ.
That's highly dirty, but it's the only way to make it work with small delays.

Efficient detection if value at memory address has been changed?

This more a general question. Consider an external device. From time to time this device writes data via its device driver to a specific memory address. I want to write a small C program which read out this data. Is there a better way than just polling this address to check if the value has been changed? I want to keep the CPU load low.
I have done some further research
Is "memory mapped IO" an option? My naive idea is to let the external device writes a flag to a "memory mapped IO"-address which triggers a kernel device driver. The driver then "informs" the program which proceed the value. Can this work? How can a driver informs the program?
The answer may depend on what processor you intend to use, what the device is and possibly whether you are using an operating system or RTOS.
Memory mapped I/O per se is not a solution, that simply refers to I/O device registers that can be directly addressed via normal memory access instructions. Most devices will generate an interrupt when certain registers are updated or contain new valid data.
In general if using an RTOS you can arrange for the device driver to signal via a suitable IPC mechanism any client thread(s) that need to handle the data. If you are not using an RTOS, you could simply register a callback with the device driver which it would call whenever the data is updated. What the client does in the call back is its business - including reading the new data.
If the device in question generates interrupts, then the handling can be done on interrupt, if the device is capable of DMA, then it can handle blocks of data autonomously before teh DMA controller generates an DMA interrupt to a handler.

Critical Timing in an ARM Linux Kernel Driver

I am running linux on an MX28 (ARMv5), and am using a GPIO line to talk to a device. Unfortunately, the device has some special timing requirements. A low on the GPIO line cannot last longer than 7us, highs have no special timing requirements. The code is implemented as a kernel device driver, and toggles the GPIO with direct register writes rather than going through the kernel GPIO api. For testing, I am just generating 3 pulses. The process is as follows, all in one function so it should fit in the instruction cache:
set gpio high
Save Flags & Disable Interrupts
gpio low
pause
gpio high
repeat 2x more
Restore Flags/Reenable Interrups
Here's the output of a logic analyzer tied to the GPIO.
Most of the time it works just great, and the pulses last just under 1us. However, about 10% of the lows last for many, many microseconds. Even though interrupts are disabled, something is causing the flow of the code to be interrupted.
I am at a loss. RT Linux would likely not help here, because the problem is not latency, it appears to be something happening during the low, even though nothing should interrupt it with the IRQs disabled. Any suggestions would be greatly, greatly appreciated.
The ARM cache on an IMX25 (ARM926) is 16K Code, 16K Data L1 with a 32byte length or eight instructions. With the DDR-SDRAM controller running at 133Mhz and a 16bit bus the transfer rate is about 300MB/s. A cache fill should only take about 100nS, not 9uS; this is about 100 times too long.
However, you have four other issues with Linux.
TLB misses and a page table walk.
Data aborts.
DMA masters stealing.
FIQ interrupts.
It is unlikely that the LCD master is stealing enough bandwidth, unless you have a huge display. Is your display larger than 1/4VGA? If not, this is only 10% of the memory bandwidth and this will pipeline with the processor. Do you have either Ethernet or USB active? These peripherals are higher data rate and could cause this type of contention with SDRAM.
All of these issues maybe avoided by writing your toggler PC relative and copying it to the IRAM. See: iram_alloc.c; this file should be portable to older versions of Linux. The XBAR switch allows fetches from SDRAM and IRAM simultaneously. The IRAM can still be a target of other DMA masters. If you are really pressed, move the code to the ETB buffers which no other master in the system can access.
The TLB miss can actually be quite steep as it may need to run several single beat SDRAM cycles; still this should be under 1uS. You have not posted code, so it is possible that a variable and/or other is causing a data fault which is not maskable.
If you have any drivers using the FIQ, they may still be running even though you have masked the normal IRQ interrupts. For instance, the ALSA driver for this system normally uses the FIQ.
Both the ETB and the IRAM are 32-bit data paths and low wait state. Either one will probably give better response than the DDR-SDRAM.
We have achieved sub micro-second response by using a FIQ and IRAM to toggle GPIOs on an IMX258 with another protocol using bit banging.
One possible workaround to the problem Chris mentioned (in addition to problems with paging of kernel module code) is to use a PWM peripheral where the duration of the pulse is pre-programmed and the timing is implemented in hardware.
Fancy processors with caches are not suitable for hard realtime work. Execution time varies if cache misses are non-deterministic (and designs where cache misses are completely deterministic aren't complicated enough to justify a fancy processor).
You can try to avoid memory controller latency during critical sections by aligning the critical section so that it doesn't straddle cache lines. Or prefetch the code you will need. But this is going to be very non-portable and create a nightmare for future maintenance. And still doesn't protect the access to memory-mapped GPIO from bus contention.

How to generate a square wave by Linux kernel

I need to develop a Linux driver that generates a square wave, with a cycle of about 1ms, using the MIPS platform (this is not i386).
I tried some methods, but these are not success:
Use timer/hrtimer --> but cycle is 12ms and unstable
Cannot use realtime additional packages as RTLinux/RTAI, because these do not support for MIPS
Use the kernel-thread with a forever loop and udelay function --> It takes too much of the CPU's resource --> Performance is not acceptable
Do you aid me? Or do you thwart me...? (Please help!)
Thank you.
The Unix way would be not doing that at all. Maybe in olden times on single task machines, you would have done like this, but now - if you don't have a hardware circuit that gives to the proper frequency, you may never succeed because hardware timers don't have the necessary resolution, and it may always happen that a task of more importance grabs your CPU time.
As FrankH said, the best solution involves relying on hardware. You should check your processor's reference manual to see if it has a timer.
I'll add this: if it happens to have an Output Compare or PWM subsystem (I'm not familiar with MIPS, but it's not at all uncommon in embedded devices) you can just write a few registers to set it all up, and then you don't need any more processor time.
It might be possible, but to get this from within Linux, the hardware must have certain characteristics:
you need a programmable timer device that can create an interrupt at sufficiently-high priority that other activity by the Linux kernel (such as scheduling or other interrupts, even) won't preempt / block the interrupt handler, and at sufficient granulatity/frequency to meet your signal stability constraints
the "square wave" electrical line must also be programmable and the operation (register write ? memory mapped register write ? special CPU instruction ? ... ?) which switches its state must be guaranteed faster than the shortest cycle time used with the timer above (or else you could get "frequency moire")
If that's the case then your special timer device driver can toggle the line from within its high prio interrupt handler and create the square wave. Since it's both interrupt driven and separate from the normal timer interrupt sources / consumers (i.e. not shared - no latency from possibly dispatching multiple timer events per interrupt), you've got a much better chance of sufficient precision.
Since all this (the existance of a separately-programmable timer device, to start with) is hardware-specific, you need to start with the specs of your CPU/SoC/board and find out if there are multiple independent timers available.

Resources