I don't need exact figures but I want to know a realistic sense of the typical average pc's ability to read input interrupts in 1 millisecond period. Say a mouse keeps moving, how many reads happen for an average or a gaming mouse for that matter, by the os?
In other words if we make a program that tries to record mouse inputs, how frequent should we read in order to read a single input value more than once?
This depends on hardware and what kind of device you are talking about. Intel actually provides the maximum rate of interrupt for its xHCI USB controller. I would say this maximum rate is probably too high for any gaming mouse. The Intel document about xHCI (https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/extensible-host-controler-interface-usb-xhci.pdf) specifies at page 289 that
Interrupt Moderation allows multiple events to be processed in the context of a single Interrupt Service Request (ISR), rather than generating an ISR for each event.The interrupt generation that results from the assertion of the Interrupt Pending (IP) flag may be throttled by the settings of the Interrupter Moderation (IMOD) register of the associated Interrupter. The IMOD register consists of two 16-bit fields: the Interrupt Moderation Counter (IMODC) and the Interrupt Moderation Interval (IMODI).Software may use the IMOD register to limit the rate of delivery of interrupts to the host CPU. This register provides a guaranteed inter-interrupt delay between the interrupts of an Interrupter asserted by the host controller, regardless of USB traffic conditions.The following algorithm converts the inter-interrupt interval value to the common 'interrupts/sec' performance metric:
Interrupts/sec = (250×10-9sec × IMODI) -1
For example, if the IMODI is programmed to 512, the host controller guarantees the host will not be interrupted by the xHC for at least 128 microseconds from the last interrupt. The maximum observable interrupt rate from the xHC should not exceed 8000 interrupts/sec.Inversely, inter-interrupt interval value can be calculated as:
Inter-interrupt interval = (250×10-9sec × interrupts/sec) -1
The optimal performance setting for this register is very system and configuration specific. An initial suggested range for the moderation Interval is 651-5580 (28Bh -15CCh). The IMODI field shall default to 4000 (1 ms.) upon initialization and reset. It may be loaded with an alternative value by software when the Interrupter is initialized
USB works alongside the xHCI to provide interrupts to the system. I'm not a hardware engineer but I would say that the interrupt speed depends on the mouse frequency. For example this mouse: https://www.amazon.ca/Programmable-PICTEK-Computer-Customized-Breathing/dp/B01G8W30BY/ref=sr_1_4?dchild=1&keywords=usb+gaming+mouse&qid=1610137924&s=electronics&sr=1-4, has a frequency of 125HZ to 1000HZ. It probably means that you will get a interrupt frequency of 125/s to 1000/s since the mouse has this frequency. Its optical sensor will check the surface that the mouse is on at this frequency providing an interrupt for a movement.
As to interrupts themselves, I think it depends on the speed of the CPU. Interrupts are masked for a short amount of time while handling. The fastest the CPU, the fastest the interrupt will be unmasked, the fastest a new interrupt can occur. I would say the bottleneck here is the mouse with 1000 interrupts/s, that is 1 interrupt/ms.
Related
On my STM32H753 I've enabled an interruption on the rising edge of one of the GPIOs. Once I get the interrupt (provided of course that the handler acknowledges the IT in the EXTI peripheral), when the signal goes low again, I will be able to get another interruption at the following rising edge.
My question is: what is the minimal duration between the falling edge and the rising edge for the latter to be detected by EXTI ? The datasheet specifies many characteristics of the IOs, in particular the voltage values to consider the input low or high but I didn't find this timing.
Thank you
For the electonic part, you need to refere to you MCU datasheet.
However I beleive you need the information about the software part:
You will be able to handle a new GPIO IRQ (EXTI) as soon as you've aknowlegded the former one by clearing the IRQPending Flag or via HAL APIs.
If two IRQs occurred and you did not clear the IRQPending flag yet, then they will be considered as one IRQ. Benchamrking such delay depends on the Clock speed you're using and the complexity of your EXTI_IRQ_Handler routine.
I'm currently working with termios for serial communication in Linux.
I need to set an intercharacter timeout to 5ms.
I found a way to set intercharacter timeout using VMIN and VTIME where VMIN has to be VMIN > 0 and VTIME > 0.
The problem is that i need to set the VTIME to 5ms, but the VTIME is expressed in tenths of a second.
VTIME data type is unsigned char, so i can't just set it to 0.05.
Does anyone know if there is some way around this?
I need to set an intercharacter timeout to 5ms.
...
Does anyone know if there is some way around this?
No, there is no way to set a shorter termios timeout than 100 ms.
Depending on your hardware and kernel configuration, this timeout may not be reliable at all, especially if you are trying to detect time-separated messages.
The termios handling is at least a full layer above the UART device driver (see
Linux serial drivers).
Unless your kernel is configured to ensure that the bottom-half of the UART driver and the kworker threads for termios are high priority and low latency, then short intercharacter intervals cannot be accurately or reliably determined.
If the UART utilizes a FIFO to buffer incoming data, then that hardware obscures the intercharacter spacing that the software can detect.
Similarly when the UART driver is using DMA to store the received data, intercharacter timing will be obscured.
With DMA the CPU is not involved with handling the received data until the DMA operation is complete, and all temporal information about any intercharacter separation is gone.
(Crucial information such as framing error and/or parity error is difficult/impossible to pinpoint to a specific byte when using DMA.)
Even without DMA, termios will only be able to use timing based on the transfer of data through the tty flip buffers (which is a layer removed from the timing on the wire).
Some UARTs do have hardware that assist in detecting the end-of-message by idle line.
For example Atmel/Microchip ATSAMA5 and AT91SAM9 SoCs have USARTs with a Receiver Timeout feature that measures the idle time after each received frame.
When this idle line time exceeds a specified value, an interrupt can be generated.
The Linux driver for the Atmel USART typically uses the receiver-timeout interrupt to (prematurely) terminate the current DMA receive operation, and copy the contents of the DMA buffer to the tty flip buffer.
In summary you cannot or should not rely solely on VMIN and VTIME settings to detect time-separated messages. See Parsing time-delimited UART data.
The message packets need to have delimiter/sentinel characters/bytes so that messages can be reliably parsed and validated.
See parsing complete messages from serial port for an example of efficient use of syscalls with a local buffer.
I'm implementing a protocol over serial ports on Linux. The protocol is based on a request answer scheme so the throughput is limited by the time it takes to send a packet to a device and get an answer. The devices are mostly arm based and run Linux >= 3.0. I'm having troubles reducing the round trip time below 10ms (115200 baud, 8 data bit, no parity, 7 byte per message).
What IO interfaces will give me the lowest latency: select, poll, epoll or polling by hand with ioctl? Does blocking or non blocking IO impact latency?
I tried setting the low_latency flag with setserial. But it seemed like it had no effect.
Are there any other things I can try to reduce latency? Since I control all devices it would even be possible to patch the kernel, but its preferred not to.
---- Edit ----
The serial controller uses is an 16550A.
Request / answer schemes tends to be inefficient, and it shows up quickly on serial port. If you are interested in throughtput, look at windowed protocol, like kermit file sending protocol.
Now if you want to stick with your protocol and reduce latency, select, poll, read will all give you roughly the same latency, because as Andy Ross indicated, the real latency is in the hardware FIFO handling.
If you are lucky, you can tweak the driver behaviour without patching, but you still need to look at the driver code. However, having the ARM handle a 10 kHz interrupt rate will certainly not be good for the overall system performance...
Another options is to pad your packet so that you hit the FIFO threshold every time. It will also confirm that if it is or not a FIFO threshold problem.
10 msec # 115200 is enough to transmit 100 bytes (assuming 8N1), so what you are seeing is probably because the low_latency flag is not set. Try
setserial /dev/<tty_name> low_latency
It will set the low_latency flag, which is used by the kernel when moving data up in the tty layer:
void tty_flip_buffer_push(struct tty_struct *tty)
{
unsigned long flags;
spin_lock_irqsave(&tty->buf.lock, flags);
if (tty->buf.tail != NULL)
tty->buf.tail->commit = tty->buf.tail->used;
spin_unlock_irqrestore(&tty->buf.lock, flags);
if (tty->low_latency)
flush_to_ldisc(&tty->buf.work);
else
schedule_work(&tty->buf.work);
}
The schedule_work call might be responsible for the 10 msec latency you observe.
Having talked to to some more engineers about the topic I came to the conclusion that this problem is not solvable in user space. Since we need to cross the bridge into kernel land, we plan to implement an kernel module which talks our protocol and gives us latencies < 1ms.
--- edit ---
Turns out I was completely wrong. All that was necessary was to increase the kernel tick rate. The default 100 ticks added the 10ms delay. 1000Hz and a negative nice value for the serial process gives me the time behavior I wanted to reach.
Serial ports on linux are "wrapped" into unix-style terminal constructs, which hits you with 1 tick lag, i.e. 10ms. Try if stty -F /dev/ttySx raw low_latency helps, no guarantees though.
On a PC, you can go hardcore and talk to standard serial ports directly, issue setserial /dev/ttySx uart none to unbind linux driver from serial port hw and control the port via inb/outb to port registers. I've tried that, it works great.
The downside is you don't get interrupts when data arrives and you have to poll the register. often.
You should be able to do same on the arm device side, may be much harder on exotic serial port hw.
Here's what setserial does to set low latency on a file descriptor of a port:
ioctl(fd, TIOCGSERIAL, &serial);
serial.flags |= ASYNC_LOW_LATENCY;
ioctl(fd, TIOCSSERIAL, &serial);
In short: Use a USB adapter and ASYNC_LOW_LATENCY.
I've used a FT232RL based USB adapter on Modbus at 115.2 kbs.
I get about 5 transactions (to 4 devices) in about 20 mS total with ASYNC_LOW_LATENCY. This includes two transactions to a slow-poke device (4 mS response time).
Without ASYNC_LOW_LATENCY the total time is about 60 mS.
With FTDI USB adapters ASYNC_LOW_LATENCY sets the inter-character timer on the chip itself to 1 mS (instead of the default 16 mS).
I'm currently using a home-brewed USB adapter and I can set the latency for the adapter itself to whatever value I want. Setting it at 200 µS shaves another mS off that 20 mS.
None of those system calls have an effect on latency. If you want to read and write one byte as fast as possible from userspace, you really aren't going to do better than a simple read()/write() pair. Try replacing the serial stream with a socket from another userspace process and see if the latencies improve. If they don't, then your problems are CPU speed and hardware limitations.
Are you sure your hardware can do this at all? It's not uncommon to find UARTs with a buffer design that introduces many bytes worth of latency.
At those line speeds you should not be seeing latencies that large, regardless of how you check for readiness.
You need to make sure the serial port is in raw mode (so you do "noncanonical reads") and that VMIN and VTIME are set correctly. You want to make sure that VTIME is zero so that an inter-character timer never kicks in. I would probably start with setting VMIN to 1 and tune from there.
The syscall overhead is nothing compared to the time on the wire, so select() vs. poll(), etc. is unlikely to make a difference.
I use this ALU block diagram as a learning material : http://www.righto.com/2013/09/the-z-80-has-4-bit-alu-heres-how-it.html
I am not familiar with electronics. I am currently believing that a clock cycle is needed to move data from registers or latch to another register or latch, eventually throught a net of logical gates.
So here is my understanding of what happens for and ADD :
Cycle 1 : move registers to internal latchs
Cycle 2 : move low nibbles internal latchs to internal result latch (through the ALU)
Cycle 3, in parallell :
move high nibbles internal latchs to destination register (through the ALU)
move internal result latch to register
I think operations cycle 3 are done in parallell because there are two 4 bits bus (for high and low nibbles) and the register bus seems to be 8 bits.
Per the z80 data sheet:
The PC is placed on the address bus at the beginning of the M1 cycle.
One half clock cycle later the MREQ signal goes active. At this time
the address to the memory has had time to stabilize so that the
falling edge of MREQ can be used directly as a chip enable clock to
dynamic memories. The RD line also goes active to indicate that the
memory read data should be enabled onto the CPU data bus. The CPU
samples the data from the memory on the data bus with the rising edge
of the clock of state T3 and this same edge is used by the CPU to turn
off the RD and MREQ signals. Thus, the data has already been sampled
by the CPU before the RD signal becomes inactive. Clock state T3 and
T4 of a fetch cycle are used to refresh dynamic memories. The CPU uses
this time to decode and execute the fetched instruction so that no
other operation could be performed at this time.
So it appears mostly to be about memory interfacing to read the opcode rather than actually doing the addition — decode and execution occurs entirely within clock states T3 and T4. Given that the z80 has a 4-bit ALU, it would take two operations to perform an 8-bit addition. Which likely explains the use of two cycles.
There are several "hi-res" timestamping functions in ALSA:
snd_pcm_status_get_trigger_htstamp
snd_pcm_status_get_audio_htstamp
snd_pcm_status_get_driver_htstamp
snd_pcm_status_get_htstamp
I would like to understand what points in time the resulting functions represent.
My current understanding is that trigger_htstamp represents the time when stream was started/stopped/paused. snd_pcm_status_get_trigger_htstamp returns a constant value and when I add audio_htstamp to that value the result is very close to the current system time.
audio_htstamp seems to start from zero on my system and it is incremented by a value that is equal to the period size I use. Hence on my system it is a simple frame counter. If I understand ALSA correctly audio_htstamp can also work in different more accurate way depending on the system capabilities.
driver_htstamp I guess by the name is a timestamp generated by the audio driver.
Question 1: When is the timestamp driver_htstamp usually generated?
With htstamp I am really unsure where and when it is generated. I have a hunch that it may be related to DMA.
Question 2: Where is htstamp generated?
Question 3: When is htstamp generated?
Question 4: Is the assumption audio_htstamp < htstamp < driver_htstamp generally correct?
It seems like this with a little test program I wrote, but I want to verify my assumption.
I can not find this information in the ALSA documentation.
I just dug through the code for this stuff for my own purposes, so I figured I would share what I found.
The purpose of these timestamps is to allow you to determine subtle differences in the rate of different clocks; most importantly in this case the main system clock that Linux uses for general timekeeping compared with the different clock that determines the rate at which samples move in and out of the sound device. This can be very important for applications that need to keep audio from different hardware devices in sync, since the rates of different physical clocks are never exactly the same.
The technique used is sometimes called "cross-timestamping"; you capture timestamps from the clocks you want to compare as close to simultaneously as possible, and repeat this at regular intervals. There is usually some measurement error introduced, but some relatively simple filtering can get you a good characterization of the difference in the rate at which the clocks count.
The core PCM driver arranges to take a system clock timestamp as closely as possible to when an audio stream starts, and then it does a cross-timestamp between the system clock and audio clock (which can be measured in different ways) whenever it is asked to check the state of the hardware pointers for the DMA engine that moves samples around.
The default method of measuring the audio clock is via DMA hardware pointer comparsion. This isn't terribly precise, but over longer periods of time you can still get a good measure of the rate difference. At the start of snd_pcm_update_hw_ptr0, a system timestamp is captured; this will end up being htstamp. The DMA pointers are then checked, and if it's determined that they've moved since the last check, audio_htstamp is calculated based on the number of frames DMA has copied and the nominal frequency of the audio clock. Then, once all the DMA pointer update is done and right before snd_pcm_update_hw_ptr0 returns, another system timestamp is captured in driver_htstamp. This isn't meant to be used when you're using the DMA hw_ptr method of calculating the audio_htstamp though.
If you happen to have an audio device using the HDAudio driver, you can use an alternate and much more precise method of measuring the audio clock. It supplies an extra operation callback called get_time_info that is used instead of the default method of capturing the system and audio timestamps. It the HDAudio case, it takes a system timestamp for htstamp as close to possible to when it reads an interal counter driven by the same clock source as the audio clock; this forms the audio_htstamp. Afterwards, the same DMA hw_ptr bookkeeping is done, but the code that translates the pointer movement into time is skipped. The driver_htstamp is still taken right before the routine ends, though; this is "to let apps detect if the reference tstamp read by low-level hardware was provided with a delay" as the comment says in the code. This is because there's no guarantee that the get_time_info callback is going to take a new system timestamp; it may have previously recorded an audio timestamp along with a system timestamp as part of an interrupt handler. In this case, the timestamps you get might not match with the available frames and delay frames counts calculated by hw_ptr bookkeeping, but the driver_htstamp will let you know the closest system time to when those calculations were made.
In any case, the code is designed in both cases to capture htstamp and audio_htstamp as closely together as possible, and for htstamp - trigger_htstamp to represent the amount of system time that passed during the period measured by audio_htstamp of the audio clock. You mostly shouldn't need to use driver_htstamp, but I guess it might be used with the USB Audio driver, as I think it and HDAudio are the only ones that do anything special with these interfaces right now.
The documentation for this, although it doesn't contain all the details you might want to know, is part of the kernel documentation: http://lxr.free-electrons.com/source/Documentation/sound/alsa/timestamping.txt?v=4.9