Millisecond resolution timer on LinkIT 7688 - linux

I am developing for the LinkIt Smart 7688 device by Mediatek. I need to do some timekeeping in a userspace application where I need at least 10ms resolution (preferably 1ms).
However every syscall I have tried returns values only with 1 second resolution. clock_gettime (tried all the different clocks) and gettimeofday which should provide sub-second resolution does not.
Doing a dmesg on the target reveals that the kernel timestamps with a resolution below 1 second, thus I conclude that a clock source is available with sub second resolution. (I would be very surpriced if this was not the case :) )
How do I get a timestamp with sub-second resolution on the Linkit Smart 7688 device?
Perhaps I could be missing some kernel configuration selecting the correct clock source to be available to userspace? I have not been able to find one.

Do not only use the seconds returned by gettimeofday, but also usec
gettimeofday(&t0, 0);
/* ... */
gettimeofday(&t1, 0);
long elapsed = (t1.tv_sec-t0.tv_sec)*1000000 + t1.tv_usec-t0.tv_usec;

Related

What timeframe does the "perf sched record" use?

I've been trying to analyze the output of perf sched record but I don't understand with what frame of reference do I try to understand the "20624.983302 secs". It isn't Unix time for sure, so what is it? How would I go about converting this into Unix time?
*A0 20624.983302 secs A0 => migration/0:12
*. 20624.983311 secs . => swapper:0
*B0 20624.983318 secs B0 => IPC I/O Child:33924
*. 20624.983355 secs
*C0 20624.983485 secs C0 => WRScene~lder#15:39974
*. 20624.983581 secs
*D0 20624.983972 secs D0 => IPC I/O Parent:33780
These timestamps are captured using the kernel scheduler clock, which counts in nanoseconds since boot. The exact details depend on the compile-time parameters chosen to build a particular Linux distribution and the target architecture.
In general, the timestamp of a sample is captured around the same time when it's recorded. Timestamps on the same core are guaranteed to be monotonically increasing as long the core remains in an active state. The samples you've shown were all captured on the same core and the core remained active from the first sample to the last sample. So the timestamps are guaranteed to be monotonic in this case irrespective of the platform and distribution. When profiling on multiple cores, there is no guarantee that the clocks on all cores are in sync.
All perf tools use the same clock to capture timestamps, but they may differ in the way timestamps are printed and it may happen that two tools print timestamps from the same sample file differently. This depends on the kernel version.
It's possible to specify a clock source when calling perf_event_open() by setting use_clockid to 1 and setting clockid to one of the clock sources defined in linux/time.h, such as CLOCK_MONOTONIC. perf record provides the -k or --clockid option to specify the clock source for capturing timestamps.
Modern distributions on x86 typically use TSC as the source for the scheduler clock (check /sys/devices/system/clocksource/clocksource0/current_clocksource). So if you're on an x86 processor, most probably the TSC of the profiled core was used to capture the current value of TSC cycles, which internally gets converted into nanoseconds. When a timestamp is printed, it may get converted to a different unit. In this case, timestamps are printed in the format "seconds.microseconds". A summary of the behavior of TSC on Intel processors can be found at: Can constant non-invariant tsc change frequency across cpu states?.

How to modify timing in readyRead of QextSeriaport [duplicate]

I'm implementing a protocol over serial ports on Linux. The protocol is based on a request answer scheme so the throughput is limited by the time it takes to send a packet to a device and get an answer. The devices are mostly arm based and run Linux >= 3.0. I'm having troubles reducing the round trip time below 10ms (115200 baud, 8 data bit, no parity, 7 byte per message).
What IO interfaces will give me the lowest latency: select, poll, epoll or polling by hand with ioctl? Does blocking or non blocking IO impact latency?
I tried setting the low_latency flag with setserial. But it seemed like it had no effect.
Are there any other things I can try to reduce latency? Since I control all devices it would even be possible to patch the kernel, but its preferred not to.
---- Edit ----
The serial controller uses is an 16550A.
Request / answer schemes tends to be inefficient, and it shows up quickly on serial port. If you are interested in throughtput, look at windowed protocol, like kermit file sending protocol.
Now if you want to stick with your protocol and reduce latency, select, poll, read will all give you roughly the same latency, because as Andy Ross indicated, the real latency is in the hardware FIFO handling.
If you are lucky, you can tweak the driver behaviour without patching, but you still need to look at the driver code. However, having the ARM handle a 10 kHz interrupt rate will certainly not be good for the overall system performance...
Another options is to pad your packet so that you hit the FIFO threshold every time. It will also confirm that if it is or not a FIFO threshold problem.
10 msec # 115200 is enough to transmit 100 bytes (assuming 8N1), so what you are seeing is probably because the low_latency flag is not set. Try
setserial /dev/<tty_name> low_latency
It will set the low_latency flag, which is used by the kernel when moving data up in the tty layer:
void tty_flip_buffer_push(struct tty_struct *tty)
{
unsigned long flags;
spin_lock_irqsave(&tty->buf.lock, flags);
if (tty->buf.tail != NULL)
tty->buf.tail->commit = tty->buf.tail->used;
spin_unlock_irqrestore(&tty->buf.lock, flags);
if (tty->low_latency)
flush_to_ldisc(&tty->buf.work);
else
schedule_work(&tty->buf.work);
}
The schedule_work call might be responsible for the 10 msec latency you observe.
Having talked to to some more engineers about the topic I came to the conclusion that this problem is not solvable in user space. Since we need to cross the bridge into kernel land, we plan to implement an kernel module which talks our protocol and gives us latencies < 1ms.
--- edit ---
Turns out I was completely wrong. All that was necessary was to increase the kernel tick rate. The default 100 ticks added the 10ms delay. 1000Hz and a negative nice value for the serial process gives me the time behavior I wanted to reach.
Serial ports on linux are "wrapped" into unix-style terminal constructs, which hits you with 1 tick lag, i.e. 10ms. Try if stty -F /dev/ttySx raw low_latency helps, no guarantees though.
On a PC, you can go hardcore and talk to standard serial ports directly, issue setserial /dev/ttySx uart none to unbind linux driver from serial port hw and control the port via inb/outb to port registers. I've tried that, it works great.
The downside is you don't get interrupts when data arrives and you have to poll the register. often.
You should be able to do same on the arm device side, may be much harder on exotic serial port hw.
Here's what setserial does to set low latency on a file descriptor of a port:
ioctl(fd, TIOCGSERIAL, &serial);
serial.flags |= ASYNC_LOW_LATENCY;
ioctl(fd, TIOCSSERIAL, &serial);
In short: Use a USB adapter and ASYNC_LOW_LATENCY.
I've used a FT232RL based USB adapter on Modbus at 115.2 kbs.
I get about 5 transactions (to 4 devices) in about 20 mS total with ASYNC_LOW_LATENCY. This includes two transactions to a slow-poke device (4 mS response time).
Without ASYNC_LOW_LATENCY the total time is about 60 mS.
With FTDI USB adapters ASYNC_LOW_LATENCY sets the inter-character timer on the chip itself to 1 mS (instead of the default 16 mS).
I'm currently using a home-brewed USB adapter and I can set the latency for the adapter itself to whatever value I want. Setting it at 200 µS shaves another mS off that 20 mS.
None of those system calls have an effect on latency. If you want to read and write one byte as fast as possible from userspace, you really aren't going to do better than a simple read()/write() pair. Try replacing the serial stream with a socket from another userspace process and see if the latencies improve. If they don't, then your problems are CPU speed and hardware limitations.
Are you sure your hardware can do this at all? It's not uncommon to find UARTs with a buffer design that introduces many bytes worth of latency.
At those line speeds you should not be seeing latencies that large, regardless of how you check for readiness.
You need to make sure the serial port is in raw mode (so you do "noncanonical reads") and that VMIN and VTIME are set correctly. You want to make sure that VTIME is zero so that an inter-character timer never kicks in. I would probably start with setting VMIN to 1 and tune from there.
The syscall overhead is nothing compared to the time on the wire, so select() vs. poll(), etc. is unlikely to make a difference.

How to increase kernel poll rate for accelerometer?

I'm using the hwmon/mxc_mma8451.c module to access an accelerometer. Using /sys/devices/virtual/input/input0/poll I can change the polling rate to some degree... if I set a larger millisecond value the polling becomes slower. However, I cannot seem to get below around 30ms per poll, despite the device driver source apparently allowing as low as 1ms per poll. The accelerometer itself supports 800Hz sample rate, so that is not the bottleneck. When I write a value of 1 to the above file, I see each sample occurs either 30ms or 60ms from the previous sample, so it is not even consistent. However, even 30ms is unacceptably slow at it is only 33Hz.
The kernel source for the module clearly shows that I should be able to use a value of 1:
#define POLL_INTERVAL_MIN 1
#define POLL_INTERVAL_MAX 500
#define POLL_INTERVAL 100 /* msecs */
...
mma8451_idev->poll_interval = POLL_INTERVAL;
mma8451_idev->poll_interval_min = POLL_INTERVAL_MIN;
mma8451_idev->poll_interval_max = POLL_INTERVAL_MAX;
I'm not familiar with exactly how Linux does this kind of polling, but this system has a 10ms tick, so even if sampling with ticks, why is it taking 3 or 6 ticks per sample and nothing else? Is there some kernel parameter somewhere else that is throttling how fast polling can occur?
Linux kernel version is 3.14.28 for IMX28 (ARM) if that makes any difference. This is the version available for the device in question, so I can't just up and use a different/newer one.

How to tune the polling period of NAPI?

I can understand that NAPI in Linux will change from interrupt to poll mode to handle the high packet rate.
NAPI uses weight to decide how many packets to process in each poll period; It also makes sure that the packet handling in each poll period is less than one jiffies.
However, I couldn't find in anywhere (google) what is the poll period of NAPI? Can we change the poll period to any value we want?
Thank you very much for any of your help!
From what I observe, it seems that NAPI's poll period is 2 second, but I want to make sure my observation is correct.
NAPI packet processing is controlled in two ways:
With the netdev_budget which is the total number of packets that can be processed. This can be tuned by setting the net.core.netdev_budget sysctl.
On Linux 4.12+, with netdev_budget_usecs which is the total time in microseconds that can be spent processing packets. The corresponding sysctl parameter is net.core.netdev_budget_usecs.
On Linux < 4.12, the sysctl does not exist and this value is hardcoded to 2 jiffies. It cannot be tuned.
I wrote a detailed blog post describing the Linux network stack in detail that may interest you and this section shows the code for the NAPI processing loop where the hardcoded timeout can be found.

Starting point for CLOCK_MONOTONIC

As I understand on Linux starting point for CLOCK_MONOTONIC is boot time. In my current work I prefer to use monotonic clock instead of CLOCK_REALTIME (for calculation) but in same time I need to provide human friendly timestamps (with year/month/day) in reporting. They can be not very precise so I was thinking to join monotonic counter with boot time.
From where I can get this time on linux system using api calls?
Assuming the Linux kernel starts the uptime counter at the same time as it starts keeping track of the monotonic clock, you can derive the boot time (relative to the Epoch) by subtracting uptime from the current time.
Linux offers the system uptime in seconds via the sysinfo structure; the current time in seconds since the Epoch can be acquired on POSIX compliant libraries via the time function.
#include <stddef.h>
#include <stdio.h>
#include <time.h>
#include <sys/sysinfo.h>
int main(void) {
/* get uptime in seconds */
struct sysinfo info;
sysinfo(&info);
/* calculate boot time in seconds since the Epoch */
const time_t boottime = time(NULL) - info.uptime;
/* get monotonic clock time */
struct timespec monotime;
clock_gettime(CLOCK_MONOTONIC, &monotime);
/* calculate current time in seconds since the Epoch */
time_t curtime = boottime + monotime.tv_sec;
/* get realtime clock time for comparison */
struct timespec realtime;
clock_gettime(CLOCK_REALTIME, &realtime);
printf("Boot time = %s", ctime(&boottime));
printf("Current time = %s", ctime(&curtime));
printf("Real Time = %s", ctime(&realtime.tv_sec));
return 0;
}
Unfortunately, the monotonic clock may not match up relative to boot time exactly. When I tested out the above code on my machine, the monotonic clock was a second off from the system uptime. However, you can still use the monotonic clock as long as you take the respective offset into account.
Portability note: although Linux may return current monotonic time relative to boot time, POSIX machines in general are permitted to return current monotonic time from any arbitrary -- yet consistent -- point in time (often the Epoch).
As a side note, you may not need to derive boot time as I did. I suspect there is a way to get the boot time via the Linux API, as there are many Linux utilities which display the boot time in a human-readable format. For example:
$ who -b
system boot 2013-06-21 12:56
I wasn't able to find such a call, but inspection of the source code for some of these common utilities may reveal how they determine the human-readable boot time.
In the case of the who utility, I suspect it utilizes the utmp file to acquire the system boot time.
http://www.kernel.org/doc/man-pages/online/pages/man2/clock_getres.2.html:
CLOCK_MONOTONIC
Clock that cannot be set and represents monotonic time since some
unspecified starting point.
Means that you can use CLOCK_MONOTONIC for interval calculations and other things but you can't really convert it to a human readable representation.
Moreover, you prabably want CLOCK_MONOTONIC_RAW instead of CLOCK_MONOTONIC:
CLOCK_MONOTONIC_RAW (since Linux 2.6.28; Linux-specific)
Similar to CLOCK_MONOTONIC, but provides access to a raw hard‐
ware-based time that is not subject to NTP adjustments.
Keep using CLOCK_REALTIME for human-readable times.
CLOCK_MONOTONIC is generally not affected by any adjustments to system time. For example, if the system clock is adjusted via NTP, CLOCK_MONOTONIC has no way of knowing (nor does it need to).
For this reason, don't use CLOCK_MONOTONIC if you need human-readable timestamps.
See Difference between CLOCK_REALTIME and CLOCK_MONOTONIC? for a discussion.

Resources