polling file descriptor - linux

for the embedded MIPS-based platform I'm implementing a small program to poll GPIO, i.e. I'm using chip vendor's user level GPIO library with basic functionality (open /dev/gpio, read, write pin etc.). The design is straightforward:
int gpio_fd;
fd_set rfds;
gpio_fd = gpio_open(...);
while (1) {
FD_ZERO(&rfds);
FD_SET(gpio_fd, &rfds);
if (select(gpio_fd + 1, &rfds, NULL, NULL, NULL) > 0) {
if (FD_ISSET(gpio_fd, &rfds)) {
/* read pins and similar */
}
}
}
But I'm facing a serious problem - this application when ran with '&' at the end, i.e. put it in background, consumes 99% CPU, this is obviously because of tight loop, but I observed the similar approach in many networking code and it worked fine.
Am I missing something, can it be a defect of the gpio library ?
Actually, just a single "while(1) ; " does the same effect. Can it be the "natural" behavior of the kernel?
Thanks.

The select call should block until the file descriptor is readable.
What may be happening is that the device driver does not support the select call, and so it exits immediately rather than blocking.
Another possibility is that the call to gpio_open does not actually give you a real Unix file descriptor. If that were open("/dev/gpio", O_RDWR) or something like that I'd have a lot more faith in it.

Related

Linux UART slower than specified Baudrate

I'm trying to communicate between two Linux systems via UART.
I want to send large chunks of data. With the specified Baudrate it should take around 5 seconds, but it takes nearly 10 times the expected time.
As I'm sending more than the buffer can handle at once it is send in small parts and I'm draining the buffer in between. If I measure the time needed for the drain and the number of bytes written to the buffer I calculate a Baudrate nearly 10 times lower than the specified Baudrate.
I would expect a slower transmission as the optimal, but not this much.
Did I miss something while setting the UART or while writing? Or is this normal?
The code used for setup:
int bus = open(interface.c_str(), O_RDWR | O_NOCTTY | O_NDELAY); // <- also tryed blocking
if (bus < 0) {
return;
}
struct termios options;
memset (&options, 0, sizeof options);
if(tcgetattr(bus, &options) != 0){
close(bus);
bus = -1;
return;
}
cfsetspeed (&options, B230400);
cfmakeraw(&options); // <- also tried this manually. did not make a difference
if(tcsetattr(bus, TCSANOW, &options) != 0)
{
close(bus);
bus = -1;
return;
}
tcflush(bus, TCIFLUSH);
The code used to send:
int32_t res = write(bus, data, dataLength);
while (res < dataLength){
tcdrain(bus); // <- taking way longer than expected
int32_t r = write(bus, &data[res], dataLength - res);
if(r == 0)
break;
if(r == -1){
break;
}
res += r;
}
B230400
The docs are contradictory. cfsetspeed is documented as requiring a speed_t type, while the note says you need to use one of the "B" constants like "B230400." Have you tried using an actual speed_t type?
In any case, the speed you're supplying is the baud rate, which in this case should get you approximately 23,000 bytes/second, assuming there is no throttling.
The speed is dependent on hardware and link limitations. Also the serial protocol allows pausing the transmission.
FWIW, according to the time and speed you listed, if everything works perfectly, you'll get about 1 MB in 50 seconds. What speed are you actually getting?
Another "also" is the options structure. It's been years since I've had to do any serial I/O, but IIRC, you need to actually set the options that you want and are supported by your hardware, like CTS/RTS, XON/XOFF, etc.
This might be helpful.
As I'm sending more than the buffer can handle at once it is send in small parts and I'm draining the buffer in between.
You have only provided code snippets (rather than a minimal, complete, and verifiable example), so your data size is unknown.
But the Linux kernel buffer size is known. What do you think it is?
(FYI it's 4KB.)
If I measure the time needed for the drain and the number of bytese written to the buffer I calculate a Baudrate nearly 10 times lower than the specified Baudrate.
You're confusing throughput with baudrate.
The maximum throughput (of just payload) of an asynchronous serial link will always be less than the baudrate due to framing overhead per character, which could be two of the ten bits of the frame (assuming 8N1). Since your termios configuration is incomplete, the overhead could actually be three of the eleven bits of the frame (assuming 8N2).
In order to achieve the maximum throughput, the tranmitting UART must saturate the line with frames and never let the line go idle.
The userspace program must be able to supply data fast enough, preferably by one large write() to reduce syscall overhead.
Did I miss something while setting the UART or while writing?
With Linux, you have limited access to the UART hardware.
From userspace your program accesses a serial terminal.
Your program accesses the serial terminal in a sub-optinal manner.
Your termios configuration appears to be incomplete.
It leaves both hardware and software flow-control untouched.
The number of stop bits is untouched.
The Ignore modem control lines and Enable receiver flags are not enabled.
For raw reading, the VMIN and VTIME values are not assigned.
Or is this normal?
There are ways to easily speed up the transfer.
First, your program combines non-blocking mode with non-canonical mode. That's a degenerate combination for receiving, and suboptimal for transmitting.
You have provided no reason for using non-blocking mode, and your program is not written to properly utilize it.
Therefore your program should be revised to use blocking mode instead of non-blocking mode.
Second, the tcdrain() between write() syscalls can introduce idle time on the serial link. Use of blocking mode eliminates the need for this delay tactic between write() syscalls.
In fact with blocking mode only one write() syscall should be needed to transmit the entire dataLength. This would also minimize any idle time introduced on the serial link.
Note that the first write() does not properly check the return value for a possible error condition, which is always possible.
Bottom line: your program would be simpler and throughput would be improved by using blocking I/O.

In general, on ucLinux, is ioctl faster than writing to /sys filesystem?

I have an embedded system I'm working with, and it currently uses the sysfs to control certain features.
However, there is function that we would like to speed up, if possible.
I discovered that this subsystem also supports and ioctl interface, but before rewriting the code, I decided to search to see which is a faster interface (on ucLinux) in general: sysfs or ioctl.
Does anybody understand both implementations well enough to give me a rough idea of the difference in overhead for each? I'm looking for generic info, such as "ioctl is faster because you've removed the file layer from the function calls". Or "they are roughly the same because sysfs has a very simple interface".
Update 10/24/2013:
The specific case I'm currently doing is as follows:
int fd = open("/sys/power/state",O_WRONLY);
write( fd, "standby", 7 );
close( fd );
In kernel/power/main.c, the code that handles this write looks like:
static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t n)
{
#ifdef CONFIG_SUSPEND
suspend_state_t state = PM_SUSPEND_STANDBY;
const char * const *s;
#endif
char *p;
int len;
int error = -EINVAL;
p = memchr(buf, '\n', n);
len = p ? p - buf : n;
/* First, check if we are requested to hibernate */
if (len == 7 && !strncmp(buf, "standby", len)) {
error = enter_standby();
goto Exit;
((( snip )))
Can this be sped up by moving to a custom ioctl() where the code to handle the ioctl call looks something like:
case SNAPSHOT_STANDBY:
if (!data->frozen) {
error = -EPERM;
break;
}
error = enter_standby();
break;
(so the ioctl() calls the same low-level function that the sysfs function did).
If by sysfs you mean the sysfs() library call, notice this in man 2 sysfs:
NOTES
This System-V derived system call is obsolete; don't use it. On systems with /proc, the same information can be obtained via
/proc/filesystems; use that interface instead.
I can't recall noticing stuff that had an ioctl() and a sysfs interface, but probably they exist. I'd use the proc or sys handle anyway, since that tends to be less cryptic and more flexible.
If by sysfs you mean accessing files in /sys, that's the preferred method.
I'm looking for generic info, such as "ioctl is faster because you've removed the file layer from the function calls".
Accessing procfs or sysfs files does not entail an I/O bottleneck because they are not real files -- they are kernel interfaces. So no, accessing this stuff through "the file layer" does not affect performance. This is a not uncommon misconception in linux systems programming, I think. Programmers can be squeamish about system calls that aren't well, system calls, and paranoid that opening a file will be somehow slower. Of course, file I/O in the ABI is just system calls anyway. What makes a normal (disk) file read slow is not the calls to open, read, write, whatever, it's the hardware bottleneck.
I always use low level descriptor based functions (open(), read()) instead of high level streams when doing this because at some point some experience led me to believe they were more reliable for this specifically (reading from /proc). I can't say whether that's definitively true.
So, the question was interesting, I built a couple of modules, one for ioctl and one for sysfs, the ioctl implementing only a 4 bytes copy_from_user and nothing more, and the sysfs having nothing in its write interface.
Then a couple of userspace test up to 1 million iterations, here the results:
time ./sysfs /sys/kernel/kobject_example/bar
real 0m0.427s
user 0m0.056s
sys 0m0.368s
time ./ioctl /run/temp
real 0m0.236s
user 0m0.060s
sys 0m0.172s
edit
I agree with #goldilocks answer, HW is the real bottleneck, in a Linux environment with a well written driver choosing ioctl or sysfs doesn't make a big difference, but if you are using uClinux probably in your HW even few cpu cycles can make a difference.
The test I've done is for Linux not uClinux and it never wanted to be an absolute reference profiling the two interfaces, my point is that you can write a book about how fast is one or another but only testing will let you know, took me few minutes to setup the thing.

Opening device file and error code in linux-kernel

I am new to kernel programming and I have two questions:
My device is getting registered (by dynamic registration) but my
application is not able to open the device file. What could be the
possible reasons?
What would be the appropriate error code to return when my device
driver detects an divide by zero?
My code implements simple arithmetic operations in the kernel. I use an ioctl() based interface to communicate between user space and the kernel.
if(out.b==0) /*checking for divide by zero*/
out.res=-EINVAL;
else
out.res=out.a/out.b;
copy_to_user((values*)ioctl_param,&out,sizeof(values));
break;
We can't possibly answer the first question if you don't show us your code.
As for the second, EINVAL or perhaps ERANGE.
In your case you need to make the distinction between the information you return in the ioctl_param structure (that's a really bad variable name by the way) and the return status of the ioctl() call itself.
Remember that ioctl() returns 0 if it completes successfully, and sets errno if it fails.
The kernel and C library take care of most of that for you. Usually all you have to do is return -EINVAL or similar from your ioctl() function.
Something like this:
if(out.b == 0) /*checking for divide by zero*/
return -EINVAL;
out.res=out.a / out.b;
copy_to_user((values*)ioctl_param,&out,sizeof(values));
break;

Linux device driver handling multiple interrupt sources/vectors

I am writing a device driver to handle interrupts for a PCIe card, which currently works for any interrupt vector raised on the IRQ line.
But it has a few types that can be raised, flagged by the Vector register. So now I need to read the vector information and be a bit cleverer...
So, do I :-
1/ Have separate dev nodes /dev/int1, /dev/int2, etc for each interrupt type, and just doc that int1 is for vector type A etc?
1.1/ As each file/char-devices will have its own minor number, when opened I'll know which is which. i think.
1.2/ ldd3 seems to demo this method.
2/ Have one node /dev/int (as I do now) and have multiple processes hanging off the same read method? sounds better?!
2.1/ Then only wake the correct process up...?
2.2/ Do I use separate wait_queue_head_t wait_queues? Or different flag/test conditions?
In the read method:-
wait_event_interruptible(wait_queue, flag);
In the handler not real code! :-
int vector = read_vector();
if vector = A then
wake_up_interruptible(wait_queue, flag)
return IRQ_HANDLED;
else
return IRQ_NONE/IRQ_RETVAL?
EDIT: notes from peoples comments :-
1) my user-space code mmap's all of the PCIe firmware registers
2) User-space code has a few threads, each perform a blocking read on the device driver device nodes, which then returns data from the firmware when an interrupt occurs. I need the correct thread woken up depending on the interrupt type.
I am not sure I understand correctly what you mean with the Vector register (a pointer to some documentation would help me precise for your case).
Anyway, any PCI device gets a unique interrupt number (given by the BIOS or some firmware on other architectures than x86). You just need to register this interrupt in your driver.
priv->name = DRV_NAME;
err = request_irq(pdev->irq, your_irqhandler, IRQF_SHARED, priv->name,
pdev);
if (err) {
dev_err(&pdev->dev, "cannot request IRQ\n");
goto err_out_unmap;
}
One other thing that I do not really understand is why you would export your interrupts as a dev node: interrupts are certainly something that need to remain in your driver/kernel code. But I guess here you want to export a device that is then accessed in userspace. I just find /dev/int no to be a good naming.
For your question about multiple dev nodes: if your different interrupt sources then provide access to different hardware resources (even if on the same PCI board) I would go for option 1), with a wait_queue for each device. Otherwise, I would go for option 2)
Since your interrupts are coming from the same physical device, if you chose option 1) or option 2), the interrupt line will have to be shared and you will have to read the vector in your interrupt handler to define which hardware resource raised the interrupt.
For option 1), it would be something like this:
static irqreturn_t pex_irqhandler(int irq, void *dev) {
struct pci_dev *pdev = dev;
int result;
result = pci_read_config_byte(pdev, PCI_INTERRUPT_LINE, &myirq);
if (result) {
int vector = read_vector();
if (vector == A) {
set_flagA(flag);
} else if (vector == B) {
set_flagB(flag);
}
wake_up_interruptible(wait_queue, flag);
return IRQ_HANDLED;
} else {
return IRQ_NONE;
}
For option 2, it would be similar, but you would have only one if clause (for the respective vector value) in every different interrupt handler that you would request for every node.
If you have different chanel you can read() from, then you should definitely use different minor number. Imagine you have a card whith four serial port, you would definitely want four /dev/ttySx.
But does your device fit whith this model ?
First, I assume you're not trying to get your code into the mainline kernel. If you are, expect a vigorous discussion about the best way to do this. If you're writing a simple interrupt handling driver for a card which is mostly driven by mmap from user-space, there are a lot of ways to solve this problem.
If you use multiple device nodes (option 1), you can also implement poll so that a single application can open multiple device nodes and wait for a selection of interrupts. The minor number will be sufficient to tell them apart. If you have a wake queue for each vector, you can wake only the relevant listeners. You'll need to latch the vector after a successful poll to be sure that the read succeeds.
If you use a single device node (option 2), you'll need to add some extra magic so that the threads can register their interest in particular interrupt vectors. You could do this with an ioctl, or have the threads write the interrupt vectors to the device. Each thread should open the device node to get its own file descriptor. You can then associate the list of requested vectors with each open file descriptor. As a bonus, you can let the application read the interrupt vector from the device, so it knows which one happened.
You'll need to think about how the interrupt gets cleared. The interrupt handler will need to remove the interrupt, then store the result so it can be passed to user-space. You might find a kfifo useful for this rather than a wait queue. If you have a fifo for each open file descriptor, you can distribute the interrupt notifications to each listening application.

How to Verify Atomic Writes?

I have searched diligently (both within the S[O|F|U] network and elsewhere) and believe this to be an uncommon question. I am working with an Atmel AT91SAM9263-EK development board (ARM926EJ-S core, ARMv5 instruction set) running Debian Linux 2.6.28-4. I am writing using (I believe) the tty driver to talk to an RS-485 serial controller. I need to ensure that writes and reads are atomic. Several lines of source code (listed below the end of this post relative to the kernel source installation directory) either imply or implicitly state this.
Is there any way I can verify that writing/reading to/from this device is actually an atomic operation? Or, is the /dev/ttyXX device considered a FIFO and the argument ends there? It seems not enough to simply trust that the code is enforcing this claim it makes - as recently as February of this year freebsd was demonstrated to lack atomic writes for small lines. Yes I realize that freebsd is not exactly the same as Linux, but my point is that it doesn't hurt to be carefully sure. All I can think of is to keep sending data and look for a permutation - I was hoping for something a little more scientific and, ideally, deterministic. Unfortunately, I remember precisely nothing from my concurrent programming classes in the college days of yore. I would thoroughly appreciate a slap or a shove in the right direction. Thank you in advance should you choose to reply.
Kind regards,
Jayce
drivers/char/tty_io.c:1087
void tty_write_message(struct tty_struct *tty, char *msg)
{
lock_kernel();
if (tty) {
mutex_lock(&tty->atomic_write_lock);
if (tty->ops->write && !test_bit(TTY_CLOSING, &tty->flags))
tty->ops->write(tty, msg, strlen(msg));
tty_write_unlock(tty);
}
unlock_kernel();
return;
}
arch/arm/include/asm/bitops.h:37
static inline void ____atomic_set_bit(unsigned int bit, volatile unsigned long *p)
{
unsigned long flags;
unsigned long mask = 1UL << (bit & 31);
p += bit >> 5;
raw_local_irq_save(flags);
*p |= mask;
raw_local_irq_restore(flags);
}
drivers/serial/serial_core.c:2376
static int
uart_write(struct tty_struct *tty, const unsigned char *buf, int count)
{
struct uart_state *state = tty->driver_data;
struct uart_port *port;
struct circ_buf *circ;
unsigned long flags;
int c, ret = 0;
/*
* This means you called this function _after_ the port was
* closed. No cookie for you.
*/
if (!state || !state->info) {
WARN_ON(1);
return -EL3HLT;
}
port = state->port;
circ = &state->info->xmit;
if (!circ->buf)
return 0;
spin_lock_irqsave(&port->lock, flags);
while (1) {
c = CIRC_SPACE_TO_END(circ->head, circ->tail, UART_XMIT_SIZE);
if (count < c)
c = count;
if (c <= 0)
break;
memcpy(circ->buf + circ->head, buf, c);
circ->head = (circ->head + c) & (UART_XMIT_SIZE - 1);
buf += c;
count -= c;
ret += c;
}
spin_unlock_irqrestore(&port->lock, flags);
uart_start(tty);
return ret;
}
Also, from the man write(3) documentation:
An attempt to write to a pipe or FIFO has several major characteristics:
Atomic/non-atomic: A write is atomic if the whole amount written in one operation is not interleaved with data from any other process. This is useful when there are multiple writers sending data to a single reader. Applications need to know how large a write request can be expected to be performed atomically. This maximum is called {PIPE_BUF}. This volume of IEEE Std 1003.1-2001 does not say whether write requests for more than {PIPE_BUF} bytes are atomic, but requires that writes of {PIPE_BUF} or fewer bytes shall be atomic.
I think that, technically, devices are not FIFOs, so it's not at all clear that the guarantees you quote are supposed to apply.
Are you concerned about partial writes and reads within a process, or are you actually reading and/or writing the same device from different processes? Assuming the latter, you might be better off implementing a proxy process of some sort. The proxy owns the device exclusively and performs all reads and writes, thus avoiding the multi-process atomicity problem entirely.
In short, I advise not attempting to verify that "reading/writing from this device is actually an atomic operation". It will be difficult to do with confidence, and leave you with an application that is subject to subtle failures if a later version of linux (or different o/s altogether) fails to implement atomicity the way you need.
I think PIPE_BUF is the right thing. Now, writes of less than PIPE_BUF bytes may not be atomic, but if they aren't it's an OS bug. I suppose you could ask here if an OS has known bugs. But really, if it has a bug like that, it just ought to be immediately fixed.
If you want to write more than PIPE_BUF atomically, I think you're out of luck. I don't think there is any way outside of application coordination and cooperation to make sure that writes of larger sizes happen atomically.
One solution to this problem is to put your own process in front of the device and make sure everybody who wants to write to the device contacts the process and sends the data to it instead. Then you can do whatever makes sense for your application in terms of atomicity guarantees.

Resources