Why linux omits driver close function - multithreading

all.
I have my module - character driver for several DMA channels. It has open/close/ioctl functions for a user for each DMA.
Everything worked fine when used from one application thread - 3 DMAs worked as required.
When I added another thread to utilze one of DMAs separately I fall into very strange case - application call to function close() is executed with return code zero WITHOUT(!) entering my driver's close function (printk text is absent). Just to test, I called close() again immediately - it returned with errno=9 (bad descriptor). Closing the next channel in the next code line works fine.
I inserted all possible semaphores to protect the code both in the application and in the driver - no success.
I clearly understand that the issue is in some racing condition - delaying a bit the second thread solves the problem. But I can't catch it.
What, makes me crazy is - how can it be that function call close() does not reach my driver? I traced the call till the assembler "svc 0" - the file handler is correct, the return code is 0.
So, in which condition Linux may omit calling the driver function close()?
UPDATE: I inserted a call to the IOCTL function immediately after "not working" close() - the return code was -9 - bad file descriptor. This means that Linux really closed this handler without entering my "close" function!
UPDATE2: I added a new IOCTL to the driver, which only calls the close() driver function directly. In the application: I called this new IOCTL just one line before calling close(). Everything now works fine!
More than that - I removed all semaphores for testing (I wrote everything reentrant, but added semaphores supposing an error)! Still everything is fine!

Thanks to Ian Abbot the problem is found.
My driver needs one large memory region for control and one huge region for each channel buffer. For this, after the first open() call I called mmap() for the control region All consecutive calls to open() mmapped() only their corresponding buffer.
This effectively lead to the reference counter of the first file handler to be set to 2. And when I tried to "normally" close it, my driver's release function was not called! And, as it wasn't called, it did not clean the driver state and blocked further reopening.
SUMMARY OF THE THINGS I MISSED/didn't know:
mmap system call increases the reference call of the file.
mmap system call is per file and not per driver.
Again, thousands of thanks to Ian Abbot!

Related

Can aio_error be used to poll for completion of aio_write?

We have some code that goes along the lines of
aiocb* aiocbptr = new aiocb;
// populate aiocbptr with info for the write
aio_write( aiocbptr );
// Then do this periodically:
if(aio_error( aiocbptr ) == 0) {
delete aiocbptr;
}
aio_error is meant to return 0 when the write is completed, and hence we assume that we can call delete on aiocbptr at this point.
This mostly seems to work OK, but we recently started experiencing random crashes. The evidence points to the data pointed to by aiocbptr being modified after the call to delete.
Is there any issue using aio_error to poll for aio_write completion like this? Is there a guarantee that the aiocb will not be modified after aio_error has returned 0?
This change seems to indicate that something may have since been fixed with aio_error. We are running on x86 RHEL7 linux with glibc v 2.17, which predates this fix.
We tried using aio_suspend in addition to aio_error, so once aio_error has returned 0, we call aio_suspend, which is meant to wait for the operation to complete. But the operation should have already completed, so aio_suspend should do nothing. However, it seemed to fix the crashes.
Yes, my commit was fixing a missing memory barrier. Using e.g. aio_suspend triggers the memory barrier and thus fixes it too.

How does Linux knows about deferred work in a driver and when exactly to use the data brought from hardware device?

When the kernel tries to read a block from a hard drive it send a software interrupt, which will be handled by the device driver. If the device driver splits the work of handling the request into top and bottom halves through work queues, how does the kernel knows that the data is not available until the bottom half finishes?
In other words, how does the kernel knows that the driver has not fetched the required block and copied into the supplied buffer yet?
Obviously, if the kernel expects the data is readily available once the top half finishes execution and returns, then it might read junk data.
The block device driver API has changed a few times since the inception of Linux, but today it basically looks like what follows.
The initialization function calls blk_init_queue, passing a request callback and an optional lock for that queue:
struct request_queue *q;
q = blk_init_queue(my_request_cb, &my_dev->lock);
my_request_cb is a callback that will handle all I/O for that block device. I/O requests will be pushed into this queue and my_request_cb will be called to handle them one after the other, when the kernel block driver layer decides. This queue is then added to the disk:
struct gendisk *disk;
disk->queue = q;
and then the disk is added to the system:
add_disk(disk);
The disk has other informations like the major number, the first minor number and other file operations (open, release, ioctl and others, but no read and no write like found in character devices).
Now, my_request_cb can be called at any time, and won't necessarily be called from the context of the process that initiated a read/write on the block device. This call is asynchronous by the kernel.
This function is declared like this:
static void my_request_cb(struct request_queue *q);
The queue q contains an ordered list of requests to this block device. The function may then look at the next request (blk_fetch_request(q)). To mark a request as completed, it will call blk_end_request_all (other variations exist, depending on the situation).
And this is where I answer your question: the kernel knows a particular block device request is done when its driver calls blk_end_request_all or a similar function for this request. The driver does not have to end a request within my_request_cb: it may, for example, start a DMA transfer, requeue the request, ignore others, and only when the interrupt for a completed DMA transfer is asserted, end it, effectively telling the kernel that this specific read/write operation is completed.
LDD3/chapter 16 can help, but some things changed since 2.6.

Can dup2 really return EINTR?

In the spec and two implementations:
According to POSIX, dup2() may return EINTR.
The linux man pages list it as permitted.
The FreeBSD man pages indicate it's not ever returned. Is this a bug - since its close implementation can EINTR (at least for TCP linger if nothing else).
In reality, can Linux return EINTR for dup2()? Presumably if so, it would be because close() decided to wait and a signal arrived (TCP linger or dodgy file system drivers that try to sync when closing).
In reality, does FreeBSD guarantee not to return EINTR for dup2()? In that case, it must be that it doesn't bother waiting for any outstanding operations on the old fd and simply unlinks the fd.
What does POSIX dup2() mean when it refers to "closing" (not in italics), rather than referencing the actual close() function - are we to understand it's just talking about "closing" it in an informal way (unlinking the file descriptor), or is it attempting to say that the effect should be as if the close() function were first called and then dup2() were called atomically.
If fildes2 is already a valid open file descriptor, it shall be closed first, unless fildes is equal to fildes2 in which case dup2() shall return fildes2 without closing it.
If dup2() does have to close, wait, then atomically dup, it's going to be a nightmare for implementors! It's much worse than the EINTR with close() fiasco. Cowardly POSIX doesn't even say if the dup took place in the case of EINTR...
Here's the relevant information from the C/POSIX library documentation with respect to the standard Linux implementation:
If OLD and NEW are different numbers, and OLD is a valid
descriptor number, then `dup2' is equivalent to:
close (NEW);
fcntl (OLD, F_DUPFD, NEW)
However, `dup2' does this atomically; there is no instant in the
middle of calling `dup2' at which NEW is closed and not yet a
duplicate of OLD.
It lists the possible error values returned by dup and dup2 as EBADF, EINVAL, and EMFILE, and no others. The documentation states that all functions that can return EINTR are listed as such, which indicates that these don't. Note that these are implemented via fcntl, not a call to close.
8 years later this still seems to be undocumented.
I looked at the linux sources and my conclusion is that dup2 can't return EINTR in a current version of Linux.
In particular, the function do_dup2 in fs/file.c ignores the return value of filp_close, which is what can cause close to return EINTR in some cases (see fs/open.c and fs/file.c).
The way dup2 works is it first makes the atomic file descriptor update, and then waits for any flushing that needs to happen on close. Any errors happening on flush are simply ignored.

Is the first thread that gets to run inside a Win32 process the "primary thread"? Need to understand the semantics

I create a process using CreateProcess() with the CREATE_SUSPENDED and then go ahead to create a little patch of code inside the remote process to load a DLL and call a function (exported by that DLL), using VirtualAllocEx() (with ..., MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE), WriteProcessMemory(), then call FlushInstructionCache() on that patch of memory with the code.
After that I call CreateRemoteThread() to invoke that code, creating me a hRemoteThread. I have verified that the remote code works as intended. Note: this code simply returns, it does not call any APIs other than LoadLibrary() and GetProcAddress(), followed by calling the exported stub function that currently simply returns a value that will then get passed on as the exit status of the thread.
Now comes the peculiar observation: remember that the PROCESS_INFORMATION::hThread is still suspended. When I simply ignore hRemoteThread's exit code and also don't wait for it to exit, all goes "fine". The routine that calls CreateRemoteThread() returns and PROCESS_INFORMATION::hThread gets resumed and the (remote) program actually gets to run.
However, if I call WaitForSingleObject(hRemoteThread, INFINITE) or do the following (which has the same effect):
DWORD exitCode = STILL_ACTIVE;
while(STILL_ACTIVE == exitCode)
{
Sleep(500);
if(!GetExitCodeThread(hRemoteThread, &exitCode))
break;
}
followed by CloseHandle() this leads to hRemoteThread finishing before PROCESS_INFORMATION::hThread gets resumed and the process simply "disappears". It is enough to allow hRemoteThread to finish somehow without PROCESS_INFORMATION::hThread to cause the process to die.
This looks suspiciously like a race condition, since under certain circumstances hRemoteThread may still be faster and the process would likely still "disappear", even if I leave the code as is.
Does that imply that the first thread that gets to run within a process becomes automatically the primary thread and that there are special rules for that primary thread?
I was always under the impression that a process finishes when its last thread dies, not when a particular thread dies.
Also note: there is no call to ExitProcess() involved here in any way, because hRemoteThread simply returns and PROCESS_INFORMATION::hThread is still suspended when I wait for hRemoteThread to return.
This happens on Windows XP SP3, 32bit.
Edit: I have just tried Sysinternals Process Monitor to see what's happening and I could verify my observations from before. The injected code does not crash or anything, instead I get to see that if I don't wait for the thread it doesn't exit before I close the program where the code got injected. I'm thinking whether the call to CloseHandle(hRemoteThread) should be postponed or something ...
Edit+1: it's not CloseHandle(). If I leave that out just for a test, the behavior doesn't change when waiting for the thread to finish.
The first thread to run isn't special.
For example, create a console app which creates a suspended thread and terminates the original thread (by calling ExitThread). This process never terminates (on Windows 7 anyway).
Or make the new thread wait for five seconds then exit. As expected, the process will live for five seconds and exit when the secondary thread terminates.
I don't know what's happening with your example. The easiest way to avoid the race is to make the new thread resume the original thread.
Speculating now, I do wonder if what you're doing isn't likely to cause problems anyway. For example, what happens to all the DllMain calls for the implicitly loaded DLLs? Are they unexpectedly happening on the wrong thread, are they being skipped, or are they postponed until after your code has run and the main thread starts?
Odds are good that the thread with the main (or equivalent) function calls ExitProcess (either explicitly or in its runtime library). ExitProcess, well, exits the entire process, including killing all threads. Since the main thread doesn't know about your injected code, it doesn't wait for it to finish.
I don't know that there's a good way to make the main thread wait for yours to complete...

Multithreading (pthreads)

I'm working on a project where I need to make a program run on multiple threads. However, I'm running into a bit of an issue.
In my program, I have an accessory function called 'func_call'.
If I use this in my code:
func_call((void*) &my_pixels);
The program runs fine.
However, if I try to create a thread, and then run the function on that, the program runs into a segmentation fault.
pthread_t thread;
pthread_create (&thread, NULL, (void*)&func_call, (void*) &my_pixels);
I've included pthread.h in my program. Any ideas what might be wrong?
You are not handling data in a thread safe manner:
the thread copies data from the thread argument, which is a pointer to the main thread's my_pixels variable; the main thread may exit, making my_pixles invalid.
the thread uses scene, main thread calls free_scene() on it, which I imagine makes it invalid
the thread calls printf(), the main thread closes stdout (kind of unusual itself)
the thread updates the picture array, the main thread accesses picture to output data from it
It looks like you should just wait for the thread to finish its work after creating it - call pthread_join() to do that.
For a single thread, that would seem to be pointless (you've just turned a multi-threaded program into a single threaded program). But on the basis of code that's commented out, it looks like you're planning to start up several threads that work on chunks of the data. So, when you get to the point of trying that again, make sure you join all the threads you start. As long as the threads don't modify the same data, it'll work. Note that you'll need to use separate my_pixels instances for each thread (make an array of them, just like you did with pthreads), or some threads will likely get parameters that are intended for a different thread.
Without knowing what func_call does, it is difficult to give you an answer. Nevertheless, here are few possibilities
Does func_call use some sort of a global state - check if that is initialized properly from within the thread. The order of execution of threads is not always the same for every execution
Not knowing your operating system (AIX /Linux/Solaris etc) it is difficult to answer this, but please check your compilation options
Please provide the signal trapped and atleast a few lines of the stack-trace - for all the threads. One thing you can check for yourself is to print the threads' stack-track (using threads/thread or pthread and thread current <x> based on the debugger) and and if there is a common data that is being accessed. It is most likely that the segfault occurred when two threads were trying to read off the other's (uncommitted) change
Hope that helps.
Edit:
After checking your code, I think the problem is the global picture array. You seem to be modifying that in the thread function without any guards. You loop using px and py and all the threads will have the same px and py and will try to write into the picture array at the same time. Please try to modify your code to prevent multiple threads from stepping on each other's data modifications.
Is func_call a function, or a function pointer? If it's a function pointer, there is your problem: you took the address of a function pointer and then cast it.
People are guessing because you've provided only a fraction of the program, which mentions names like func_call with no declaration in scope.
Your compiler must be giving you diagnostics about this program, because you're passing a (void *) expression to a function pointer parameter.
Define your thread function in a way that is compatible with pthread_create, and then just call it without any casts.

Resources