Can AIO run without creating thread? - multithreading

I would like aio to signal to my program when a read operation completes, and according to this page, such notification can be received by either a signal sent by the kernel, or by starting a thread running a user function. Either behavior can be selected by setting the right value of sigev_notify.
I gave it a try and soon discover that even when set to receive the notification by signal, another thread was created.
(gdb) info threads
Id Target Id Frame
2 Thread 0x7ffff7ff9700 (LWP 6347) "xnotify" 0x00007ffff7147e50 in gettimeofday () from /lib64/libc.so.6
* 1 Thread 0x7ffff7fc3720 (LWP 6344) "xnotify" 0x0000000000401834 in update (this=0x7fffffffdc00)
The doc also states that: The implementation of these functions can be done using support in the kernel (if available) or using an implementation based on threads at userlevel.
I would like to have no thread at all, is this possible?
I checked on my kernel, and that looks okay:
qdii#localhost /home/qdii $ grep -i aio /usr/src/linux/.config
CONFIG_AIO=y
Is it possible to run aio without any (userland) thread at all (apart from the main one, of course)?
EDIT:
I digged deeper into it. librt seems to provide a collection of aio functions: looking through the glibc sources exposed something fishy: inside /rt/aio_read.c is a function stub :
int aio_read (struct aiocb *aiocbp)
{
__set_errno (ENOSYS);
return -1;
}
stub_warning (aio_read)
I found a first relevant implementation in the subdirectory sysdeps/pthread, which directly called __aio_enqueue_request(..., LIO_READ), which in turn created pthreads. But as I was wondering why there would be a stup in that case, I thought maybe the stub could be implemented by the linux kernel itself, and that pthread implementation would be some sort of fallback code.
Grepping aio_read through my /usr/src/linux directory gives a lot of results, which I’m trying to understand now.

I found out that there are actually two really different aio libraries: one is part of glibc, included in librt, and performs asynchronous access by using pthreads. The other aio library implements the same interface as the first one, but is built upon the linux kernel itself and can use signals to run asynchronously.

Related

Vulkan theaded application get error message on queue submissions under mutex

I have an application with Vulkan for rendering and glfw for windowing. If I start several threads, each with a different window, I get errors on threading and queue submission even though ALL vulkan calls are protected by a common mutex. The vulkan layer says:
THREADING ERROR : object of type VkQueue is simultaneously used in thread 0x0 and thread 0x7fc365b99700
Here is the skeleton of the loop under which this happens in each thread:
while (!finished) {
window.draw(...);
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
The draw function skeleton looks like:
draw(Arg arg) {
static std::mutex mtx;
std::lock_guard lock{mtx};
// .... drawing calls. Including
device.acquireNextImageKHR(...);
// Fill command bufers
graphicsQueue.submit(...);
presentQueue.presentKHR(presentInfo);
}
This is C++17 which slightly simplifies the syntax but is otherwise irrelevant.
Clearly everything is under a mutex. I also intercept the call to the debug message. When I do so, I see that one thread is waiting for glfw events, one is printing the vulkan layer message and the other two threads are trying to acquire the mutex for the lock_guard.
I am at a loss as to what is going on or how to even figure out what is causing this.
I am running on linux, and it does not crash. However on Mac OS X, after a random amount of time, the code will crash in a queue submit call of MoltenVK and when the crash happens, I see a similar situation of the threads. That is to say no other thread is inside a Vulkan call.
I'd appreciate any ideas. My next move would be to move all queue submissions to a single thread, though that is not my favorite solution.
PS: I created a complete MCVE under the Vookoo framework. It is at https://github.com/FunMiles/Vookoo/tree/lock_guard_queues and is the example 00-parallelTriangles
To try it, do the following:
git clone https://github.com/FunMiles/Vookoo.git
cd Vookoo
git checkout lock_guard_queues
mkdir build
cd build
cmake ..
make
examples/00-parallelTriangles
The way you call the draw is:
window.draw(device, fw.graphicsQueue(), [&](){//some lambda});
The insides of draw is protected by mutex, but the fw.graphicsQueue() isn't.
fw.graphicsQueue() million abstraction layers below just calls vkGetDeviceQueue. I found executing vkGetDeviceQueue in parallel with vkQueueSubmit causes the validation error.
So there are few issues here:
There is a bug in layers that causes multiple initialization of VkQueue state on vkGetDeviceQueue, which is the cause of the validation error
KhronosGroup/Vulkan-ValidationLayers#1751
Thread id 0 is not a separate issue. As there are not any actual previous accesses, thread id is not recorded. The problem is the layers issue the error because the access count goes into negative because it is previously wrongly reset to 0.
Arguably there is some spec issue here. It is not immediatelly obvious from the text that VkQueue is not actually accessed in vkGetDeviceQueue, except the silent assumption that it is the sane thing to do.
KhronosGroup/Vulkan-Docs#1254

Multithreading (pthreads)

I'm working on a project where I need to make a program run on multiple threads. However, I'm running into a bit of an issue.
In my program, I have an accessory function called 'func_call'.
If I use this in my code:
func_call((void*) &my_pixels);
The program runs fine.
However, if I try to create a thread, and then run the function on that, the program runs into a segmentation fault.
pthread_t thread;
pthread_create (&thread, NULL, (void*)&func_call, (void*) &my_pixels);
I've included pthread.h in my program. Any ideas what might be wrong?
You are not handling data in a thread safe manner:
the thread copies data from the thread argument, which is a pointer to the main thread's my_pixels variable; the main thread may exit, making my_pixles invalid.
the thread uses scene, main thread calls free_scene() on it, which I imagine makes it invalid
the thread calls printf(), the main thread closes stdout (kind of unusual itself)
the thread updates the picture array, the main thread accesses picture to output data from it
It looks like you should just wait for the thread to finish its work after creating it - call pthread_join() to do that.
For a single thread, that would seem to be pointless (you've just turned a multi-threaded program into a single threaded program). But on the basis of code that's commented out, it looks like you're planning to start up several threads that work on chunks of the data. So, when you get to the point of trying that again, make sure you join all the threads you start. As long as the threads don't modify the same data, it'll work. Note that you'll need to use separate my_pixels instances for each thread (make an array of them, just like you did with pthreads), or some threads will likely get parameters that are intended for a different thread.
Without knowing what func_call does, it is difficult to give you an answer. Nevertheless, here are few possibilities
Does func_call use some sort of a global state - check if that is initialized properly from within the thread. The order of execution of threads is not always the same for every execution
Not knowing your operating system (AIX /Linux/Solaris etc) it is difficult to answer this, but please check your compilation options
Please provide the signal trapped and atleast a few lines of the stack-trace - for all the threads. One thing you can check for yourself is to print the threads' stack-track (using threads/thread or pthread and thread current <x> based on the debugger) and and if there is a common data that is being accessed. It is most likely that the segfault occurred when two threads were trying to read off the other's (uncommitted) change
Hope that helps.
Edit:
After checking your code, I think the problem is the global picture array. You seem to be modifying that in the thread function without any guards. You loop using px and py and all the threads will have the same px and py and will try to write into the picture array at the same time. Please try to modify your code to prevent multiple threads from stepping on each other's data modifications.
Is func_call a function, or a function pointer? If it's a function pointer, there is your problem: you took the address of a function pointer and then cast it.
People are guessing because you've provided only a fraction of the program, which mentions names like func_call with no declaration in scope.
Your compiler must be giving you diagnostics about this program, because you're passing a (void *) expression to a function pointer parameter.
Define your thread function in a way that is compatible with pthread_create, and then just call it without any casts.

Problem handling file I/O with libevent2

I worked with libevent2 for some time, but usually I used it to handle network I/O (using sockets). Now I need to read many different files so I also wanted to use it. I created this code:
int file = open(filename, O_RDONLY);
struct event *ev_file_read = event_new(ev_base, file, EV_READ | EV_PERSIST, read_file, NULL);
if(event_add(ev_file_read, NULL))
error("adding file event");
Unfortunately it doesn't work. I get this message when trying to add event:
[warn] Epoll ADD(1) on fd 7 failed. Old events were 0; read change was 1 (add); write change was 0 (none): Operation not permitted
adding file event: Operation not permitted
The file exists and has rights to read/write.
Anyone has any idea how to handle file IO using libevent? I thought also about bufferred events, but in API there's only function bufferevent_socket_new() which doesn't apply here.
Thanks in advance.
I needed libevent to read many files regarding priorities. The problem was in epoll not in libevent. Epoll doesn't support regular Unix files.
To solve it I forced libevent not to use epoll:
struct event_config *cfg = event_config_new();
event_config_avoid_method(cfg, "epoll");
ev_base = event_base_new_with_config(cfg);
event_config_free(cfg);
Next method on the preference list was poll, which fully support files just as I wanted to.
Thank you all for answers.
Makes no sense to register regular file descriptors with libevent. File descriptors associated with regular files shall always select true for ready to read, ready to write, and error conditions.
if you want to do async disk i/o you may want to check the aio_* family (see man (3) aio_read). it's POSIX.1-2001 and available on linux and bsd (at least).
for integrating aio operations with libevent, see libevent aio patch and a related stackoverflow post that mention using signalfd(2) to route the aio signal events to a file descriptor that can be used with various fd event polling implementations (so implicitly with libevent loop).
EDIT: libevent also has signal handling support (totally forgot about that) so you can try and register/handle the aio signals directlry with/from libevent loop. I'd personally go and try the libevent patch first if your development rules allows you to.

Can I prevent a Linux user space pthread yielding in critical code?

I am working on an user space app for an embedded Linux project using the 2.6.24.3 kernel.
My app passes data between two file nodes by creating 2 pthreads that each sleep until a asynchronous IO operation completes at which point it wakes and runs a completion handler.
The completion handlers need to keep track of how many transfers are pending and maintain a handful of linked lists that one thread will add to and the other will remove.
// sleep here until events arrive or time out expires
for(;;) {
no_of_events = io_getevents(ctx, 1, num_events, events, &timeout);
// Process each aio event that has completed or thrown an error
for (i=0; i<no_of_events; i++) {
// Get pointer to completion handler
io_complete = (io_callback_t) events[i].data;
// Get pointer to data object
iocb = (struct iocb *) events[i].obj;
// Call completion handler and pass it the data object
io_complete(ctx, iocb, events[i].res, events[i].res2);
}
}
My question is this...
Is there a simple way I can prevent the currently active thread from yielding whilst it runs the completion handler rather than going down the mutex/spin lock route?
Or failing that can Linux be configured to prevent yielding a pthread when a mutex/spin lock is held?
You can use the sched_setscheduler() system call to temporarily set the thread's scheduling policy to SCHED_FIFO, then set it back again. From the sched_setscheduler() man page:
A SCHED_FIFO process runs until either
it is blocked by an I/O request, it is
preempted by a higher priority
process, or it calls sched_yield(2).
(In this context, "process" actually means "thread").
However, this is quite a suspicious requirement. What is the problem you are hoping to solve? If you are just trying to protect your linked list of completion handlers from concurrent access, then an ordinary mutex is the way to go. Have the completion thread lock the mutex, remove the list item, unlock the mutex, then call the completion handler.
I think you'll want to use mutexes/locks to prevent race conditions here. Mutexes are by no way voodoo magic and can even make your code simpler than using arbitrary system-specific features, which you'd need to potentially port across systems. Don't know if the latter is an issue for you, though.
I believe you are trying to outsmart the Linux scheduler here, for the wrong reasons.
The correct solution is to use a mutex to prevent completion handlers from running in parallel. Let the scheduler do its job.

Using sigprocmask to implement locks

I'm implementing user threads in Linux kernel 2.4, and I'm using ualarm to invoke context switches between the threads.
We have a requirement that our thread library's functions should be uninterruptable by the context switching mechanism for threads, so I looked into blocking signals and learned that using sigprocmask is the standard way to do this.
However, it looks like I need to do quite a lot to implement this:
sigset_t new_set, old_set;
sigemptyset(&new_set);
sigaddset(&new_set, SIGALRM);
sigprocmask(SIG_BLOCK, &new_set, &old_set);
This blocks SIGALARM but it does this with 3 function invocations! A lot can happen in the time it takes for these functions to run, including the signal being sent.
The best idea I had to mitigate this was temporarily disabling ualarm, like this:
sigset_t new_set, old_set;
time=ualarm(0,0);
sigemptyset(&new_set);
sigaddset(&new_set, SIGALRM);
sigprocmask(SIG_BLOCK, &new_set, &old_set);
ualarm(time, 0);
Which is fine except that this feels verbose. Isn't there a better way to do this?
As WhirlWind points out, the signal set functions are quite lightweight and may even be implemented as macros; and you can also just keep around a signal set that contains only SIGALRM and re-use that.
Regardless, it doesn't actually matter if the signal happens during the sigaddset() or sigemptyset() calls - the new_set and old_set variable are (presumably) thread-local, and the critical section isn't entered until after sigprocmask() returns.
You'll find that sigemptyset() and sigaddset() in signals.h are just macros or inline functions, so they execute inline in your code. Just use a stack variable when you call them.
However, why don't you do this in a single-threaded startup section of your code? I also doubt the function call to sigprocmask will be atomic. Blocking signals does not mean your code will be uninterruptible.
By the way, I'm not sure how you're using ualarm, but if you're not catching or ignoring SIGALARM when you call it the first time, you'll probably kill your process.
sigprocmask() is the only function that goes to kernel level and actually changes the signal masking status. The other functions are just manipulation functions for setting up the mask before calling sigprocmask or passing the set to another signal related function.

Resources