Linux driver resource protection - linux

I'm writing a Linux device driver and am pretty new at this so I'm learning quickly how NOT to do things. I'm currently using a couple of mutexes to prevent some functions from concurrently reading from the device and running into deadlocks on resume from suspend. My problem is two-fold:
1) Interrupt handler schedules a workqueue to read from the FIFO of the device and process data. FIFO needs to be read uninterrupted by other reads so I have placed a mutex (A) lock/unlock in the read, write functions.
2) Device configuration function is a sequence of read and writes using the same read, write functions as above that must be done uninterrupted by other reads or writes so I have placed a mutex (B) lock/unlock in the config functions. Device configuration functions are called by SYSFS nodes.
The issue appears to be when the system resumes from suspend, an interrupt triggers the FIFO call and at near the same time higher layers write to the SYSFS nodes to set configuration parameters and the system seems to deadlock during configuration sequence. Is my issue just that I'm using mutex which sleeps where I should be using a spinlock? Or am I going about this the wrong way?

Get an interrupt.
Ack/disable interrupt in interrupt handler.
Start work queue.
Do high-priority processing, like get data off device and onto queue.
Enable device interrupt and go process lower-priority data in work queue.
Two difference mutexes here can't work because the locking order may be A->B [where try to acquire mutex B while holding mutex A] while another path is B->A which is textbook deadly embrace.
The solution is to re-structure your processing in to high-priority work (very limited task) to give data to the lower-priority work.
If the test for busy/availableis more than just a yes/no test, use a condition variable to guard the complex testing must be done.

Related

How to detect if a linux thread is crashed

I've this problem, I need to understand if a Linux thread is running or not due to crash and not for normal exit. The reason to do that is try to restart the thread without reset\restart all system.
The pthread_join() seems not a good option because I've several thread to monitoring and the function return on specific thread, It doesn't work in "parallel". At moment I've a keeep live signal from thread to main but I'm looking for some system call or thread attribute to understand the state
Any suggestion?
P
Thread "crashes"
How to detect if a linux thread is crashed
if (0) //...
That is, the only way that a pthreads thread can terminate abnormally while other threads in the process continue to run is via thread cancellation,* which is not well described as a "crash". In particular, if a signal is received whose effect is abnormal termination then the whole process terminates, not just the thread that handled the signal. Other kinds of errors do not cause threads to terminate.
On the other hand, if by "crash" you mean normal termination in response to the thread detecting an error condition, then you have no limitation on what the thread can do prior to terminating to communicate about its state. For example,
it could update a shared object that tracks information about your threads
it could write to a pipe designated for the purpose
it could raise a signal
If you like, you can use pthread_cleanup_push() to register thread cleanup handlers to help with that.
On the third hand, if you're asking about detecting live threads that are failing to make progress -- because they are deadlocked, for example -- then your best bet is probably to implement some form of heartbeat monitor. That would involve each thread you want to monitor periodically updating a shared object that tracks the time of each thread's last update. If a thread goes too long between beats then you can guess that it may be stalled. This requires you to instrument all the threads you want to monitor.
Thread cancellation
You should not use thread cancellation. But if you did, and if you include termination because of cancellation in your definition of "crash", then you still have all the options above available to you, but you must engage them by registering one or more cleanup handlers.
GNU-specific options
The main issues with using pthread_join() to check thread state are
it doesn't work for daemon threads, and
pthread_join() blocks until the specified thread terminates.
For daemon threads, you need one of the approaches already discussed, but for ordinary threads on GNU/Linux, Glibc provides non-standard pthread_tryjoin_np(), which performs a non-blocking attempt to join a thread, and also pthread_timedjoin_np(), which performs a join attempt with a timeout. If you are willing to rely on Glibc-specific functions then one of these might serve your purpose.
Linux-specific options
The Linux kernel makes per-process thread status information available via the /proc filesystem. See How to check the state of Linux threads?, for example. Do be aware, however, that the details vary a bit from one kernel version to another. And if you're planning to do this a lot, then also be aware that even though /proc is a virtual filesystem (so no physical disk is involved), you still access it via slow-ish I/O interfaces.
Any of the other alternatives is probably better than reading files in /proc. I mention it only for completeness.
Overall
I'm looking for some system call or thread attribute to understand the state
The pthreads API does not provide a "have you terminated?" function or any other such state-inquiry function, unless you count pthread_join(). If you want that then you need to roll your own, which you can do by means of some of the facilities already discussed.
*Do not use thread cancellation.

Memory visibility for an in-order queue

After reading the OpenCL 1.1 standard I still can't grasp whether in-order command queue does guarantee memory visibility for any pair of commands (not only kernels) according to their enqueueing order.
OpenCL standard 1.1 section 5.11 states:
If the CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE property of a
command-queue is not set, the commands enqueued to a command-queue
execute in order. For example, if an application calls clEnqueueNDRangeKernel to execute kernel A followed by a
clEnqueueNDRangeKernel to execute kernel B, the application can
assume that kernel A finishes first and then kernel B is executed. If
the memory objects output by kernel A are inputs to kernel B then
kernel B will see the correct data in memory objects produced by execution of kernel A.
What about clEnqueueWriteBuffer (non-blocking) and clEnqueueNDRangeKernel enqueued after, which uses that buffer contents?
AFAIK, 'finishes execution' does not imply that corresponding writes are visible (due to relaxed consistency). For example, section 5.10 states specifically:
The clEnqueueBarrier command ensures that all queued commands in
command_queue have finished execution before the next batch of
commands can begin execution. The clEnqueueBarrier command is a
synchronization point.
In other words, should I rely on other 'synchronization points'-related rules (events, etc.), or I get memory synchronization out-of-the-box for all the commands in an in-order queue?
What about clEnqueueWriteBuffer (non-blocking) and
clEnqueueNDRangeKernel enqueued after, which uses that buffer
contents?
since it is in-order queue, it will first write then run the kernel after it finishes, even if the write is non-blocking.
clEnqueueBarrier is device-side synchronization command and is intended to work with out-of-order queues. When you use clFinish(), you make the api wait more for the communication between host and device. Enqueueing barrier is much faster synchronization but on the device side only. When you need to synchronize a queue with another queue and still need a similar sync point, you should use clEnqueueWaitForEvents just after(or before) the barrier or simply use only the even waiting(for in-order queue).
For opencl 1.2, clEnqueueWaitForEvents and clEnqueueBarrier was merged into clEnqueueBarrierWithWaitList which lets you both barrier out-of-order queue and synchronize it with other queues or even host-side-raised events.
If there is only single in-order queue, you don't need a barrier and when you need to synchronize with host, you can use clFinish or an event-based synchronization command.
or I get memory synchronization out-of-the-box for all the commands in
an in-order queue?
for only enqueue type commands, yes. Enqueue (1 write + 1 compute + 1 read) operations 128 times in an in-order queue, they all will work one after another and complete a 128-step simulation(after they are issued by a flush/finish command) . Commands don't have to be in a specific order for this implicit synchronization. Anything like 1 write + 2 reads + 2 kernels +5 writes +1 read + 1 kernel +15 reads work one after another(2 kernels = 1 kernel + 1kernel).
For non-enqueue type commands such as clSetKernelArg, you have to use a synchronization point or do it before all enqueuing of commands.
You can also use enqueued commands themselves as an inter-queue sync point with its eventlist parameter and use the next parameter to get its completion event to be used in another queue(signaling) but its still not a barrier for out-of-order queue.
If a buffer is used for two kernels that are in different queues and they are to write data on that buffer, there must be synchronization between queues unless they are writing on different locations. So you can use 20 kernels working on each 1/20th of a buffer and work in parallel using multiple queues and finally synchronize all queues only in the end using a wait list. If a kernel uses or alters another kernels region concurrently, it is undefined behaviour. Similar process can be done for map/unmap too.
in-order vs out-of-order example:
r: read, w: write, c: compute
<------------clFinish----------------------->
in-order queue....: rwrwrwcwccwccwrcwccccccwrcwcwrwcwrwccccwrwcrw
out-of-order queue: <--r--><-r-><-c-><-----c-----><-----r-----><w>
<---w-------><-------r-----><------c----->
<---r---><-----c--------------------><--c->
<---w--->
<---c-----> <----w------>
and another out-of-order queue with a barrier in the middle:
<---r---><--w---> | <-----w---->
<---c------> | <----c---> <----w--->
<---c--------> | <------r------->
<----w------> | <----c------->
where read/write operations before barrier forced to wait until all commands hit same barrier. Then all remaining ones continue concurrently.
The last exemple shows, memory visibility from "host side" can be acquired by barrier or clfinish. But barrier doesn't inform host that it has finished so you need to query events about the queue. ClFinish blocks until all commands are finished so you don't need to query anything. Both will make host see the most updated memory.
Your question is about memory visibility for commands of an in-order queue, so you don't need a synchronization point for them to see each others most-up-to-date-values.
Each kernel execution is also a synchronization point between its work groups so work groups can't know other groups' data until kernel finishes and all data is prepared and becomes visible at the end of kernel execution. So next kernel can use it immediately.
I haven't tried to read data concurrently from device to host without any synchronization points but it may work for some devices that are not caching any data on any cache memory. Even integrated gpus have their dedicated L3 caches so it would need at least a barrier command once in a while, to let the host read some updated(but possibly partially re-updated in-flight) data. Event-based synchronization is faster than clFinish and gives correct memory data for host. Barrier is also faster than clFinish but only usable for device-side sync points.
If I understand correctly,
Sync Point ------------------------- Memory visibility
in-kernel fence in same workitem(and wavefront?)
in-kernel local memory barrier local memory in same workgroup
in-kernel global memory barrier global memory in same workgroup
in-kernel atomics only other atomics in same kernel
enqueued kernel/command next kernel/command in same queue
enqueued barrier following commands in same device
enqueued event wait host
clFinish host
https://www.khronos.org/registry/OpenCL/sdk/1.1/docs/man/xhtml/clEnqueueMapBuffer.html
If the buffer object is created with CL_MEM_USE_HOST_PTR set in
mem_flags, the host_ptr specified in clCreateBuffer is guaranteed to
contain the latest bits in the region being mapped when the
clEnqueueMapBuffer command has completed; and the pointer value
returned by clEnqueueMapBuffer will be derived from the host_ptr
specified when the buffer object is created.
and
https://www.khronos.org/registry/OpenCL/sdk/1.1/docs/man/xhtml/clEnqueueWriteBuffer.html
All commands that use this buffer object or a memory object (buffer or
image) created from this buffer object have finished execution before
the read command begins execution.
so it doesn't say anything like a barrier or sync. Completion is just enough.
From the spec:
In-order Execution: Commands are launched in the order they appear in the commandqueue and complete in order. In other words, a prior
command on the queue completes before the following command begins.
This serializes the execution order of commands in a queue.
In case of in-order queues all commands in a queue executed in order, no extra synchronisation is required.

Where does the wait queue for threads lies in POSIX pthread mutex lock and unlock?

I was going through concurrency section from REMZI and while going through mutex section, and I got confused about this:
To avoid busy waiting, mutex implementations employ park() / unpark() mechanism (on Sun OS) which puts a waiting thread in a queue with its thread ID. Later on during pthread_mutex_unlock() it removes one thread from the queue so that it can be picked by the scheduler. Similarly, an implementation of Futex (mutex implementation on Linux) uses the same mechanism.
It is still unclear to me where the queue lies. Is it in the address space of the running process or somewhere inside the kernel?
Another doubt I had is regarding condition variables. Do pthread_cond_wait() and pthread_cond_signal() use normal signals and wait methods, or do they use some variant of it?
Doubt 1: But, it is still unclear to me where actually does the queue lies. Is it in the address space of the running process or somewhere inside kernel.
Every mutex has an associated data structure maintained in the kernel address space, in Linux it is futex. That data structure has an associated wait queue where threads from different processes can queue up and wait to be woken up, see futex_wait kernel function.
Doubt 2: Another doubt I had is regarding condition variables, does pthread_cond_wait() and pthread_cond_signal() use normal signal and wait methods OR they use some variant of it.
Modern Linux does not use signals for condition variable signaling. See NPTL: The New Implementation of Threads for Linux for more details:
The addition of the Fast Userspace Locking (futex) into the kernel enabled a complete reimplementation of mutexes and other synchronization mechanisms without resorting to interthread signaling. The futex, in turn, was made possible by the introduction of preemptive scheduling to the kernel.

C# When thread switching will most probably occur?

I was wondering when .Net would most probably switch from a thread to another?
I understand we can't predict when this will happen exactly, but is there any intelligence in this? For example, when a thread is executed will it try to wait for a method to returns or a loop to finish before switching?
I'm not an expert on .NET, but in general scheduling is handled by the kernel.
Either your thread's timeslice has expired (threads/processes only get a certain amount of CPU time)
Your thread has blocked for IO.
Some other obscure reason, like waiting for an IPC message, a network packet or something.
Threads can be preempted at any point along their execution path, be it in a loop or returning from a function. This in general isn't handled by the underlying VM (.NET or JVM) but is controlled by the OS.
Of course there is 'intelligence', of a sort:). The set of running threads can only change upon an interrupt, either:
An actual hardware interrupt from a peripheral device, eg. disk, NIC, KB, mouse, timer.
A software interrupt, (ie. a system call), that can change the state of thread/s. This encompasses sleep calls and calls to wait/signal on inter-thread synchro objects, as well as I/O calls that request data that is not immediately available.
If there is no interrupt, the OS cannot change the set of running threads because it is not entered. The OS does not know or care about loops, function/methods calls, (except those that make system calls as above), gotos or any other user-level flow-control mechanisms.
I read your question now, it may not be rellevant anymore, but after reading the above answers, i want to just to make sure:
Threads are managed (or as i know) by the process they belong to. There is nothing to do with the Operation System(and that's is the main reason why working with multithreads is more faster than working with multiprocess, because there are data sharing between threads and the switching between them is occuring faster than the context switch wich occure between process by the Short-Term-Scheduler).
(NOTE: There are two types of threads: USER_MODE' threads and KERNEL_MODE' threadss, and each os can have both of them or just on of them. Anyway a thread that working in a user application environment is considered as a USER_MODE' thread and managed by the process it's belong to.)
Am I Write?
Thanks!!!

How does Linux blocking I/O actually work?

In Linux, when you make a blocking i/o call like read or accept, what actually happens?
My thoughts: the process get taken out of the run queue, put into a waiting or blocking state on some wait queue. Then when a tcp connection is made (for accept) or the hard drive is ready or something for a file read, a hardware interrupt is raised which lets those processes waiting to wake up and run (in the case of a file read, how does linux know what processes to awaken, as there could be lots of processes waiting on different files?). Or perhaps instead of hardware interrupts, the individual process itself polls to check availability. Not sure, help?
Each Linux device seems to be implemented slightly differently, and the preferred way seems to vary every few Linux releases as safer/faster kernel features are added, but generally:
The device driver creates read and
write wait queues for a device.
Any process thread wanting to wait
for i/o is put on the appropriate
wait queue. When an interrupt occurs
the handler wakes up one or more
waiting threads. (Obviously the
threads don't run immediately as we are in interrupt
context, but are added to the
kernel's scheduling queue).
When scheduled by the kernel the
thread checks to see if conditions
are right for it to proceed - if not
it goes back on the wait queue.
A typical example (slightly simplified):
In the driver at initialisation:
init_waitqueue_head(&readers_wait_q);
In the read function of a driver:
if (filp->f_flags & O_NONBLOCK)
{
return -EAGAIN;
}
if (wait_event_interruptible(&readers_wait_q, read_avail != 0))
{
/* signal interrupted the wait, return */
return -ERESTARTSYS;
}
to_copy = min(user_max_read, read_avail);
copy_to_user(user_buf, read_ptr, to_copy);
Then the interrupt handler just issues:
wake_up_interruptible(&readers_wait_q);
Note that wait_event_interruptible() is a macro that hides a loop that checks for a condition - read_avail != 0 in this case - and repeatedly adds to the wait queue again if woken when the condition is not true.
As mentioned there are a number of variations - the main one is that if there is potentially a lot of work for the interrupt handler to do then it does the bare minimum itself and defers the rest to a work queue or tasklet (generally known as the "bottom half") and it is this that would wake the waiting threads.
See Linux Device Driver book for more details - pdf available here:
http://lwn.net/Kernel/LDD3
Effectivly the method will only returns when the file is ready to read, when data is on a socket, when a connection has arrived...
To make sure it can return immediatly you probably want to use the Select system call to find a ready file descriptor.
Read this: http://www.minix3.org/doc/
It's a very, clear, very easy to understand explanation. It generally applies to Linux, also.

Resources