How to keep linux flock(2) from starving exclusive lock requests? - linux

I am using flock(2) in linux to control access to resources in a homespun database, using both shared and exclusive locking modes. I find that if a shared lock is granted, then another process can also get a shared lock, regardless of whether there are blocked processes waiting for exclusive locks. This means that for a popular resource with many overlapping readers, an exclusive lock request can starve for a long time, perhaps forever.
This behavior does not contradict the flock(2) man page, but it surprises me because this code has been working for years in FreeBSD and OS-X without a problem. My guess is that BSD systems must implement some kind of queue to prevent exclusive locks from starving forever.
My primary question is, is there any simple trick or programming pattern to keep my exclusive locks from starving?
A secondary question, to satisfy my curiousity, does anyone know if this really is different on BSD systems like I suspect?

I had exactly same problem on FreeBSD 7.2 and found no way to prevent writer starvation on flock(). You have to choose other locking method, like SysV IPC or simple stop-flag file.

Related

Do all types of interprocess/interthread communication need system calls?

In Linux,
do all types of interprocess communication need system calls?
Types of interprocess communication are such as
Pipes
Signals
Message Queues
Semaphores
Shared Memory
Sockets
Do all types of interthread communication need system calls?
I would like to know if all interprocess communications and interthread communications involve switching from user mode to kernel mode so that the OS kernel will run to perform the communications? Since system calls all involve such switch, I asked if the communications need system calls.
For example, "Shared memory" can be used for both interprocess and interthread communcations, but i am not sure if it requires system calls or involvement of OS kernel to take over the cpu to perform something.
Thanks.
For interprocess communication I am pretty sure you cannot avoid system calls.
For interthread communication I cannot give you a definitive answer, but my educated guess would be "yes-and-no". You see, you can communicate between threads using thread-safe queues, and the only thing that a thread-safe queue needs in order to work is a lock. If a lock is unavailable at the moment that a thread wants to obtain it, then of course the system must be involved in order to put the thread in a waiting mode. But if the lock is available to obtain, then the thread should be able to proceed without the need for any system call.
That's what I would guess, and I would be quite disappointed to find out that things do not actually work this way, because that would mean that code which I have up until now been considering pretty innocent in fact has a tremendous additional hidden overhead.
Yes, every IPC was set by some syscalls(2).
It might happen that some IPC was set by a previous program (e.g. the program in the same process before execve), for example when running a pipeline like ls | ./yourprog it is the shell which has called pipe(2), not yourprog.
Since threads -in the same process- (by definition) share a common address space they can communicate using some shared data. However, they often need some syscall for synchronization (e.g. with mutexes), see e.g. futex(7) - because you want to avoid spinlocks (i.e. wasting CPU power for waiting). But in practice you should use pthreads(7)
In practice you cannot use shared memory (like shm_overview(7)) without synchronization (e.g. with semaphores, see sem_overview(7)). Notice that cache coherence is tricky and makes memory model sometimes non-intuitive (and processor specific).
At least, you do not need a system call for each read/write to shared memory. Setting up shared memory will for sure and synchronizing threads/processes will often involve system calls.
You could use flags in shared memory for synchronization, but note that read and write of flags may not be atomic actions.
(For example if you set up a location in shared memory to be 0 in the beginning and then check for it to be non-zero, while the other process sets it to non-zero when ready for something)

Why are thread locks resources?

I recently read that thread locks are system resources, therefore they have to be properly released "just like memory". I realised I wasn't really aware of this.
Can someone offer some further elaboration on this fact or point to a good reference? More specifically: how can I think about the implementation of locks at a deeper system level? What are the possible consequences of leaking locks? Is there a maximum number of locks available in the system?
All that means is that you have to be careful that anything you lock gets released, similar to how you would be careful to close network connections or files or graphic device contexts or whatever. If you write code that is not careful about that, then you risk having the program deadlock or be unable to progress when it can't get access to something that's locked (because the point of locking is to make sure multiple threads can access something safely, so if one thread leaves something locked other threads that need to access it are shut out).
The program will have severe performance issues a long time before it runs out of physical locks, so typically you shouldn't have to worry about the number of available locks.

How can I synchronized two process accessing on the same resources?

I have two processes which access to the same physical memory(GPIO data addr).
So how can I have synchronized between these apps?
I understand that we have some kind of locking mechanism such as mutex and semaphore, so which method is the fastest?
Thank for your help,
-nm
Mutexes and semaphores are generally considered to be concurrency solutions in the same address space -- meaning that different parts of the same program will lock their access to a resource using one of these contraptions.
When you're dealing with separate processes, the standard way to do this on Linux is to create something in /var/lock, like /var/lock/myapp.lock, and place your PID followed by a newline in it. Then other processes will check for its existence, and if you're crafty check the PID to make sure it's still alive, too.
If you need real-time access to the area, skip the filesystem and the processes will have to communicate via IPC (LET_ME_KNOW_WHEN_DONE, OKAY_IM_DONE, you get the idea), or -- better -- write a process whose sole purpose is to read and write to the GPIO memory, and your other programs communicate with it via IPC (probably the best approach).
mutex means mutual exclusion -- a semaphore is just a variable used to determine if the resource is in use. In windows, there is a Mutex object that can be created to protect a shared resource.
The issue is what language are you using? What OS (I am assuming linux). Most languages provide support for multi-threading and mutual exclusion, and you should use the built-in constructs.
For example, using C on Linux, you might want to
include semaphore.h
and look up the calls for sem_init, sem_wait etc.

What interprocess locking calls should I monitor?

I'm monitoring a process with strace/ltrace in the hope to find and intercept a call that checks, and potentially activates some kind of globally shared lock.
While I've dealt with and read about several forms of interprocess locking on Linux before, I'm drawing a blank on what to calls to look for.
Currently my only suspect is futex() which comes up very early on in the process' execution.
Update0
There is some confusion about what I'm after. I'm monitoring an existing process for calls to persistent interprocess memory or equivalent. I'd like to know what system and library calls to look for. I have no intention call these myself, so naturally futex() will come up, I'm sure many libraries will implement their locking calls in terms of this, etc.
Update1
I'd like a list of function names or a link to documentation, that I should monitor at the ltrace and strace levels (and specifying which). Any other good advice about how to track and locate the global lock in mind would be great.
If you can start monitored process in valgrind, then there are two projects:
http://code.google.com/p/data-race-test/wiki/ThreadSanitizer
and Helgrind
http://valgrind.org/docs/manual/hg-manual.html
Helgrind is aware of all the pthread
abstractions and tracks their effects
as accurately as it can. On x86 and
amd64 platforms, it understands and
partially handles implicit locking
arising from the use of the LOCK
instruction prefix.
So, this tools can detect even atomic memory accesses. And they will check pthread usage
flock is another good one
There are many system calls can be used for locking: flock, fcntl, and even create.
When you are using pthreads/sem_* locks they may be executed in user space so you'll never
see them in strace as futex is called only for pending operations. Like when you actually
need to wait.
Some operations can be done in user space only - like spinlocks - you'll never see them
unless they do some waits for timer - backoff so you may see only stuff like nanosleep when one lock waits for other.
So there is no "generic" way to trace them.
on systems with glibc ~ >= 2.5 (glibc + nptl) you can use process shared
semaphores (last parameter to sem_init), more precisely, posix unnamed semaphores
posix mutexes (with PTHREAD_PROCESS_SHARED to pthread_mutexattr_setpshared)
posix named semaphores (got from sem_open/sem_unlink)
system v (sysv) semaphores: semget, semop
On older systems with glibc 2.2, 2.3 with linuxthreads or on embedded systems with uClibc you can use ONLY system v (sysv) semaphores for iterprocess communication.
upd1: any IPC and socker must be checked.

Why lock may become a bottleneck of multithreaded program?

Why lock may become a bottleneck of multithreaded program?
If I want my queue frequently pop() and push() by multithread,
which lock should I use?
The lock you use depends on your platform but will generally be some flavour of mutex. On windows, you would use a critical section and in .NET, you'd use a monitor. I'm not very familiar with locking mechanisms on other platforms. I'd stay away from lock free approaches. They are very difficult to program correctly and the performance gains are often not as great as you would expect.
Locks become a bottleneck in your program when they are under heavy contention. That is, a very large number of threads all try to acquire the lock at the same time. This wastes a lot of CPU cycles as threads become blocked and the OS spends a greater and greater portion of its time switching between threads. This sort of problem most frequently manifests itself in the server world. For desktop applications, it's rare that locks will cause a performance issue.
"Why lock may become a bottleneck of multithreaded program?" - think of a turnstile (also called a baffle gate), which only lets one person through at a time, with a crowd of people waiting to go through it.
For a queue, use the simplest lock your environment has to offer.
For a queue, it is easy to write a lock-free implementation (google away)
Locks are bottlenecks because they force all other threads which encounter them to stop doing what they're doing and wait for the lock to open, thus wasting time. One of the ideas behind multithreading is to use as many processors as possible at any given time. By forcing threads to wait on the locks the application essentially gives up processing power which it might have used.
"Why lock may become a bottleneck of multithreaded program?"
Because waiting threads remain blocked until shared memory is unlocked.
Suggest you read this article on "Concurrency: What Every Dev Must Know About Multithreaded Apps" http://msdn.microsoft.com/en-au/magazine/cc163744.aspx
Locks are expensive both because they require operating system calls in the middle of your algorithm and because they are hard to do properly when creating the CPU.
As a programmer, it is best to leave the locks in the middle of your data structures to the experts and instead use a good multithreaded library such as Intel's TBB
For Queues, you would want to use Atomic instructions (hard) or a spinlock (easier) if possible because they are cheap compared to a mutex. Use a mutex if you are doing a lot of work that needs to be locked, i.e modify a complex tree structure
In the threading packages that I'm familiar with, your options for mutexes are recursive and non-recursive. You should opt for non-recursive -- all of your accesses will be lock(); queue_op(); unlock(), so there's no need to be able to acquire the lock twice.

Resources