Why are thread locks resources? - multithreading

I recently read that thread locks are system resources, therefore they have to be properly released "just like memory". I realised I wasn't really aware of this.
Can someone offer some further elaboration on this fact or point to a good reference? More specifically: how can I think about the implementation of locks at a deeper system level? What are the possible consequences of leaking locks? Is there a maximum number of locks available in the system?

All that means is that you have to be careful that anything you lock gets released, similar to how you would be careful to close network connections or files or graphic device contexts or whatever. If you write code that is not careful about that, then you risk having the program deadlock or be unable to progress when it can't get access to something that's locked (because the point of locking is to make sure multiple threads can access something safely, so if one thread leaves something locked other threads that need to access it are shut out).
The program will have severe performance issues a long time before it runs out of physical locks, so typically you shouldn't have to worry about the number of available locks.

Related

posix: interprocess lock abandoned, is there a better way?

I'm coding on AIX, but looking for a general 'nix solution, posix compliant ideally. Can't use anything in C++11 or later.
I have shared memory with many threads from many processes involved. The data in shared memory has to stay self-consistent, so I need a lock, to get everyone to take turns.
Processes crashing with the lock is a thing, so I have to be able to detect an abandoned lock, fix (aka reset) the data, and move on. Twist: deciding the lock is abandoned by waiting for it for some fixed period is not a viable solution.
A global mutex (either living in shared memory, or named) appears not to be a solution. There's no detection mechanism for abandonment (except timing) and even then you can't delete and reform the mutex without risking undefined behaviour.
So I opted for lockf() and a busy flag - get the file lock, set the flag in shared memory, do stuff, unset the flag, drop the lock. On a crash with the lock owned, the lock is automatically dropped, and the next guy to get it can see the busy flag is still set, and knows he has to clean up a mess.
This doesn't work - because lockf() will keep threads from other processes out, but it has special semantics for other threads in your own process. It lets them through unchecked.
In the end I came up with a two step solution - a local (thread) mutex and a file lock. Get the local mutex first; now you're the only thread in this process doing the next step, which is lockf(). lockf() in turn guarantees you're the only process getting through, so now you can set the busy flag and do the work. To unlock, go in reverse order: clear the busy flag, drop the file lock, drop the mutex lock. In a crash, the local mutex vanishes when the process does, so it's harmless.
Works fine. I hate it. Using two locks nested like this strikes me as expensive, and takes a page worth of comments in the code to explain. (My next code review will be interesting). I feel like I missed a better solution. What is it?
Edit: #Matt I probably wasn't clear. The busy flag isn't part of the locking mechanism; it's there to indicate when some process successfully acquired the lock(s). If, after acquiring the locks, you see the busy flag is already set, it means some other process got the locks and then crashed, leaving the shared memory it was in the middle of writing to in an incomplete state. In that case the thread now in possess of the lock gets the job of re-initializing the shared memory to a usable state. I probably should have called it a "memoryBeingModified" flag.
No variation of "tryLock" is going to be permissible. Polling is absolutely out of the question in this application. Threads that need to modify shared memory may only block on the locks (which are never held long) and have to take their turn as soon as the lock is available to them. They have to experience the minimum possible delay.
You can just
//always returns true unless something horrible happened
bool lock()
{
if (pthread_mutex_lock(&local_mutex)==0)
{
if (lockf(global_fd, F_LOCK, 0))
return true;
pthread_mutex_unlock(&local_mutex);
}
return false;
}
void unlock()
{
lockf(global_fd, F_ULOCK, 0);
pthread_mutex_unlock(&local_mutex);
}
This seems pretty straightforward to me, and I wouldn't feel too bad about using 2 levels of lock -- the pthread_mutex is quite fast and consumes almost no resources.
The simple answer is, there's no good solution. On AIX, lockf turns out to be extremely slow, for no good reason. But mutexes in shared memory, while very fast on any platform, are fragile (anyone can crash while holding the lock and there's no recovery for that.) It would be nice is posix defined a "this mutex is held by a thread/process that died ", but it doesn't and even if there was such an error code, there's no way to repair things and continue. Using shared memory with multiple readers and writers continues to be the wild west.

Synchronization of threads slows down a multithreaded application

I have a multithreaded application written in c#. What i noticed is that implementing thread synchronization with lock(this) method slows down the application by 20%. Is that an expected behavior or should i look into the implementation closer?
Locking does add some overhead, that can't be avoided. It is also very likely that some of your threads now will be waiting on resources to be released, rather than just grabbing them when they feel like. If you implemented thread synchronization correctly, then that is a good thing.
But in general, your question can't be answered without intimate knowledge about the application. 20 % slowdown might be OK, but you might be locking too broadly, and then the program would (in general) be slower.
Also, please dont use lock(this). If your instance is passed around and someone else locks on the reference, you will have a deadlock. Best practice is to lock on a private object that noone else can access.
Depending on how coarse or granular your lock() statements are, you can indeed impact the performance of your MT app. Only lock things you really know are supposed to be locked.
Any synchronization will slow down multithreading.
That being said, lock(this) is really never a good idea. You should always lock on a private object used for nothing but synchronization when possible.
Make sure to keep your locking to a minimum, and only hold the lock for as short of a time as possible. This will help keep the "slowdown" to a minimum.
There are performance counters you can monitor in Windows to see how much time your application spends contending for locks.

Is a lock (threading) atomic?

This may sound like a stupid question, but if one locks a resource in a multi-threaded app, then the operation that happens on the resource, is that done atomically?
I.E.: can the processor be interrupted or can a context switch occur while that resource has a lock on it? If it does, then nothing else can access this resource until it's scheduled back in to finish off it's process. Sounds like an expensive operation.
The processor can very definitely still switch to another thread, yes. Indeed, in most modern computers there can be multiple threads running simultaneously anyway. The locking just makes sure that no other thread can acquire the same lock, so you can make sure that an operation on that resource is atomic in terms of that resource. Code using other resources can operate completely independently.
You should usually lock for short operations wherever possible. You can also choose the granularity of locks... for example, if you have two independent variables in a shared object, you could use two separate locks to protect access to those variables. That will potentially provide better concurrency - but at the same time, more locks means more complexity and more potential for deadlock. There's always a balancing act when it comes to concurrency.
You're exactly right. That's one reason why it's so important to lock for short period of time. However, this isn't as bad as it sounds because no other thread that's waiting on the lock will get scheduled until the thread holding the lock releases it.
Yes, a context switch can definitely occur.
This is exactly why when accessing a shared resource it is important to lock it from another thread as well. When thread A has the lock, thread B cannot access the code locked.
For example if two threads run the following code:
1. lock(l);
2. -- change shared resource S here --
3. unlock(l);
A context switch can occur after step 1, but the other thread cannot hold the lock at that time, and therefore, cannot change the shared resource. If access to the shared resource on one of the threads is done without a lock - bad things can happen!
Regarding the wastefulness, yes, it is a wasteful method. This is why there are methods that try to avoid locks altogether. These methods are called lock-free, and some of them are based on strong locking services such as CAS (Compare-And-Swap) or others.
No, it's not really expensive. There are typically only two possibilities:
1) The system has other things it can do: In this case, the system is still doing useful work with all available cores.
2) The system doesn't have anything else to do: In this case, the thread that holds the lock will be scheduled. A sane system won't leave a core unused while there's a ready-to-run thread that's not scheduled.
So, how can it be expensive? If there's nothing else for the system to do that doesn't require acquiring that lock (or not enough other things to occupy all cores) and the thread holding the lock is not ready-to-run. So that's the case you have to avoid, and the context switch or pre-empt issue doesn't matter (since the thread would be ready-to-run).

fork in multi-threaded program

I've heard that mixing forking and threading in a program could be very problematic, often resulting with mysterious behavior, especially when dealing with shared resources, such as locks, pipes, file descriptors. But I never fully understand what exactly the dangers are and when those could happen. It would be great if someone with expertise in this area could explain a bit more in detail what pitfalls are and what needs to be care when programming in a such environment.
For example, if I want to write a server that collects data from various different resources, one solution I've thought is to have the server spawns a set of threads, each popen to call out another program to do the actual work, open pipes to get the data back from the child. Each of these threads responses for its own work, no data interexchange in b/w them, and when the data is collected, the main thread has a queue and these worker threads will just put the result in the queue. What could go wrong with this solution?
Please do not narrow your answer by just "answering" my example scenario. Any suggestions, alternative solutions, or experiences that are not related to the example but helpful to provide a clean design would be great! Thanks!
The problem with forking when you do have some threads running is that the fork only copies the CPU state of the one thread that called it. It's as if all of the other threads just died, instantly, wherever they may be.
The result of this is locks aren't released, and shared data (such as the malloc heap) may be corrupted.
pthread does offer a pthread_atfork function - in theory, you could take every lock in the program before forking, release them after, and maybe make it out alive - but it's risky, because you could always miss one. And, of course, the stacks of the other threads won't be freed.
It is really quite simple. The problems with multiple threads and processes always arise from shared data. If there is not shared data then there can be no possible issues arising.
In your example the shared data is the queue owned by the main thread - any potential contention or race conditions will arise here. Typical methods for "solving" these issues involve locking schemes - a worker thread will lock the queue before inserting any data, and the main thread will lock the queue before removing it.

Why lock may become a bottleneck of multithreaded program?

Why lock may become a bottleneck of multithreaded program?
If I want my queue frequently pop() and push() by multithread,
which lock should I use?
The lock you use depends on your platform but will generally be some flavour of mutex. On windows, you would use a critical section and in .NET, you'd use a monitor. I'm not very familiar with locking mechanisms on other platforms. I'd stay away from lock free approaches. They are very difficult to program correctly and the performance gains are often not as great as you would expect.
Locks become a bottleneck in your program when they are under heavy contention. That is, a very large number of threads all try to acquire the lock at the same time. This wastes a lot of CPU cycles as threads become blocked and the OS spends a greater and greater portion of its time switching between threads. This sort of problem most frequently manifests itself in the server world. For desktop applications, it's rare that locks will cause a performance issue.
"Why lock may become a bottleneck of multithreaded program?" - think of a turnstile (also called a baffle gate), which only lets one person through at a time, with a crowd of people waiting to go through it.
For a queue, use the simplest lock your environment has to offer.
For a queue, it is easy to write a lock-free implementation (google away)
Locks are bottlenecks because they force all other threads which encounter them to stop doing what they're doing and wait for the lock to open, thus wasting time. One of the ideas behind multithreading is to use as many processors as possible at any given time. By forcing threads to wait on the locks the application essentially gives up processing power which it might have used.
"Why lock may become a bottleneck of multithreaded program?"
Because waiting threads remain blocked until shared memory is unlocked.
Suggest you read this article on "Concurrency: What Every Dev Must Know About Multithreaded Apps" http://msdn.microsoft.com/en-au/magazine/cc163744.aspx
Locks are expensive both because they require operating system calls in the middle of your algorithm and because they are hard to do properly when creating the CPU.
As a programmer, it is best to leave the locks in the middle of your data structures to the experts and instead use a good multithreaded library such as Intel's TBB
For Queues, you would want to use Atomic instructions (hard) or a spinlock (easier) if possible because they are cheap compared to a mutex. Use a mutex if you are doing a lot of work that needs to be locked, i.e modify a complex tree structure
In the threading packages that I'm familiar with, your options for mutexes are recursive and non-recursive. You should opt for non-recursive -- all of your accesses will be lock(); queue_op(); unlock(), so there's no need to be able to acquire the lock twice.

Resources