Do I need to lock object when reading from it? - multithreading

I am writing a program where there is an object shared by multiple threads:
A) Multiple write threads write to the object (all running the same
function)
B) A read thread which accesses the object every 5 seconds
C) A read thread which accesses the object there is a user request
It is obviously necessary to lock the object when writing to it, as we do not want multiple threads to write to the object at the same time.
My questions are:
Is it also necessary to lock the object when reading from it?
Am I correct to think that if we just lock the object when writing, a critical section is enough; but if we lock the object when reading or writing, a mutex is necessary?
I am asking this question because in Microsoft Office, it is not possible for two instances of Word to access a document in read/write access mode; but while the document is being opened in read/write mode, it is possible to open another instance of Word to access the document in read only mode. Would the same logic apply in threading?

As Ofir already wrote - if you try to read data from an object that some other thread is modyfying - you could get data in some inconsistent state.
But - if you are sure the object is not being modified, you can of course read it from multiple threads. In general, the question you are asking is more or less the Readers-writers problem - see http://en.wikipedia.org/wiki/Readers-writers_problem
Lastly - a critical section is an abstract term and can be implemented using a mutex or a monitor. The syntax sugar for a critical section in java or C# (synchronized, lock) use a monitor under the covers.

Is it also necessary to lock the object when reading from it?
If something else could write to it at the same time - yes. If only another read could occur - no. In your circumstances, I would say - yes.
Am I correct to think that if we just lock the object when writing, a
critical section is enough; but if we
lock the object when reading or
writing, a mutex is necessary?
No, you can use a critical section for both, other things being equal. Mutexes have added features over sections (named mutexes can be used from multiple processes, for example), but I don't think you need such features here.

It is necessary, because otherwise (unless operations are atomic) you may be reading an intermediate state.
You may want to allow multiple readers at the same time which requires a (bit) more complex kind of lock.

depends on how you use and read it. if your read is atomic (i.e, won't be interrupted by write) and the read thread does not have dependency with the write threads, then you maybe able to skip read lock. But if your 'read' operation takes some time and takes heavy object interation, then you should lock it for read.
if your reading does not take a very long time (i.e., won't delay the write threads too long), critical section should be enough.

locking is only needed when two processes can change the same database table elements.
when you want to read data it is always secure. you read data of a consistent database. the process changing the data has a shadow version which is consistent and will override current data when you save it. but if you are running a reading process which is depending on critical value from database elements you should look for locks which indicates those values are likely to be altered. so your reading is delayed until the lock is gone.

Related

Do I need always lock before read even if very occasionally write to the memory?

I have a shared data structure that is read in one thread and modified in another thread. However, its data changes very occasionally. Most of time, it is read by the thread. I now have a Mutex (or RW lock) locked before read/write and unlocked after read/write.
Because the data rarely changes, lock-unlock every time it is read seems inefficient. If no change is made to the data, I can get rid of the lock because only read to the same structure can run simultaneously without lock.
My question is:
Is there a lock-free solution that allows me changes the data without a lock?
Or, the lock-unlock in read (one thread, in other words, no contention) don't take much of time/resources (no enter to the kernel) at all?
If there's no contention, not kernel call is needed, but still atomic lock acquisition is needed. If the resource is occupied for a short period of time, then spinning can be attempted before kernel call.
Mutex and RW lock implementations, such as (an usual quality implementation of) std::mutex / std::shared_mutex in C++ or CRITICAL_SECTION / SRW_LOCK in Windows already employ above mentioned techniques on their own. Linux mutexes are usually based on futex, so they also avoid kernel call when it its not needed. So you don't need to bother about saving a kernel call yourself.
And there are alternatives to locking. There are atomic types that can be accessed using lock-free reads and writes, they can always avoid lock. There are other patterns, such as SeqLock. There is transaction memory.
But before going there, you should make sure that locking is performance problem. Because use of atomics may be not simple (although it is simple for some languages and simple cases), and other alternatives have their own pitfalls.
An uncontrolled data race may be dangerous. Maybe not. And there may be very thin boundary between cases where it is and where it is not. For example, copying a bunch of integer could only result in garbage integers occasionally obtained, if integers are properly sized and aligned, then there may be only a mix up, but not garbage value of a single integer, and if you add some more complex type, say string, you may have a crash. So most of the times uncontrolled data race is treated as Undefined Behavior.

Is it safe to update an object in a thread without locks if other threads will not access it?

I have a vector of entities. At update cycle I iterate through vector and update each entity: read it's position, calculate current speed, write updated position. Also, during updating process I can change some other objects in other part of program, but each that object related only to current entity and other entities will not touch that object.
So, I want to run this code in threads. I separate vector into few chunks and update each chunk in different threads. As I see, threads are fully independent. Each thread on each iteration works with independent memory regions and doesn't affect other threads work.
Do I need any locks here? I assume, that everything should work without any mutexes, etc. Am I right?
Short answer
No, you do not need any lock or synchronization mechanism as your problem appear to be a embarrassingly parallel task.
Longer answer
A race conditions that can only appear if two threads might access the same memory at the same time and at least one of the access is a write operation. If your program exposes this characteristic, then you need to make sure that threads access the memory in an ordered fashion. One way to do it is by using locks (it is not the only one though). Otherwise the result is UB.
It seems that you found a way to split the work among your threads s.t. each thread can work independently from the others. This is the best case scenario for concurrent programming as it does not require any synchronization. The complexity of the code is dramatically decreased and usually speedup will jump up.
Please note that as #acelent pointed out in the comment section, if you need changes made by one thread to be visible in another thread, then you might need some sort of synchronization due to the fact that depending on the memory model and on the HW changes made in one thread might not be immediately visible in the other.
This means that you might write from Thread 1 to a variable and after some time read the same memory from Thread 2 and still not being able to see the write made by Thread 1.
"I separate vector into few chunks and update each chunk in different threads" - in this case you do not need any lock or synchronization mechanism, however, the system performance might degrade considerably due to false sharing depending on how the chunks are allocated to threads. Note that the compiler may eliminate false sharing using thread-private temporal variables.
You can find plenty of information in books and wiki. Here is some info https://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads
Also there is a stackoverflow post here does false sharing occur when data is read in openmp?

Deciding the critical section of kernel code

Hi I am writing kernel code which intends to do process scheduling and multi-threaded execution. I've studied about locking mechanisms and their functionality. Is there a thumb rule regarding what sort of data structure in critical section should be protected by locking (mutex/semaphores/spinlocks)?
I know that where ever there is chance of concurrency in part of code, we require lock. But how do we decide, what if we miss and test cases don't catch them. Earlier I wrote code for system calls and file systems where I never cared about taking locks.
Is there a thumb rule regarding what sort of data structure in critical section should be protected by locking?
Any object (global variable, field of the structure object, etc.), accessed concurrently when one access is write access requires some locking discipline for access.
But how do we decide, what if we miss and test cases don't catch them?
Good practice is appropriate comment for every declaration of variable, structure, or structure field, which requires locking discipline for access. Anyone, who uses this variable, reads this comment and writes corresponded code for access. Kernel core and modules tend to follow this strategy.
As for testing, common testing rarely reveals concurrency issues because of their low probability. When testing kernel modules, I would advice to use Kernel Strider, which attempts to prove correctness of concurrent memory accesses or RaceHound, which increases probability of concurrent issues and checks them.
It is always safe to grab a lock for the duration of any code that accesses any shared data, but this is slow since it means only one thread at a time can run significant chunks of code.
Depending on the data in question though, there may be shortcuts that are safe and fast. If it is a simple integer ( and by integer I mean the native word size of the CPU, i.e. not a 64 bit on a 32 bit cpu ), then you may not need to do any locking: if one thread tries to write to the integer, and the other reads it at the same time, the reader will either get the old value, or the new value, never a mix of the two. If the reader doesn't care that he got the old value, then there is no need for a lock.
If however, you are updating two integers together, and it would be bad for the reader to get the new value for one and the old value for the other, then you need a lock. Another example is if the thread is incrementing the integer. That normally involves a read, add, and write. If one reads the old value, then the other manages to read, add, and write the new value, then the first thread adds and writes the new value, both believe they have incremented the variable, but instead of being incremented twice, it was only incremented once. This needs either a lock, or the use of an atomic increment primitive to ensure that the read/modify/write cycle can not be interrupted. There are also atomic test-and-set primitives so you can read a value, do some math on it, then try to write it back, but the write only succeeds if it still holds the original value. That is, if another thread changed it since the time you read it, the test-and-set will fail, then you can discard your new value and start over with a read of the value the other thread set and try to test-and-set it again.
Pointers are really just integers, so if you set up a data structure then store a pointer to it where another thread can find it, you don't need a lock as long as you set up the structure fully before you store its address in the pointer. Another thread reading the pointer ( it will need to make sure to read the pointer only once, i.e. by storing it in a local variable then using only that to refer to the structure from then on ) will either see the new structure, or the old one, but never an intermediate state. If most threads only read the structure via the pointer, and any that want to write do so either with a lock, or an atomic test-and-set of the pointer, this is sufficient. Any time you want to modify any member of the structure though, you have to copy it to a new one, change the new one, then update the pointer. This is essentially how the kernel's RCU ( read, copy, update ) mechanism works.
Ideally, you must enumerate all the resources available in your system , the related threads and communication, sharing mechanism during design. Determination of the following for every resource and maintaining a proper check list whenever change is made can be of great help :
The duration for which the resource will be busy (Utilization of resource) & type of lock
Amount of tasks queued upon that particular resource (Load) & priority
Type of communication, sharing mechanism related to resource
Error conditions related to resource
If possible, it is better to have a flow diagram depicting the resources, utilization, locks, load, communication/sharing mechanism and errors.
This process can help you in determining the missing scenarios/unknowns, critical sections and also in identification of bottlenecks.
On top of the above process, you may also need certain tools that can help you in testing / further analysis to rule out hidden problems if any :
Helgrind - a Valgrind tool for detecting synchronisation errors.
This can help in identifying data races/synchronization issues due
to improper locking, the lock ordering that can cause deadlocks and
also improper POSIX thread API usage that can have later impacts.
Refer : http://valgrind.org/docs/manual/hg-manual.html
Locksmith - For determining common lock errors that may arise during
runtime or that may cause deadlocks.
ThreadSanitizer - For detecting race condtion. Shall display all accesses & locks involved for all accesses.
Sparse can help to lists the locks acquired and released by a function and also identification of issues such as mixing of pointers to user address space and pointers to kernel address space.
Lockdep - For debugging of locks
iotop - For determining the current I/O usage by processes or threads on the system by monitoring the I/O usage information output by the kernel.
LTTng - For tracing race conditions and interrupt cascades possible. (A successor to LTT - Combination of kprobes, tracepoint and perf functionalities)
Ftrace - A Linux kernel internal tracer for analysing /debugging latency and performance related issues.
lsof and fuser can be handy in determining the processes having lock and the kind of locks.
Profiling can help in determining where exactly the time is being spent by the kernel. This can be done with tools like perf, Oprofile.
The strace can intercept/record system calls that are called by a process and also the signals that are received by a process. It shall show the order of events and all the return/resumption paths of calls.

When should I use Shared Locking (Read Locking)

Can someone explain when shared locks should be used. If I understand correctly, shared locks are used when reading and exclusive locks are used while writing.
But why can't I just wait while a mutex is locked when doing a read.
It is for improving performance. Multiple concurrent reads then won't have to happen sequentially, which may be a great bonus if the structure is read frequently. (But still the data read will be consistent and up to date.)
But why can't I just wait while a mutex is locked when doing a read.
Usually for speed. Shared locks allow multiple readers, as the content is not changing. Exclusive lock allows only a single (typically) write operation as you want all the writes to be atomic.
More technical definitions from here.
Exclusive locks protect updates to file resources, both recoverable
and non-recoverable. They can be owned by only one transaction at a
time. Any transaction that requires an exclusive lock must wait if
another task currently owns an exclusive lock or a shared lock against
the requested resource. Shared locks
Shared locks support read integrity. They ensure that a record is not
in the process of being updated during a read-only request. Shared
locks can also be used to prevent updates of a record between the time
that a record is read and the next syncpoint.
A shared locks (also known as read locks) prohibits any other process from requesting a write lock on the specified part of the file. However, other processes can request read locks and keep reading from the resource.
An exclusive lock (also known as write lock) gives a process an exclusive access for writing to the specified part of the file and will prevent any other process to acquire any kind of locking to the resource until the exclusive lock is released.
So a read lock says "you can read now but if you want to write you'll have to wait" whereas a write lock says "you'll have to wait".
For an official description, check out the official locks documentation of GNU C library.
In addition, in many cases, the context of shared/exclusive locking is pessimistic/optimistic locking which is a methodology used to handle multi-users read/write to the same resource. Here is an explanation of the methodology.
But why can't I just wait while a mutex is locked when doing a read.
Because that would be inefficient in certain scenarios, specifically those where a lot of threads are reading a data structure often, but very few are writing and not very often. Since multiple concurrent reads are thread-safe if no one is writing, it would be a waste to have mutual exclusion for the readers.
Imagine a server and multiple clients doing various transactions on some shared data. If most of these clients are simply asking for information, but not changing anything, the server would have horrible performance if it only allowed one client to read at a time.

Real World Examples of read-write in concurrent software

I'm looking for real world examples of needing read and write access to the same value in concurrent systems.
In my opinion, many semaphores or locks are present because there's no known alternative (to the implementer,) but do you know of any patterns where mutexes seem to be a requirement?
In a way I'm asking for candidates for the standard set of HARD problems for concurrent software in the real world.
What kind of locks are used depends on how the data is being accessed by multiple threads. If you can fine tune the use case, you can sometimes eliminate the need for exclusive locks completely.
An exclusive lock is needed only if your use case requires that the shared data must be 100% exact all the time. This is the default that most developers start with because that's how we think about data normally.
However, if what you are using the data for can tolerate some "looseness", there are several techniques to share data between threads without the use of exclusive locks on every access.
For example, if you have a linked list of data and if your use of that linked list would not be upset by seeing the same node multiple times in a list traversal and would not be upset if it did not see an insert immediately after the insert (or similar artifacts), you can perform list inserts and deletes using atomic pointer exchange without the need for a full-stop mutex lock around the insert or delete operation.
Another example: if you have an array or list object that is mostly read from by threads and only occasionally updated by a master thread, you could implement lock-free updates by maintaining two copies of the list: one that is "live" that other threads can read from and another that is "offline" that you can write to in the privacy of your own thread. To perform an update, you copy the contents of the "live" list into the "offline" list, perform the update to the offline list, and then swap the offline list pointer into the live list pointer using an atomic pointer exchange. You will then need some mechanism to let the readers "drain" from the now offline list. In a garbage collected system, you can just release the reference to the offline list - when the last consumer is finished with it, it will be GC'd. In a non-GC system, you could use reference counting to keep track of how many readers are still using the list. For this example, having only one thread designated as the list updater would be ideal. If multiple updaters are needed, you will need to put a lock around the update operation, but only to serialize updaters - no lock and no performance impact on readers of the list.
All the lock-free resource sharing techniques I'm aware of require the use of atomic swaps (aka InterlockedExchange). This usually translates into a specific instruction in the CPU and/or a hardware bus lock (lock prefix on a read or write opcode in x86 assembler) for a very brief period of time. On multiproc systems, atomic swaps may force a cache invalidation on the other processors (this was the case on dual proc Pentium II) but I don't think this is as much of a problem on current multicore chips. Even with these performance caveats, lock-free runs much faster than taking a full-stop kernel event object. Just making a call into a kernel API function takes several hundred clock cycles (to switch to kernel mode).
Examples of real-world scenarios:
producer/consumer workflows. Web service receives http requests for data, places the request into an internal queue, worker thread pulls the work item from the queue and performs the work. The queue is read/write and has to be thread safe.
Data shared between threads with change of ownership. Thread 1 allocates an object, tosses it to thread 2 for processing, and never wants to see it again. Thread 2 is responsible for disposing the object. The memory management system (malloc/free) must be thread safe.
File system. This is almost always an OS service and already fully thread safe, but it's worth including in the list.
Reference counting. Releases the resource when the number of references drops to zero. The increment/decrement/test operations must be thread safe. These can usually be implemented using atomic primitives instead of full-stop kernal mutex locks.
Most real world, concurrent software, has some form of requirement for synchronization at some level. Often, better written software will take great pains to reduce the amount of locking required, but it is still required at some point.
For example, I often do simulations where we have some form of aggregation operation occurring. Typically, there are ways to prevent locking during the simulation phase itself (ie: use of thread local state data, etc), but the actual aggregation portion typically requires some form of lock at the end.
Luckily, this becomes a lock per thread, not per unit of work. In my case, this is significant, since I'm typically doing operations on hundreds of thousands or millions of units of work, but most of the time, it's occuring on systems with 4-16 PEs, which means I'm usually restricting to a similar number of units of execution. By using this type of mechanism, you're still locking, but you're locking between tens of elements instead of potentially millions.

Resources