What does "readback" mean in terms of computer memory?

What does "readback" mean in terms of computer memory? - multithreading

I am messing with multiple threads accessing a resource (probably memory). What does "readback" mean in this context?
Any guides will be helpful... Google didn't give me any good results.

I can think of several possible meanings for "readback". Here's the most likely; in a multithreaded environment, a lot can happen between your thread reading a value from memory and writing a changed value back to that memory. A simple yet effective way to detect changes is simply to get the value from memory again just before writing, and if it has changed from the value you started with, you know someone else changed it while you were working.
"Readback" may also refer to "repeatable reads", in which a locking mechanism is used to ensure that within the scope of an atomic set of operations, only the thread that obtained the lock on the resource can read OR write to it, ensuring that no other thread can change the value from what would be expected by the task if it ran single-threaded. That way, a thread doesn't have to detect external changes; the locking mechanism prevents such a thing from happening.

When I've encountered that term, it's usually in the context of writing a value to
a register or memory location that may also be accessed by some other software or
hardware. To check whether someone else has changed it, you might keep a private
copy of the data you wrote, and some time later read that shared register or memory location
to compare its current value to the stored private copy. That's the "readback".

Related

Single write - single read big memory buffer sharing without locks

Let's suppose I have a big memory buffer used as a framebuffer, what is constantly written by a thread (or even multiple threads, guaranteed that no two threads write the same byte concurrently). These writes are indeterministic in time, scattered through the codebase, and cannot be blocked.
I have another single thread which periodically reads out (copies) the whole buffer for generating a display frame. This read should not be blocked, too. Tearing is not a problem in my case. In other words, my only goal is that every change done by the write thread(s) should eventually appear in the reading thread. The ordering or some (negligible compared to a display refresh rate) delay does not matter.
Reading and writing the same memory location concurrently is a data race, which results in an undefined behavior in c++11, and this article lists same really dreadful examples where the optimizing compiler generates code for a memory read that alters the memory contents in the presence of data race.
Still, I need some solution without completely redesigning this legacy code. Every advice counts what is safe from practical standpoints, independent of if it is theoretically correct or not. I am also open to not-fully-portable solutions, too.
Aside from that I have a data race, I can easily force the visibility of the buffer changes in the reading thread by establishing a synchronizes-with relation between the threads (acquire-release an atomic guard variable, used for nothing else), or by adding platform-specific memory fence calls to key points in the writer thread(s).
My ideas to target the data race:
Use assembly for the reading thread. I would try to avoid that.
Make the memory buffer volatile, thus preventing the compiler to optimize such nasty things what are described in the referenced article.
Put the reading thread's code in a separate compile unit, and compile with -O0
+1. Leave everything as is, and cross my fingers (as currently I do not notice issues) :)
What is the safest from the list above? Do you see a better solution?
FYI, the target platform is ARM (with multiple cores) and x86 (for testing).
(This question is concretizing a previous one what was a little too generic.)

Deciding the critical section of kernel code

Hi I am writing kernel code which intends to do process scheduling and multi-threaded execution. I've studied about locking mechanisms and their functionality. Is there a thumb rule regarding what sort of data structure in critical section should be protected by locking (mutex/semaphores/spinlocks)?
I know that where ever there is chance of concurrency in part of code, we require lock. But how do we decide, what if we miss and test cases don't catch them. Earlier I wrote code for system calls and file systems where I never cared about taking locks.

Is there a thumb rule regarding what sort of data structure in critical section should be protected by locking?
Any object (global variable, field of the structure object, etc.), accessed concurrently when one access is write access requires some locking discipline for access.
But how do we decide, what if we miss and test cases don't catch them?
Good practice is appropriate comment for every declaration of variable, structure, or structure field, which requires locking discipline for access. Anyone, who uses this variable, reads this comment and writes corresponded code for access. Kernel core and modules tend to follow this strategy.
As for testing, common testing rarely reveals concurrency issues because of their low probability. When testing kernel modules, I would advice to use Kernel Strider, which attempts to prove correctness of concurrent memory accesses or RaceHound, which increases probability of concurrent issues and checks them.

It is always safe to grab a lock for the duration of any code that accesses any shared data, but this is slow since it means only one thread at a time can run significant chunks of code.
Depending on the data in question though, there may be shortcuts that are safe and fast. If it is a simple integer ( and by integer I mean the native word size of the CPU, i.e. not a 64 bit on a 32 bit cpu ), then you may not need to do any locking: if one thread tries to write to the integer, and the other reads it at the same time, the reader will either get the old value, or the new value, never a mix of the two. If the reader doesn't care that he got the old value, then there is no need for a lock.
If however, you are updating two integers together, and it would be bad for the reader to get the new value for one and the old value for the other, then you need a lock. Another example is if the thread is incrementing the integer. That normally involves a read, add, and write. If one reads the old value, then the other manages to read, add, and write the new value, then the first thread adds and writes the new value, both believe they have incremented the variable, but instead of being incremented twice, it was only incremented once. This needs either a lock, or the use of an atomic increment primitive to ensure that the read/modify/write cycle can not be interrupted. There are also atomic test-and-set primitives so you can read a value, do some math on it, then try to write it back, but the write only succeeds if it still holds the original value. That is, if another thread changed it since the time you read it, the test-and-set will fail, then you can discard your new value and start over with a read of the value the other thread set and try to test-and-set it again.
Pointers are really just integers, so if you set up a data structure then store a pointer to it where another thread can find it, you don't need a lock as long as you set up the structure fully before you store its address in the pointer. Another thread reading the pointer ( it will need to make sure to read the pointer only once, i.e. by storing it in a local variable then using only that to refer to the structure from then on ) will either see the new structure, or the old one, but never an intermediate state. If most threads only read the structure via the pointer, and any that want to write do so either with a lock, or an atomic test-and-set of the pointer, this is sufficient. Any time you want to modify any member of the structure though, you have to copy it to a new one, change the new one, then update the pointer. This is essentially how the kernel's RCU ( read, copy, update ) mechanism works.

Ideally, you must enumerate all the resources available in your system , the related threads and communication, sharing mechanism during design. Determination of the following for every resource and maintaining a proper check list whenever change is made can be of great help :
The duration for which the resource will be busy (Utilization of resource) & type of lock
Amount of tasks queued upon that particular resource (Load) & priority
Type of communication, sharing mechanism related to resource
Error conditions related to resource
If possible, it is better to have a flow diagram depicting the resources, utilization, locks, load, communication/sharing mechanism and errors.
This process can help you in determining the missing scenarios/unknowns, critical sections and also in identification of bottlenecks.
On top of the above process, you may also need certain tools that can help you in testing / further analysis to rule out hidden problems if any :
Helgrind - a Valgrind tool for detecting synchronisation errors.
This can help in identifying data races/synchronization issues due
to improper locking, the lock ordering that can cause deadlocks and
also improper POSIX thread API usage that can have later impacts.
Refer : http://valgrind.org/docs/manual/hg-manual.html
Locksmith - For determining common lock errors that may arise during
runtime or that may cause deadlocks.
ThreadSanitizer - For detecting race condtion. Shall display all accesses & locks involved for all accesses.
Sparse can help to lists the locks acquired and released by a function and also identification of issues such as mixing of pointers to user address space and pointers to kernel address space.
Lockdep - For debugging of locks
iotop - For determining the current I/O usage by processes or threads on the system by monitoring the I/O usage information output by the kernel.
LTTng - For tracing race conditions and interrupt cascades possible. (A successor to LTT - Combination of kprobes, tracepoint and perf functionalities)
Ftrace - A Linux kernel internal tracer for analysing /debugging latency and performance related issues.
lsof and fuser can be handy in determining the processes having lock and the kind of locks.
Profiling can help in determining where exactly the time is being spent by the kernel. This can be done with tools like perf, Oprofile.
The strace can intercept/record system calls that are called by a process and also the signals that are received by a process. It shall show the order of events and all the return/resumption paths of calls.

Does all shared variables needs to be atomic?

I have started working to learn multicore programming. I started leaning c++11 atomics. I would like to know if all he shared variables needs to be atomic?

The only time a variable needs to be "atomic", i.e., can be updated in "one fell swoop, without any other thread being able to read it meanwhile", is if it even can be read while someone else is updating it. For instance, if it's set on initialization and then never changes, no one can ever read it while it's changing, and so it doesn't have to be atomic. On the other hand, if it ever changes after initialization and there's a risk that anyone else than the changing thread reads it while it's changing, then it needs to be atomic (atomic intrinsics or protected by mutex or otherwise)

If multiple thread accessing (read/write) the same variable then it should be atomic.
Also you can go though this.

Not necessarily for all scenarios. Also, note that atomicity of variable access alone does not guarantee full thread safety. It just ensures that the particular variable being read is got as a whole.In some architectures, the read operation does not happen in single assembly instruction. For example, if you are reading 64 bit value, the compiler might implement the read using two load instructions of assembly such that the 1st instruction reads the lower 32 bits and the 2nd instruction reads the higher 32 bits. This in turn can lead to race condition. So, atomic reads are preferred.

Do shared variables between threads always require protection ?

Lets say I have two threads reading and modifying a bool / int "state". The reads and writes are guaranteed to be atomic by the processor.
Thread 1:
if (state == ENABLED)
{
Process_Data()
}
Thread 2:
state = DISABLED
In this case yes the thread 1 can read the state and go into it's "if" to Process_Data and then Thread2 can change state. But it isn't incorrect at that point to still go on to Process_Data. Yes if we peek into the hood we have an inconsistency of state being DISABLED and us entering the Process_Data function. But after its executed the next time Thread1 executes it will get state = DISABLED and not Process_Data.
My question is do I still need a lock in both these threads to make Thread1's check-state-and-process atomic and Thread2's write atomic (wrt to Thread 1) ?

You've addressed the atomicity concerns. However, in modern processors, you have to worry not just about atomicity, but also memory visibility.
For example, thread 1 is executing on one processor, and reads ENABLED from state - from its processor's cache.
Meanwhile, thread 2 is executing on a different processor, and writes DISABLED to state on its processor's cache.
Without further code - in some languages, for example, declaring state volatile - the DISABLED value may not get flushed to main memory for a long time. It may never get flushed to main memory if thread 2 changes the value back to ENABLED eventually.
Meanwhile, even if the DISABLED value is flushed to main memory, thread 1 may never pick it up, instead continuing to use its cached value of ENABLED indefinitely.
Generally if you want to share values between threads, it's better to do so explicitly using the appropriate mechanisms for the programming language and environment that you're using.

There's no way to answer your question generically. If the specification for the language, compiler, threading library and/or platform you are using says you need protection, then you do. If it says you don't, then you don't. I believe every threading library or multi-threading implementation specifies rules for sane use and sharing of data. If yours doesn't, it's a piece of junk that is impossible to use reliably and you should get a better one.
Do not make the mistake of thinking, "This is safe because I can't think of any way it can go wrong." Or "I tested this, and I couldn't get it to fail, so it's safe." That kind of thinking produces fragile code that tends to fail when you change compiler options, upgrade your CPU, or run the program on a different platform. Follow the specifications for the tools you are using.

multithreading dispute on local variables

I had implemented a few methods that were being handled by individual background threads. I understand the complexity of doing things this way but when I tested the results it all seemed fine. Each thread accesses the same variables at times and there is maximum of 5 threads working at any given time and I guess I should have used synchlock but my question is whether there can be any way the threads could have been executing the processes without overwriting the variable contents. I was under the impression that each thread was allocated a site in memory for that variable and even though it is named the same, in memory it is a different location mapped with a specific thread, right? so if there were collisions you should be getting an error that it cannot access that variable if it were used by another thread.
Am I wrong on this?

If you are talking about local variables of a function - no, each thread has its own copy of those on its stack.
If you are talking about member variables of a class being accessed from different threads - yes, you need to protect them (unless they are read-only)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string