multithreading dispute on local variables

multithreading dispute on local variables - multithreading

I had implemented a few methods that were being handled by individual background threads. I understand the complexity of doing things this way but when I tested the results it all seemed fine. Each thread accesses the same variables at times and there is maximum of 5 threads working at any given time and I guess I should have used synchlock but my question is whether there can be any way the threads could have been executing the processes without overwriting the variable contents. I was under the impression that each thread was allocated a site in memory for that variable and even though it is named the same, in memory it is a different location mapped with a specific thread, right? so if there were collisions you should be getting an error that it cannot access that variable if it were used by another thread.
Am I wrong on this?

If you are talking about local variables of a function - no, each thread has its own copy of those on its stack.
If you are talking about member variables of a class being accessed from different threads - yes, you need to protect them (unless they are read-only)

Related

How does threading.Lock actually work? (with multiple scenarios)

I have looked online and done some searching through stackoverflow and the internet about locks and I just seem to get a general understanding that when a lock is active another thread cannot use it??
I have multiple shared objects which are being read/written constantly throughout the script and I'm still not 100% sure how the locking function really works? When do you need to use it, when do you not need to use it and is it worth creating individual locks for each shared variable/object?
When a thread calls a lock does that mean other threads will only pause at that particular part of the script where the lock was originally called or does it somehow acknowledge to stop reading/writing any variables within the acquire/release function call throughout the entire script?
If I have multiple locks specifically for each shared variable/object and one lock function is called, does this effect the rest of the locks too?
I think to summerise, I'm struggling to understand the "in-depth" version of locking, only being able to find a general overview amongst previous explanations online.

Mutexes. What even?

I am learning about computer architecture and how operating systems work. I have a few questions about how mutexes work.
Question 1
add_to_list(&list, &elem):
mutex m;
lock_mutex(m);
...
remove_from_list(&list):
mutex m;
lock_mutex(m);
...
These two functions instantiate their own mutex, which means they point to different places in memory and so one does not lock the other and effectively doesn't accomplish what we want--list to be protected.
How do we get two different functions to use the same mutex? Do we define a global variable? If so, how do you share this global variable throughout an entire program that is potentially spread throughout multiple files?
Question 2
mutex m;
modify_A():
lock_mutex(m);
A += 1;
modify_B():
lock_mutex(m);
B += 1;
These two functions modify different spaces in memory. Does that mean I need a unique mutex for each function / or piece of data? If I were to have a global mutex variable that I used for both functions, a thread calling modify_A() would block another thread trying to call modify_B()
Which brings me to my last question...
Question 3
A mutex seems like it just blocks a thread from running a piece of code until whatever thread is currently running that same code finishes. This is to create atomicity and protect the integrity of the data being used by a thread. However, the same piece of memory can be modified from many different places in a program. Which makes me think we have to use one mutex throughout an entire program, which would result in a lot of needless blocking of other threads.
Considering that pretty much every function in a given program is going to be modifying data, if we use a single mutex throughout a program, that means each function call will be blocked while that mutex is in use by another thread, even if the data it needs to access is unrelated.
Doesn't that effectively eliminate the gains from having multiple threads? If only one thread can run at a given time?
I feel like I'm totally misunderstanding how mutexes work, so please ELI5!
Thanks in advance.

Yes, you make it a global variable, or otherwise accessible to the required functions through some kind of convenience method or whatever. Global variables can be shared between translation units too, but that's language/system dependent. In C you'd just put an extern mutex m in a header that everyone shares and then define that mutex as mutex m in exactly one of your translation units.
If you don't want changes to B to block other threads from modifying A, yes, you'd use two different mutexes. If you want to lock both at the same time, you would share the mutex.
Multiple threads can run at the same time as long as no two of them are inside the critical section protected by a certain mutex at the same time. That's the whole point - everything goes on nice and parallel, but you use the mutex to serialize access to a specific resource or critical section you need protected.

You typically use a mutex to protect some particular piece of shared data. If the vast majority of your code's time is spent accessing one single piece of shared data, then you won't get much of a performance improvement from threads precisely because only one thread can safely access that piece of shared data at a time.
If you happen to fall into this situation, there are more complex techniques than mutexes. Fortunately, it's fairly rare (unless you're implementing operating systems or low-level libraries) so you can get away with using mutexes for a very large fraction of your synchronization needs.

Can a singleton object have a (thread-safe) method executing in different threads simultaneously?

If so, how will different threads share the same instance or chunk of memory that represents the object? Will different threads somehow "copy" the single-instance object method code to be run on its own CPU resources?
EDIT - to clarify this question further:
I understand that different threads can be in the "process" of executing a singleton object's method at the same time, while they might not all be actively executing - they may be waiting to be scheduled for execution by the OS and in various states of execution within the method. This question is specifically for multiple active threads that are executing each on a different processor. Can multiple threads be actively executing the same exact code path (same region of memory) at the same time?

Can a singleton object have a (thread-safe) method executing in different threads simultaneously?
Yes, of course. Note that this is no different from multiple threads running a method on the same non-singleton object instance.
how will different threads share the same instance or chunk of memory that represents the object?
(Ignoring NUMA and processor caches), the object exists in only one place in memory (hence, it's a singleton) so the different threads are all reading from the same memory addresses.
Now, if the object is immutable (and doesn't have external side-effects, like IO) then that's okay: multiple threads reading from the same memory and never changing it doesn't introduce any problems.
By analogy, this is like having a single (single-sided) piece of paper on a desk and 10 people reading it simultaneously: no problem.
If the object is mutable or (does have external side-effects like IO) then that's a problem :) You need to synchronize each thread's changes otherwise they'll be overwriting each other - this is bad.
Note that in many languages and platforms, like C#/.NET and C++, documentation can often assert or claim that a method is "thread-safe" but this is not necessarily absolutely guaranteed by the runtime (excepting the case where a function is provably "const-correct" and has no IO, which is something C++ can do, but C# cannot currently do that).
Note that just because a single method is thread-safe, doesn't mean an entire logical operation (e.g. calling multiple object methods in sequence) is thread-safe; for example, consider an Dictionary<K,V>: many threads can simultaneously call TryGetValue without issues - but as soon as one thread calls .Add then that messes up all of the other threads calling TryGetValue (because Dictionary<K,V> does not guarantee that .Add is atomic and blocks TryGetValue, this is why we use ConcurrentDictionary which has a different API - or we wrap our logical business/domain operations in lock (Monitor) statements).
Will different threads somehow "copy" the single-instance object method code to be run on its own CPU resources?
No.
You can make that happen with [ThreadLocal] and [ThreadStatic] (and [AsyncLocal]) but proponents of functional programming techniques (like myself) will strongly recommend against this for many reasons. If you want a thread to "own" something then pass the object on the stack as a parameter and not inside hidden static state.
(Also note this does not apply to value-types (e.g. struct) which are always copied between uses, but I assume you're referring to class object instances, fwiw - as .NET discourages using mutable structs anyway).

multiple threads vs reference counting: does each thread count variables separately

I've been playing around with glib, which
utilizes reference counting to manage memory for its objects;
supports multiple threads.
What I can't understand is how they play together.
Namely:
In glib each thread doesn't seem to increase refcount of objects passed on its input, AFAIK (I'll call them thread-shared objects). Is it true? (or I've just failed to find the right piece of code?) Is it a common practice not to increase refcounts to thread-shared objects for each thread, that shares them, besides the main thread (responsible for refcounting them)?
Still, each thread increases reference counts for the objects, dynamically created by itself. Should the programmer bother not to give the same names of variables in each thread in order to prevent collision of names and memory leaks? (E.g. on my picture, thread2 shouldn't crate a heap variable called output_object or it will collide with thread1's heap variable of the same name)?
UPDATE: Answer to (question 2) is no, cause the visibility scope of
those variables doesn't intersect:
Is dynamically allocated memory (heap), local to a function or can all functions in a thread have access to it even without passing pointer as an argument.
An illustration to my questions:

I think that threads are irrelevant to understanding the use of reference counters. The point is rather ownership and lifetime, and a thread is just one thing that is affected by this. This is a bit difficult to explain, hopefully I'll make this clearer using examples.
Now, let's look at the given example where main() creates an object and starts two threads using that object. The question is, who owns the created object? The simple answer is that main() and both threads share this object, so this is shared ownership. In order to model this, you should increment the refcounter before each call to pthread_create(). If the call fails, you must decrement it again, otherwise it is the responsibility of the started thread to do that when it is done with the object. Then, when main() terminates, it should also release ownership, i.e. decrement the refcounter. The general rule is that when adding an owner, increment the refcounter. When an owner is done with the object, it decrements the refcounter and the last one destroys the object with that.
Now, why does the the code not do this? Firstly, you can get away with adding the first thread as owner and then passing main()'s ownership to the second thread. This will save one increment/decrement operation. This still isn't what's happening though. Instead, no reference counting is done at all, and the simple reason is that it isn't used. The point of refcounting is to coordinate the lifetime of a dynamically allocated object between different owners that are peers. Here though, the object is created and owned by main(), the two threads are not peers but rather slaves of main. Since main() is the master that controls start/stop of the threads, it doesn't have to coordinate the lifetime of the object with them.
Lastly, though that might be due to the example-ness of your code, I think that main simply leaks the reference, relying on the OS to clean up. While this isn't beautiful, it doesn't hurt. In general, you can allocate objects once and then use them forever without any refcounting in some cases. An example for this is the main window of an application, which you only need once and for the whole runtime. You shouldn't repeatedly allocate such objects though, because then you have a significant memory leak that will increase over time. Both cases will be caught by tools like valgrind though.
Concerning your second question, concerning the heap variable name clash you expect, it doesn't exist. Variable names that are function-local can not collide. This is not because they are used by different threads, but even if the same function is called twice by the same thread (think recursion!) the local variables in each call to the function are distinct. Also, variable names are for the human reader. The compiler completely eradicates these.

UPDATE:
As matthias says below, GObject is not thread-safe, only reference counting functions are.
Original content:
GObject is supposed to be thread safe, but I've never played with that myself…

What does "readback" mean in terms of computer memory?

I am messing with multiple threads accessing a resource (probably memory). What does "readback" mean in this context?
Any guides will be helpful... Google didn't give me any good results.

I can think of several possible meanings for "readback". Here's the most likely; in a multithreaded environment, a lot can happen between your thread reading a value from memory and writing a changed value back to that memory. A simple yet effective way to detect changes is simply to get the value from memory again just before writing, and if it has changed from the value you started with, you know someone else changed it while you were working.
"Readback" may also refer to "repeatable reads", in which a locking mechanism is used to ensure that within the scope of an atomic set of operations, only the thread that obtained the lock on the resource can read OR write to it, ensuring that no other thread can change the value from what would be expected by the task if it ran single-threaded. That way, a thread doesn't have to detect external changes; the locking mechanism prevents such a thing from happening.

When I've encountered that term, it's usually in the context of writing a value to
a register or memory location that may also be accessed by some other software or
hardware. To check whether someone else has changed it, you might keep a private
copy of the data you wrote, and some time later read that shared register or memory location
to compare its current value to the stored private copy. That's the "readback".

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string