How does Tensorflow ensure thread-safety, for example in QueueRunner?

How does Tensorflow ensure thread-safety, for example in QueueRunner? - multithreading

Tensorflow's session can be run under multiple threads at the same time.
For example, the QueueRunner runs the enqueue operation with several threads simultaneously.
But there is little document talking about thread safety in Tensorflow.
Can anyone give any details about it? For example, how does Tensorflow ensure thread safety when multiple threads run enqueue simultaneously? Thank you very much.

Related

Parrallelism and concurrency in multiprocess and multithreading

Hello I'm studying Operating Sysyem. I recognize the difference between parrallelism and concurrency but I still wonder at some point so I wanna get some help thank you!
What I know is that threads are parallel in multiThreading but there are contexts switching among threads. How does it possible? Does that happen when they approach to shared resources?
In case of 4cores 8threads. Are processes running parrallel or concurrently? If they run concurrently, processes switch each other but only 2 threads are running at once at any time in cpu right?
I heard coroutine is concurrent. Which means it doesnt share any resources but how can race conditions still happen there?

Context switching is required, because usually we execute more threads than we have CPU cores. And yes, this often happens when the currently running thread has to wait for something (I guess this is what you meant by "shared resource"), then the thread goes to sleep and another is awaken. But context switching could also happen at almost any time - OS scheduler is responsible for sharing CPU cores between threads in the way that one or few threads don't consume all resources.
"Do processes are running parrallel or concurrent?" - processes or rather threads? Threads run both concurrently and in parallel, but the parallelism is limited to 4. And no, there is only one thread running in a specific core at a time (I ignore hyper-threading here).
I have no idea what do you mean here. Coroutines are concurrent in a similar way to threads and the resource sharing pose a similar challenge when using them.

how are the multiprocessing and threading and thread pooling working

https://code.tutsplus.com/articles/introduction-to-parallel-and-concurrent-programming-in-python--cms-28612
From this link I have studied, I have few questions
Q1 : How thread pool (Concurrent) and threading are different here? why do we see the performance improvement. Threading with Que is having 4 threads and each runs cooperatively during the idle time and picks the item from the Que once they get website response. As i see, the thread pool is also in a way doing the same. completing its work and waiting for the manager to assign a task; which is very similar to picking a new item from the Que. I'm not sure how this is different and why i see the perfroamcne improvment. Seems i'm wrong in interpreting the poling here. Could you expalin
Q2 : Question 2 : using multiprocessing the time taken is more. If I have multiprocessor which can handle multiple processes at a time, then all my 4 processes should be handled by it at a time. That is the real parallelization is happening. Also, I have a question here - in such case since 4 processes are running same function doesn't GIL try to stop them executing the same piece of code. Lets suppose all of them share a common variable that gets updated - like number of websites checked. So how does GIL work in these cases of multiprocessing?
Also, here are the same processes used again and again or they get killed and created every time after their job - I think same processes are used. Also, I think that the performance problem is because of the process creation compared to light weight threads at the concurrent threading phase - which is costly. So could you explain more in detail how the GIL is working here and process are running, are they running cooperatively (like each process wait for its turn - like threads in a process do). Or are these processes using the multiprocessors to run really parallel. Also, my other question is If I have a 8 core machine, I think I can run 8 threads of a same process simultaneously or parallel. if I have the 8 core machine can I run 2 processes with 4 threads each? can I run 8 processes on 8 cores? I think cores are only for threads of a process, which means I cant run the 8 process on 8 cores but I can run as many number of processes as many CPU's or multiprocessor system is mine, am i right? So can I run 2 processes with 4 threads each? on my 8 core machine with 2 multiprocessors and each processor having 4 cores each?

Python has a rich set of libraries for multitasking with Processes and Threads. However, there is overlap between the libraries, the choice depends on how abstractly you view the computational tasks. For example, the concurrent.futures library views threads as asynchronous tasks, while the Threading library deals with them as high-level threads. Further, the _thread implements a low-level interface for threading exposing all the synchronization mechanisms.
The GIL(Global Interpreter Lock) is just a synchronization primitive, specifically a mutex which prevents multiple threads of the same process from executing Python bytecode fragments(for certain objects which need to remain consistent with concurrent operations). This is exactly why Python threads excel with I/O operations in terms of speed when compared to compute intensive tasks.(owing to the fact that the GIL is released in case of certain blocking calls/computationally intensive libraries such as numpy). Note that only CPython and Pypy versions of Python are constrained by the GIL mechanism.
Now, let's see those questions...
How thread pool (Concurrent) and threading are different here? Why do we see the performance improvement?
Coming to the comparison between Threading and concurrent.futures.ThreadPoolExecutor (aka threading_squirrel vs future_squirrel), I've executed both programs with the same test case. There are two factors that contribute to this "performance improvement":
Network HEAD requests: Remember that network operations need not complete in the same time period every time you execute them... due to the very nature of packet transfer delays...
Order of thread execution: In the website you've linked, the author creates all threads initially, sets up the queue full of website links and then starts all of them in a list comprehension loop. In ThreadPoolExecutor of concurrent.futures, each time a task is submitted, a thread is assigned to it if the predefined maximum number of threads/workers have not been reached. I've changed the code to mirror this technique. It seems to give a speedup as the first thread begins work early on and doesn't need to wait for the queue to be filled up...
How does GIL work in these cases of multiprocessing?
Remember that the GIL comes into effect for threads of a process only, not among processes. GIL locks up the whole interpreter bytecode during a thread of execution, so the other threads have to wait for their turn. This is the reason multiprocessing used processes instead of threads, as each process has it's own interpreter and consequently, it's own GIL.
Are the same processes used again and again or they get killed and created every time after their job?
The concept of pooling is to reduce the overhead of creating and destroying workers(be it threads or processes) during computation. However, the processes are kind of "brand new" in the sense that the library effectively asks the OS to perform a fork in an UNIX based OS or spawn in an NT based OS...
Also, are the processes running co-operatively?
Maybe. They have to run in co-operation if they use shared memory...(need not be running together). There is definitely going to be a context switch if there are more processes than the OS can allocate to its processors' cores. They can run in parallel if there's no shared memory updates to make.
If I have the 8 core machine can I run 2 processes with 4 threads each? Can I run 8 processes on 8 cores?
Sure (subject to the GIL, in Python). Each process can be allocated to each processing unit for execution. A processing unit can be a physical or a virtual core of a CPU. As long as the OS scheduler supports it, it's possible. Any reasonable split up of processes and threads are possible. If all are allocatable, that's the best situation, else you will encounter context switches...(which are more expensive when it comes to processes)
Hope I've answered all those questions!
Here are a few resources:
MultiCore CPUs, Multithreading and context switching?
Why does multiprocessing use only a single core after I import numpy?
Bonus celery-squirrel resource

What do classes tf.train.Coordinator and class tf.train.QueueRunner do in tensorflow?

I understand that both classes deal with threads. According to the documentation, tf.train.Coordinator coordinates the termination of a set of threads and tf.train.QueueRunner holds a list of enqueue operations for a queue, each to be run in a thread.
However, what is their role in simple words? When are they necessary during the training?

QueueRunner:
When TensorFlow is reading the input, it needs to maintain multiple queues for it. The queue serves all the workers that are responsible for executing the training step. We use a queue because we want to have the inputs ready for the workers to operate on. If you don't have a queue, you will be blocked on I/O and performance will degrade.
Coordindator:
This is part of tf.train.Supervisor. It's necessary because you need a controller to maintain the set of threads (know when main thread should terminate, request stopping of sub-threads, etc).
Hope this helps.

TPL Tasks, Threads, etc

Could someone clear up to me how these things correlate:
Task
Thread
ThreadPool's thread
Paraller.For/ForEach/Invoke
I.e. when I create a Task and run it, where does it get a thread to execute on? And when I call Parallel.* what is really going on under the covers?
Any links to articles, blogposts, etc are also very welcomed!

The ideal state of a system is to have 1 actively running thread per CPU core. By defining work in more general terms of "tasks", the TPL can dynamically decide how many threads to use and which tasks to do on each one in order to come closest to achieving that ideal state. These are decisions that are almost always best made dynamically at runtime because when writing the code you can't know for sure how many CPU cores will be available to your application, how busy they are with other work, etc.

Thread: is a real OS thread, has handle and ID.
ThreadPool: is a collection of already-created OS Threads. These threads are owned/maintained by the runtime, and your code is only allowed to "borrow" them for a while, you can only do work short-termed work in these threads, and you can't modify any thread state, nor delete these threads.
Best guesses on these two:
Task: might get run on a pre-created thread in the thread pool, or might get run as part of user-mode scheduling, this is all depending on what the runtime thinks is best. Another guess: with TPL, the user-mode scheduling is NOT based on OS Fibers, but is its own complete (and working) implementation).
Parallel.For: actually, no clue how this is implemented. The runtime might create new threads to do the parallel bits, or much more likely use the thread pool's threads for the parallelism.

Mutithreading thread control

How do I control the number of threads that my program is working on?
I have a program that is now ready for mutithreading but one problem is that the program is extremely memory intensive and i have to limit the number of threads running so that i don't run out of ram. The main program goes through and creates a whole bunch of handles and associated threads in suspended state.
I want the program to activate a set number of threads and when one thread finishes, it will automatically unsuspended the next thread in line until all the work has been completed. How do i do this?
Someone has once mentioned something about using a thread handler, but I can't seem to find any information about how to write one or exactly how it would work.
If anyone can help, it would be greatly appreciated.
Using windows and visual c++.
Note: i don't need to worry about the traditional problems of access with the threads, each one is completely independent of each other, its more of like batch processing rather than true mutithreading of a program.
Thanks,
-Faken

Don't create threads explicitly. Create a thread pool, see Thread Pools and queue up your work using QueueUserWorkItem. The thread pool size should be determined by the number of hardware threads available (number of cores and ratio of hyperthreading) and the ratio of CPU vs. IO your work items do. By controlling the size of the thread pool you control the number of maximum concurrent threads.

A Suspended thread doesn't use CPU resources, but it still consumes memory, so you really shouldn't be creating more threads than you want to run simultaneously.
It is better to have only as many threads as your maximum number of simultaneous tasks, and to use a queue to pass units of work to the pool of worker threads.
You can give work to the standard pool of threads created by Windows using the Windows Thread Pool API.
Be aware that you will share these threads and the queue used to submit work to them with all of the code in your process. If, for some reason, you don't want to share your worker threads with other code in your process, then you can create a FIFO queue, create as many threads as you want to run simultaneously and have each of them pull work items out of the queue. If the queue is empty they will block until work items are added to the queue.

There is so much to say here.
There are a few ways
You should only create as many thread handles as you plan on running at the same time, then reuse them when they complete. (Look up thread pool).
This guarantees that you can never have too many running at the same time. This raises the question of funding out when a thread completes. You can have a callback be called just before a thread terminates where a parameter in that callback is the thread handle that just finished. Use Boost bind and boost signals for that. When the callback is called, look for another task for that thread handle and restart the thread. That way all you have to do is add to the "tasks to do" list and the callback will remove the tasks for you. No polling needed, and no worries about too many threads.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string