Tensorflow Multithreading: How to Share Variables Across Threads - multithreading

I am working on an a multithreaded implementation of DDPG (Deep Deterministic Policy Gradient) in TensorFlow/Keras on the CPU. Multiple worker agents (each with a copy of the environment) are being used to update a global network via worker gradients.
My question is: How do you properly update the global network via its workers and then safely run the updated global network periodically in its own copy of the environment? The intent is that I want to periodically observe/store how the global network performs over time, while the workers continue to learn/update the global network via interacting with their own copy of the environment. According to this Stackoverflow question regarding locking in TensorFlow, "Reads to the variables are not performed under the lock, so it is possible to see intermediate states and partially-applied updates". I would like to avoid the global network being partially updated/mid-update as it interacts with its environments.

Related

Concurrent file system access with golang in Kubernetes

A prerequisite to my question is the assumption, that one uses a Kubernetes cluster with multiple pods accessing a file system storage, for example Azure File (ReadWriteMany). According to Microsoft this relies on SMB3 and should be safe for concurrent use.
Within one pod GO is used (with the GIN framework) as the programming language for a server that accesses the the file system. Since requests are handled in parallel using goroutines the file system access is concurrent.
Does SMB3 also ensure correctness for concurrent ReadWrites within one pod, or do I need to manually manage file system operations with something like a Mutex or a Worker Pool?
More specifically: I want to use git2go which is a GO wrapper around libgit2 written in C. According to libgit2s threading.md it is not thread safe:
Unless otherwise specified, libgit2 objects cannot be safely accessed by multiple threads simultaneously.
Since git2go does not specify otherwise I assume, that different references to the same repository on the file system cannot be used concurrently from different goroutines.
To summarise my question:
Do network communication protocols (like SMB3) also "make" file system operations thread safe locally?
If not: how would I ensure thread safe file system access in GO? lockedfile seems to be a good option, but it is also internal.

How worker threads works in Nodejs?

Nodejs can not have a built-in thread API like java and .net
do. If threads are added, the nature of the language itself will
change. It’s not possible to add threads as a new set of available
classes or functions.
Nodejs 10.x added worker threads as an experiment and now stable since 12.x. I have gone through the few blogs but did not understand much maybe due to lack of knowledge. How are they different than the threads.
Worker threads in Javascript are somewhat analogous to WebWorkers in the browser. They do not share direct access to any variables with the main thread or with each other and the only way they communicate with the main thread is via messaging. This messaging is synchronized through the event loop. This avoids all the classic race conditions that multiple threads have trying to access the same variables because two separate threads can't access the same variables in node.js. Each thread has its own set of variables and the only way to influence another thread's variables is to send it a message and ask it to modify its own variables. Since that message is synchronized through that thread's event queue, there's no risk of classic race conditions in accessing variables.
Java threads, on the other hand, are similar to C++ or native threads in that they share access to the same variables and the threads are freely timesliced so right in the middle of functionA running in threadA, execution could be interrupted and functionB running in threadB could run. Since both can freely access the same variables, there are all sorts of race conditions possible unless one manually uses thread synchronization tools (such as mutexes) to coordinate and protect all access to shared variables. This type of programming is often the source of very hard to find and next-to-impossible to reliably reproduce concurrency bugs. While powerful and useful for some system-level things or more real-time-ish code, it's very easy for anyone but a very senior and experienced developer to make costly concurrency mistakes. And, it's very hard to devise a test that will tell you if it's really stable under all types of load or not.
node.js attempts to avoid the classic concurrency bugs by separating the threads into their own variable space and forcing all communication between them to be synchronized via the event queue. This means that threadA/functionA is never arbitrarily interrupted and some other code in your process changes some shared variables it was accessing while it wasn't looking.
node.js also has a backstop that it can run a child_process that can be written in any language and can use native threads if needed or one can actually hook native code and real system level threads right into node.js using the add-on SDK (and it communicates with node.js Javascript through the SDK interface). And, in fact, a number of node.js built-in libraries do exactly this to surface functionality that requires that level of access to the nodejs environment. For example, the implementation of file access uses a pool of native threads to carry out file operations.
So, with all that said, there are still some types of race conditions that can occur and this has to do with access to outside resources. For example if two threads or processes are both trying to do their own thing and write to the same file, they can clearly conflict with each other and create problems.
So, using Workers in node.js still has to be aware of concurrency issues when accessing outside resources. node.js protects the local variable environment for each Worker, but can't do anything about contention among outside resources. In that regard, node.js Workers have the same issues as Java threads and the programmer has to code for that (exclusive file access, file locks, separate files for each Worker, using a database to manage the concurrency for storage, etc...).
It comes under the node js architecture. whenever a req reaches the node it is passed on to "EVENT QUE" then to "Event Loop" . Here the event-loop checks whether the request is 'blocking io or non-blocking io'. (blocking io - the operations which takes time to complete eg:fetching a data from someother place ) . Then Event-loop passes the blocking io to THREAD POOL. Thread pool is a collection of WORKER THREADS. This blocking io gets attached to one of the worker-threads and it begins to perform its operation(eg: fetching data from database) after the completion it is send back to event loop and later to Execution.

Why branch predictor cannot be trained in another process.

In the Spectre variant 2, the branch target buffer (BTB) can be poisoned in another process. If the branch predictor is also using the virtual address to index the branch, why we can not train the branch predictor like training BTB in Spectre v1 attack?
If the branch predictor is also using the virtual address to index the branch, why we can not train the branch predictor like training BTB in Spectre v1 attack?
Simple answers:
We don't know the virtual address of another process
Since it's another process, we can't directly inject CPU instructions to another process to access the memory we need.
More detailed:
Sure, Spectre V1 and V2 use the same vulnerability -- branch predictor. The basic difference is that V1 works within the same process, while V2 works for among the processes.
A typical example of V1 is a JavaScript virtual machine. We can't access process memory from within the VM, because VM checks bounds of each memory access. But using Spectre V1 we can first train branch predictor for the VM and then speculatively access a memory within the current process.
Things get more complicated if we want to access another process, i.e. Spectre V2. First, we have to train the branch predictor of a remote process. There are many ways to do this, including the way you've mentioned: train local virtual address (if we know it).
Then we can't just ask remote process "please read this and this secret", so we have to use the technique called "return oriented programming" to speculatively execute pieces of code we need inside the remote process.
That is the main difference. But as I said, your proposal is valid and will definitely work on some CPUs and some circumstances.

How to use processors of several computers in a network in TF

I'm asking this because it's completely new for me (It's more about a computer networks in Linux question than about TF, but maybe someone has already done it)
Since my GPU is not able to compute the input data I need, I had to get resources from my CPU, however there are times that even the CPU + GPU together cannot cope with all of the operations. I can use the processor of another computer which is in a network with my computer, but I don"t know how I should code that (I have access to that computer, but in that area I'm not that good in Linux :p)
I was looking in the TF web page, but they just tell when the resources are local. Here I found the usual with tf.device('/cpu:0'): ... to solve when my GPU was not able to cope with all of the information, I think that maybe it could be something like with tf.device('other_computer/cpu:0'): but then I think I would have to change the line sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) and at the same time I would have had to access to the other computer but I don't know how to do it
Anyway, if someone has done this before, I would be thankful to know it. I accept any reference I could use
Thanks
TensorFlow supports distributed computation, using the CPUs (and potentially GPUs) of multiple computers connected by a network. Your intuition that the with tf.device(): block will be useful is correct. This is a broad topic, but there are three main steps to setting up a distributed TensorFlow computation:
Create a tf.train.Server, which maps the machines in your cluster to TensorFlow jobs, which are lists of tasks. You'll need to create a tf.train.Server on each computer that you want to use, and configure it with the same cluster definition.
Build a TensorFlow graph. Use with tf.device("/job:foo/task:17"): to define a block of nodes that will be placed in the 17th task of the job called "foo" that you defined in step 1. There are convenience methods for applying device mapping policies, such as tf.train.replica_device_setter(), which help to simplify this task for common training topologies.
Create a tf.Session that connects to the local server. If you created your server as server = tf.train.Server(...), you'd create your session as sess = tf.Session(server.target, ...).
There's a longer tutorial about distributed TensorFlow available here, and some example code for training the Inception image recognition model using a cluster here.

Allowing multiple processes to connect to a singleton process and invoke method calls or access resources

Scenario:
I currently have a class MyLoadBalancerSingleton that manages access to a cluster of resources (resource1 and resource2). The class has methods create(count) and delete(count). When these methods get called, the load balancer would queue up the request and then processes it FIFO on the resources.
Naturally, there should be only one load balancer running otherwise each they'll all think they have complete control over the resources being managed.
Here is the problem:
Multiple users will simultaneously try to access the load balancer from a GUI. Each user will spawn their own GUI via python gui.py on the same machine. (They will all ssh into the same machine) As such, each GUI will be running in it's own process. The GUI will then attempt to communicate with the load balancer.
Is it possible to have those multiple GUI processes access only one loadbalancer process and call the load balancer's methods?
I looked into the multiprocessing library and it appears that the workflow is opposite to what I want. Using my example it would be: Loadbalancer process spawns 2 GUI processes (child) and then shares the parent resources with the child. In my example, both the GUI and the load balancer are top level processes. (No parent-child relationship)
I suspect that Singleton is not the right word as singletons only work within one process. Maybe run the load balancer as a daemon process and then have those GUI processes connect to it? I tried searching IPC but it just lead me to the multiprocessing module which is not what I want. The distributed, cluster computing modules (dispy) isn't what I want either. This is strictly processes communicating with each other (IPC?) on the same machine.
Which brings me to my original question:
Is it possible to allow multiple processes to connect to a singleton process and invoke method calls or access its resources? All processes will be executing on the same machine.
Fictitious pseudocode:
LoadBalancer.py
class MyLoadBalancerSingleton(object):
def __init__():
# Singleton instance logic here
# Resource logic here
def create(count):
resource1.create(count)
resource2.create(count)
def delete(count):
resource1.delete(count)
resource2.delete(count)
Gui.py
class GUI(object):
def event_loop():
# count = Ask for user input
# process = Locate load balancer process
# process.create(count)
# process.delete(count)
Thank you for your time!
Yes, it's possible. I don't have a Python-specific example easily at hand, but you can do it. There are several kinds of IPC that allow multiple clients (GUIs, in your case) to connect to a single server (your singleton, which yes, would usually be run as a daemon). Here are a few of them:
Cross-platform: TCP sockets. You'd need your server to allow multiple connections on a single listening socket, and handle them as clients connect (and disconnect). The easiest approach to use across multiple machines, but also the least secure option (no ACLs, etc.).
Windows-specific: Named pipes. Windows' named pipes, unlike the similarly-named but much less capable feature of POSIX OSes, can allow multiple clients to connect at once. You'd need to create a multiple-instance pipe server. MSDN has good examples of this. I'm not sure what the best way to do it in Python would be, but I know that ActivePython has wrappers for the NT named pipe APIs. The clients only need to be able to open a file (of the form \\.\pipe\LoadBalancer). File APIs are used to read from, and write to, the pipes.
POSIX-only (Linux, BSD, OS X, etc.): Unix domain sockets. The POSIX equivalent of NT's named pipes, they use socket APIs but the endpoint is on the file system (like, /var/LoadBalanceSocket) instead of on an IP address/protocol/port tuple.
Various other things, using stuff like shared memory / memory-mapped files, RPC, COM (on Windows), Grand Central Dispatch (OS X), D-Bus (cross-platform but third-party), and so on. None of them, with the possible exception of GCD, is ideal for the simple case you're talking about here.
Of course, each of these approaches require the server to handle multiple simultaneously-connected clients. The server will need to synchronize across them, and avoid one client blocking the others from being served. You could use multithreading for quick responsiveness and minimal CPU cost while waiting, or polling for a quick-and-dirty solution that avoids multithreading synchronization (mutexes and the like).

Resources