PYTHON - MULTITHREADING USING CLASSES - python-3.x

I am a absolute beginner in python multi threading. My application needs to telnet around 200 servers, execute commands and return the response. I have created separate classes for telnetting and processing the response. I read about GIL and race conditions in threading but not sure whether they will have impact in my code. Because for every thread i am creating a new instance of the class and accessing the method. So technically the threads will not share same resource. Can anyone please explain whether my assumption is right if not please explain the right way of doing it ?
Main method :
if __name__ == "__main__":
thread_list = []
for ip in server_list: # server list contains the IP of hosts
config_object = Configuration () # configuration class has method for telnet device
thread1 = threading.Thread(target=config_object.captureconfigprocess, args=(ip))
thread_list.append(thread1)
for thread in thread_list:
thread.start()
for thread in thread_list:
thread.join()

I read about GIL and race conditions in threading but not sure whether they will have impact in my code
Python does not have real threads. OS will see all python threads as one process and that will require CPU to context switch between instructions sent by python. This will cripple the performance of your code. Although python threads will be more than enough for most of the case, it may or may not be enough for your case. 200 servers may seem too much but it all boils down to how much communication happens between those 200 servers and your python client. To be sure, you have to try. If you want a better solution, use multiprocessing.
So technically the threads will not share same resource.
If each thread is using it's own resourse than shared resourse is not an issue to worry about.

Related

Asynchronous Communication between few 'loops'

I have 3 classes that represent nearly isolated processes that can be run concurrently (meant to be persistent, like 3 main() loops).
class DataProcess:
...
def runOnce(self):
...
class ComputeProcess:
...
def runOnce(self):
...
class OtherProcess:
...
def runOnce(self):
...
Here's the pattern I'm trying to achieve:
start various streams
start each process
allow each process to publish to any stream
allow each process to listen to any stream (at various points in it's loop) and behave accordingly (allow for interruption of it's current task or not, etc.)
For example one 'process' Listens for external data. Another process does computation on some of that data. The computation process might be busy for a while, so by the time it comes back to start and checks the stream, there may be many values that piled up. I don't want to just use a queue because, actually I don't want to be forced to process each one in order, I'd rather be able to implement logic like, "if there is one or multiple things waiting, just run your process one more time, otherwise go do this interruptible task while you wait for something to show up."
That's like a lot, right? So I was thinking of using an actor model until I discovered RxPy. I saw that a stream is like a subject
from reactivex.subject import BehaviorSubject
newData = BehaviorSubject()
newModel = BehaviorSubject()
then I thought I'd start 3 threads for each of my high level processes:
thread = threading.Thread(target=data)
threads = {'data': thread}
thread = threading.Thread(target=compute)
threads = {'compute': thread}
thread = threading.Thread(target=other)
threads = {'other': thread}
for thread in threads.values():
thread.start()
and I thought the functions of those threads should listen to the streams:
def data():
while True:
DataProcess().runOnce() # publishes to stream inside process
def compute():
def run():
ComuteProcess().runOnce()
newData.events.subscribe(run())
newModel.events.subscribe(run())
def other():
''' not done '''
ComuteProcess().runOnce()
Ok, so that's what I have so far. Is this pattern going to give me what I'm looking for?
Should I use threading in conjunction with rxpy or just use rxpy scheduler stuff to achieve concurrency? If so how?
I hope this question isn't too vague, I suppose I'm looking for the simplest framework where I can have a small number of computational-memory units (like objects because they have internal state) that communicate with each other and work in parallel (or concurrently). At the highest level I want to be able to treat these computational-memory units (which I've called processes above) as like individuals who mostly work on their own stuff but occasionally broadcast or send a message to a specific other individual, requesting information or providing information.
Am I perhaps actually looking for an actor model framework? or is this RxPy setup versatile enough to achieve that without extreme complexity?
Thanks so much!

Python, Multithreading, sockets sometimes fail to create

Recently observed a rather odd behaviour that only happens in Linux but not freeBSD and was wondering whether anyone had an explanation or at least a guess of what might really be going on.
The problem:
The socket creation method, socket.socket(), sometimes fails. This only happens when multiple threads are creating the sockets, single-threaded works just fine.
To expand on socket.socket() fails, most of the time I get "error 13: Permission denied", but I have also seen "error 93: Protocol not supported".
Notes:
I have tried this on Ubuntu 18.04 (bug is there) and freeBSD 12.0 (bug is not there)
It only happens when multiple threads are creating sockets
I've used UDP as a protocol for the sockets, although that seems to be more fault-tolerant. I have tried it with TCP as well, it even goes haywire faster with similar errors.
It only happens sometimes, so multiple-runs might be required or as in the case I provided below - a bloated number of threads should also do the trick.
Code:
Here's some minimal code that you can use to reproduce that:
from threading import Thread
import socket
def foo():
udp = socket.getprotobyname('udp')
try:
send_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, udp)
except Exception as e:
print type(e)
print repr(e)
def main():
for _ in range(6000):
t = Thread(target=foo)
t.start()
main()
Note:
I have used an artificially large number of threads just to maximize the probability that you'd hit that error at least once within a run with UDP. As I said earlier, if you try TCP you'll see A LOT of errors with that number of threads. But in reality, even a more real-world number of threads like 20 or even 10 would trigger the error, you'd just likely need multiple runs in order to observe it.
Surrounding the socket creation with while, try/except will cause all subsequent calls to also fail.
Surrounding the socket creation with try/except and in the "exception handing" bit restarting the function, i.e. calling it again would work and will not fail.
Any ideas, suggestions or explanations are welcome!!!
P.S.
Technically I know I can get around my problem by having a single thread create as many sockets as I need and pass them as arguments to my other threads, but that is not the point really. I am more interested in why this is happening and how to solve it, rather than what workarounds there might be, even though these are also welcome. :)
I managed to solve it. The problem comes from getprotobyname() not being thread safe!
See:
The Linux man page
On another note, looking at the freeBSD man page also hints that this might cause problems with concurrency, however my experiments prove that it does not, maybe someone can follow up?
Anyway, a fixed version of the code for anyone interested would be to get the protocol number in the main thread (seems sensible and should have done that in the first place) and then pass it as an argument. It would both reduce the system calls that you perform and fix any concurrency-related problems with that within the program. The code would look as follows:
from threading import Thread
import socket
def foo(proto_num):
try:
send_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, proto_num)
except Exception as e:
print type(e)
print repr(e)
def main():
proto_num = socket.getprotobyname('udp')
for _ in range(6000):
t = Thread(target=foo, args=(proto_num,))
t.start()
main()
Exceptions with socket creation in the form of "Permission denied" or "Protocol not supported" will not be reported this way. Also, note that if you use SOCK_DGRAM the proto_number is redundant and might be skipped altogether, however the solution would be more relevant in case someone wants to create a SOCK_RAW socket.

Can't share variable between threads with websockify

I have been fighting with Websockify the last days trying to make it work. There is no apparent documentation so I end up doing things with trial & error.
I have a server which runs on two threads. One thread always sends and receives information while the second thread does other work. However I can't seem to make the two threads talk with each other.
#!/usr/bin/env python
from websocket import WebSocketServer
from threading import Thread
from time import sleep
class Server(WebSocketServer):
a=10
def new_client(self):
while True:
sleep(1)
print("Thread 2: ", self.a)
server = Server('', 9017)
Thread(target=server.start_server).start()
# Main thread continues
while 1:
sleep(1)
server.a+=2
print("Main thread: ", server.a)
Output:
Main thread: 18
Thread 2: 16
Main thread: 20
Thread 2: 16
Main thread: 22
Thread 2: 16
Main thread: 24
Thread 2: 16
Obviously the two threads don't share the same attribute a. Why?
By default websockify spawns a new process for each new client connection (websockify connections tend to be long-lived so the process creation overhead isn't generally an issue). This provides some security isolation to reduce the risk that bugs in websockify can be exploited to allow one client to listen in or otherwise affect other client connections.
You can find the process creation code in the top_new_client method. There is an option called --run-once that will handle the a single client in the same process. However, it is designed to exit the main loop in top_new_client after a single connection. You could remove the break statement in the self.run_once conditional check but it means you won't be able to connect more than one client at a time, but perhaps that is sufficient for what you are trying to do.
I also have some unpushed in-progress code to switch WebSocketServer to be more like the HTTPServer class where you provide your own threading or multiprocessing mixin. If you think that might help, let me know and I can push that out to a branch.
Another option for your case would be to use some form of IPC communication to communicate between each client process and the parent process.

NIO Server : use a worker thread or not?

I'm building a server with NIO, I have two questions.
Do I have to use a worker thread or a thread pool to process the messages received, or let the main thread do all this stuff ( I have performance needs).
I have two kind of sending, sendNow method which ends with selector.selectNow() and simple send method which ends with selector.wakeup().. can I have loss of data those methods?
thanks
If possible try to do it all in one thread. It gets very complicated very quickly otherwise.
I don't know why you think a sendNow() method needs to end with either selectNow() or wakeup(), but neither of them is intrinsically going to cause a data loss.

What multithreading package for Lua "just works" as shipped?

Coding in Lua, I have a triply nested loop that goes through 6000 iterations. All 6000 iterations are independent and can easily be parallelized. What threads package for Lua compiles out of the box and gets decent parallel speedups on four or more cores?
Here's what I know so far:
luaproc comes from the core Lua team, but the software bundle on luaforge is old, and the mailing list has reports of it segfaulting. Also, it's not obvious to me how to use the scalar message-passing model to get results ultimately into a parent thread.
Lua Lanes makes interesting claims but seems to be a heavyweight, complex solution. Many messages on the mailing list report trouble getting Lua Lanes to build or work for them. I myself have had trouble getting the underlying "Lua rocks" distribution mechanism to work for me.
LuaThread requires explicit locking and requires that communication between threads be mediated by global variables that are protected by locks. I could imagine worse, but I'd be happier with a higher level of abstraction.
Concurrent Lua provides an attractive message-passing model similar to Erlang, but it says that processes do not share memory. It is not clear whether spawn actually works with any Lua function or whether there are restrictions.
Russ Cox proposed an occasional threading model that works only for C threads. Not useful for me.
I will upvote all answers that report on actual experience with these or any other multithreading package, or any answer that provides new information.
For reference, here is the loop I would like to parallelize:
for tid, tests in pairs(tests) do
local results = { }
matrix[tid] = results
for i, test in pairs(tests) do
if test.valid then
results[i] = { }
local results = results[i]
for sid, bin in pairs(binaries) do
local outcome, witness = run_test(test, bin)
results[sid] = { outcome = outcome, witness = witness }
end
end
end
end
The run_test function is passed in as an argument, so a package can be useful to me only if it can run arbitrary functions in parallel. My goal is enough parallelism to get 100% CPU utilization on 6 to 8 cores.
Norman wrote concerning luaproc:
"it's not obvious to me how to use the scalar message-passing model to get results ultimately into a parent thread"
I had the same problem with a use case I was dealing with. I liked lua proc due to its simple and light implementation, but my use case had C code that was calling lua, which was triggering a co-routine that needed to send/receive messages to interact with other luaproc threads.
To achieve my desired functionality I had to add features to luaproc to allow sending and receiving messages from the parent thread or any other thread not running from the luaproc scheduler. Additionally, my changes allow using luaproc send/receive from coroutines created from luaproc.newproc() created lua states.
I added an additional luaproc.addproc() function to the api which is to be called from any lua state running from a context not controlled by the luaproc scheduler in order to set itself up with luaproc for sending/receiving messages.
I am considering posting the source as a new github project or contacting the developers and seeing if they would like to pull my additions. Suggestions as to how I should make it available to others are welcome.
Check the threads library in torch family. It implements a thread pool model: a few true threads (pthread in linux and windows thread in win32) are created first. Each thread has a lua_State object and a blocking job queue that admits jobs added from the main thread.
Lua objects are copied over from main thread to the job thread. However C objects such as Torch tensors or tds data structures can be passed to job threads via pointers -- this is how limited shared memory is achieved.
This is a perfect example of MapReduce
You can use LuaRings to accomplish your parallelization needs.
Concurrent Lua might seem like the way to go, but as I note in my updates below, it doesn't run things in parallel. The approach I tried was to spawn several processes that execute pickled closures received through the message queue.
Update
Concurrent Lua seems to handle first-class functions and closures without a hitch. See the following example program.
require 'concurrent'
local NUM_WORKERS = 4 -- number of worker threads to use
local NUM_WORKITEMS = 100 -- number of work items for processing
-- calls the received function in the local thread context
function worker(pid)
while true do
-- request new work
concurrent.send(pid, { pid = concurrent.self() })
local msg = concurrent.receive()
-- exit when instructed
if msg.exit then return end
-- otherwise, run the provided function
msg.work()
end
end
-- creates workers, produces all the work and performs shutdown
function tasker()
local pid = concurrent.self()
-- create the worker threads
for i = 1, NUM_WORKERS do concurrent.spawn(worker, pid) end
-- provide work to threads as requests are received
for i = 1, NUM_WORKITEMS do
local msg = concurrent.receive()
-- send the work as a closure
concurrent.send(msg.pid, { work = function() print(i) end, pid = pid })
end
-- shutdown the threads as they complete
for i = 1, NUM_WORKERS do
local msg = concurrent.receive()
concurrent.send(msg.pid, { exit = true })
end
end
-- create the task process
local pid = concurrent.spawn(tasker)
-- run the event loop until all threads terminate
concurrent.loop()
Update 2
Scratch all of that stuff above. Something didn't look right when I was testing this. It turns out that Concurrent Lua isn't concurrent at all. The "processes" are implemented with coroutines and all run cooperatively in the same thread context. That's what we get for not reading carefully!
So, at least I eliminated one of the options I guess. :(
I realize that this is not a works-out-of-the-box solution, but, maybe go old-school and play with forks? (Assuming you're on a POSIX system.)
What I would have done:
Right before your loop, put all tests in a queue, accessible between processes. (A file, a Redis LIST or anything else you like most.)
Also before the loop, spawn several forks with lua-posix (same as the number of cores or even more depending on the nature of tests). In parent fork wait until all children will quit.
In each fork in a loop, get a test from the queue, execute it, put results somewhere. (To a file, to a Redis LIST, anywhere else you like.) If there are no more tests in queue, quit.
In the parent fetch and process all test results as you do now.
This assumes that test parameters and results are serializable. But even if they are not, I think that it should be rather easy to cheat around that.
I've now built a parallel application using luaproc. Here are some misconceptions that kept me from adopting it sooner, and how to work around them.
Once the parallel threads are launched, as far as I can tell there is no way for them to communicate back to the parent. This property was the big block for me. Eventually I realized the way forward: when it's done forking threads, the parent stops and waits. The job that would have been done by the parent should instead be done by a child thread, which should be dedicated to that job. Not a great model, but it works.
Communication between parent and children is very limited. The parent can communicate only scalar values: strings, Booleans, and numbers. If the parent wants to communicate more complex values, like tables and functions, it must code them as strings. Such coding can take place inline in the program, or (especially) functions can be parked into the filesystem and loaded into the child using require.
The children inherit nothing of the parent's environment. In particular, they don't inherit package.path or package.cpath. I had to work around this by the way I wrote the code for the children.
The most convenient way to communicate from parent to child is to define the child as a function, and to have the child capture parental information in its free variables, known in Lua parlances as "upvalues." These free variables may not be global variables, and they must be scalars. Still, it's a decent model. Here's an example:
local function spawner(N, workers)
return function()
local luaproc = require 'luaproc'
for i = 1, N do
luaproc.send('source', i)
end
for i = 1, workers do
luaproc.send('source', nil)
end
end
end
This code is used as, e.g.,
assert(luaproc.newproc(spawner(randoms, workers)))
This call is how values randoms and workers are communicated from parent to child.
The assertion is essential here, as if you forget the rules and accidentally capture a table or a local function, luaproc.newproc will fail.
Once I understood these properties, luaproc did indeed work "out of the box", when downloaded from askyrme on github.
ETA: There is an annoying limitation: in some circumstances, calling fread() in one thread can prevent other threads from being scheduled. In particular, if I run the sequence
local file = io.popen(command, 'r')
local result = file:read '*a'
file:close()
return result
the read operation blocks all other threads. I don't know why this is---I assume it is some nonsense going on within glibc. The workaround I used was to call directly to read(2), which required a little glue code, but this works properly with io.popen and file:close().
There's one other limitation worth noting:
Unlike Tony Hoare's original conception of communicating sequential processing, and unlike most mature, serious implementations of synchronous message passing, luaproc does not allow a receiver to block on multiple channels simultaneously. This limitation is serious, and it rules out many of the design patterns that synchronous message-passing is good at, but it's still find for many simple models of parallelism, especially the "parbegin" sort that I needed to solve for my original problem.

Resources