Can't share variable between threads with websockify - multithreading

I have been fighting with Websockify the last days trying to make it work. There is no apparent documentation so I end up doing things with trial & error.
I have a server which runs on two threads. One thread always sends and receives information while the second thread does other work. However I can't seem to make the two threads talk with each other.
#!/usr/bin/env python
from websocket import WebSocketServer
from threading import Thread
from time import sleep
class Server(WebSocketServer):
a=10
def new_client(self):
while True:
sleep(1)
print("Thread 2: ", self.a)
server = Server('', 9017)
Thread(target=server.start_server).start()
# Main thread continues
while 1:
sleep(1)
server.a+=2
print("Main thread: ", server.a)
Output:
Main thread: 18
Thread 2: 16
Main thread: 20
Thread 2: 16
Main thread: 22
Thread 2: 16
Main thread: 24
Thread 2: 16
Obviously the two threads don't share the same attribute a. Why?

By default websockify spawns a new process for each new client connection (websockify connections tend to be long-lived so the process creation overhead isn't generally an issue). This provides some security isolation to reduce the risk that bugs in websockify can be exploited to allow one client to listen in or otherwise affect other client connections.
You can find the process creation code in the top_new_client method. There is an option called --run-once that will handle the a single client in the same process. However, it is designed to exit the main loop in top_new_client after a single connection. You could remove the break statement in the self.run_once conditional check but it means you won't be able to connect more than one client at a time, but perhaps that is sufficient for what you are trying to do.
I also have some unpushed in-progress code to switch WebSocketServer to be more like the HTTPServer class where you provide your own threading or multiprocessing mixin. If you think that might help, let me know and I can push that out to a branch.
Another option for your case would be to use some form of IPC communication to communicate between each client process and the parent process.

Related

Asynchronous Communication between few 'loops'

I have 3 classes that represent nearly isolated processes that can be run concurrently (meant to be persistent, like 3 main() loops).
class DataProcess:
...
def runOnce(self):
...
class ComputeProcess:
...
def runOnce(self):
...
class OtherProcess:
...
def runOnce(self):
...
Here's the pattern I'm trying to achieve:
start various streams
start each process
allow each process to publish to any stream
allow each process to listen to any stream (at various points in it's loop) and behave accordingly (allow for interruption of it's current task or not, etc.)
For example one 'process' Listens for external data. Another process does computation on some of that data. The computation process might be busy for a while, so by the time it comes back to start and checks the stream, there may be many values that piled up. I don't want to just use a queue because, actually I don't want to be forced to process each one in order, I'd rather be able to implement logic like, "if there is one or multiple things waiting, just run your process one more time, otherwise go do this interruptible task while you wait for something to show up."
That's like a lot, right? So I was thinking of using an actor model until I discovered RxPy. I saw that a stream is like a subject
from reactivex.subject import BehaviorSubject
newData = BehaviorSubject()
newModel = BehaviorSubject()
then I thought I'd start 3 threads for each of my high level processes:
thread = threading.Thread(target=data)
threads = {'data': thread}
thread = threading.Thread(target=compute)
threads = {'compute': thread}
thread = threading.Thread(target=other)
threads = {'other': thread}
for thread in threads.values():
thread.start()
and I thought the functions of those threads should listen to the streams:
def data():
while True:
DataProcess().runOnce() # publishes to stream inside process
def compute():
def run():
ComuteProcess().runOnce()
newData.events.subscribe(run())
newModel.events.subscribe(run())
def other():
''' not done '''
ComuteProcess().runOnce()
Ok, so that's what I have so far. Is this pattern going to give me what I'm looking for?
Should I use threading in conjunction with rxpy or just use rxpy scheduler stuff to achieve concurrency? If so how?
I hope this question isn't too vague, I suppose I'm looking for the simplest framework where I can have a small number of computational-memory units (like objects because they have internal state) that communicate with each other and work in parallel (or concurrently). At the highest level I want to be able to treat these computational-memory units (which I've called processes above) as like individuals who mostly work on their own stuff but occasionally broadcast or send a message to a specific other individual, requesting information or providing information.
Am I perhaps actually looking for an actor model framework? or is this RxPy setup versatile enough to achieve that without extreme complexity?
Thanks so much!

python3 - thread is missing from enumerate result when it is sleeping

We have an API endpoint that starts a thread, and another endpoint to check the status of the thread (based on a thread ID returned by the first API call).
We use the threading module.
The function that the thread is executing may or may not sleep for a duration of time.
When we create the thread, we override the default name provided by the module and add the thread ID that was generated by us (so we can keep track).
The status endpoint gets the thread ID from the client request and simply loops over the results from threading.enumerate(). When the thread is running and not sleeping, we see that the thread is returned by the threading.enumerate() function. When it is sleeping, it is not.
The function we use to see if a thread is alive:
def thread_is_running(thread_id):
all_threads = [ t.getName() for t in threading.enumerate() ]
return any(thread_id in item for item in all_threads)
When we run in debug and print the value of "all_threads", we only see the MainThread thread during our thread's sleep time.
As soon as the sleep is over, we see our thread in the value of "all_threads".
This is how we start the thread:
thread_id = random.randint(10000, 50000)
thread_name = f"{service_name}-{thread_id}"
threading.Thread(target=drain, args=(service_name, params,), name=thread_name).start()
Is there a way to get a list of all threads including idle threads? Is a sleeping thread marked as idle? Is there a better way to pause a thread?
We thought about making the thread update it's state in a database, but due to some internal issue we currently have, we cannot 100% count on writing to our database, so we prefer checking the system for the thread's status.
Turns out the reason we did not see the thread was our use of gunicorn and multi workers.
The thread was initiated on one of the 4 configured workers while the status api call could've been handled by any of the 4 workers. only when it was handled by the worker who is also responsible of running the thread - we were able to see it in the enumerate output

ValueError when asyncio.run() is called in separate thread

I have a network application which is listening on multiple sockets.
To handle each socket individually, I use Python's threading.Thread module.
These sockets must be able to run tasks on packet reception without delaying any further packet reception from the socket handling thread.
To do so, I've declared the method(s) that are running the previously mentioned tasks with the keyword async so I can run them asynchronously with asyncio.run(my_async_task(my_parameters)).
I have tested this approach on a single socket (running on the main thread) with great success.
But when I use multiple sockets (each one with it's independent handler thread), the following exception is raised:
ValueError: set_wakeup_fd only works in main thread
My question is the following: Is asyncio the appropriate tool for what I need? If it is, how do I run an async method from a thread that is not a main thread.
Most of my search results are including "event loops" and "awaiting" assync results, which (if I understand these results correctly) is not what I am looking for.
I am talking about sockets in this question to provide context but my problem is mostly about the behaviour of asyncio in child threads.
I can, if needed, write a short code sample to reproduce the error.
Thank you for the help!
Edit1, here is a minimal reproducible code example:
import asyncio
import threading
import time
# Handle a specific packet from any socket without interrupting the listenning thread
async def handle_it(val):
print("handled: {}".format(val))
# A class to simulate a threaded socket listenner
class MyFakeSocket(threading.Thread):
def __init__(self, val):
threading.Thread.__init__(self)
self.val = val # Value for a fake received packet
def run(self):
for i in range(10):
# The (fake) socket will sequentially receive [val, val+1, ... val+9]
asyncio.run(handle_it(self.val + i))
time.sleep(0.5)
# Entry point
sockets = MyFakeSocket(0), MyFakeSocket(10)
for socket in sockets:
socket.start()
This is possibly related to the bug discussed here: https://bugs.python.org/issue34679
If so, this would be a problem with python 3.8 on windows. To work around this, you could try either downgrading to python 3.7, which doesn't include asyncio.main so you will need to get and run the event loop manually like:
loop = asyncio.get_event_loop()
loop.run_until_complete(<your tasks>)
loop.close()
Otherwise, would you be able to run the code in a docker container? This might work for you and would then be detached from the OS behaviour, but is a lot more work!

PYTHON - MULTITHREADING USING CLASSES

I am a absolute beginner in python multi threading. My application needs to telnet around 200 servers, execute commands and return the response. I have created separate classes for telnetting and processing the response. I read about GIL and race conditions in threading but not sure whether they will have impact in my code. Because for every thread i am creating a new instance of the class and accessing the method. So technically the threads will not share same resource. Can anyone please explain whether my assumption is right if not please explain the right way of doing it ?
Main method :
if __name__ == "__main__":
thread_list = []
for ip in server_list: # server list contains the IP of hosts
config_object = Configuration () # configuration class has method for telnet device
thread1 = threading.Thread(target=config_object.captureconfigprocess, args=(ip))
thread_list.append(thread1)
for thread in thread_list:
thread.start()
for thread in thread_list:
thread.join()
I read about GIL and race conditions in threading but not sure whether they will have impact in my code
Python does not have real threads. OS will see all python threads as one process and that will require CPU to context switch between instructions sent by python. This will cripple the performance of your code. Although python threads will be more than enough for most of the case, it may or may not be enough for your case. 200 servers may seem too much but it all boils down to how much communication happens between those 200 servers and your python client. To be sure, you have to try. If you want a better solution, use multiprocessing.
So technically the threads will not share same resource.
If each thread is using it's own resourse than shared resourse is not an issue to worry about.

Python: how does a thread wait for other thread to end before resuming it's execution?

I am making a bot for telegram, this bot will use a database (SQLite3).
I am familiar with threads and locks and I know that is safe to launch multiple thread that make query to the database.
My problem rises when I want to update/insert data.
With the use Condition and Event from the threading module, I can prevent new thread to access the database while a thread is updating/inserting data.
What I haven't figured out is how to wait that all the thread that are accessing the database are done, before updating/inserting data.
If I could get the count of semaphore I would just wait for it to drop to 0, but since is not possible, what approach should I use?
UPDATE: I can't use join() since I am using telegram bot and create thread dynamically with each request to my bot, therefore when a thread is created I don't know if I'll have to wait for it to end or not.
CLARIFICATION: join() can only be used if, at the start of a thread you know wether you'll have to wait for it to end or not. Since I create a thread for each request of my clients and I am unaware of what they'll ask or and when the request will be done, I can't know whether to use join() or not.
UPDATE2: Here the code regarding the locks. I haven't finished the code regarding the database since I am more concerned with the locks and it doesn't seems relevant to the question.
lock = threading.Lock()
evLock = threading.Event()
def addBehaviours(dispatcher):
evLock.set()
# (2) Fetch the list of events
events_handler = CommandHandler('events', events)
dispatcher.add_handler(events_handler)
# (3) Add a new event
addEvent_handler = CommandHandler('addEvent', addEvent)
dispatcher.add_handler(addEvent_handler)
# (2) Fetch the list of events
#run_async
def events(bot, update):
evLock.wait()
# fetchEvents()
# (3) Add a new event
#run_async
def addEvent(bot, update):
with lock:
evLock.clear()
# addEvent()
evLock.set()
You can use threading.Thread.join(). This will wait for a thread to end and only continue on when the thread is done.
Usage below:
import threading as thr
thread1 = thr.Thread() # some thread to be waited for
thread1 = thr.Thread() # something that runs after thread1 finishes
thread1.start() # start up this thread
thread1.join() # wait until this thread finishes
thread2.start()
...

Resources