Pika states it is not thread-safe and not to share one connection across multiple threads.
(I think) I am running one thread per connection which should be ok but the wording of other answers suggests that there may be a subtle difference in 'running one thread per connection' and 'running one connection per thread'.
My goal is to have a consumer that listens for RMQ messages and when a message is received, do some work which takes time. The work logic itself is not multithreaded as it executes 'synchronously'. The exchange and queue was setup manually - I am just writing a consumer.
However due to the fact that the work takes time (url calls), currently I have each callback create and execute a single thread;
class MyThread(threading.Thread):
def run(self):
name = random.choice(string.ascii_letters)
print(f"Thread {name} execution started.")
# Simulate URL calls
time.sleep(random.randrange(1,5))
print(f"Thread {name} execution ended.")
class MyClass():
def connect(self, url, queue):
connection = pika.BlockingConnection(
pika.connection.URLParameters(url)
)
channel = connection.channel()
channel.basic_consume(queue=queue, on_message_callback=self.callback)
# Infinite loop that waits for incoming messages
channel.start_consuming()
def callback(self, ch, method, properties, body):
thread = MyThread()
thread.start()
# Not sure of this ACK and how to NACK
ch.basic_ack(delivery_tag = method.delivery_tag)
When executing the program the threads complete in different orders which is what I expect.
Thread H execution started.
Thread E execution started.
Thread G execution started.
Thread j execution started.
Thread E execution ended.
Thread j execution ended.
Thread H execution ended.
Thread G execution ended.
My understanding is that the following implementation (that I am not using) would cause thread safe issues;
def callback(self, ch, method, properties, body):
thread1 = MyThread()
thread2 = MyThread()
thread3 = MyThread()
thread1.start()
thread2.start()
thread3.start()
Is my implementation thread safe? If not, how would I implement a thread safe version?
EDIT I added my implementation of an ACK, I'm not sure how to implement add_callback_threadsafe
Your code appears fine. When you acknowledge a message from the thread spawned by the callback, be sure to use this method that uses add_callback_threadsafe.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
Related
According to the docs Pika is not thread safe:
Pika does not have any notion of threading in the code. If you want to use Pika with threading, make sure you have a Pika connection per thread, created in that thread. It is not safe to share one Pika connection across threads, with one exception: you may call the connection method add_callback_threadsafe from another thread to schedule a callback within an active pika connection.
Lets say I have a subscriber which I have started using channel.start_consuming(). That thread will be blocked waiting for messages to arrive. These messages might be a long time apart (hours sometimes).
Surely if I want to safely / cleanly shutdown the subscriber, I must do-so from another thread? Or else how can I trigger the consumer to break out of blocking?
You could use connection.process_data_events() instead of just channel.start_consuming(). The advantage here is that you could do something like this to close a connection.
def consume_messages(self):
while self.running:
self.connection.process_data_events()
sleep(0.1)
self.connection.close()
You would then just close the connection by setting self.running to False.
I'm trying to understand the pattern for indefinitely running asyncio Tasks
and the difference that a custom loop signal handler makes.
I create workers using loop.create_task() so that they run concurrently.
In my regular workers' code I am polling for data and act accordingly when data is there.
I'm trying to handle the shutdown process gracefully on a signal.
When a signal is delivered - I again create_task() with the shutdown function, so that currently running tasks continue, and shutdown gets executed in next iteration of the event loop.
Now - when a single worker's while loop doesn't actually do any IO or work then it prevents the signal handler from being executed. It never ends and does not give back execution so that other tasks could be run.
When I don't attach a custom signal handler to a loop and run this program, then a signal is delivered and the program stops. I assume it's a main thread that stops the loop itself.
This is obviously different from trying to schedule a (new) shutdown task on a running loop, because that running loop is stuck in a single coroutine which is blocked in a while loop and doesn't give back any control or time for other tasks.
Is there any standard pattern for such cases?
Do I need to asyncio.sleep() if there's no work to do, do I replace the while loop with something else (e.g. rescheduling the work function itself)?
If the range(5) is replaced with range(1, 5) then all workers do await asyncio.sleep,
but if one of them does not, then everything gets blocked. How to handle this case, is there any standard approach?
The code below illustrates the problem.
async def shutdown(loop, sig=None):
print("SIGNAL", sig)
tasks = [t for t in asyncio.all_tasks()
if t is not asyncio.current_task()]
[t.cancel() for t in tasks]
results = await asyncio.gather(*tasks, return_exceptions=True)
# handle_task_results(results)
loop.stop()
async def worker(intval):
print("start", intval)
while True:
if intval:
print("#", intval)
await asyncio.sleep(intval)
loop = asyncio.get_event_loop()
for sig in {signal.SIGINT, signal.SIGTERM}:
loop.add_signal_handler(
sig,
lambda s=sig: asyncio.create_task(shutdown(loop, sig=s)))
workers = [loop.create_task(worker(i)) for i in range(5)] # this range
loop.run_forever()
To accelerate a certain task, I'm subclassing Process to create a worker that will process data coming in samples. Some managing class will feed it data and read the outputs (using two Queue instances). For asynchronous operation I'm using put_nowait and get_nowait. At the end I'm sending a special exit code to my process, upon which it breaks its internal loop. However... it never happens. Here's a minimal reproducible example:
import multiprocessing as mp
class Worker(mp.Process):
def __init__(self, in_queue, out_queue):
super(Worker, self).__init__()
self.input_queue = in_queue
self.output_queue = out_queue
def run(self):
while True:
received = self.input_queue.get(block=True)
if received is None:
break
self.output_queue.put_nowait(received)
print("\tWORKER DEAD")
class Processor():
def __init__(self):
# prepare
in_queue = mp.Queue()
out_queue = mp.Queue()
worker = Worker(in_queue, out_queue)
# get to work
worker.start()
in_queue.put_nowait(list(range(10**5))) # XXX
# clean up
print("NOTIFYING")
in_queue.put_nowait(None)
#out_queue.get() # XXX
print("JOINING")
worker.join()
Processor()
This code never completes, hanging permanently like this:
NOTIFYING
JOINING
WORKER DEAD
Why?
I've marked two lines with XXX. In the first one, if I send less data (say, 10**4), everything will finish normally (processes join as expected). Similarly in the second, if I get() after notifying the workers to finish. I know I'm missing something but nothing in the documentation seems relevant.
Documentation mentions that
When an object is put on a queue, the object is pickled and a background thread later flushes the pickled data to an underlying pipe. This has some consequences [...] After putting an object on an empty queue there may be an infinitesimal delay before the queue’s empty() method returns False and get_nowait() can return without raising queue.Empty.
https://docs.python.org/3.7/library/multiprocessing.html#pipes-and-queues
and additionally that
whenever you use a queue you need to make sure that all items which have been put on the queue will eventually be removed before the process is joined. Otherwise you cannot be sure that processes which have put items on the queue will terminate.
https://docs.python.org/3.7/library/multiprocessing.html#multiprocessing-programming
This means that the behaviour you describe is caused probably by a racing condition between self.output_queue.put_nowait(received) in the worker and joining the worker with worker.join() in the Processers __init__. If joining was faster than feeding it into the queue, everything finishes fine. If it was too slow, there is an item in the queue, and the worker would not join.
Uncommenting the out_queue.get() in the main process would empty the queue, which allows joining. But as it is important for the queue to return if the queue would already be empty, using a time-out might be an option to try to wait out the racing condition, e.g out_qeue.get(timeout=10).
Possibly important might also be to protect the main routine, especially for Windows (python multiprocessing on windows, if __name__ == "__main__")
I am making a bot for telegram, this bot will use a database (SQLite3).
I am familiar with threads and locks and I know that is safe to launch multiple thread that make query to the database.
My problem rises when I want to update/insert data.
With the use Condition and Event from the threading module, I can prevent new thread to access the database while a thread is updating/inserting data.
What I haven't figured out is how to wait that all the thread that are accessing the database are done, before updating/inserting data.
If I could get the count of semaphore I would just wait for it to drop to 0, but since is not possible, what approach should I use?
UPDATE: I can't use join() since I am using telegram bot and create thread dynamically with each request to my bot, therefore when a thread is created I don't know if I'll have to wait for it to end or not.
CLARIFICATION: join() can only be used if, at the start of a thread you know wether you'll have to wait for it to end or not. Since I create a thread for each request of my clients and I am unaware of what they'll ask or and when the request will be done, I can't know whether to use join() or not.
UPDATE2: Here the code regarding the locks. I haven't finished the code regarding the database since I am more concerned with the locks and it doesn't seems relevant to the question.
lock = threading.Lock()
evLock = threading.Event()
def addBehaviours(dispatcher):
evLock.set()
# (2) Fetch the list of events
events_handler = CommandHandler('events', events)
dispatcher.add_handler(events_handler)
# (3) Add a new event
addEvent_handler = CommandHandler('addEvent', addEvent)
dispatcher.add_handler(addEvent_handler)
# (2) Fetch the list of events
#run_async
def events(bot, update):
evLock.wait()
# fetchEvents()
# (3) Add a new event
#run_async
def addEvent(bot, update):
with lock:
evLock.clear()
# addEvent()
evLock.set()
You can use threading.Thread.join(). This will wait for a thread to end and only continue on when the thread is done.
Usage below:
import threading as thr
thread1 = thr.Thread() # some thread to be waited for
thread1 = thr.Thread() # something that runs after thread1 finishes
thread1.start() # start up this thread
thread1.join() # wait until this thread finishes
thread2.start()
...
I'm putting together a client server app on RPi. It has a main thread which creates a comms thread to talk to an iOS device.
The main thread creates an asyncio event loop and a sendQ and a recvQ and passes them as args to the commsDelegate main method in the comms thread.
The trouble I'm having is when iOS device connects, it needs to receive unsolicited data from this Python app as soon as the data becomes available and it needs to be able to send data up to the Python app. So send and receive need to be non-blocking.
There are great echo server tutorials out there. But little in terms of the server doing something useful with the data.
Can anyone assist me in getting asyncio to read my send queue and forward data as soon as the main thread has queued it? I have receive working great.
Main Thread creates a loop and starts the comms thread:
commsLoop = asyncio.new_event_loop()
commsMainThread = threading.Thread(target=CommsDelegate.commsDelegate, args=(commsInQ,commsOutQ,commsLoop,commsPort,), daemon=True)
commsMainThread.start()
Then asyncio in module CommsDelegate should run the loop as loop.run_forever() server task reading and writing from a socket stream and sending receiving messages using queues back up to the main thread.
Here's my code so far. I found that if I create a factory for the protocol generator, I can pass it the queue names so the receipt of messages is all good now. When they arrive from the client they are queued _nowait and the main thread receives them just fine.
I just need asyncio to handle the queue of outbound messages from the Main thread as they arrive on sendQ, so it can send them on to the connected client.
#!/usr/bin/env python3.6
import asyncio
class ServerProtocol(asyncio.Protocol):
def __init__(self, loop, recvQ, sendQ):
self.loop = loop
self.recvQ = recvQ
self.sendQ = sendQ
def connection_made(self, transport):
peername = transport.get_extra_info('peername')
print('Connection from {}'.format(peername))
self.transport = transport
def data_received(self, data):
message = data.decode()
print('Data received: {!r}'.format(message))
self.recvQ.put_nowait(message.rstrip())
# Needs work... I think the queue.get_nowait should be a co-ro maybe?
def unknownAtTheMo():
dataToSend = sendQ.get_nowait()
print('Send: {!r}'.format(message))
self.transport.write(dataToSend)
# Needs work to close on request from client or server or exc...
def handleCloseSocket(self):
print('Close the client socket')
self.transport.close()
# async co-routine to consume the send message Q from Main Thread
async def consume(sendQ):
print("In consume coro")
while True:
outboundData = await self.sendQ.get()
print("Consumed", outboundData)
self.transport.write(outboundData.encode('ascii'))
def commsDelegate(recvQ, sendQ, loop, port):
asyncio.set_event_loop(loop)
# Connection coroutine - Create a factory to assist the protocol in receipt of the queues as args
factory = lambda: ProveItServerProtocol(loop, recvQ, sendQ)
# Each client connection will create a new protocol instance
connection = loop.run_until_complete(loop.create_server(factory, host='192.168.1.199', port=port))
# Outgoing message queue handler
consumer = asyncio.ensure_future(consume(sendQ))
# Set up connection
loop.run_until_complete(connection)
# Wait until the connection is closed
loop.run_forever()
# Wait until the queue is empty
loop.run_until_complete(queue.join())
# Cancel the consumer
consumer.cancel()
# Let the consumer terminate
loop.run_until_complete(consumer)
# Close the connection
connection.close()
# Close the loop
loop.close()
I send all data messages as json and CommsDelegate performs encode and decode then relays them asis.
Update: asyncio thread seems to be working well for incoming traffic. Server receives json and relays it via a queue - non-blocking.
Once the send is working, I'll have a reusable blackbox server on a thread.
I can see two problems with your approach. First, all your clients are using the same recv and send queues, so there is no way the consume coroutine can know who to reply to.
The second issue has to do with your use of queues as a bridge between the synchronous and the asynchronous worlds. See this part of your code:
await self.sendQ.get()
If sendQ is a regular queue (from the queue module), this line will fail because sendQ is not a coroutine. On the other hand, if sendQ is an asyncio.Queue, the main thread won't be able to use sendQ.put because it is a coroutine. It would be possible to use put_nowait, but thread-safety is not guaranteed in asyncio. Instead, you'd have to use loop.call_soon_threadsafe:
loop.call_soon_threadsafe(sendQ.put_nowait, message)
In general, remember that asyncio is designed to run as the main application. It's supposed to run in the main thread, and communicate with synchronous code through a ThreadPoolExecutor (see loop.run_in_executor).
More information about multithreading in the asyncio documentation. You might also want to have a look at the asyncio stream API that provides a much nicer interface to work with TCP.