Python: Callback on the worker-queue not working - python-3.x

Apologies for the long post. I am trying to subscribe to rabbitmq queue and then trying to create a worker-queue to execute tasks. This is required since the incoming on the rabbitmq would be high and the processing task on the item from the queue would take 10-15 minutes to execute each time. Hence necessitating the need for a worker-queue. Now I am trying to initiate only 4 items in the worker-queue, and register a callback method for processing the items in the queue. The expectation is that my code handles the part when all the 4 instances in the worker-queue are busy, the new incoming would be blocked until a free slot is available.
The rabbitmq piece is working well. The problem is I cannot figure out why the items from my worker-queue are not executing the task, i.e the callback is not working. In fact, the item from the worker queue gets executed only once when the program execution starts. For the rest of the time, tasks keep getting added to the worker-queue without being consumed. Would appreciate it if somebody could help out with the understanding on this one.
I am attaching the code for rabbitmqConsumer, driver, and slaveConsumer. Some information has been redacted in the code for privacy issues.
# This is the driver
#!/usr/bin/env python
import time
from rabbitmqConsumer import BasicMessageReceiver
basic_receiver_object = BasicMessageReceiver()
basic_receiver_object.declare_queue()
while True:
basic_receiver_object.consume_message()
time.sleep(2)
#This is the rabbitmqConsumer
#!/usr/bin/env python
import pika
import ssl
import json
from slaveConsumer import slave
class BasicMessageReceiver:
def __init__(self):
# SSL Context for TLS configuration of Amazon MQ for RabbitMQ
ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
url = <url for the queue>
parameters = pika.URLParameters(url)
parameters.ssl_options = pika.SSLOptions(context=ssl_context)
self.connection = pika.BlockingConnection(parameters)
self.channel = self.connection.channel()
# worker-queue object
self.slave_object = slave()
self.slave_object.start_task()
def declare_queue(self, queue_name=“abc”):
print(f"Trying to declare queue inside consumer({queue_name})...")
self.channel.queue_declare(queue=queue_name, durable=True)
def close(self):
print("Closing Receiver")
self.channel.close()
self.connection.close()
def _consume_message_setup(self, queue_name):
def message_consume(ch, method, properties, body):
print(f"I am inside the message_consume")
message = json.loads(body)
self.slave_object.execute_task(message)
ch.basic_ack(delivery_tag=method.delivery_tag)
self.channel.basic_qos(prefetch_count=1)
self.channel.basic_consume(on_message_callback=message_consume,
queue=queue_name)
def consume_message(self, queue_name=“abc”):
print("I am starting the rabbitmq start_consuming")
self._consume_message_setup(queue_name)
self.channel.start_consuming()
#This is the slaveConsumer
#!/usr/bin/env python
import pika
import ssl
import json
import requests
import threading
import queue
import os
class slave:
def __init__(self):
self.job_queue = queue.Queue(maxsize=3)
self.job_item = ""
def start_task(self):
def _worker():
while True:
json_body = self.job_queue.get()
self._parse_object_from_queue(json_body)
self.job_queue.task_done()
threading.Thread(target=_worker, daemon=True).start()
def execute_task(self, obj):
print("Inside execute_task")
self.job_item = obj
self.job_queue.put(self.job_item)
# print(self.job_queue.queue)
def _parse_object_from_queue(self, json_body):
if bool(json_body[‘entity’]):
if json_body['entity'] == 'Hello':
print("Inside Slave: Hello")
elif json_body['entity'] == 'World':
print("Inside Slave: World")
self.job_queue.join()

Related

Wait for message using python's async protocol

Into:
I am working in a TCP server that receives events over TCP. For this task, I decided to use asyncio Protocol libraries (yeah, maybe I should have used Streams), the reception of events works fine.
Problem:
I need to be able to connect to the clients, so I create another "server" used to look up all my connected clients, and after finding the correct one, I use the Protocol class transport object to send a message and try to grab the response by reading a buffer variable that always has the last received message.
My problem is, after sending the message, I don't know how to wait for the response, so I always get the previous message from the buffer.
I will try to simplify the code to illustrate (please, keep in mind that this is an example, not my real code):
import asyncio
import time
CONN = set()
class ServerProtocol(asyncio.Protocol):
def connection_made(self, transport):
self.transport = transport
CONN.add(self)
def data_received(self, data):
self.buffer = data
# DO OTHER STUFF
print(data)
def connection_lost(self, exc=None):
CONN.remove(self)
class ConsoleProtocol(asyncio.Protocol):
def connection_made(self, transport):
self.transport = transport
# Get first value just to ilustrate
self.client = next(iter(CONN))
def data_received(self, data):
# Forward the message to the client
self.client.transport.write(data)
# wait a fraction of a second
time.sleep(0.2)
# foward the response of the client
self.transport.write(self.client.buffer)
def main():
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(
loop.create_server(protocol_factory=ServerProtocol,
host='0.0.0.0',
port=6789))
loop.run_until_complete(
loop.create_server(protocol_factory=ConsoleProtocol,
host='0.0.0.0',
port=9876))
try:
loop.run_forever()
except Exception as e:
print(e)
finally:
loop.close()
if __name__ == '__main__':
main()
This is not only my first experience writing a TCP server, but is also my first experience working with parallelism. So it took me days to realize that my sleep not only would not work, but I was locking the server while it "sleeps".
Any help is welcome.
time.sleep(0.2) is blocking, should not used in async programming, which will block the whole execution, if your program runing with 100 clients, the last client will be delayed for 0.2*99 seconds, which is not what you want.
the right way is trying to let program wait 0.2s but not blocking, then other concurrent clients would not be delayed,we can use thread.
import asyncio
import time
import threading
CONN = set()
class ServerProtocol(asyncio.Protocol):
def dealy_thread(self):
time.sleep(0.2)
def connection_made(self, transport):
self.transport = transport
CONN.add(self)
def data_received(self, data):
self.buffer = data
# DO OTHER STUFF
print(data)
def connection_lost(self, exc=None):
CONN.remove(self)
class ConsoleProtocol(asyncio.Protocol):
def connection_made(self, transport):
self.transport = transport
# Get first value just to ilustrate
self.client = next(iter(CONN))
def data_received(self, data):
# Forward the message to the client
self.client.transport.write(data)
# wait a fraction of a second
thread = threading.Thread(target=self.delay_thread, args=())
thread.daemon = True
thread.start()
# foward the response of the client
self.transport.write(self.client.buffer)

Handling a lot of concurrent connections in Python 3 asyncio

Iam trying to improve the performance of my application. It is a Python3.6 asyncio.Protocol based TCP server (SSL wrapped) handling a lot of requests.
It works fine and the performance is acceptable when only one connection is active, but as soon as another connection is opened, the client part of the application slows down. This is really noticeable once there are 10-15 client connection.
Is there a way to properly handle requests in parallel or should I resort to running multiple server instances?
/edit Added code
main.py
if __name__ == '__main__':
import package.server
server = package.server.TCPServer()
server.join()
package.server
import multiprocessing, asyncio, uvloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
from package.connection import Connection
class TCPServer(multiprocessing.Process):
name = 'tcpserver'
def __init__(self, discord_queue=None):
multiprocessing.Process.__init__(self)
self.daemon = True
# some setup in here
self.start()
def run(self):
loop = uvloop.new_event_loop()
self.loop = loop
# db setup, etc
server = loop.create_server(Connection, HOST, PORT, ssl=SSL_CONTEXT)
loop.run_until_complete(server)
loop.run_forever()
package.connection
import asyncio, hashlib, os
from time import sleep, time as timestamp
class Connection(asyncio.Protocol):
connections = {}
def setup(self, peer):
self.peer = peer
self.ip, self.port = self.peer[0], self.peer[1]
self.buffer = []
#property
def connection_id(self):
if not hasattr(self, '_connection_id'):
self._connection_id = hashlib.md5('{}{}{}'.format(self.ip, self.port, timestamp()).encode('utf-8')).hexdigest()
return self._connection_id
def connection_lost(self, exception):
del Connection.connections[self.connection_id]
def connection_made(self, transport):
self.transport = transport
self.setup(transport.get_extra_info('peername'))
Connection.connections[self.connection_id] = self
def data_received(self, data):
# processing, average server side execution time is around 30ms
sleep(0.030)
self.transport.write(os.urandom(64))
The application runs on Debian 9.9 and is started via systemd
To "benchmark" I use this script:
import os, socket
from multiprocessing import Pool
from time import time as timestamp
def foobar(i):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('127.0.0.1', 60000))
while True:
ms = timestamp()*1000
s.send(os.urandom(128))
s.recv(1024*2)
print(i, timestamp()*1000-ms)
if __name__ == '__main__':
instances = 4
with Pool(instances) as p:
print(p.map(foobar, range(0, instances)))
To answer my own question here. I went with a solution that spawned multiple instances which were listening on base_port + x and I put a nginx TCP loadbalancer in front of it.
The individual TCPServer instances are still spawned as own process and communicate among themselves via a separate UDP connection and with the main process via multiprocessing.Queue.
While this does not "fix" the problem, it provides a somewhat scalable solution for my very specific problem.

How to name thread for logging with concurrent.futures?

I am creating a webscraper that would scrape from multiple domains in different threads. As there are many different domains, I would like to be able to search logged info per each thread.
UPDATE: solution implemented in code. Follow # SOLUTION lines
The script has been set up as follows:
import logging
from queue import Queue, Empty
from threading import current_thread # SOLUTION
from concurrent.futures import ThreadPoolExecutor
logging.basicConfig(
format='%(threadName)s %(levelname)s: %(message)s',
level=logging.INFO
)
class Scraper:
def __init__(self, max_workers):
self.pool = ThreadPoolExecutor(max_workers = max_workers, thread_name_prefix='T')
self.to_crawl = Queue()
for task in self.setup_tasks(tasks=max_workers):
logging.info('Putting task to queue:\n{}'.format(task))
self.to_crawl.put(task)
logging.info('Queue size after init: {}'.format(self.to_crawl.qsize()))
def setup_tasks(self, cur, tasks=1):
# Prepare tasks for the queue
def run_task(self, task):
# Function for executing the task
current_thread().name = task['id'] # SOLUTION
logging.info('Executing task:\n{}'.format(task))
id = task['id'] # I want the task id to be reflected in the logging function for when run_task runds
def run_scraper(self):
while True:
logging.info('Launching new thread, queue size is {}'.format(self.to_crawl.qsize()))
try:
task = self.to_crawl.get()
self.pool.submit(self.run_task, task)
except Empty:
break
if __name__ == '__main__':
s = Scraper(max_workers=3)
s.run_scraper()
I would like to add the task['id'] to the logging formatting configuration instead of the given %(threadName)s without doing it manually each time the script logs something in run_task
Is there a way to assign task['id'] to the thread %(threadName)s when the thread takes the task in run_scraper?

Python Tornado: consuming external Queue from not coroutine

I have the following situation: Using python 3.6 and Tornado 5.1 to receive client requests by web socket. Some of these requests require you to invoke an external processing, which returns a queue and then deposits results periodically in it. These results are transmitted via websocket to the clients.
External processing is NOT a coroutine, so I invoke it using run_in_executor.
My problem:
When the response time of the external processing is very large, the run_in_executor reaches the maximum number of workers (default: number of processors x 5)!
Is it safe to increase the maximum number of workers?
Or is another solution recommended? !!
Below a simplified code.
From already thank you very much!!!!
#########################
## SERVER CODE ##
#########################
from random import randint
import tornado.httpserver
import tornado.websocket
import tornado.ioloop
import tornado.web
from random import randint
from tornado import gen
import threading
import asyncio
import queue
import time
class WSHandler(tornado.websocket.WebSocketHandler):
"""entry point for all WS request"""
def open(self):
print('new connection. Request: ' + str(self.request))
async def on_message(self, message):
# Emulates the subscription to an external object
# that returns a queue to listen
producer = Producer()
q = producer.q
while True:
rta = await tornado.ioloop.IOLoop.current().run_in_executor(None, self.loop_on_q, q)
if rta != None:
await self.write_message(str(rta))
else:
break
def on_close(self):
print('connection closed. Request: ' + str(self.request) +
'. close_reason: ' + str(self.close_reason) +
'. close_code: ' + str(self.close_code) +
'. get_status: ' + str(self.get_status()))
def loop_on_q(self, q):
rta = q.get()
return rta
class Producer:
def __init__(self):
self.q = queue.Queue()
t = threading.Thread(target=self.start)
t.daemon = True
t.start()
def start(self):
count = 1
while True:
# time.sleep(randint(1,5))
if count < 100:
self.q.put(count)
else:
self.q.put(None)
break
time.sleep(50)
count += 1
application = tornado.web.Application([
(r'/ws', WSHandler),
])
if __name__ == "__main__":
asyncio.set_event_loop(asyncio.new_event_loop())
http_server = tornado.httpserver.HTTPServer(application)
http_server.listen(8888)
print('SRV START')
tornado.ioloop.IOLoop.instance().instance().start()
#########################
## CLIENT CODE ##
#########################
# If you run it more than 20 times in less than 50 seconds ==> Block
# (number of processors x 5), I have 4 cores
from websocket import create_connection
def conect():
url = 'ws://localhost:8888/ws'
ws = create_connection(url)
print('Conecting')
return ws
print('Conecting to srv')
con_ws = conect()
print('Established connection. Sending msg ...')
msj = '{"type":"Socket"}'
con_ws.send(msj)
print('Package sent. Waiting answer...')
while True:
result = con_ws.recv()
print('Answer: ' + str(result))
Is it safe to increase the maximum number of workers Yes, up to a certain fixed amount which can be calculated with load testing.
Or is another solution recommended? If you reach workers limit you can move workers to multiple separated servers (this approach is called horizontal scaling) and pass jobs to them with a message queue. See Celery as a batteries-included-solution or RabbitMQ, Kafka etc. if you prefer to write everything by yourself.

websockets, asyncio and PyQt5 together at last. Is Quamash necessary?

I've been working on a client that uses PyQt5 and the websockets module which is built around asyncio. I thought that something like the code below would work but I'm finding that the incoming data (from the server) is not being updated in the GUI until I click enter in the line edit box. Those incoming messages are intended to set the pulse for the updates to the GUI and will carry data to be used for updating. Is quamash a better way to approach this? btw, I will be using processes for some other aspects of this code so I don't consider it overkill (at this point).
This is Python 3.6, PyQt5.6(or higher) and whatever version of websockets that currently installs with pip. https://github.com/aaugustin/websockets
The client:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import asyncio
import websockets
import sys
import time
from multiprocessing import Process, Pipe, Queue
from PyQt5 import QtCore, QtGui, QtWidgets
class ComBox(QtWidgets.QDialog):
def __init__(self):
QtWidgets.QDialog.__init__(self)
self.verticalLayout = QtWidgets.QVBoxLayout(self)
self.groupBox = QtWidgets.QGroupBox(self)
self.groupBox.setTitle( "messages from beyond" )
self.gridLayout = QtWidgets.QGridLayout(self.groupBox)
self.label = QtWidgets.QLabel(self.groupBox)
self.gridLayout.addWidget(self.label, 0, 0, 1, 1)
self.verticalLayout.addWidget(self.groupBox)
self.lineEdit = QtWidgets.QLineEdit(self)
self.verticalLayout.addWidget(self.lineEdit)
self.lineEdit.editingFinished.connect(self.enterPress)
#QtCore.pyqtSlot()
def enterPress(self):
mytext = str(self.lineEdit.text())
self.inputqueue.put(mytext)
#QtCore.pyqtSlot(str)
def updategui(self, message):
self.label.setText(message)
class Websocky(QtCore.QThread):
updatemaingui = QtCore.pyqtSignal(str)
def __init__(self):
super(Websocky, self).__init__()
def run(self):
while True:
time.sleep(.1)
message = self.outputqueue.get()
try:
self.updatemaingui[str].emit(message)
except Exception as e1:
print("updatemaingui problem: {}".format(e1))
async def consumer_handler(websocket):
while True:
try:
message = await websocket.recv()
outputqueue.put(message)
except Exception as e1:
print(e1)
async def producer_handler(websocket):
while True:
message = inputqueue.get()
await websocket.send(message)
await asyncio.sleep(.1)
async def handler():
async with websockets.connect('ws://localhost:8765') as websocket:
consumer_task = asyncio.ensure_future(consumer_handler(websocket))
producer_task = asyncio.ensure_future(producer_handler(websocket))
done, pending = await asyncio.wait(
[consumer_task, producer_task],
return_when=asyncio.FIRST_COMPLETED, )
for task in pending:
task.cancel()
def start_websockets():
loop = asyncio.get_event_loop()
loop.run_until_complete(handler())
inputqueue = Queue()
outputqueue = Queue()
app = QtWidgets.QApplication(sys.argv)
comboxDialog = ComBox()
comboxDialog.inputqueue = inputqueue
comboxDialog.outputqueue = outputqueue
comboxDialog.show()
webster = Websocky()
webster.outputqueue = outputqueue
webster.updatemaingui[str].connect(comboxDialog.updategui)
webster.start()
p2 = Process(target=start_websockets)
p2.start()
sys.exit(app.exec_())
The server:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import asyncio
import time
import websockets
# here we'll store all active connections to use for sending periodic messages
connections = []
##asyncio.coroutine
async def connection_handler(connection, path):
connections.append(connection) # add connection to pool
while True:
msg = await connection.recv()
if msg is None: # connection lost
connections.remove(connection) # remove connection from pool, when client disconnects
break
else:
print('< {}'.format(msg))
##asyncio.coroutine
async def send_periodically():
while True:
await asyncio.sleep(2) # switch to other code and continue execution in 5 seconds
for connection in connections:
message = str(round(time.time()))
print('> Periodic event happened.')
await connection.send(message) # send message to each connected client
start_server = websockets.serve(connection_handler, 'localhost', 8765)
asyncio.get_event_loop().run_until_complete(start_server)
asyncio.ensure_future(send_periodically()) # before blocking call we schedule our coroutine for sending periodic messages
asyncio.get_event_loop().run_forever()
Shortly after posting this question I realized the problem. The line
message = inputqueue.get()
in the producer_handler function is blocking. This causes what should be an async function to hang everything in that process until it sees something in the queue. My workaround was to use the aioprocessing module which provides asyncio compatible queues. So, it looks more like this:
import aioprocessing
async def producer_handler(websocket):
while True:
message = await inputqueue.coro_get()
await websocket.send(message)
await asyncio.sleep(.1)
inputqueue = aioprocessing.AioQueue()
The aioprocessing module provides some nice options and documentation. And in this case is a rather simple solution for the issue. https://github.com/dano/aioprocessing
So, to answer my question: No, you don't have to use quamash for this kind of thing.

Resources