Python3 paho-mqtt speed vs nodejs mqtt - node.js

I have conducted some speed tests for MQTT in Python3 and Node.js, using af QoS level 0 and have found Node.js to be remarkably faster than the Python3 implementation.
How can this be?
I'm open to using either framework as bridge on the server side to handle data from multiple clients. However, I'm losing my confidence that I should be using Python3 for anything on the server.
Running code snippets.
Python3:
import paho.mqtt.client as mqtt
import logging
import time
import threading
import json
import sys
class MqttAdaptor(threading.Thread):
def __init__(self,topic, type=None):
threading.Thread.__init__(self)
self.topic = topic
self.client = None
self.type = type
def run(self):
self.client = mqtt.Client(self.type)
self.client.on_connect = self.on_connect
self.client.on_disconnect = self.on_disconnect
if self.type is not None:
self.client.connect("localhost", 1883, 60)
self.client.on_message = self.on_message
self.client.loop_forever()
else:
self.client.connect_async("localhost", 1883, 60)
self.client.loop_start()
# The callback for when the client receives a CONNACK response from the server.
def on_connect(self,client, userdata, flags, rc):
self.client.subscribe(self.topic)
def on_disconnect(self, client, userdata, rc):
if rc != 0:
print("Unexpected disconnection from local MQTT broker")
# The callback for when a PUBLISH message is received from the server.
def on_message(self,client, userdata, msg):
jsonMsg = ""
try:
jsonMsg = json.loads(msg.payload)
if jsonMsg['rssi'] is not None:
jsonMsg['rssi'] = round(jsonMsg['rssi']*3.3 * 100000)/ 10000
except:
pass
print(json.dumps(jsonMsg))
def publish(self,topic, payload, qos=0,retain=False):
self.client.publish(topic,payload,qos,retain)
def close(self):
if self.client is not None:
self.client.loop_stop()
self.client.disconnect()
if __name__=="__main__":
topic = '/test/+/input/+'
subber = MqttAdaptor(topic,'sub')
subber.start()
topic = None
test = MqttAdaptor(topic)
test.run()
print("start")
while True:
data = sys.stdin.readline()
if not len(data):
print("BREAK")
break
msg = data.split('\t')
topic = msg[0]
test.publish(topic,msg[1],0)
print("done")
sys.exit(0)
Node.js:
"use strict";
const fs = require('fs');
const readline = require('readline');
const mqtt = require('mqtt');
const mqttClient = mqtt.connect();
mqttClient.on('connect', () => {
console.error('==== MQTT connected ====');
mqttClient.subscribe('/test/+/input/+');
});
mqttClient.on('close', () => {
console.error('==== MQTT closed ====');
});
mqttClient.on('error', (error) => {
console.error('==== MQTT error ' + error + ' ====');
});
mqttClient.on('offline', () => {
console.error('==== MQTT offline ====');
});
mqttClient.on('reconnect', () => {
console.error('==== MQTT reconnect ====');
});
mqttClient.on('message', (topic, message) => {
const topicSegments = topic.split('/');
topicSegments[topicSegments.length - 2] = 'done';
topic = topicSegments.join('/');
try {
//The message might not always be valid JSON
const json = JSON.parse(message);
//If rssi is null/undefined in input, it should be left untouched
if (json.rssi !== undefined && json.rssi !== null) {
//Multiply by 3 and limit the number of digits after comma to four
json.rssi = Math.round(json.rssi * 3.3 * 10000) / 10000;
}
console.log(topic + "\t" + JSON.stringify(json));
} catch (ex) {
console.error('Error: ' + ex.message);
}
});
const rl = readline.createInterface({
input: process.stdin,
terminal: false,
});
rl.on('line', (line) => {
const lineSegments = line.split("\t");
if (lineSegments.length >= 2) {
const topic = lineSegments[0];
const message = lineSegments[1];
mqttClient.publish(topic, message);
}
});
rl.on('error', () => {
console.error('==== STDIN error ====');
process.exit(0);
});
rl.on('pause', () => {
console.error('==== STDIN paused ====');
process.exit(0);
});
rl.on('close', () => {
console.error('==== STDIN closed ====');
process.exit(0);
});
Both script are run on the command line connecting to the same broker.
They are run using a scripting pipe (node):
time cat test-performance.txt | pv -l -L 20k -q | nodejs index.js | pv -l | wc -l
and (python):
time cat test-performance.txt | pv -l -L 20k -q | python3 mqttTestThread.py | pv -l | wc -l
The test file contains around 2Gb of text in this format:
/test/meny/input/test {"sensor":"A1","data1":"176","time":1534512473545}
As shown in the scripts, I count the number of lines during the time they run. For a small test the Python3 script has a throughput of roughly 3k/sec, while node has a throughput og roughly 20k/sec.
This is a big difference. Does anyone have an idea why? And/or how to get python to run with a comparable throughput?

There are multiple reasons why Node is faster for this task than Python. The main reason is: Python is slooooow. Only the libraries which are implemented in C like numpy or pandas are somewhat fast. But then also just for numeric tasks.
The second reason is, as Nhosko mentioned in a comment, that Node is per default async and therefore faster in I/O bound tasks.
A potential third reason could be that MQTT sends JSON data. JSON stands for Java-Script-Object-Notation and can be natively converted into NodeJS objects.
I wouldn't recommend you to use Python for this task. Python is great for machine learning and data science. For server and I/O bound tasks you may consider using Node or Go.

Related

Making pycurl request in asyncio websocket server setup

Currently, I have a simple websocket server that can handle recv and send operations. The code is as such.
async def recv_handler(websocket):
while True:
try:
message = await websocket.recv()
print(message)
except Exception as e:
print(e)
await asyncio.sleep(0.01)
async def send_handler(websocket):
while True:
try:
data = {
"type": "send",
"time": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
}
await websocket.send(json.dumps(data))
except Exception as e:
print(e)
await asyncio.sleep(0.01)
async def main(websocket):
while True:
recv_task = asyncio.create_task(recv_handler(websocket))
send_task = asyncio.create_task(send_handler(websocket))
await asyncio.gather(recv_task, send_task)
async def start_server():
server = await websockets.serve(main, "", 3001)
await server.wait_closed()
if __name__ == "__main__":
asyncio.run(start_server())
This successfully runs a server, hand can handle message sent from a client node.js application using websockets as well as send updates to the client node.js application periodically.
// receive a message from the server
socket.addEventListener("message", ({ data }) => {
const packet = JSON.parse(data);
switch (packet.type) {
case "send":
console.log(packet.time)
break;
default:
break;
}
});
// send message to server
const onClickSend = () => {
if (socket.readyState !== WebSocket.OPEN) {
console.log("socket not open");
return;
} else {
socket.send(JSON.stringify({
type: "hello from client",
}));
}
}
Now, I want to include a blocking function call that sends a pycurl (or any http) request, then use the result of that pycurl request, package it into the json object, and send that to the client.
I have a sample pycurl requst that gets the weather from wttr.in
def getWeather():
# Creating a buffer as the cURL is not allocating a buffer for the network response
buffer = BytesIO()
c = pycurl.Curl()
#initializing the request URL
c.setopt(c.URL, 'wttr.in/Beijing?format="%l:+\%c+%t+%T\\n"')
#setting options for cURL transfer
c.setopt(c.WRITEDATA, buffer)
#setting the file name holding the certificates
c.setopt(c.CAINFO, certifi.where())
# perform file transfer
c.perform()
#Ending the session and freeing the resources
c.close()
#retrieve the content BytesIO
body = buffer.getvalue()
#decoding the buffer
return body.decode('utf-8')
So if we change data to include or weather,
date = {
"type" : "send",
"weather" : getWeather(),
}
and we can slightly change the node.js application case statement to print
case "send":
console.log(packet.weather)
The problem with this, I believe, is that we are making a blocking request, but I don't know enough on how to fix the problem. Currently, I can make requests, but every time the "onClickSend" is called (by pressing a button in frontend", now, we get an error saying that the "socket not open", meaning the backend is no longer handling receive messages.
So how do I handle pycurl requests in asyncio-websocket program?

How to use flask-socketio for heartbeat detection

How to use flask-socketio for heartbeat detection.
The current requirement is to create an interface. When the client connects to the corresponding interface, the server periodically sends a message to the client, the client response a message to server, it proves that the client is alive, heartbeat detection.
I have tried to write the code using flask-socketio and it runs successfully, but I don't know how to test the multiplayer connection.If the connection is broken, I don't know how the thread stops.
class MyCustomNamespace(Namespace):
clients = []
def on_connect(self):
sid = request.sid
print('client connected websocket: {}'.format(sid))
self.clients.append(sid)
global thread
with thread_lock:
if thread is None:
thread = socketio.start_background_task(target=self.background_thread, args=(current_app._get_current_object()))
def on_disconnect(self):
sid = request.sid
print('close websocket: {}'.format(sid))
self.clients.remove(sid)
def on_message(self, data):
print('received message: ' + data["param"])
if data["param"] is not None:
print('websocket {} is alive'.format(request.sid))
# else:
# print('websocket {} is die'.format(*rooms()))
# self.disconnect(*rooms())
def background_thread(self, app):
with app.app_context():
while True:
time.sleep(6)
data = {"status": "ok"}
socketio.emit('server_detect', data, namespace='/testnamespace', broadcast=True)
print("服务器探测客户端是否还活着")
socketio.on_namespace(MyCustomNamespace("/testnamespace"))
html
$(document).ready(function () {
namespace = '/testnamespace';
webSocketUrl = location.protocol + '//' + document.domain + ':' + location.port + namespace;
console.log(webSocketUrl);
var socket = io.connect(webSocketUrl);
socket.on('server_detect', function (res) {
console.log(res);
socket.emit('message',{'param':'value'});
});
});

How can we use microphone in google colab?

OSError Traceback (most recent call last)
<ipython-input-21-4159a88154c9> in <module>()
7 response = google_images_download.googleimagesdownload()
8 r = sr.Recognizer()
----> 9 with sr.Microphone() as source:
10 print("Say something!")
11 audio = r.listen(source)
/usr/local/lib/python3.6/dist-packages/speech_recognition/init.py in init(self, device_index, sample_rate, chunk_size)
84 assert 0 <= device_index < count, "Device index out of range ({} devices available; device index should be between 0 and {} inclusive)".format(count, count - 1)
85 if sample_rate is None: # automatically set the sample rate to the hardware's default sample rate if not specified
---> 86 device_info = audio.get_device_inf o_by_index(device_index) if device_index is not None else audio.get_default_input_device_info()
87 assert isinstance(device_info.get("defaultSampleRate"), (float, int)) and device_info["defaultSampleRate"] > 0, "Invalid device info returned from PyAudio: {}".format(device_info)
88 sample_rate = int(device_info["defaultSampleRate"])
Here's an example that shows how to access a user's camera and microphone:
https://colab.research.google.com/notebooks/snippets/advanced_outputs.ipynb#scrollTo=2viqYx97hPMi
The snippet you linked above attempts to access a microphone in Python. That won't work because there's no microphone attached to the virtual machine which executes Python code in Colab.
Instead, you want to access the microphone of the computer running the web browser. Then, capture data there, and pass it back to the virtual machine for processing in Python.
That's what's shown in the snippet linked above.
Here is a simple snippet
from IPython.display import HTML, Audio
from google.colab.output import eval_js
from base64 import b64decode
import numpy as np
from scipy.io.wavfile import read as wav_read
import io
import ffmpeg
AUDIO_HTML = """
<script>
var my_div = document.createElement("DIV");
var my_p = document.createElement("P");
var my_btn = document.createElement("BUTTON");
var t = document.createTextNode("Press to start recording");
my_btn.appendChild(t);
//my_p.appendChild(my_btn);
my_div.appendChild(my_btn);
document.body.appendChild(my_div);
var base64data = 0;
var reader;
var recorder, gumStream;
var recordButton = my_btn;
var handleSuccess = function(stream) {
gumStream = stream;
var options = {
//bitsPerSecond: 8000, //chrome seems to ignore, always 48k
mimeType : 'audio/webm;codecs=opus'
//mimeType : 'audio/webm;codecs=pcm'
};
//recorder = new MediaRecorder(stream, options);
recorder = new MediaRecorder(stream);
recorder.ondataavailable = function(e) {
var url = URL.createObjectURL(e.data);
var preview = document.createElement('audio');
preview.controls = true;
preview.src = url;
document.body.appendChild(preview);
reader = new FileReader();
reader.readAsDataURL(e.data);
reader.onloadend = function() {
base64data = reader.result;
//console.log("Inside FileReader:" + base64data);
}
};
recorder.start();
};
recordButton.innerText = "Recording... press to stop";
navigator.mediaDevices.getUserMedia({audio: true}).then(handleSuccess);
function toggleRecording() {
if (recorder && recorder.state == "recording") {
recorder.stop();
gumStream.getAudioTracks()[0].stop();
recordButton.innerText = "Saving the recording... pls wait!"
}
}
// https://stackoverflow.com/a/951057
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
var data = new Promise(resolve=>{
//recordButton.addEventListener("click", toggleRecording);
recordButton.onclick = ()=>{
toggleRecording()
sleep(2000).then(() => {
// wait 2000ms for the data to be available...
// ideally this should use something like await...
//console.log("Inside data:" + base64data)
resolve(base64data.toString())
});
}
});
</script>
"""
def get_audio():
display(HTML(AUDIO_HTML))
data = eval_js("data")
binary = b64decode(data.split(',')[1])
process = (ffmpeg
.input('pipe:0')
.output('pipe:1', format='wav')
.run_async(pipe_stdin=True, pipe_stdout=True, pipe_stderr=True, quiet=True, overwrite_output=True)
)
output, err = process.communicate(input=binary)
riff_chunk_size = len(output) - 8
# Break up the chunk size into four bytes, held in b.
q = riff_chunk_size
b = []
for i in range(4):
q, r = divmod(q, 256)
b.append(r)
# Replace bytes 4:8 in proc.stdout with the actual size of the RIFF chunk.
riff = output[:4] + bytes(b) + output[8:]
sr, audio = wav_read(io.BytesIO(riff))
return audio, sr
then run this
audio, sr = get_audio()
you might need to install this one
!pip install ffmpeg-python

Python multiprocessing within node.js - Prints on sub process not working

I have a node.js application that runs a client interface which exposes action that triggers machine-learn tasks. Since python is a better choice when implementing machine-learn related stuff, I've implemented a python application that runs on demand machine learning tasks.
Now, I need to integrate both applications. It has been decided that we need to use a single (AWS) instance to integrate both applications.
One way found to do such integration was using python-shell node module. There, the communications between Python and Node are done by stdin and stdout.
On node I have something like this:
'use strict';
const express = require('express');
const PythonShell = require('python-shell');
var app = express();
app.listen(8000, function () {
console.log('Example app listening on port 8000!');
});
var options = {
mode: 'text',
pythonPath: '../pythonapplication/env/Scripts/python.exe',
scriptPath: '../pythonapplication/',
pythonOptions: ['-u'], // Unbuffered
};
var pyshell = new PythonShell('start.py', options);
pyshell.on('message', function (message) {
console.log(message);
});
app.get('/task', function (req, res) {
pyshell.send('extract-job');
});
app.get('/terminate', function (req, res) {
pyshell.send('terminate');
pyshell.end(function (err, code, signal) {
console.log(err)
console.log(code)
console.log(signal);
});
});
On python, I have a main script which loads some stuff and the calls a server script, that runs forever reading lines with sys.stdin.readline() and then executes the corresponding task.
start.py is:
if __name__ == '__main__':
# data = json.loads(sys.argv[1])
from multiprocessing import Manager, Pool
import logging
import provider, server
# Get logging setup objects
debug_queue, debug_listener = provider.shared_logging(logging.DEBUG, 'python-server-debug.log')
info_queue, info_listener = provider.shared_logging(logging.INFO, 'python-server.log')
logger = logging.getLogger(__name__)
# Start logger listener
debug_listener.start()
info_listener.start()
logger.info('Initializing pool of workers...')
pool = Pool(initializer=provider.worker, initargs=[info_queue, debug_queue])
logger.info('Initializing server...')
try:
server.run(pool)
except (SystemError, KeyboardInterrupt) as e:
logger.info('Execution terminated without errors.')
except Exception as e:
logger.error('Error on main process:', exc_info=True)
finally:
pool.close()
pool.join()
debug_listener.stop()
info_listener.stop()
print('Done.')
Both info_queue and debug_queue are multiprocessing.Queue to handle multiprocessing logging. If I run my python application as standalone, everything works fine, even when using the pool of workers (logs get properly logged, prints, get properly printed...)
But, if I try to run using python-shell, only my main process prints and logs get printed and logged correctly... Every message (print or log) from my pool of workers get held until I terminate the python script.
In other words, every message will get held until the finally step on server.py run...
Does anyone has any insights on this issue? Have you guys heard about python-bridge module? Is it a better solution? Can you suggest a better approach for such integration that does not uses two separated servers?
Here I post my real provider script, and a quick mock I did for the server script (the real one has too much stuff)
mock server.py:
import json
import logging
import multiprocessing
import sys
import time
from json.decoder import JSONDecodeError
from threading import Thread
def task(some_args):
logger = logging.getLogger(__name__)
results = 'results of machine learn task goes here, as a string'
logger.info('log whatever im doing')
# Some machine-learn task...
logger.info('Returning results.')
return results
def answer_node(message):
print(message)
# sys.stdout.write(message)
# sys.stdout.flush()
def run(pool, recrutai, job_pool, candidate_queue):
logger = logging.getLogger(__name__)
workers = []
logger.info('Server is ready and waiting for commands')
while True:
# Read input stream
command = sys.stdin.readline()
command = command.split('\n')[0]
logger.debug('Received command: %s', command)
if command == 'extract-job':
logger.info(
'Creating task.',
)
# TODO: Check data attributes
p = pool.apply_async(
func=task,
args=('args'),
callback=answer_node
)
# What to do with workers array?!
workers.append(p)
elif command == 'other-commands':
pass
# Other task here
elif command == 'terminate':
raise SystemError
else:
logger.warn(
'Received an invalid command %s.',
command
)
my provider.py:
import logging
import os
from logging.handlers import QueueHandler, QueueListener
from multiprocessing import Queue
def shared_logging(level, file_name):
# Create main logging file handler
handler = logging.FileHandler(file_name)
handler.setLevel(level)
# Create logging format
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
# Create queue shared between all process to centralize logging features
logger_queue = Queue() # multiprocessing.Queue
# Create logger queue listener to send records from logger_queue to handler
logger_listener = QueueListener(logger_queue, handler)
return logger_queue, logger_listener
def process_logging(info_queue, debug_queue, logger_name=None):
# Create logging queue handlers
debug_queue_handler = QueueHandler(debug_queue)
debug_queue_handler.setLevel(logging.DEBUG)
info_queue_handler = QueueHandler(info_queue)
info_queue_handler.setLevel(logging.INFO)
# Setup level of process logger
logger = logging.getLogger()
if logger_name:
logger = logging.getLogger(logger_name)
logger.setLevel(logging.DEBUG)
# Add handlers to the logger
logger.addHandler(debug_queue_handler)
logger.addHandler(info_queue_handler)
def worker(info_queue, debug_queue):
# Setup worker process logging
process_logging(info_queue, debug_queue)
logging.debug('Process %s initialized.', os.getpid())

Sending RabbitMQ messages via websockets

Looking for some code samples to solve this problem :-
Would like to write some code (Python or Javascript) that would act as a subscriber to a RabbitMQ queue so that on receiving a message it would broadcast the message via websockets to any connected client.
I've looked at Autobahn and node.js (using "amqp" and "ws" ) but cannot get things to work as needed. Here's the server code in javascript using node.js:-
var amqp = require('amqp');
var WebSocketServer = require('ws').Server
var connection = amqp.createConnection({host: 'localhost'});
var wss = new WebSocketServer({port:8000});
wss.on('connection',function(ws){
ws.on('open', function() {
console.log('connected');
ws.send(Date.now().toString());
});
ws.on('message',function(message){
console.log('Received: %s',message);
ws.send(Date.now().toString());
});
});
connection.on('ready', function(){
connection.queue('MYQUEUE', {durable:true,autoDelete:false},function(queue){
console.log(' [*] Waiting for messages. To exit press CTRL+C')
queue.subscribe(function(msg){
console.log(" [x] Received from MYQUEUE %s",msg.data.toString('utf-8'));
payload = msg.data.toString('utf-8');
// HOW DOES THIS NOW GET SENT VIA WEBSOCKETS ??
});
});
});
Using this code, I can successfully subscribe to a queue in Rabbit and receive any messages that are sent to the queue. Similarly, I can connect a websocket client (e.g. a browser) to the server and send/receive messages. BUT ... how can I send the payload of the Rabbit queue message as a websocket message at the point indicated ("HOW DOES THIS NOW GET SENT VIA WEBSOCKETS") ? I think it's something to do with being stuck in the wrong callback or they need to be nested somehow ...?
Alternatively, if this can be done easier in Python (via Autobahn and pika) that would be great.
Thanks !
One way to implement your system is use python with tornado.
Here the server:
import tornado.ioloop
import tornado.web
import tornado.websocket
import os
import pika
from threading import Thread
clients = []
def threaded_rmq():
connection = pika.BlockingConnection(pika.ConnectionParameters("localhost"));
print 'Connected:localhost'
channel = connection.channel()
channel.queue_declare(queue="my_queue")
print 'Consumer ready, on my_queue'
channel.basic_consume(consumer_callback, queue="my_queue", no_ack=True)
channel.start_consuming()
def consumer_callback(ch, method, properties, body):
print " [x] Received %r" % (body,)
for itm in clients:
itm.write_message(body)
class SocketHandler(tornado.websocket.WebSocketHandler):
def open(self):
print "WebSocket opened"
clients.append(self)
def on_message(self, message):
self.write_message(u"You said: " + message)
def on_close(self):
print "WebSocket closed"
clients.remove(self)
class MainHandler(tornado.web.RequestHandler):
def get(self):
print "get page"
self.render("websocket.html")
application = tornado.web.Application([
(r'/ws', SocketHandler),
(r"/", MainHandler),
])
if __name__ == "__main__":
thread = Thread(target = threaded_rmq)
thread.start()
application.listen(8889)
tornado.ioloop.IOLoop.instance().start()
and here the html page:
<html>
<head>
<script src="//code.jquery.com/jquery-1.11.0.min.js"></script>
<script>
$(document).ready(function() {
var ws;
if ('WebSocket' in window) {
ws = new WebSocket('ws://localhost:8889/ws');
}
else if ('MozWebSocket' in window) {
ws = new MozWebSocket('ws://localhost:8889/ws');
}
else {
alert("<tr><td> your browser doesn't support web socket </td></tr>");
return;
}
ws.onopen = function(evt) { alert("Connection open ...")};
ws.onmessage = function(evt){
alert(evt.data);
};
function closeConnect(){
ws.close();
}
});
</script>
</head>
<html>
So when you publish a message to "my_queue" the message is redirects to all web page connected.
I hope it can be useful
EDIT**
Here https://github.com/Gsantomaggio/rabbitmqexample you can find the complete example

Resources