PyZMQ segmentation fault, messages not arriving after restart via bash script - multithreading

im currently facing an issue in which a proxy throws a segmentation error after a random period of time. Restarting the proxy with a bash script leads to messages not arriving.
I was sadly not able to recreate the issue. I am aware that this most likely is related to a partly wrong utilization of zmq and that the error gets thrown by c python.
The System runs 3 diffrent processes.
process, which task is to handle send data, which is a subscriber
context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.connect("tcp://127.0.0.1:8100")
socket.setsockopt(zmq.SUBSCRIBE, b'')
poller = zmq.Poller()
poller.register(socket, zmq.POLLIN)
while True:
try:
socks = dict(poller.poll())
if socket in socks and socks[socket] == zmq.POLLIN:
action, values = socket.recv_pyobj()
##### handling of data #######
except Exception as e:
print(e)
process, being a proxy
def main():
context = zmq.Context()
# Socket facing clients
frontend = context.socket(zmq.XSUB)
frontend.bind("tcp://127.0.0.1:5557")
# Socket facing services
backend = context.socket(zmq.XPUB)
backend.bind("tcp://127.0.0.1:8100")
print("starting broker...")
while True:
try:
zmq.proxy(frontend, backend)
except KeyboardInterrupt:
print("stopping broker...")
frontend.close()
backend.close()
context.term()
quit()
except Exception as e:
print(f"failed with {e}")
if __name__ == "__main__":
main()
process running multiple threads which publish their data.
socket_pub = context.socket(zmq.PUB)
socket_pub.connect("tcp://127.0.0.1:5557")
while True:
# pre processing f.e. outgoing requests
######
# sending results
message = ["Action", {"foo": "bar"}]
socket_pub.send_pyobj(message)
As i was not able to recreate the error and therefor was not able to fix it i am trying to bypass it using a bash script.
The segementation error gets thrown in process nr. 2 (the proxy).
Thus the bash script simply restarts it.
#!/bin/bash
until python3 process2.py; do
echo "bridge broker crashed with exit code $?. Respawning.." >&2
sleep 1
done
The bash script correctly respawns the process if it died due to a segmentation fault.
But notifications from process 3 are not arriving in process 1 anymore.
I was not able to track down why this is happening. I rebuild it localy and if i manually restarted the proxy (without a segmentation fault) the messages directly arrived again.
Does anybody have a clue why this is happening or do i have to find the initial reason for the segmentation fault?

Related

How to implement custom timeout for function that connects to server

I want to establish a connection with a b0 client for Coppelia Sim using the Python API. Unfortunately, this connection function does not have a timeout and will run indefinitely if it fails to connect.
To counter that, I tried moving the connection to a separate process (multiprocessing) and check after a couple of seconds, whether the process is still alive. If it still is, I kill the process and continue with the program.
This sort of works, as it does not block my program anymore, however the process does not stop when the connection is successfully made, so it kills the process, even when the connection succeeds.
How can I fix this and also write the b0client to the global variable?
def connection_function():
global b0client
b0client = b0RemoteApi.RemoteApiClient('b0RemoteApi_pythonClient', 'b0RemoteApi', 60)
print('Success!')
return 0
def establish_b0_connection(timeout):
connection_process = multiprocessing.Process(target=connection_function)
connection_process.start()
# Wait for [timeout] seconds or until process finishes
connection_process.join(timeout=timeout)
# If thread is still active
if connection_process.is_alive():
print('[INITIALIZATION OF B0 API CLIENT FAILED]')
# Terminate - may not work if process is stuck for good
connection_process.terminate()
# OR Kill - will work for sure, no chance for process to finish nicely however
# connection_process.kill()
connection_process.join()
print('[CONTINUING WITHOUT B0 API CLIENT]')
return False
else:
return True
if __name__ == '__main__':
b0client = None
establish_b0_connection(timeout=5)
# Continue with the code, with or without connection.
# ...

Direct communication between Javascript in Jupyter and server via IPython kernel

I'm trying to display an interactive mesh visualizer based on Three.js inside a Jupyter cell. The workflow is the following:
The user launches a Jupyter notebook, and open the viewer in a cell
Using Python commands, the user can manually add meshes and animate them interactively
In practice, the main thread is sending requests to a server via ZMQ sockets (every request needs a single reply), then the server sends back the desired data to the main thread using other socket pairs (many "request", very few replies expected), which finally uses communication through ipython kernel to send the data to the Javascript frontend. So far so good, and it works properly because the messages are all flowing in the same direction:
Main thread (Python command) [ZMQ REQ] -> [ZMQ REP] Server (Data) [ZMQ XREQ] -> [ZMQ XREQ] Main thread (Data) [IPykernel Comm] -> [Ipykernel Comm] Javascript (Display)
However, the pattern is different when I'm want to fetch the status of the frontend to wait for the meshes to finish loading:
Main thread (Status request) --> Server (Status request) --> Main thread (Waiting for reply)
| |
<--------------------------------Javascript (Processing) <--
This time, the server sends a request to the frontend, which in return does not send the reply directly back to the server, but to the main thread, that will forward the reply to the server, and finally to the main thread.
There is a clear issue: the main thread is supposed to jointly forward the reply of the frontend and receive the reply from the server, which is impossible. The ideal solution would be to enable the server to communicate directly with the frontend but I don't know how to do that, since I cannot use get_ipython().kernel.comm_manager.register_target on the server side. I tried to instantiate an ipython kernel client on the server side using jupyter_client.BlockingKernelClient, but I didn't manged to use it to communicate nor to register targets.
OK so I found a solution for now but it is not great. Indeed of just waiting for a reply and keep busy the main loop, I added a timeout and interleave it with do_one_iteration of the kernel to force to handle to messages:
while True:
try:
rep = zmq_socket.recv(flags=zmq.NOBLOCK).decode("utf-8")
except zmq.error.ZMQError:
kernel.do_one_iteration()
It works but unfortunately it is not really portable and it messes up with the Jupyter evaluation stack (all queued evaluations will be processed here instead of in order)...
Alternatively, there is another way that is more appealing:
import zmq
import asyncio
import nest_asyncio
nest_asyncio.apply()
zmq_socket.send(b"ready")
async def enforce_receive():
await kernel.process_one(True)
return zmq_socket.recv().decode("utf-8")
loop = asyncio.get_event_loop()
rep = loop.run_until_complete(enforce_receive())
but in this case you need to know in advance that you expect the kernel to receive exactly one message, and relying on nest_asyncio is not ideal either.
Here is a link to an issue on this topic of Github, along with an example notebook.
UPDATE
I finally manage to solve completely my issue, without shortcomings. The trick is to analyze every incoming messages. The irrelevant messages are put back in the queue in order, while the comm-related ones are processed on-the-spot:
class CommProcessor:
"""
#brief Re-implementation of ipykernel.kernelbase.do_one_iteration
to only handle comm messages on the spot, and put back in
the stack the other ones.
#details Calling 'do_one_iteration' messes up with kernel
'msg_queue'. Some messages will be processed too soon,
which is likely to corrupt the kernel state. This method
only processes comm messages to avoid such side effects.
"""
def __init__(self):
self.__kernel = get_ipython().kernel
self.qsize_old = 0
def __call__(self, unsafe=False):
"""
#brief Check once if there is pending comm related event in
the shell stream message priority queue.
#param[in] unsafe Whether or not to assume check if the number
of pending message has changed is enough. It
makes the evaluation much faster but flawed.
"""
# Flush every IN messages on shell_stream only
# Note that it is a faster implementation of ZMQStream.flush
# to only handle incoming messages. It reduces the computation
# time from about 10us to 20ns.
# https://github.com/zeromq/pyzmq/blob/e424f83ceb0856204c96b1abac93a1cfe205df4a/zmq/eventloop/zmqstream.py#L313
shell_stream = self.__kernel.shell_streams[0]
shell_stream.poller.register(shell_stream.socket, zmq.POLLIN)
events = shell_stream.poller.poll(0)
while events:
_, event = events[0]
if event:
shell_stream._handle_recv()
shell_stream.poller.register(
shell_stream.socket, zmq.POLLIN)
events = shell_stream.poller.poll(0)
qsize = self.__kernel.msg_queue.qsize()
if unsafe and qsize == self.qsize_old:
# The number of queued messages in the queue has not changed
# since it last time it has been checked. Assuming those
# messages are the same has before and returning earlier.
return
# One must go through all the messages to keep them in order
for _ in range(qsize):
priority, t, dispatch, args = \
self.__kernel.msg_queue.get_nowait()
if priority <= SHELL_PRIORITY:
_, msg = self.__kernel.session.feed_identities(
args[-1], copy=False)
msg = self.__kernel.session.deserialize(
msg, content=False, copy=False)
else:
# Do not spend time analyzing already rejected message
msg = None
if msg is None or not 'comm_' in msg['header']['msg_type']:
# The message is not related to comm, so putting it back in
# the queue after lowering its priority so that it is send
# at the "end of the queue", ie just at the right place:
# after the next unchecked messages, after the other
# messages already put back in the queue, but before the
# next one to go the same way. Note that every shell
# messages have SHELL_PRIORITY by default.
self.__kernel.msg_queue.put_nowait(
(SHELL_PRIORITY + 1, t, dispatch, args))
else:
# Comm message. Processing it right now.
comm_handler = getattr(
self.__kernel.comm_manager, msg['header']['msg_type'])
msg['content'] = self.__kernel.session.unpack(msg['content'])
comm_handler(None, None, msg)
self.qsize_old = self.__kernel.msg_queue.qsize()
process_kernel_comm = CommProcessor()

Creating detached processes from celery worker/alternative solution?

I'm developing a web service that will be used as a "database as a service" provider. The goal is to have a flask based small web service, running on some host and "worker" processes running on different hosts owned by different teams. Whenever a team member comes and requests a new database I should create one on their host. Now the problem... The service I start must be running. The worker however might be restarted. Could happen 5 minutes could happen 5 days. A simple Popen won't do the trick because it'd create a child process and if the worker stops later on the Popen process is destroyed (I tried this).
I have an implementation that's using multiprocessing which works like a champ, sadly I cannot use this with celery. so out of luck there. I tried to get away from the multiprocessing library with double forking and named pipes. The most minimal sample I could produce:
def launcher2(working_directory, cmd, *args):
command = [cmd]
command.extend(list(args))
process = subprocess.Popen(command, cwd=working_directory, shell=False, start_new_session=True,
stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
with open(f'{working_directory}/ipc.fifo', 'wb') as wpid:
wpid.write(process.pid)
#shared_task(bind=True, name="Test")
def run(self, cmd, *args):
working_directory = '/var/tmp/workdir'
if not os.path.exists(working_directory):
os.makedirs(working_directory, mode=0o700)
ipc = f'{working_directory}/ipc.fifo'
if os.path.exists(ipc):
os.remove(ipc)
os.mkfifo(ipc)
pid1 = os.fork()
if pid1 == 0:
os.setsid()
os.umask(0)
pid2 = os.fork()
if pid2 > 0:
sys.exit(0)
os.setsid()
os.umask(0)
launcher2(working_directory, cmd, *args)
else:
with os.fdopen(os.open(ipc, flags=os.O_NONBLOCK | os.O_RDONLY), 'rb') as ripc:
readers, _, _ = select.select([ripc], [], [], 15)
if not readers:
raise TimeoutError(60, 'Timed out', ipc)
reader = readers.pop()
pid = struct.unpack('I', reader.read())[0]
pid, status = os.waitpid(pid, 0)
print(status)
if __name__ == '__main__':
async_result = run.apply_async(('/usr/bin/sleep', '15'), queue='q2')
print(async_result.get())
My usecase is more complex but I don't think anyone would want to read 200+ lines of bootstrapping, but this fails exactly on the same way. On the other hand I don't wait for the pid unless that's required so it's like start the process on request and let it do it's job. Bootstrapping a database takes roughly a minute with the full setup, and I don't want the clients standing by for a minute. Request comes in, I spawn the process and send back an id for the database instance, and the client can query the status based on the received instance id. However with the above forking solution I get:
[2020-01-20 18:03:17,760: INFO/MainProcess] Received task: Test[dbebc31c-7929-4b75-ae28-62d3f9810fd9]
[2020-01-20 18:03:20,859: ERROR/MainProcess] Process 'ForkPoolWorker-2' pid:16634 exited with 'signal 15 (SIGTERM)'
[2020-01-20 18:03:20,877: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 15 (SIGTERM).')
Traceback (most recent call last):
File "/home/pupsz/PycharmProjects/provider/venv37/lib/python3.7/site-packages/billiard/pool.py", line 1267, in mark_as_worker_lost
human_status(exitcode)),
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 15 (SIGTERM).
Which leaves me wondering, what might be going on. I tried an even more simple task:
#shared_task(bind=True, name="Test")
def run(self, cmd, *args):
working_directory = '/var/tmp/workdir'
if not os.path.exists(working_directory):
os.makedirs(working_directory, mode=0o700)
command = [cmd]
command.extend(list(args))
process = subprocess.Popen(command, cwd=working_directory, shell=False, start_new_session=True,
stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
return process.wait()
if __name__ == '__main__':
async_result = run.apply_async(('/usr/bin/sleep', '15'), queue='q2')
print(async_result.get())
Which again fails with the very same error. Now I like Celery but from this it feels like it's not suited for my needs. Did I mess something up? Can it be achieved, what I need to do from a worker? Do I have any alternatives, or should I just write my own task queue?
Celery is not multiprocessing-friendly, so try to use billiard instead of multiprocessing (from billiard import Process etc...) I hope one day Celery guys do a heavy refactoring of that code, remove billiard, and start using multiprocessing instead...
So, until they move to multiprocessing we are stuck with billiard. My advice is to remove any usage of multiprocessing in your Celery tasks, and start using billiard.context.Process and similar, depending on your use-case.

Python chat clients not updating

I am working on a GUI based chat program.
I am using someone else's server which has worked well for many people so I am assuming the problem is with my client's code.
When I run a single instance of the client it works perfectly, but if I run two instances of the client on the same computer the listener stops responding when the 2nd client logs in.
# server is from socket module
# chat_box is a tkinter ListBox
# both are copies of global variable
class listener_thread(threading.Thread):
def __init__(self, server, chat_box):
super(listener_thread, self).__init__()
self.server = server
self.chat_box = chat_box
def run(self):
try:
update = self.server.recv(1024)
msg = update.decode("utf-8")
if msg != "":
self.chat_box.insert(END, msg)
except Exception as e:
print(e)
I've verified that the server is putting each client on a different port. The server is receiving the messages. When 'Michael' logs in and says 'Hi' it updates in his chat_box.
Though, the clients are no longer updating their histories after 'Dave' logs in.
Yet, the server continues to show that it is receiving the messages from both clients.
#This is the server output
#Hi is Michael
#Yo is Dave
#So Michael is still connecting and transmitting after Dave connects
Michael - ('127.0.0.1', 56263) connected
Hi
Dave - ('127.0.0.1', 56264) connected
Yo
Hi
The network connection is working properly. It just locks up the list_box updating threads.
No exceptions are being thrown.
I solved my own problem.
I needed to make the chat_history_listbox as a ListBox initially, instead of None
I needed to put the receive code into a function, with a loop and an exit condition
def receive_func():
global server, chat_history_listbox
while True:
try:
update = server.recv(1024)
except OSError as e:
update = None
break
connect()
msg = update.decode("utf-8")
if msg != "":
chat_history_listbox.insert(END, msg)
I needed to make the thread call a function and make it a daemon
listener = Thread(target=receive_func, daemon=True)
listener.start()
This got it working with multiple clients

python client recv only reciving on exit inside BGE

using python 3, I'm trying to send a file from a server to a client as soon as the client connects to the server, problem is that the client do only continue from recv when I close it (when the connection is closed)
I'm running the client in blender game engine, the client is running until it gets to recv, then it just stops, until i exit the game engine, then I can see that the console is receiving the bytes expected.
from other threads I have read that this might be bco the recv never gets an end, that's why I added "\n\r" to the end of my bytearray that the server is sending. but still, the client just stops at recv until I exit the application.
in the code below I'm only sending the first 6 bytes, these are to tell the client the size of the file. after this i intend to send data of the file on the same connection.
what am I doing wrong here?
client:
import socket
import threading
def TcpConnection():
TCPsocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
TCPsocket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
server_address = ('localhost', 1338)
TCPsocket.connect(server_address)
print("TCP Socket open!, starting thread!")
ServerResponse = threading.Thread(target=TcpReciveMessageThread,args=(TCPsocket,))
ServerResponse.daemon = True
ServerResponse.start()
def TcpReciveMessageThread(Sock):
print("Tcp thread running!")
size = Sock.recv(6)#Sock.MSG_WAITALL
print("Recived data", size)
Sock.close()
Server:
import threading
import socket
import os
def StartTcpSocket():
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.bind(('localhost', 1338))
server_socket.listen(10)
while 1:
connection, client_address = server_socket.accept()
Response = threading.Thread(target=StartTcpClientThread,args=(connection,))
Response.daemon = True # thread dies when main thread (only non-daemon thread) exits.
Response.start()
def StartTcpClientThread(socket):
print("Sending data")
length = 42
l1 = ToByts(length)
socket.send(l1)
#loop that sends the file goes here
print("Data sent")
#socket.close()
def ToByts(Size):
byt_res = (Size).to_bytes(4,byteorder='big')
result = bytearray()
for r in byt_res:
result.append(r)
t = bytearray("\r\n","utf-8")
for b in t:
result.append(b)
return result
MessageListener = threading.Thread(target=StartTcpSocket)
MessageListener.daemon = True # thread dies when main thread (only non-daemon thread) exits.
MessageListener.start()
while 1:
pass
if the problem is that the client don't find a end of the stream, then how can solve this without closing the connection, as I intend to send the file on the same connection.
Update #1:
to clarify, the print in the client that say "recived" is printed first when I exit the ge (the client is closing). The loop that sends the file and recives it where left out of the question as they are not the problem. the problem still occurs without them, client freeze at recv until it is closed.
Update #2:
here are a image of what my consoles are printing when i run the server and client:
as you can see it is never printing the "Recived" print
when i exit the blender game engine, I get this output:
now, when the engine and the server script is exited/closed/finished i get the data printed. so recv is probably pausing the thread until the socket is closed, why are it doing this? and how can i get my data (and the print) before the socket is closing? This also happens if I set
ServerResponse.daemon = False
here are a .blend (on mediafire) of the client, the server running on python 3 (pypy). I'm using blender 2.78a
Update #3:
I tested and verified that the problem is the same on windows 10 and linux mint. I also made a Video showing the problem:
In the video you can see how I only receive data from the server when i exit blender ge. After some research I besinning to suspect that the problem is related to python threading not playing well with the bge.
https://www.youtube.com/watch?v=T5l9YGIoDYA
I have observed a similar phenomenon. It appears that the Python instance doesn't receive any execution cycles from Blender Game Engine (BGE) unless a controller gets invoked.
A simple solution is:
Add another Always sensor that is fired on every logic tick.
Add another Python controller that does nothing, a no-op.
Hook the sensor to the controller.
I applied this to your .blend as shown in the following screen capture.
I tested it by running your server and it seems to work OK.
Cheers, Jim

Resources