process socket data in near realtime python - python-3.x

I have upto 30 nodes each capable of sending data upto 1000 messages/second. Each messages can have 256-512 bytes of data.
Each node use unique tcp port for communication. Each data received is pre-processed, inserted into database and post-processed.
Below are the approaches I have tried with observations :-
Case-1. using asyncio with processing data as soon as received.
async def process_packets(reader, writer, db):
while True:
data = reader.read(4096)
data = pre_process(data)
save_in_db(data)
post_process(data)
writer.close()
Observation:- For single packet, processing usually takes 10-20msec. But as packet frequency increases, tcp buffering starts to happen i.e single call to reader.read(), gets multiple packets.
This increases the processing of the current node as well as other nodes.
Case-2. using asyncio with data pushed in a queue and worker thread consuming this queue.
async def process_packets(reader, writer, q):
while True:
data = reader.read(4096)
q.put(data)
writer.close()
def worker_thread(q, db):
while True:
data = q.get()
data = pre_process(data)
save_in_db(data)
post_process(data)
Observation:- As no processing is done while receiving packets, all nodes are able put data in the queue as fast as possible. The issue comes in the worker thread where q.get() becomes very slow as time progress.
Case-3. creating socket server thread for each node
def server_thread(port, db):
s = socket.socket()
s.bind()
s.listen(1)
while True:
(conn, addr) = s.accept()
while True:
try:
data = conn.recv(4096)
except Exception:
conn.close()
break
data = pre_process(data)
save_in_db(data)
post_process(data)
Observation:- The advantage is this case is that each node has dedicated thread for receiving and processing data so other threads are not affected. But here I am facing multiple packets returned by socket.recv(). This increases processing time.
I need a way to process data from these node as fast as possible with application running 24x7 with no downtime.
OS = Ubuntu20.04-lts
System = intel i3 8th gen, 8GB ram, 4core

But here I am facing multiple packets returned by socket.recv()
TCP is a byte stream, i.e. there are no packets at this level. You likely mean application level messages. Your code MUST be able to deal with multiple or partial application messages itself since TCP does not provide a message syntax by itself. While you seem to get only full messages when reading fast enough there is no guarantee for this, eventually your application might stall for a short time (due to scheduling) and messages will accumulate.
Dealing with multiple messages returned from a socket.recv() can even be an advantage. Reading multiple messages at once means that a single system call returns more application data, which increases the efficiency of the application (less system calls needed for the same amount of work). So it is better to read as much as possible within a single recv instead of hoping to get only a single message.
As for the other design: the last approach with a thread per node scales best, since in this case the work (and this the load) is spread over multiple CPU cores. The other approaches only use a single CPU core. But none of the approaches actually guarantees that your specific system is able to process that much data. They only differ in how good they make use of the resources offered by the underlying system.

Related

Asynchronous Communication between few 'loops'

I have 3 classes that represent nearly isolated processes that can be run concurrently (meant to be persistent, like 3 main() loops).
class DataProcess:
...
def runOnce(self):
...
class ComputeProcess:
...
def runOnce(self):
...
class OtherProcess:
...
def runOnce(self):
...
Here's the pattern I'm trying to achieve:
start various streams
start each process
allow each process to publish to any stream
allow each process to listen to any stream (at various points in it's loop) and behave accordingly (allow for interruption of it's current task or not, etc.)
For example one 'process' Listens for external data. Another process does computation on some of that data. The computation process might be busy for a while, so by the time it comes back to start and checks the stream, there may be many values that piled up. I don't want to just use a queue because, actually I don't want to be forced to process each one in order, I'd rather be able to implement logic like, "if there is one or multiple things waiting, just run your process one more time, otherwise go do this interruptible task while you wait for something to show up."
That's like a lot, right? So I was thinking of using an actor model until I discovered RxPy. I saw that a stream is like a subject
from reactivex.subject import BehaviorSubject
newData = BehaviorSubject()
newModel = BehaviorSubject()
then I thought I'd start 3 threads for each of my high level processes:
thread = threading.Thread(target=data)
threads = {'data': thread}
thread = threading.Thread(target=compute)
threads = {'compute': thread}
thread = threading.Thread(target=other)
threads = {'other': thread}
for thread in threads.values():
thread.start()
and I thought the functions of those threads should listen to the streams:
def data():
while True:
DataProcess().runOnce() # publishes to stream inside process
def compute():
def run():
ComuteProcess().runOnce()
newData.events.subscribe(run())
newModel.events.subscribe(run())
def other():
''' not done '''
ComuteProcess().runOnce()
Ok, so that's what I have so far. Is this pattern going to give me what I'm looking for?
Should I use threading in conjunction with rxpy or just use rxpy scheduler stuff to achieve concurrency? If so how?
I hope this question isn't too vague, I suppose I'm looking for the simplest framework where I can have a small number of computational-memory units (like objects because they have internal state) that communicate with each other and work in parallel (or concurrently). At the highest level I want to be able to treat these computational-memory units (which I've called processes above) as like individuals who mostly work on their own stuff but occasionally broadcast or send a message to a specific other individual, requesting information or providing information.
Am I perhaps actually looking for an actor model framework? or is this RxPy setup versatile enough to achieve that without extreme complexity?
Thanks so much!

Handling RabbitMQ heartbeats when cpu is loaded 100% for a long time

I'm using pika 1.1 and graph-tool 3.4 in my python application. It consumes tasks from RabbitMQ, which then used to build graphs with graph-tool and then runs some calculations.
Some of the calculations, such as betweenness, take a lot of cpu power which make cpu usage hit 100% for a long time. Sometimes rabbitmq connection drops down, which causes task to start from the beginning.
Even though calculations are run in a separate process, my guess is during the time cpu is loaded 100%, it can't find any opportunity to send a heartbeat to rabbitmq, which causes connection to terminate. This doesn't happen all the time, which indicates by chance it could send heartbeats time to time. This is only my guess, I am not sure what else can cause this.
I tried lowering the priority of the calculation process using nice(19), which didn't work. I'm assuming it's not affecting the processes spawned by graph-tool, which parallelizes work on its own.
Since it's just one line of code, graph.calculate_betweenness(... I don't have a place to manually send heartbeats or slow the execution down to create chance for heartbeats.
Can my guess about heartbeats not getting sent because cpu is super busy be correct?
If yes, how can I handle this scenario?
Answering to your questions:
Yes, that's basically it.
The solution we do is creating a separate process for the CPU intensive tasks.
import time
from multiprocessing import Process
import pika
connection = pika.BlockingConnection(
pika.ConnectionParameters(host='localhost'))
channel = connection.channel()
channel.exchange_declare(exchange='logs', exchange_type='fanout')
result = channel.queue_declare(queue='', exclusive=True)
queue_name = result.method.queue
channel.queue_bind(exchange='logs', queue=queue_name)
def cpu_intensive_task(ch, method, properties, body):
def work(body):
time.sleep(60) # If I remember well default HB is 30 seconds
print(" [x] %r" % body)
p = Process(target=work, args=(body,))
p.start()
# Important to notice if you do p.join() You will have the same problem.
channel.basic_consume(
queue=queue_name, on_message_callback=cpu_intensive_task, auto_ack=True)
channel.start_consuming()
I wonder if this is the best solution to this problem or if rabbitMQ is the best tool for CPU intensive tasks. (For really long CPU intensive tasks (more than 30 min) if you send manual ACK you will need to handle with this also: https://www.rabbitmq.com/consumers.html#acknowledgement-timeout)

Why does the Disruptor hold lots of data when the producer is much faster than the consumer?

I'm learning about the LMAX Disruptor and have a problem: When I have a very large ring buffer, like 1024, and my producer is much faster than my consumer, the ring buffer will hold lots of data, but will not publish the events until my application ends. Which means my application will lose lots of data (my application is not a daemon).
I've tried to slow down the rate of the producer, which works. But I can't use this approach in my application, it would reduce my application's performance greatly.
val ringBufferSize = 1024
val disruptor = new Disruptor[util.Map[String, Object]](new MessageEventFactory, ringBufferSize, new MessageThreadFactory, ProducerType.MULTI, new BlockingWaitStrategy)
disruptor.handleEventsWith(new MessageEventHandler(batchSize, this))
disruptor.setDefaultExceptionHandler(new MessageExceptionHandler)
val ringBuffer = disruptor.start
val producer = new MessageEventProducer(ringBuffer)
part.foreach { row =>
// Thread.sleep(2000)
accm.add(1)
producer.onData(row)
// flush(row)
}
I want to find a way to control the batch size of the disruptor by myself, and is there any method to consume the rest of the data held at the end of my application?
If you let your application end abruptly, your consumers will end abruptly, too, of course. There is no need to slow down the producer, you simply need to block your application from exiting until all consumers (i. e. event handlers) have finished working on the outstanding events.
The normal way to do this is to invoke Disruptor.shutdown() on the main thread, thus blocking the application from exiting until Disruptor.shutdown() has returned.
In your code snipplet above, you'd add that command before you exit the routine after the part.foreach statement, blocking until the routine returns normally. That would ensure that all events are properly handled to completion.
The Disruptor excels mainly in buffering (smoothing out) bursts of data coming from a single (extremely fast) or multiple (still pretty fast) producer threads, to feed that data to consumers which perform in a predictable manner, thus eliminating as much latency and overhead due to lock contention as possible. You may find that simply invoking the consumer code from within your lambda may yield better or similar results if your producers are in fact much faster than your consumers, unless you use advanced techniques such as batching or setting up the Disruptor to run multiple instances of the same consumer in parallel threads, which requires the event handler implementation to be modified though (see the Disruptor FAQ).
In your example, it seems that all you try to accomplish is to feed an already available set of data (your "part" collection) into a single event handler (MessageEventHandler). In such a use case, you might be better off saying something like parts.stream().parallel().foreach(... messageEventHanler.onEvent(event) ...)

Erlang Node to Node Messaging Throughput, Timeouts and guarantees

Now, suppose we are designing an application, consists of 2 Erlang Nodes. On Node A, will be very many processes, in the orders of thousands. These processes access resources on Node B by sending a message to a registered process on Node B. At Node B, lets say you have a process started by executing the following function:
start_server()->
register(zeemq_server,spawn(?MODULE,server,[])),ok.<br>
server()->
receive
{{CallerPid, Ref}, {Module, Func, Args}} ->
Result = (catch erlang:apply(Module, Func, Args)),
CallerPid ! {Ref, Result},
server();
_ -> server()
end.
On Node A, any process that wants to execute any function in a given module on Node B, uses the following piece of code:
call(Node, Module, Func, Args)->
Ref = make_ref(),
Me = self(),
{zeemq_server,Node} ! {{Me, Ref}, {Module, Func, Args}},
receive
{Ref, Result} -> Result
after timer:minutes(3) ->
error_logger:error_report(["Call to server took so long"]),
{error,remote_call_failed}
end.
So assuming that Process zeemq_server on Node B, will never be down, and that the network connection between Node A and B is always up, please answer the following questions:
Qn 1: Since there is only one receiving process on Node B, its mail box is most likely to be full , all the time. This is because, the processes are many on Node A and at a given interval, say, 2 seconds, every process at least ,makes a single call to the Node B server. In which ways, can the reception be made redundant on the Node B ? , e.g. Process Groups e.t.c. and explain (the concept) how this would replace the server side code above. Show what changes would happen on the Client side.
Qn 2: In a situation where there is only one receiver on Node B, is there a maximum number of messages allowable in the process mail box ? how would erlang respond , if a single process mail ox is flooded with too many messages ?
Qn 3: In what ways, using the very concept showed above, can i guarantee that every process which sends a request , gets back an answer as soon as possible before the timeout occurs ? Could converting the reception part on the Node B to a parallel operation help ? like this:
start_server()->
register(zeemq_server,spawn(?MODULE,server,[])),ok.<br>
server()->
receive
{{CallerPid, Ref}, {Module, Func, Args}} ->
<b>spawn(?MODULE,child,[Ref,CallerPid,{Module, Func, Args}]),</b>
server();
_ -> server()
end.
child(Ref,CallerPid,{Module, Func, Args})->
Result = (catch erlang:apply(Module, Func, Args)),
CallerPid ! {Ref, Result},
ok.
The method showed above, may increase the instantaneous number of processes running on the Node B, and this may affect the service greatly due to memory. However, it looks good and makes the server() loop to return immediately to handle the next request. What is your take on this modification ?
Lastly : Illustrate how you would implement a Pool of receiver Threads on Node B, yet appearing to be under one Name as regards Node A. Such that, incoming messages are multiplexed amongst the receiver threads and the load shared within this group of processes. Keep the meaning of the problem the same.
The maximum number of messages in a process mailbox is unbounded, except by the amount of memory.
Also, if you need to inspect the mailbox size, use
erlang:process_info(self(),[message_queue_len,messages]).
This will return something like:
[{message_queue_len,0},{messages,[]}]
What I suggest is that you first convert your server above into a gen_server. This your worker.
Next, I suggest using poolboy ( https://github.com/devinus/poolboy ) to create a pool of instances of your server as poolboy workers (there are examples in their github Readme.md). Lastly, I suggest creating a module for callers with a helper method that creates a poolboy transaction and applies a Worker arg from the pool to a function. Example below cribbed from their github:
squery(PoolName, Sql) ->
poolboy:transaction(PoolName, fun(Worker) ->
gen_server:call(Worker, {squery, Sql})
end).
That said, would Erlang RPC suit your needs better? Details on Erlang RPC at http://www.erlang.org/doc/man/rpc.html. A good treatment of Erlang RPC is found at http://learnyousomeerlang.com/distribunomicon#rpc.
IMO spawning a new process to handle each request may be overkill, but it's hard to say without knowing what has to be done with each request.
You can have a pool of process to handle each msg, using a round robin method to distribute the requests or based on type of request ether handle it, send it to a child process or spawn a process. You can also monitor the load of the pooled processes by looking at their msg queues and starting new children if they are overloaded. Using a supervisor.. just use a send_after in the init to monitor the load every few seconds and act accordingly. Use OTP if you can, there's overhead but it is worth it.
I wouldn't use http for a dedicated line communication, I believe it's too much overhead. You can control the load using a pool of processes to handle it.

Increase speed of Read/Write Serial Port using Timers

I have my code that reads and writes to a serial port written in MFC. The programs works well but is a bit slow as there are many operations occuring (Read and writing). I have a timer that carries on the operations on the serial port. The timer is given below:
Loop_Timer = SetTimer(1,50,0);
The serial port transmission information is as follows:
BaudRate = 57600;
ByteSize = 8;
Parity = NOPARITY;
StopBits = ONESTOPBIT;
fAbortOnError = false;
The following write and read operation occurs when the timer starts:
Write(command);
Read(returned_message);
returned_message.Trim();
...
//finds a value from the returned string
...
So, this read and write operation occurs may be 1,2,3 or 4 times for a given selected option.
For Ex: Option 1 requires the above function to occurs 4 times in the given timer.
Option 2 requires the above function to occur 2 times. (as it has only two variables with return values). etc
...
Now, what I was trying to do is improving the speed of this overall operation making it robust and respond quickly. I tried changing the timer but it is still pretty slow. Any suggestions on improvement?
You'd do far better to run your actual serial port processing in a separate thread, and to use the WaitCommEvent rather than a timer for accepting incoming data. Append newly received data within a storage buffer local to that thread.
Retrieve data from your serial port thtread using a timer if you wish, or have your serial port thread communicate to your main app. when a complete message is received.
When sending data to the serial port thread you want a mechanism, whereby the data is stored locally to the serial port code and transmitted form there.
The thing to bear in mind is that compared to all other means of communications serial port transmission and reception is SLOW and by accessing the serial port on your main application thread you'll slow it down massively, especially when transmitting data.
If you find direct coding using the Win32 API and serial ports a pain then this class here I've found very useful.

Resources