How to merge three TCP streams in realtime

How to merge three TCP streams in realtime - linux

I have three bits of networked realtime data logging equipment that output lines of ASCII text via TCP sockets. They essentially just broadcast the data that they are logging - there are no requests for data from other machines on the network. Each piece of equipment is at a different location on my network and each has a unique IP address.
I'd like to combine these three streams into one so that I can log it to a file for replay or forward it onto another device to view in realtime.
At the moment I have a PHP script looping over each IP/port combination listening for up to 64Kb of data. As soon as the data is received or it gets an EOL then it forwards that on to another which that listens to the combined stream.
This works reasonably well but one of the data loggers outputs far more data than the others and tends to swamp the other machines so I'm pretty sure that I'm missing data. Presumably because it's not listening in parallel.
I've also tried three separate PHP processes writing to a shared file in memory (on /dev/shm) which is read and written out by a fourth process. Using file locking this seems to work but introduces a delay of a few seconds which I'd rather avoid.
I did find a PHP library that allows true multithreading using Pthreads called (I think) Amp but I'm still not sure how to combine the output. A file in RAM doesn't seem quick enough.
I've had a good look around on Google and can't see an obvious solution. There certainly doesn't seem to be a way to do this on Linux using command line tools that I've found unless I've missed something obvious.
I'm not too familiar with other languages but are there other languages that might be better suited to this problem ?
Based on the suggested solution below I've got the following code almost working however I get an error 'socket_read(): unable to read from socket [107]: Transport endpoint is not connected'. This is odd as I've set the socket to accept connections and made it non-blocking. What am I doing wrong ?:
// Script to mix inputs from multiple sockets
// Run forever
set_time_limit (0);
// Define address and ports that we will listen on
$localAddress='';
// Define inbound ports
$inPort1=36000;
$inPort2=36001;
// Create sockets for inbound data
$inSocket1=createSocket($localAddress, $inPort1);
$inSocket2=createSocket($localAddress, $inPort2);
// Define buffer of data to read and write
$buffer="";
// Repeat forever
while (true) {
// Build array streams to monitor
$readSockets=array($inSocket1, $inSocket2);
$writeSockets=NULL;
$exceptions=NULL;
$t=NULL;
// Count number of stream that have been modified
$modifiedCount=socket_select($readSockets, $writeSockets, $exceptions, $t);
if ($modifiedCount>0) {
// Process inbound arrays first
foreach ($readSockets as $socket) {
// Get up to 64 Kb from this socket
$buffer.=socket_read($socket, 65536, PHP_BINARY_READ);
}
// Process outbound socket array
foreach ($writeSockets as $socket) {
// Get up to 64 Kb from this socket and add it to any other data that we need to write out
//socket_write($socket, $buffer, strlen($buffer));
echo $buffer;
}
// Reset buffer
$buffer="";
} else {
echo ("Nothing to read\r\n");
}
}
function createSocket($address, $port) {
// Function to create and listen on a socket
// Create socket
$socket=socket_create(AF_INET, SOCK_STREAM, 0);
echo ("SOCKET_CREATE: " . socket_strerror(socket_last_error($socket)) . "\r\n");
// Allow the socket to be reused otherwise we'll get errors
socket_set_option($socket, SOL_SOCKET, SO_REUSEADDR, 1);
echo ("SOCKET_OPTION: " . socket_strerror(socket_last_error($socket)) . "\r\n");
// Bind it to the address and port that we will listen on
$bind=socket_bind($socket, $address, $port);
echo ("SOCKET_BIND: " . socket_strerror(socket_last_error($socket)) . " $address:$port\r\n");
// Tell socket to listen for connections
socket_listen($socket);
echo ("SOCKET_LISTEN: " . socket_strerror(socket_last_error($socket)) . "\r\n");
// Make this socket non-blocking
socket_set_nonblock($socket);
// Accept inbound connections on this socket
socket_accept($socket);
return $socket;
}

You don't necessary need to switch languages, it just sounds like you're not familiar with the concept of IO multiplexing. Check out some documentation for the PHP select call here
The concept of listening to multiple data inputs and not knowing which one some data will come from next is a common one and has standard solutions. There are variations on exactly how its implemented but the basic idea is the same: you tell the system that you're interested in receiving data from multiple source simultaneously (TCP sockets in your case), and run a loop waiting for this data. On every iteration of the loop the system the system tells you which source is ready for reading. In your case that means you can piecemeal-read from all 3 of your sources without waiting for an individual one to reach 64KB before moving on to the next.
This can be done in lots of languages, including PHP.
UPDATE: Looking at the code you posted in your update, the issue that remains is that you're trying to read from the wrong thing, namely from the listening socket rather than the connection socket. You are ignoring the return value of socket_accept in your createSocket function which is wrong.
Remove these lines from createSocket:
// Accept inbound connections on this socket
socket_accept($socket);
Change your global socket creation code to:
// Create sockets for inbound data
$listenSocket1=createSocket($localAddress, $inPort1);
$listenSocket2=createSocket($localAddress, $inPort2);
$inSocket1=socket_accept($listenSocket1);
$inSocket2=socket_accept($listenSocket2);
Then your code should work.
Explanation: when you create a socket for binding and listening, its sole function then becomes to accept incoming connections and it cannot be read from or written to. When you accept a connection a new socket is created, and this is the socket that represents the connection and can be read/written. The listening socket in the meantime continues listening and can potentially accept other connections (this is why a single server running on one http port can accept multiple client connections).

Related

How can I send messages to multiple twitch channels using python 3 Select()?

I'm trying to create a Twitch bot in Python 3 that will simultaneously monitor and send messages to multiple channels. I've done this with threads but it's obviously demanding on the CPU and I've read that using Select() is more efficient. The code below allows me to read chat from multiple twitch channels but I'm at a loss for how to identify if the connections returned as writable are the ones I want to write to.
Can I pass in a list of objects that has the socket connection as well as an identifier so I know which ones have come back as writable?
I've read a number of stackoverflow posts related to using select() as well as other sources online but, as a hobbyist coder, I'm having trouble getting my head around this.
#!/usr/bin/env
import socket
import select
HOST = "irc.chat.twitch.tv"
PORT = 6667
NICK = "channelname"
PASS = 'oauth:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
CHANNEL = "channelname"
def create_sockets(usr_list):
final_socket_list = []
channels_first_element = 0
channels_last_element = len(usr_list)
for index in range(channels_first_element, channels_last_element):
channel = usr_list[index]
s = socket.socket()
s.connect((HOST, PORT))
s.setblocking(False)
s.send(bytes("PASS " + PASS + "\r\n", "UTF-8"))
s.send(bytes("NICK " + NICK + "\r\n", "UTF-8"))
s.send(bytes("JOIN #" + channel + " \r\n", "UTF-8"))
s.send(bytes('CAP REQ :twitch.tv/membership\r\n'.encode('utf-8')))
s.send(bytes('CAP REQ :twitch.tv/commands\r\n'.encode('utf-8')))
s.send(bytes('CAP REQ :twitch.tv/tags\r\n'.encode('utf-8')))
final_socket_list.append(s)
return final_socket_list
def main():
alive = True
user_list = ['channelone', 'channeltwo', 'channelthree']
user_sockets = create_sockets(user_list)
while alive:
readable, writable, errorreads = select.select(user_sockets, user_sockets, [])
if len(readable) != 0:
for element in readable:
print(str(element.recv(1024), "utf-8"))
if __name__ == "__main__":
main()

The first thing to point out is that the arguments to select() are meant to tell select() when to return -- i.e. by including a socket in the first/read_fds argument, you are telling select() to return as soon as that socket has incoming-data that is ready-to-read, and by including a socket in the second/write_fds argument, you are telling select() to return as soon as that socket has buffer-space-ready-to-write-to.
Because of that, it's important (if you want to be CPU-efficient) to only include a socket in select()'s second/write_fds argument if you have data that you want to send to that socket as soon as there is space available in that socket's outgoing-data-buffer. If you just always pass all of your sockets to the write_fds argument (as you are currently doing in the posted code), then select() will pretty much always return immediately (because sockets typically almost always have buffer space available to write to), and you'll end up spinning the CPU at near 100% usage, which is very inefficient.
Note that for a light-duty server that is using blocking TCP sockets, it's usually sufficient to simply always pass [] as the second argument to select(), on the assumption that you will never actually fill any socket's outgoing-data-buffer (and if you do, that the next send() call on that socket will simply block until there is buffer space available, and that's okay). If you want to use non-blocking sockets, you can either make the simplifying assumption that no socket's outgoing-data-buffer will ever become full (in which case passing [] to the write_fds argument is fine), or to be 100% robust you'll need to include your own per-socket outgoing-data FIFO queue, and include each socket in the write_fds argument only-if its FIFO queue is non-empty, and when the socket indicates it is ready-for-write, send() as many bytes as you can from the head of the socket's FIFO queue.
As for which sockets are the ones you want to write to, that's going to depend entirely on what your app is trying to do, and probably won't depend on which sockets have selected as writeable. Most programs include a Dictionary (or some other similar lookup mechanism) for quickly determining which data-object corresponds to a given socket, so when a socket select()'s as ready-for-read you can figure out which data-object the data coming from that socket should be viewed as "coming from". You can then write to any sockets you need to write to, in response.

UDP send performance in Node.js

I am benchmarking a Java UDP client that continuously sends datagrams with a payload of 100 bytes as fast as it can. It was implemented using java.nio.*. Tests show that it's able to achieve a steady throughput of 220k datagrams per second. I am not testing with a server; the client just sends the datagrams to some unused port on localhost.
I decided to run the same test in Node.js to compare both technologies and it was surprisingly sad to see that Node.js performed 10 times slower than Java. Let me walk you through my code.
First, I create a UDP socket using Node.js's dgram module:
var client = require('dgram').createSocket("udp4");
Then I create a function that sends a datagram using that socket:
function sendOne() {
client.send(message, 0, message.length, SERVER_PORT, SERVER_ADDRESS, onSend);
}
The variable message is a buffer created from a string with a hundred characters when the application starts:
var message = new Buffer(/* string with 100 chars */);
The function onSend just increments a variable that holds how many datagrams were sent so far. Next I have a function that constantly calls sendOne() using setImmediate():
function sendForever() {
sendOne();
setImmediate(sendForever);
}
Initially I tried to use process.nextTick(sendForever) but I found out that it always puts itself at the tip of the event queue, even before IO events, as the docs says:
It runs before any additional I/O events (including timers) fire in subsequent ticks of the event loop.
This prevents the send IO events from ever happening, as nextTick is constantly putting sendForever at the tip of the queue at every tick. The queue grows with unread IO events until it makes Node.js crash:
fish: Job 1, 'node client' terminated by signal SIGSEGV (Address boundary error)
On the other hand, setImmediate fires after I/O events callbacks, so that's why I'm using it.
I also create a timer that once every 1 second prints to the console how many datagrams were sent in the last second:
setInterval(printStats, 1000);
And finally I start sending:
sendForever();
Running on the same machine as the Java tests ran, Node.js achieved a steady throughput of 21k datagrams per second, ten times slower than Java.
My first guess was to put two sendOne's for every tick to see if it would double the throughput:
function sendForever() {
send();
send(); // second send
setImmediate(sendForever);
}
But it didn't change the throughput whatsoever.
I have a repository available on GitHub with the complete code:
https://github.com/luciopaiva/udp-perf-js
Simply clone it to your machine, cd into the folder and run:
node client
I want to open a discussion about how this test could be improved in Node.js and if there's some way we can increase Node.js's throughput. Any ideas?
P.S.: for those interested, here is the Java part.

That test is overfly flawed. UDP doesn't guarantee the delivery of anything and doesn't guarantee that it would give any error in case of error.
Your application could send 1000k datagram/s at 1GB/s from the Java application, yet 90% of datagrams never reached the destination... the destination might not even be running.
If you want to do any sort of UDP testing, you need two applications, one on each end. Send numbered datagrams 1, 2, 3... and check what's sent and what's received. Note that UDP doesn't guarantee any ordering of messages.
Kernels manage the localhost network in special ways. There are huge buffers dedicated to it and higher limits, no traffic ever goes through any network cards or drivers. It's very different from sending packets for real.
Tests might seem somewhat okay when they're only done on localhost. Expect everything to fail miserably when it's going through any physical infrastructure for real.
PC1 <-----> switch <-----> PC2
Let's say, there are two computers in the same room linked by a switch. It would be no small feat to achieve 10k/s UDP datagrams on that simple setup, without loosing messages randomly.
And that's just two computers in the same room. It can be a lot worse on the Internet and long distance.

If all you want is to make the performance test go faster, removing the setImmediate call and executing the next send once the first has completed i.e. in the send callback increased its performance to ~100k requests per second on my slowish laptop.
function send(socket, message) {
socket.send(message, SERVER_PORT, (err) => {
send(socket, message);
});
}
const socket = require('dgram').createSocket('udp4');
const message = new Buffer('dsdsddsdsdsjkdshfsdkjfhdskjfhdskjfhdsfkjsdhfdskjfhdskjfhsdfkjdshfkjdshfkjdsfhdskjfhdskjfhdkj');
send(socket, message);

Using thread to write and select to read

Has any one tried to create a socket in non blocking mode and use a dedicated thread to write to the socket, but use the select system call to identify if data is available to read data.
if the socket is non blocking, the write call will return immediately and the application will not know the status of the write (if it passed or failed).
is there a way of knowing the status of the write call without having to block on it.

Has any one tried to create a socket in non blocking mode and use a dedicated thread to write to the socket, but use the select system call to identify if data is available to read data.
Yes, and it works fine. Sockets are bi-directional. They have separate buffers for reading and writing. It is perfectly acceptable to have one thread writing data to a socket while another thread is reading data from the same socket at the same time. Both threads can use select() at the same time.
if the socket is non blocking, the write call will
return immediately and the application will not
know the status of the write (if it passed or failed).
The same is true for blocking sockets, too. Outbound data is buffered in the kernel and transmitted in the background. The difference between the two types is that if the write buffer is full (such as if the peer is not reading and acking data fast enough), a non-blocking socket will fail to accept more data and report an error code (WSAEWOULDBLOCK on Windows, EAGAIN or EWOULDBLOCK on other platforms), whereas a blocking socket will wait for buffer space to clear up and then write the pending data into the buffer. Same thing with reading. If the inbound kernel buffer is empty, a non-blocking socket will fail with the same error code, whereas a blocking socket will wait for the buffer to receive data.
select() can be used with both blocking and non-blocking sockets. It is just more commonly used with non-blocking sockets than blocking sockets.
is there a way of knowing the status of the write
call without having to block on it.
On non-Windows platforms, about all you can do is use select() or equivalent to detect when the socket can accept new data before writing to it. On Windows, there are ways to receive a notification when a pending read/write operation completes if it does not finish right away.
But either way, outbound data is written into a kernel buffer and not transmitted right away. Writing functions, whether called on blocking or non-blocking sockets, merely report the status of writing data into that buffer, not the status of transmitting the data to the peer. The only way to know the status of the transmission is to have the peer explicitly send back a reply message once it has received the data. Some protocols do that, and others do not.

is there a way of knowing the status of the write call without having
to block on it.
If the result of the write call is -1, then check errno to for EAGAIN or EWOULDBLOCK. If it's one of those errors, then it's benign and you can go back to waiting on a select call. Sample code below.
int result = write(sock, buffer, size);
if ((result == -1) && ((errno == EAGAIN) || (errno==EWOULDBLOCK)) )
{
// write failed because socket isn't ready to handle more data. Try again later (or wait for select)
}
else if (result == -1)
{
// fatal socket error
}
else
{
// result == number of bytes sent.
// TCP - May be less than the number of bytes passed in to write/send call.
// UDP - number of bytes sent (should be the entire thing)
}

Increase speed of Read/Write Serial Port using Timers

I have my code that reads and writes to a serial port written in MFC. The programs works well but is a bit slow as there are many operations occuring (Read and writing). I have a timer that carries on the operations on the serial port. The timer is given below:
Loop_Timer = SetTimer(1,50,0);
The serial port transmission information is as follows:
BaudRate = 57600;
ByteSize = 8;
Parity = NOPARITY;
StopBits = ONESTOPBIT;
fAbortOnError = false;
The following write and read operation occurs when the timer starts:
Write(command);
Read(returned_message);
returned_message.Trim();
...
//finds a value from the returned string
...
So, this read and write operation occurs may be 1,2,3 or 4 times for a given selected option.
For Ex: Option 1 requires the above function to occurs 4 times in the given timer.
Option 2 requires the above function to occur 2 times. (as it has only two variables with return values). etc
...
Now, what I was trying to do is improving the speed of this overall operation making it robust and respond quickly. I tried changing the timer but it is still pretty slow. Any suggestions on improvement?

You'd do far better to run your actual serial port processing in a separate thread, and to use the WaitCommEvent rather than a timer for accepting incoming data. Append newly received data within a storage buffer local to that thread.
Retrieve data from your serial port thtread using a timer if you wish, or have your serial port thread communicate to your main app. when a complete message is received.
When sending data to the serial port thread you want a mechanism, whereby the data is stored locally to the serial port code and transmitted form there.
The thing to bear in mind is that compared to all other means of communications serial port transmission and reception is SLOW and by accessing the serial port on your main application thread you'll slow it down massively, especially when transmitting data.
If you find direct coding using the Win32 API and serial ports a pain then this class here I've found very useful.

Sockets & File Descriptor Reuse (or lack thereof)

I am getting the error "Too many open files" after the call to socket in the server code below. This code is called repeatedly, and it only occurs just after server_SD gets the value 1022. so i am assuming that i am hitting the limit of 1024 as proscribed by "ulimit -n". What i don't understand is that i am closing the Socket, which should make the fd reusable, but this seems not to be happening.
Notes: Using linux, and yes the client is closed also, no i am not a root user so moving the limits is not an option, I should have a maximum of 20 (or so) sockets open at one time. Over the lifetime of my program i would expect to open & close close to 1000000 sockets (hence need to reuse very strong).
server_SD = socket (AF_INET, SOCK_STREAM, 0);
bind (server_SD, (struct sockaddr *) &server_address, server_len)
listen (server_SD,1)
client_SD = accept (server_SD, (struct sockaddr *)&client_address, &client_len)
// read, write etc...
shutdown (server_SD, 2);
close (server_SD)
Does anyone know how to guarantee closure & re-usability ?
Thanks.

Run your program under valgrind with the --track-fds=yes option:
valgrind --track-fds=yes myserver
You may also need --trace-children=yes if your program uses a wrapper or it puts itself in the background.
If it doesn't exit on its own, interrupt it or kill the process with "kill pid" (not -9) after it accumulates some leaked file descriptors. On exit, valgrind will show the file descriptors that are still open and the stack trace corresponding to where they were created.
Running your program under strace to log all system calls may also be helpful. Another helpful command is /usr/sbin/lsof -p pid to display all currently used file descriptors and what they are being used for.

From your description it looks like you are opening server socket for each accept(2). That is not necessary. Create server socket once, bind(2) it, listen(2), then call accept(2) on it in a loop (or better yet - give it to poll(2))
Edit 0:
By the way, shutdown(2) on listening socket is totally meaningless, it's intended for connected sockets only.

Perhaps your problem is that you're not specifying the SO_REUSEADDR flag?
From the socket manpage:
SO_REUSEADDR
Indicates that the rules used in validating addresses supplied in a bind(2) call should allow reuse of local addresses. For PF_INET sockets this means that a socket may bind, except when there is an active listening socket bound to the address. When the listening socket is bound to INADDR_ANY with a specific port then it is not possible to bind to this port for any local address.

Are you using fork()? if so, your children may be inheriting the opened file descriptors.
If this is the case, you should have the child close any fds that don't belong to it.

This looks like you might have a "TIME_WAIT" problem. IIRC, TIME_WAIT is one of the status a TCP socket can be in, and it's entered when both side have closed the connection, but the system keeps the socket for a while, to avoid delayed messages to be accepted as proper payload for subsequent connections.
You shoud maybe have a look at this (bottom of page 99 and top of 100). And maybe that other question.

One needs to close the client before closing the server (reverse order to my code above!)
Thanks all who offered suggestions !

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string