What kind of server needs select - linux

I know that a server normally open one port and listen it.
Today I learnt that there was a function select in system Unix-Like. With select we can listen multi-sockets.
I just can't imagine a case where we need to use select. If we have two sockets, it means that we are listening two ports, right? So I have a question:
What kind of server would open more than one port but receive and process the same type of requests?

Using select helps with handling reads and writes on multiple sockets. It doesn't have to be multiple server sockets. The most typical use is for multiplexing a large number of client sockets.
You have a server with one listening socket. Each time you accept a connection, you add the new client socket to the multiplexing pool. select then returns any time any of those sockets has data available to read. The big win is that you're doing all this with one thread.

You also get as socket for each connection that you've accepted on the listening (server) socket.
selecting among these (client) sockets and the server socket (readable => new connection) allows you to write apps such as chat servers efficiently.

Ummm... remember the difference between ports and sockets.
A "port" is like a telephone-number. But a single phone-number could be handling any number of "calls!"
A "socket," then, represents a single telephone-call: a currently active connection between this server and a particular client. Each connection, by definition, "takes place over a particular port," but any number of connections might exist at the same time.
(The "accept" operation corresponds to: picking up the phone.)
So, then, what select() buys you is the ability to monitor any number of sockets at one time. It examines all the sockets, waits (if necessary) for something to happen on any one of them, and returns one message to you. Now, the design of your server becomes "a simple loop." No matter how many sockets you're listening to, and no matter how many of them have messages waiting, select() will return messages to you one at a time.
It's basically the case that "every server out there will use a select() loop at its heart, unless there's an exceptionally wonderful reason not to.

Take a look here:
One traditional way to write network servers is to have the main
server block on accept(), waiting for a connection. Once a connection
comes in, the server fork()s, the child process handles the connection
and the main server is able to service new incoming requests.
With select(), instead of having a process for each request, there is
usually only one process that "multi-plexes" all requests, servicing
each request as much as it can.
So one main advantage of using select() is that your server will only
require a single process to handle all requests. Thus, your server
will not need shared memory or synchronization primitives for
different 'tasks' to communicate.
One major disadvantage of using select(), is that your server cannot
act like there's only one client, like with a fork()'ing solution. For
example, with a fork()'ing solution, after the server fork()s, the
child process works with the client as if there was only one client in
the universe -- the child does not have to worry about new incoming
connections or the existence of other sockets. With select(), the
programming isn't as transparent.
http://www.lowtek.com/sockets/select.html

Related

server.listen(5) vs multithreading in socket programming

I am working on socket programming in python. I am a bit confused with the concept of s.listen(5) and multithreading.
As I know, s.listen(5) is used so that the server can listen upto 5 clients.
And multithreading is also used so that server can get connected to many clients.
Please explain me in which condition we do use multithreading?
Thanks in advance
You will need to use multithreading to handle multiple clients. When you accept a connection you receive a new socket instance that represents the connection with that new client. Now lets suppose you are making a chat and you need to receive the data from one client and send it to all connected clients, if you are not using multithreading you will need to implement a non-performatic logic using a single process loop to walk your connected clients reading each one and after all send to them the data, but you will have another problem because the listen function creates an IO interruption that waits until a new client try to connect if you don't use non-block socket. It's all about architecture, performance and good practices.
A good reading about multithreading follow this link https://techdifferences.com/difference-between-multiprocessing-and-multithreading.html.
As I know, s.listen(5) is used so that the server can listen upto 5 clients.
No. s.listen(5) declares a backlog of size 5. Than means that the listening socket will let 5 connection requests in pending stated before they are accepted. Each time a connection request is accepted it is no longer in the pending backlog. So there is no limit (other than the server resources) to the number of accepted connections.
A common use of multithreading is to start a new thread after a connection has been accepted to process that connection. An alternative is to use select on a single thread to process all the connections in the same thread. It used to be the rule before multithreading became common, but it can lead to more complex programs

Node clustering with websockets

I have a node cluster where the master responds to http requests.
The server also listens to websocket connections (via socket.io). A client connects to the server via the said websocket. Now the client choses between various games (with each node process handles a game).
The questions I have are the following:
Should I open a new connection for each node process? How to tell the client that he should connect to the exact node process X? (Because the server might handle incoming connection-requests on its on)
Is it possible to pass a socket to a node process, so that there is no need for opening a new connection?
What are the drawbacks if I just use one connection (in the master process) and pass the user messages to the respective node processes and the process messages back to the user? (I feel that it costs a lot of CPU to copy rather big objects when sending messages between the processes)
Is it possible to pass a socket to a node process, so that there is no
need for opening a new connection?
You can send a plain TCP socket to another node process as described in the node.js doc here. The basic idea is this:
const child = require('child_process').fork('child.js');
child.send('socket', socket);
Then, in child.js, you would have this:
process.on('message', (m, socket) => {
if (m === 'socket') {
// you have a socket here
}
});
The 'socket' message identifier can be any message name you choose - it is not special. node.js has code that when you use child.send() and the data you are sending is recognized as a socket, it uses platform-specific interprocess communication to share that socket with the other process.
But, I believe this only works for plain sockets that do not yet have any local state established yet other than the TCP state. I have not tried it with an established webSocket connection myself, but I assume it does not work for that because once a webSocket has higher level state associated with it beyond just the TCP socket (such as encryption keys), there's a problem because the OS will not automatically transfer that state to the new process.
Should I open a new connection for each node process? How to tell the
client that he should connect to the exact node process X? (Because
the server might handle incoming connection-requests on its on)
This is probably the simplest means of getting a socket.io connection to the new process. If you make sure that your new process is listening on a unique port number and that it supports CORS, then you can just take the socket.io connection you already have between the master process and the client and send a message to the client on it that tells the client where to reconnect to (what port number). The client can then contain code to listen for that message and make a connection to that new destination.
What are the drawbacks if I just use one connection (in the master
process) and pass the user messages to the respective node processes
and the process messages back to the user? (I feel that it costs a lot
of CPU to copy rather big objects when sending messages between the
processes)
The drawbacks are as you surmise. Your master process just has to spend CPU energy being the middle man forwarding packets both ways. Whether this extra work is significant to you depends entirely upon the context and has to be determined by measurement.
Here's ome more info I discovered. It appears that if an incoming socket.io connection that arrives on the master is immediately shipped off to a cluster child before the connection establishes its initial socket.io state, then this concept could work for socket.io connections too.
Here's an article on sending a connection to another server with implementation code. This appears to be done immediately at connection time so it should work for an incoming socket.io connection that is destined for a specific cluster. The idea here is that there's sticky assignment to a specific cluster process and all incoming connections of any kind that reach the master are immediately transferred over to the cluster child before they establish any state.

poll system call in linux drivers

I am learning Linux internals. So I came across the poll system call. As far as I understand, it is used by drivers to provide notification when some data is ready to be read from device and when we have data ready to device.
If device do not have any data to read, process will get sleep and wake up when data become available and vice versa for write case.
Can someone provide me concrete understanding of poll system call with some real example?
poll and select (the latter is very similar to poll with these differences) sys calls are used in so called asynchronous event-driven approach for handling client's requests.
Basically, in network programming there are two major strategies for handling many connections from network clients by the server:
1) more traditional: threaded or process-oriented approach. In this situation network server has main proccess which listens on one specific network port (port 80 in case of web servers) for incomming connections and when connection arrives, it spawns new thread/process to handle this new connection. Apache HTTP server took this approch.
2) aforementioned asynchronous event-driven approach where (in simplest case) network server (for example web server) is application with only one process and it accepts connections (creating socket for each new client) and then it monitors those sockets with poll/select for incoming data. Nginx http web server took this approch.

How does an asynchronous socket server work?

I should state that I'm not asking about specific implementation details (yet), but just a general overview of what's going on. I understand the basic concept behind a socket, and need clarification on the process as a whole. My (probably very wrong) understanding is currently this:
A socket is constantly listening for clients that want to connect (in its own thread). When a connection occurs, an event is raised that spawns another thread to perform the connection process. During the connection process the client is assigned it's own socket in which to communicate with the server. The server then waits for data from the client and when data arrives an event is raised which spawns a thread to read the data from a stream into a buffer.
My questions are:
How off is my understanding?
Does each client socket require it's own thread to listen for data on?
How is data routed to the correct client socket? Is this something taken care of by the guts of TCP/UDP/kernel?
In this threaded environment, what kind of data is typically being shared, and what are the points of contention?
Any clarifications and additional explanation would be greatly appreciated.
EDIT:
Regarding the question about what data is typically shared and points of contention, I realize this is more of an implementation detail than it is a question regarding general process of accepting connections and sending/receiving data. I had looked at a couple implementations (SuperSocket and Kayak) and noticed some synchronization for things like session cache and reusable buffer pools. Feel free to ignore this question. I've appreciated all your feedback.
One thread per connection is bad design (not scalable, overly complex) but unfortunately way too common.
A socket server works more or less like this:
A listening socket is setup to accept connections, and added to a socketset
The socket set is checked for events
If the listening socket has pending connections, new sockets are created by accepting the connections, and then added to the socket set
If a connected socket has events, the relevant IO functions are called
The socket set is checked for events again
This happens in one thread, you can easily handle thousands of connected sockets in a single thread, and there's few valid reasons for making this more complex by introducing threads.
while running
select on socketset
for each socket with events
if socket is listener
accept new connected socket
add new socket to socketset
else if socket is connection
if event is readable
read data
process data
else if event is writable
write queued data
else if event is closed connection
remove socket from socketset
end
end
done
done
The IP stack takes care of all the details of which packets go to what "socket" in which order. Seen from the applications point of view, a socket represents a reliable ordered byte stream (TCP) or an unreliable unordered sequence of packets(UDP)
EDIT: In response to updated question.
I don't know either of the libraries you mention, but on the concepts you mention:
A session cache typically keeps data associated with a client, and can reuse this data for multiple connections. This makes sense when your application logic requires state information, but it's a layer higher than the actual networking end. In the above sample, the session cache would be used by the "process data" part.
Buffer pools are also an easy and often effective optimization of a high-traffic server. The concept is very easy to implement, instead of allocating/deallocating space for storing data you read/write, you fetch a preallocated buffer from a pool, use it, then return it to a pool. This avoids the (sometimes relatively expensive) backend allocation/deallocation mechanisms. This is not directly related to networking, you can just as well use buffer pools for e.g. something that reads chunks of files and process them.
How off is my understanding?
Pretty far.
Does each client socket require it's own thread to listen for data on?
No.
How is data routed to the correct client socket? Is this something taken care of by the guts of TCP/UDP/kernel?
TCP/IP is a number of layers of protocol. There's no "kernel" to it. It's pieces, each with a separate API to the other pieces.
The IP Address is handled in on place.
The port # is handled in another place.
The IP addresses are matched up with MAC addresses to identify a particular host. The port # is what ties a TCP (or UDP) socket to a particular piece of application software.
In this threaded environment, what kind of data is typically being shared, and what are the points of contention?
What threaded environment?
Data sharing? What?
Contention? The physical channel is the number one point of contention. (Ethernet, for example depends on collision-detection.) After that, well, every part of the computer system is a scarce resource shared by multiple applications and is a point of contention.

Reusing a port number in a UDP

In ASIO, s it possible to create another socket that has the same source port as another socket?
My UDP server application is calling receive_from using port 3000. It passes the packet
off to a worker thread which will send the response (currently using a dynamic source port).
The socket in the other thread is created like this:
udp::socket sock2(io_service, udp::endpoint(udp::v4(), 0));
And responds to the original request using the sender_endpoint saved with the original packet.
What I'd like to be able to do is respond to the client using the same source port as the server is listening on. But I can't see how that can be done. I get an exception if I try that saying address in use. Is it possible to do what I'm asking? The reason I want that is if I use dynamic ports, it means the clients need to add special firewall rules in windows to allow the reply packets to be read. I've found that if the source port is the same in the reply, windows firewall will allow it to pass back in.
The exception tells you as it is: you can't create two live sockets with the same source port. I don't know ASIO, but you should be able to create the socket before spinning off the thread, keeping reference to the socket and the thread for later use, and once the data sending thread is idle, joining back to it and sending any other stuff.
EDIT: with a little bit of effort, you can also make a socket for which you don't have to wait until the entire data from one thread has been sent: have a worker thread owning the socket listen on a queue for chunks of data (ideally exactly the size of the payload you intend to send) and send arbitrary chunks of payload to this queue, from multiple threads.
You should be able to use the SO_REUSEADDR socket option to bind multiple sockets to the same address. But having said that, you don't want to do this because it's not specified which socket will receive incoming data on that port (you would have to check all sockets for incoming data)
The better option is just to use the same socket to send replies - this can safely be done from multiple threads without any additional synchronisation (as you are using UDP).
send reply to the same socket (that you received client's request on) instead of creating new one
but make sure you don't send to the same socket from both threads simultaneously

Resources