Design of backend TCP/IP communication with Python

Design of backend TCP/IP communication with Python - python-3.x

I have a theoretical problem i cannot find solution that easily and would love some feedback on my thought process now.
I took over this backend project after colleague, so i inherited some of his thought processes and designs, i never really designed something like this on my own, here is a problem:
I am programming backend in python between a measuring unit(MU) running on LabView, which provides me byte data over tcp/ip socket -> i parse it, cache it, store it in db, whatever, and then send it to frontend. Thing is that it is now set up as me being the server listening for connections and data transfers from MU, but data needs to flow both ways, he needs to listen too for my requests of his data. For some reason it is now programmed so that there is new connection created and closed on my server socket.
The following code is oversimplified, but i hope u get the logic.
self.sock = sock.socket(...) #my listening socket
self.sock.bind(self.host, self.port)
self.sock.listen(1)
client_sock, client_address = self.socket.accept()
data = client_sock.recv(100)
client_sock.close()
client_sock, client_address = self.socket.accept()
and so it goes on and on 4ever, I have a new thread starting for each accept(), so there can be more connections made on this socket.
Thing is as I have found out, there is not a good way to initialize connection from me to MU when i want to send data, since connections is already closed and here is where i need your help. Colleague told me that there is good reason for it to be this way since the connections are atomic and if something goes wrong, the bytestream wont be so corrupt for next data transfer, since it is whole new connection. But I find it hard to accept, my instincts say that I should rather keep the connection open, the traffic there is very frequent.
IS IT SAFE?
By this logic i see no other option than for him to have listening socket as well. How can i connect to him otherwise? no way right?
But if i could just accept this one connection and use this one for two-way traffic, is it ok? Or should both of us be server AND client? The MU should not be doing any backend stuff, we try to keep it very simple in its purpose.
I would probably go with one connection instinctively, but this colleague of mine who handed it over to me confused me with this one connection per transfer idea.
if u managed to read this far, thank you and your thoughts or experiences are welcome!
Thank you.

It's OK to keep a connection for two-way communication.
What exactly could happen to bytestream, so it will become "corrupted" ? Data is transferred in chunks, and if one chunk becomes invalid, you will lost it anyway.
The problem here might be that the next chunks could not be distinguished, but it's easy to defend from such cases. For example, you can use frame/message headers. After all, you can always kill connection on server side if something goes wrong (forcing other side to establish a new one).
It's also worth to mention, that if your MU was a real sensor placed in the field it would become complicated to connect to it since it's address (IP) changes dynamically.

Related

server.listen(5) vs multithreading in socket programming

I am working on socket programming in python. I am a bit confused with the concept of s.listen(5) and multithreading.
As I know, s.listen(5) is used so that the server can listen upto 5 clients.
And multithreading is also used so that server can get connected to many clients.
Please explain me in which condition we do use multithreading?
Thanks in advance

You will need to use multithreading to handle multiple clients. When you accept a connection you receive a new socket instance that represents the connection with that new client. Now lets suppose you are making a chat and you need to receive the data from one client and send it to all connected clients, if you are not using multithreading you will need to implement a non-performatic logic using a single process loop to walk your connected clients reading each one and after all send to them the data, but you will have another problem because the listen function creates an IO interruption that waits until a new client try to connect if you don't use non-block socket. It's all about architecture, performance and good practices.
A good reading about multithreading follow this link https://techdifferences.com/difference-between-multiprocessing-and-multithreading.html.

As I know, s.listen(5) is used so that the server can listen upto 5 clients.
No. s.listen(5) declares a backlog of size 5. Than means that the listening socket will let 5 connection requests in pending stated before they are accepted. Each time a connection request is accepted it is no longer in the pending backlog. So there is no limit (other than the server resources) to the number of accepted connections.
A common use of multithreading is to start a new thread after a connection has been accepted to process that connection. An alternative is to use select on a single thread to process all the connections in the same thread. It used to be the rule before multithreading became common, but it can lead to more complex programs

Do I need to concern security about my game server? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have a game server where clients can connect and communicate with via TCP. Any device can connect the server if it knows the IP and port.
I am wondering if I need to add some security to the server. For example,
(1) Add some encryption for the messages sent/received. (To prevent the protocol content is revealed)
(2) Add some key to the message so if the server cannot recognize the key after decryption, the message will be dropped. (To prevent unknown connections/messages flooding in)
Do you think these things are necessary and is there any other thing I should add for such a game server.

I would have rather posted this to the gamedev question but the mods there are apparently faster than here. Before you quote me, I'd like to point out that the following isn't based on 100% book-knowledge, nor do I have a degree in any of these topics. Please improve this answer if you know better, rather than comment and/or compete.
This is a pretty comprehensive list of client/server/security issues that I've gathered from research and/or experience:
Data
The "back-end" server contains everyone's username, password, credit card details, etc., and should be a fortress. This server is for authentication only and should be on a private subnet; it will communicate only with the login server, only when a well-formed login request is received, and will only reply with "allow" or "deny". If you take people's personal information, you are obligated to protect it, and it would be wise to off-load the liability of everything security-related to a professional or hosting company. There is no non-critical attack to this server; if it is breached, you are finished. Many/most/all companies now draw their pretty login screen on top of another companies' back-end credit card/billing system.
Login
Connections to the login server should be secure. The login server is just a message pump between the public login mechanism, the private data store, and the client/server connection state. For security purposes, any HTTP access to the login system should be hosted on a separate HTTP server; the WWW server crashing should not shut down your online game (my opinion).
World/UDP
Upon successful login and authentication, the server informs the client to begin listening for "bulk data" or to initiate an in-bound connection on a specific UDP port (could be random and per-connection-attempt). Either way, the server should remain silent and wait for the client to IDENT with some type of handshake to verify that the "alleged client" is actually your code. It is easier to guess when the server asks for input sequentially; instead rely on the client knowing the proper handshake when connecting to the world and drop those that don't. The correct handshake to use can be a function of the CPU clock-ticks or whatever. The TCP will be minimally used and/or disconnected from that point on. The initial bulk data is a good place to advertise the current server-side software revision so clients that are out-of-date can update. A common pool of UDP ports can be handed out among multiple servers and the clients can be load-balanced into the correct port/server. Within the game, "zone transfers" can mean a literal disconnect from one server/port and reconnection to a different server/port. In MMO's, this usually appears as a <2 second loading screen; enough time to disconnect, reconnect, start getting data, and synchronize to the new server clock, not to mention the actual content loading.
"World server" describes a single, multiple-client, state-pumping thread running on a single core of a single processor of a single blade. One, physical, server-of-worlds can have many worlds running on it at once. Worlds can be dynamically split/merged (in a quad-tree fashion), dividing the clients between them, again, for load-balancing; synchronization between the servers occurs at LAN speeds or better. The world server will probably only serve UDP connections and should have nothing to do except process state-changes to/from the UDP connections. UDP is "blind, deaf, and dumb", so-to-speak. Messages are sent with no flow control, no error checking, etc; they are basically assumed to be received as soon as they are sent and may actually arrive late, in the wrong order, or just never arrive. Using UDP, neither the server nor the client are ever stalled, hand-shaking, error-correcting, or waiting for data. Messages need time-stamps because they may arrive late and/or out-of-order. If a UDP channel gets clogged, switch valid clients dynamically to another (potentially random) port. The world server only initiates UDP connections with successfully authenticated clients and ignores all other traffic (world servers hosted separately from HTTP and everything else).
Overly simplified and, using only the position data as an example, each client tells the server "Time:Client###:(X, Y)" over and over. If the server doesn't hear, oh well. The server says "Time:listOfClients(X, Y)" over and over, to everyone at once. If one or more of the clients doesn't hear, oh well.
This implies using prediction/extrapolation on the client; the clients will need to "guess" what should be happening and then correct themselves to agree with the server when they start getting data again. Any time you get a packet with a "future" time, even if the packet doesn't make sense or isn't useful, you can at least advance the client clock to that point and discard any now-late packets, helping a lagging client to catch up.
Un-verified supposition:
Besides the existing security concerns, I don't see a reason why two or more clients could not maintain independent, but server-managed, UDP channels between each other. By notifying other clients within close game-proximity in addition to the server, the clients, themselves, can help to load-balance. The server should always verify that what the clients say happened could/should/would happen, and has the ability to undo all of it and reset both clients to it's own known-good state. The information that the clients are able to share, internally, should be extremely restricted; basically just the most-time-critical positional and/or state-data. Client's should probably not be allowed to request specific information and, again, rely only on "dumb" broadcasts. This begins to approach distributed/cloud computing, where the clients are actually doing a lot of the server work, while the server just watches and "referees," calling foul, when appropriate.
Client1 - "I fought Client2 and won"
Client2 - "I fought Client1 and won"
Server - "I watched and Client2 cheated. Client1 wins. (Client2 is forced to agree)"
The server doesn't necessarily even need to watch; if Client2 damages Client1 in an unusual/impossible way, Client1 can request arbitration from the server.
Side-effects
If the player moves around, but the data isn't getting to the server, the player experiences "rubber-banding", where the player appears to be moving on the client but, server-side, they are not. When the client gets the next server state, the client snaps the player back to where they were when the server stopped getting updates, creating the rubber-band effect.
This often manifests another way, too. If the server sees a player moving, then fails to receive the "stopped moving" message, the server will predict their continued path for all of the other clients. In MMO-RPG's, for example, you can see "lagging" players running directly into/at walls.
Holes
The last thing I can think of is just basic code security. This is especially important if your game is moddable. Mods are, by definition, a way for users to insert their own code into yours. If you are careless about the amount of "API" access you give away, inevitably, someone WILL feel the need to be malicious. Pay particular attention to string termination/handling if the language you are using requires it. Do not build your game from plain-text ASCII content files. If your game has even one "text box," someone WILL be trying to feed HTML/LUA/etc. code into it.
Lastly, paths should use appropriate system variables whenever possible to avoid platform shenanigans and/or access violations (x86/x64, no savegames in ProgramFiles, etc.)

choose between tcp "long" connection and "short" connection for internal service

I got an app that web server re-direct some requests to backend servers, and the backend servers(Linux) will do complicated computations and response to web server.
For the tcp socket connection management between web server and backend server, i think there are two basic strategy:
"short" connection: that is, one connection per request. This seems very easy for socket management and simplify the whole program structure. After accept, we just get some thread to process the request and finally close this socket.
"long" connection: that is, for one tcp connection, there could be multi request one by one. It seems this strategy could make better use of socket resource and bring some performance improvement(i am not quite sure). BUT it seems this brings a lot of complexity than "short" connection. For example, since now socket fd may be used by multi-threads, synchronization must be involved. and there are more, socket failure process, message sequence...
Is there any suggestions for these two strategies?
UPDATE:, #SargeATM 's answer remind me that i should tell more about the backend service.
Each request is kind of context-free. Backend service can do calculation based on one single request message. It seems to be sth. stateless.

Without getting into the architecture of the backend which I think heavily influences this decision, I prefer short connections for stateless "quick" request/response type traffic and long connections for stateful protocols like a synchronization or file transfer.
I know there is some tcp overhead for establishing a new connection (if it isn't local host) but that has never been anything I have had to optimize in my applications.
Ok I will get a little into architecture since this is important. I always use threads not per request but by function. So I would have a thread that listened on the socket. Another thread that read packets off of all the active connections and another thread doing the backend calculations and a last thread saving to a database if needed. This keep things clean and simple. Easy to measure slow spots, maintain, and to optimize later when needed if needed.

What about a third option... no connection!
If your job description and job results are both of small size, UDP sockets may be a good idea. You have even less resources to manage, as there's no need to bound the request/response to a file descriptor, which give you some flexibility for the future. Imagine you have more backend services and would like to do some load balancing – a busy service can send the job to another one with UDP address of job submitter. The latter just waits for the result and doesn't care where you performed the task.
Obviously you'd have to deal with lost, duplicated and out of order packets, but as a reward you don't have to deal with broken connections. Out of order packets are probably not a big deal if you can fit the request and response in one UDP message, duplication can be taken care of by some job ids, and lost packet... well, they can be simply resent ;-)
Consider this!

Well, you are right.
The biggest problem with persistent connections will be making sure that app got "clean" connection from pool. Without any garbage left of data from another request.
There are a lot of ways to deal with that problem, but at the end it is better to close() tainted connection and open new one than trying to clean it...

Node.js game logics

I'm in process of making realtime multiplayer racing game. Now I need help writing game logics in Node.js TCP (net) server. I don't know if it's possible, I don't know if i'm doing that right, but I'm trying my best. I know it's hard to understand my broken english, so i made this "painting" :)
Thank you for your time

To elaborate on driushkin's answer, you should use remote procedure calls (RPC) and an event queue. This works like in the image you've posted, where each packet represents a 'command' or RPC with some arguments (i.e. movement direction). You'll also need an event queue to make sure RPCs are executed in order and on time. This will require a timestamp or framecount for each command to be executed on (at some point in the future, in a simple scheme), and synchronized watches (World War II style).
You might notice one critical weakness in this scheme: RPC messages can be late (arrive after the time they should be applied) due to network latency, malicious users, etc. In a simple scheme, late RPCs are dropped. This is fine since all clients (even the originator!) wait for the server to send an RPC before acting (if the originating client didn't wait for the server message, his game state would be out of sync with the server, and your game would be broken).
Consider the impact of lag on such a scheme. Let's say the lag for Client A to the server was 100ms, and the return trip was also 100ms. This means that client input goes like:
Client A presses key, and sends RPC to server, but doesn't add it locally (0ms)
Server receives and rebroadcasts RPC (100ms)
Client A receives his own event, and now finally adds it to his event queue for processing (200ms)
As you can see, the client reacts to his own event 1/5 of a second after he presses the key. This is with fairly nice 100ms lag. Transoceanic lag can easily be over 200ms each way, and dialup connections (rare, but still existent today) can have lag spikes > 500ms. None of this matters if you're playing on a LAN or something similar, but on the internet this unresponsiveness could be unbearable.
This is where the notion of client side prediction (CSP) comes in. CSP is made out to be big and scary, but implemented correctly and thoughtfully it's actually very simple. The interesting feature of CSP is that clients can process their input immediately (the client predicts what will happen). Of course, the client can (and often will) be wrong. This means that the client will need a way of applying corrections from the Server. Which means you'll need a way for the server to validate, reject, or amend RPC requests from clients, as well as a way to serialize the gamestate (so it can be restored as a base point to resimulate from).
There are lots of good resources about doing this. I like http://www.gabrielgambetta.com/?p=22 in particular, but you should really look for a good multiplayer game programming book.
I also have to suggest socket.io, even after reading your comments regarding Flex and AS3. The ease of use (and simple integration with node) make it one of the best (the best?) option(s) for network gaming over HTTP that I've ever used. I'd make whatever adjustments necessary to be able to use it. I believe that AIR/AS3 has at least one WebSockets library, even if socket.io itself isn't available.

This sounds like something socket.io would be great for. It's a library that gives you real time possibilities on the browser and on your server.

You can model this in commands in events: client sends command move to the server, then server validates this command and if everything is ok, he publishes event is moving.
In your case, there is probably no need for different responses to P1 (ok, you can move) and the rest (P1 is moving), the latter suffices in both cases. The is moving event should contain all necessary info (like current position, velocity, etc).
In this simplest form, the one issuing command would experience some lag until the event from server arrives, and to avoid that you could start moving immediately, and then apply some compensating actions if necessary when event arrives. But this can get complicated.

How does an asynchronous socket server work?

I should state that I'm not asking about specific implementation details (yet), but just a general overview of what's going on. I understand the basic concept behind a socket, and need clarification on the process as a whole. My (probably very wrong) understanding is currently this:
A socket is constantly listening for clients that want to connect (in its own thread). When a connection occurs, an event is raised that spawns another thread to perform the connection process. During the connection process the client is assigned it's own socket in which to communicate with the server. The server then waits for data from the client and when data arrives an event is raised which spawns a thread to read the data from a stream into a buffer.
My questions are:
How off is my understanding?
Does each client socket require it's own thread to listen for data on?
How is data routed to the correct client socket? Is this something taken care of by the guts of TCP/UDP/kernel?
In this threaded environment, what kind of data is typically being shared, and what are the points of contention?
Any clarifications and additional explanation would be greatly appreciated.
EDIT:
Regarding the question about what data is typically shared and points of contention, I realize this is more of an implementation detail than it is a question regarding general process of accepting connections and sending/receiving data. I had looked at a couple implementations (SuperSocket and Kayak) and noticed some synchronization for things like session cache and reusable buffer pools. Feel free to ignore this question. I've appreciated all your feedback.

One thread per connection is bad design (not scalable, overly complex) but unfortunately way too common.
A socket server works more or less like this:
A listening socket is setup to accept connections, and added to a socketset
The socket set is checked for events
If the listening socket has pending connections, new sockets are created by accepting the connections, and then added to the socket set
If a connected socket has events, the relevant IO functions are called
The socket set is checked for events again
This happens in one thread, you can easily handle thousands of connected sockets in a single thread, and there's few valid reasons for making this more complex by introducing threads.
while running
select on socketset
for each socket with events
if socket is listener
accept new connected socket
add new socket to socketset
else if socket is connection
if event is readable
read data
process data
else if event is writable
write queued data
else if event is closed connection
remove socket from socketset
end
end
done
done
The IP stack takes care of all the details of which packets go to what "socket" in which order. Seen from the applications point of view, a socket represents a reliable ordered byte stream (TCP) or an unreliable unordered sequence of packets(UDP)
EDIT: In response to updated question.
I don't know either of the libraries you mention, but on the concepts you mention:
A session cache typically keeps data associated with a client, and can reuse this data for multiple connections. This makes sense when your application logic requires state information, but it's a layer higher than the actual networking end. In the above sample, the session cache would be used by the "process data" part.
Buffer pools are also an easy and often effective optimization of a high-traffic server. The concept is very easy to implement, instead of allocating/deallocating space for storing data you read/write, you fetch a preallocated buffer from a pool, use it, then return it to a pool. This avoids the (sometimes relatively expensive) backend allocation/deallocation mechanisms. This is not directly related to networking, you can just as well use buffer pools for e.g. something that reads chunks of files and process them.

How off is my understanding?
Pretty far.
Does each client socket require it's own thread to listen for data on?
No.
How is data routed to the correct client socket? Is this something taken care of by the guts of TCP/UDP/kernel?
TCP/IP is a number of layers of protocol. There's no "kernel" to it. It's pieces, each with a separate API to the other pieces.
The IP Address is handled in on place.
The port # is handled in another place.
The IP addresses are matched up with MAC addresses to identify a particular host. The port # is what ties a TCP (or UDP) socket to a particular piece of application software.
In this threaded environment, what kind of data is typically being shared, and what are the points of contention?
What threaded environment?
Data sharing? What?
Contention? The physical channel is the number one point of contention. (Ethernet, for example depends on collision-detection.) After that, well, every part of the computer system is a scarce resource shared by multiple applications and is a point of contention.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string