Keeping FTP control connection alive - multithreading

A while back I asked a question regarding keeping the control connection on an FTP session alive during a large transfer. Although I though I had success after implementing a solution for a question I'd already asked, it appears as though the ISP is the problem, i.e. they are causing my control connections to die during large transfers.
Interestingly, the old-school FTP client program "Leap-FTP" gets around this issue by just sending 'NOOP' commands to the server on the control connection during a download. While other popular clients die during transfers (Filezilla, my Python FTP script), LeapFTP runs strong due to this workaround.
I've done some research into threading and Queue, but am having trouble coming up with the code to make this happen.
The solution seems simple enough (in my head, at least): initiate a download, while that download function runs, send a NOOP command every n seconds. Stop sending the NOOP command after the download function completes.
I'm hoping that someone can give me a suggestion as to how this might be done. Will it involve the use of threading, Queue, or is there a more simple solution?
Bottom line is, after a lot of testing, the 'NOOP' command is going to have to be sent during the large downloads (which take place on high-numbered TCP ports).
Thanks!

In order to handle multiple sockets at one time in a single program, you can use the select function instead of threads. This is either simpler or more complicated, depending on your programming experience.
I find threads are usually simple but when something does go wrong debugging it is a real pain, while writing the code for socket multiplexing using select is more complex but less difficult to debug than threads.
The basics of using select is that you set up your sockets and call the select function. It will tell you which sockets are ready to read or write. Then you check the time. If it's been X seconds since your last NOOP, send one on the control socket. If the transfer socket is ready to read or write, handle it. If the control socket is ready to read, read it and check for NOOP response, error messages, control channel being closed, etc.

Since you don't care (much, anyway) about performance in this case, it's probably easiest to use a separate thread for it that sits in a loop simply sleeps for N seconds, checks to see if it's been cancelled, and if not sends a NOP and sleeps again.

If you are running on a Unix, it would be just as efficient to have the control connection program open the sockets for a transfer and then spawn a new process to do the transfer. That would leave the control program ready to wait for completion, send NOOP commands, or even start new transfers if the FTP server can support it.
That is sort of how the original FTP model was supposed to work and the reason it uses a control connection and separate data connections instead of the HTTP model with control and data mixed together.

Related

When to use synchronous - blocking code in Node.js

I had asked in an interview, are there any cases that may force you to use blocking code in a node.js server?
my answer was: I didn't ever need that in any project but I think it may be useful in some tasks that need much CPU processing like Some Image Processing or video generation.
so experts, can you correct that for me, is there any case that a blocking code would be a must?
First off, you have to distinguish between the different types of programs. A server that you expect to be responsive to many different incoming requests has very different needs than a single user program you write to do some file management or fetch some content and insert it in a database.
So, if you're not a multi-user server, you may be able to use synchronous I/O everywhere it's offered (most specifically for file access). For example, I have several scripts that do file management on my hard disk. These scripts don't have any server component and are run automatically in the middle of the night to trim backups, trim log files, etc... These scripts are perfectly OK to use synchronous I/O for pretty much anything.
If, on the other hand, you are a mutli-user server and you need to be responsive to incoming requests that can arrive at any time, then the only two times you can/should use blocking I/O or blocking crypto are at startup time or in some sort of shut-down scenario. For all other code in service of incoming requests, you have to use non-blocking, asynchronous I/O to avoid locking up your server during a request and making it non-responsive to new incoming requests.
If you have time consuming, CPU-intensive operations such as image processing or video generation, then you will want to offload that processing to another thread or process so that your main server thread is not blocked doing that processing. A typical way of handling that would be to create a worker pool of N processes/threads that can be sent jobs to crunch on. Then, you keep your most CPU-intensive work out of the main nodejs thread, allowing it to stay responsive to incoming requests.
so experts, can you correct that for me, is there any case that a blocking code would be a must?
Synchronous (blocking) I/O vastly simplifies server startup as you can do things like read configurations synchronously. You could write that code asynchronously, but then your module interface often end up having to return promises that indicate when it's actually ready and done with its initialization which complicates using the module.
For example, require() is synchronous and this really, really helps make initialization a lot simpler.
The only place I know of in a server where blocking code might be required is if you're trying to write something to disk right before your program exits when it's already in the process of exiting. You get notified of an exit event and if you try to use asynchronous file I/O, then your program will exit before the I/O finishes. In that case, you may need to use synchronous file I/O (which is not a problem in that circumstance).

Multi threaded Linux Socket programming design

I am trying to write a server program which supports one client till now and over the few days i was trying to develop it, I concluded i needed threads. The reason for such a decision was since I take input from a wifi socket and later process it and finally write to a file, the processing time is slow and hence i needed a input thread -> circular buffer -> output thread pattern with producer consumer model which is quite common in network programming.
Now, The situation becomes complicated, as I need to manage client disconnection and re connection. I thought of using pthread_exit() and cleaning up all the semaphores and then re initializing them each time the single client re connects.
My question is that is this a efficient approach i.e. everytime killing the threads and semaphores and re creating them. Are there any better solutions.
Thanks.
My question is that is this a efficient approach i.e. everytime killing the threads and semaphores and re creating them. Are there any better solutions.
Learn how to use non-blocking sockets and an event loop. Or use a library that provides TCP sessions for you using non-blocking sockets under the hood. Such as boost::asio.
Learn how to use multi-threading without polluting your code with any synchronization primitives by using message passing to communicate between threads, not shared state. The event loop library you use for non-blocking I/O should also provide means for cross-thread message passing.
Some comments and suggestions.
1-In TCP detecting that the other side has silently disconnected it very difficult if not impossible. A client could disconnect sending a RST TCP message to the server or sending a FIN message, this is the good case. Sometimes the client can disconnect without notice (crash, cable disconnection, etc).
One suggestion here is that you consider the way client and server will communicate. For example, you can use function “select” to set a timeout for receiving a message from client and detect a silent client.
Additionally, depending on the programming language and operating system you may need to handle broken pipe (SIGPIPE) signal (in Linux, with C/C++), for a server trying to send a message through a connection closed by the client.
2-Regarding semaphores, you shouldn’t need to clean semaphores in any especial way when a client disconnect. By applying common good practices of locking and unlocking mutexes should be enough. Also with resources like file descriptors, you need to release them before ending the thread either by returning from the thread start function or with pthread_exit. Maybe I didn’t understand this part of the question.
3-Regarding threads: if you work with multiple threads to optimum is to have a pool of pre-created consumer/worker threads that will check the circular buffer to consume the next available connection. Creating and destroying threads is costly for the operating system.
Threads are resource consuming and you may exhaust operating system resources if you need to create 1,000 threads for example.
Another alternative, is to have only one consumer thread that manages all connections (sockets) asynchronously: a) Each connection has its own state. b) The main thread goes through all connections and use function “select” to detect when connection reads or a writes are ready. 3)Use of non-blocking sockets but this is not essential because from select you know which sockets are ready and will not block.
You can use functions select, poll, epoll.
One link about select and non-blocking sockets: Using select() for non-blocking sockets
Other link with an example: http://linux.die.net/man/2/select

Creating many communication instances in socket programming in c++ linux

I have created one application with server and client class which have methods for creating either creating a tcp socket or udp socket. Now my requirement is i have created two application instances of this application. Since application is in c++ in unix environment I am using putty software to run the application. I have opened two instances of putty. But now my requirement is as follows:
There can be multiple communication instances between the 2 application instances
Each communication instance, There can be multiple communication instances between the 2 application instances
Each communication instance, can be either UDP or TCP (determined from the config file)be either UDP or TCP (determined from the config fil
Anybody who knows how to create such multiple instances.
Hmm, so there are two processes, but they want the processes to be able to communicate with each other via more than one pair of sockets? i.e. you could have two (or more) TCP socket connections between the two processes, and/or two (or more) pairs of UDP sockets sending packets back and forth.
If my above paragraph is correct (i.e. if I haven't misunderstood the request), that is certainly possible, although it's not terribly obvious what advantage you'd gain by doing it. Nevertheless, what you'd need to do is have each instance of your application create multiple sockets (either by socket()+bind() for a UDP socket, or by socket()+bind()+listen()+accept() for accepting an incoming TCP connection, or by socket()+connect() to initiate a TCP connection to the other program instance.
The tricky part with managing multiple sockets is handling the waiting correctly. With just one socket you can often get away with using the default blocking I/O semantics, and that way you can end up treating the socket something like a file, and just let each send() or recv() operation (etc) take however long it needs to take to complete before it returns to your calling function.
With more than one socket, on the other hand, you typically want to be able to respond to data on any of the sockets that are ready, which means that you can't just block waiting on any one particular socket, because if you do that, you may end up stuck waiting for a long time (potentially forever!) before that blocking call returns, and in the meantime you are unable to handle any data coming in from any of the other sockets. (The problem becomes particularly obvious when one of the connections is to a computer whose plug was just pulled, as it will typically take the TCP stack several minutes to figure out that the remote computer has gone away)
To deal with the problem, you'll typically want to either use non-blocking I/O and a socket-multiplexing call (e.g. poll() or select() or kqueue()), or spawn multiple threads and let each thread handle a single socket. Neither approach is particularly easy -- the socket-multiplexing approach works well once you get the hang of it, but the multiplexing calls' semantics are somewhat complex, and it takes a while to understand fully how it is intended to work. Non-blocking I/O complicates things further, since it means your code has to correctly deal with partial reads and writes. The multithreading approach seems simpler at first, but it has its own much larger and more subtle set of 'gotchas' (race conditions, deadlocks) that can cause much pain in the long run if you aren't very careful about what the threads are doing and how.
ps Since you're in a Unix environment, a third possible approach would be to fork() a child process for each socket. This would be similar to the multithreading approach, except a bit safer since your "threads" would actually be processes and each would have their own separate memory space, and thus they'd be less likely to trip over each other while doing their work. The downside would be higher memory usage, and also it becomes a bit harder (and slower) for the processes to communicate with each other due to the process space separation.

UNIX socket magic. Recommended for high performance application?

I'm looking using to transfer an accept()ed socket between processes using sendmsg(). In short, I'm trying to build a simple load balancer that can deal with a large number of connections without having to buffer the stream data.
Is this a good idea when dealing with a large number (let's say hundreds) of concurrent TCP connections? If it matters, my system is Gentoo Linux
You can share the file descriptor as per the previous answer here.
Personally, I've always implemented servers using pre-fork. The parent sets up the listening socket, spawns (pre-forks) children, and each child does a blocking accept. I used pipes for parent <-> child communication.
Until someone does a benchmark and establishes how "hard" it is to send a file descriptor, this remains speculation (someone might pop up: "Hey, sending the descriptor like that is dirt-cheap"). But here goes.
You will (likely, read above) be better off if you just use threads. You can have the following workflow:
Start a pool of threads that just wait around for work. Alternatively you can just spawn a new thread when a request arrives (it's cheaper than you think)
Use epoll(7) to wait for traffic (wait for connections + interesting traffic)
When interesting traffic arrives you can just dispatch a "job" to one of the threads.
Now, this does circumvent the whole descriptor sending part. So what's the catch ? The catch is that if one of the threads crashes, the whole process crashes. So it is up to you to benchmark and decide what's best for your server.
Personally I would do it the way I outlined it above. Another point: if the workers are children of the process doing the accept, sending the descriptor is unnecessary.

PUB/SUB with short-lived publisher and long-lived subscribers

Context: OS: Linux (Ubuntu), language: C (actually Lua, but this should not matter).
I would prefer a ZeroMQ-based solution, but will accept anything sane enough.
Note: For technical reasons I can not use POSIX signals here.
I have several identical long-living processes on a single machine ("workers").
From time to time I need to deliver a control message to each of processes via a command-line tool. Example:
$ command-and-control worker-type run-collect-garbage
Each of workers on this machine should receive a run-collect-garbage message. Note: it would be perfect if the solution would somehow work for all workers on all machines in the cluster, but I can write that part myself.
This is easily done if I will store some information about running workers. For example keep the PIDs for them in a known location and open a control Unix domain socket on a known path with a PID somewhere in it. Or open TCP socket and store host and port somewhere.
But this would require careful management of the stored information — e.g. what if worker process suddenly dies? (Nothing unmanageable, but, still, extra fuss.) Also, the information needs to be stored somewhere, thus adding an extra bit of complexity.
Is there a good way to do this in PUB/SUB style? That is, workers are subscribers, command-and-control tool is a publisher, and all they know is a single "channel url", so to say, on which to come for messages.
Additional requirements:
Messages to the control channel must wake up workers from the poll (select, whatever)
loop.
Message delivery must be guaranteed, and it must reach each and every worker that is listening.
Worker should have a way to monitor for messages without blocking — ideally by the poll/select/whatever loop mentioned above.
Ideally, worker process should be "server" in a sense — he should not bother about keeping connections to the "channel server" (if any) persistent etc. — or this should be done transparently by the framework.
Usually such a pattern requires a proxy for the publisher, i.e. you send to the proxy which immediately accepts delivery and then that reliably forwads to the end subscriber workers. The ZeroMQ guide covers a few different methods of implementing this.
http://zguide.zeromq.org/page:all
Given your requirements, Steve's suggestion does seem the simplest: run a daemon which listens on two known sockets - the workers connect to that and the command tool pushes to it which redistributes to connected workers.
You could do something complicated that would probably work, by effectively nominating one of the workers. For example, on startup workers attempt to bind() a PUB ipc:// socket somewhere accessible, like tmp. The one that wins bind()s a second IPC as a PULL socket and acts as a forwarder device on top of it's normal duties, the others connect() to the original IPC. The command line tool connect()s to the second IPC, and pushes it's message. The risk there is that the winner dies, leaving a locked file. You could identify this in the command line tool, rebind then sleep (to allow the connections to be established). Still, that's all a little bit complex, I think I'd go with a proxy!
I think what you're describing would fit well with a gearmand/supervisord implementation.
Gearman is a great task queue manager and supervisord would allow you to make sure that the process(es) are all running. It's TCP based too so you could have clients/workers on different machines.
http://gearman.org/
http://supervisord.org/
I recently set something up with multiple gearmand nodes, linked to multiple workers so that there's no single point of failure
edit: Sorry - my bad, I just re-read and saw that this might not be ideal.
Redis has some nice and simple looking pub/sub functionality that I've not used yet but sounds promising.
Use a mulitcast PUB/SUB. You'll have to make sure the pgm option is compiled into your ZeroMQ distribution (man 7 zmq_pgm).

Resources