perl cgi threads - multithreading

I am having bit of a problem with my cgi web application, I use ithreads to do some parallel processing, where all the thread have a common 'goal'. Thus I detach all of them, and once I find my answer, I call exit.
However the problem is that the script will actually continue processing even after the user has closed the connection and left, which of course if a problem resourcewise.
Is there any way to force exit on the parent process if the user has disconnected?

If you're running under Apache, if the client closes the connection prematurely, it sends a SIGTERM to the cgi process. In my simple testing, that kills the script and threads as default behavior.
However, if there is a proxy between the server and the client, it's possible that Apache will not be able to detect the closed connection (as the connection from the server to the proxy may remain open) - in that case, you're out of luck.

AFAIK create and destroy threads isn't (at least for now) a good Perl practice because it will constantly increase the memory usage!
You should think in some other way to get the job done. Usually the solution is create a pool of threads and send arguments with the help of a shared array or Thread::Queue.

I personally would suggest changing you approach and, when creating these threads for the client connection, would be to save and associate PID of each thread with the client connection. I personally like to use daemons instead of threads, ie. Proc::Daemon. When client disconnects prematurely (before the threads finish), send SIGTERM to each process ID associated with that client.
To exit gracefully, override the termination sub in the thread process with a stop condition, so something like:
$SIG{TERM} = sub { $continue = 0; };
Where $continue would be the condition of the thread processing loop. You still would have to watch out for code errors, because even you can try overriding $SIG{__DIE__}, the die() method usually doesn't respect that and dies instantly without grace ;) (at least from my experience)

I'm not sure how you go about detecting if the user has disconnected, but, if they have, you'll have to make the threads stop yourself, since they're obviously not being killed automatically.
Destroying threads is a dangerous operation, so there isn't a good way to do it.
The standard way, as far as I know, is to have a shared variable that the threads check periodically to determine if they should keep working. Set it to some value before you exit, and check for that value inside your threads.
You can also send a signal to the threads to kill them. The docs know more about this than I do.

Related

Multi threaded Linux Socket programming design

I am trying to write a server program which supports one client till now and over the few days i was trying to develop it, I concluded i needed threads. The reason for such a decision was since I take input from a wifi socket and later process it and finally write to a file, the processing time is slow and hence i needed a input thread -> circular buffer -> output thread pattern with producer consumer model which is quite common in network programming.
Now, The situation becomes complicated, as I need to manage client disconnection and re connection. I thought of using pthread_exit() and cleaning up all the semaphores and then re initializing them each time the single client re connects.
My question is that is this a efficient approach i.e. everytime killing the threads and semaphores and re creating them. Are there any better solutions.
Thanks.
My question is that is this a efficient approach i.e. everytime killing the threads and semaphores and re creating them. Are there any better solutions.
Learn how to use non-blocking sockets and an event loop. Or use a library that provides TCP sessions for you using non-blocking sockets under the hood. Such as boost::asio.
Learn how to use multi-threading without polluting your code with any synchronization primitives by using message passing to communicate between threads, not shared state. The event loop library you use for non-blocking I/O should also provide means for cross-thread message passing.
Some comments and suggestions.
1-In TCP detecting that the other side has silently disconnected it very difficult if not impossible. A client could disconnect sending a RST TCP message to the server or sending a FIN message, this is the good case. Sometimes the client can disconnect without notice (crash, cable disconnection, etc).
One suggestion here is that you consider the way client and server will communicate. For example, you can use function “select” to set a timeout for receiving a message from client and detect a silent client.
Additionally, depending on the programming language and operating system you may need to handle broken pipe (SIGPIPE) signal (in Linux, with C/C++), for a server trying to send a message through a connection closed by the client.
2-Regarding semaphores, you shouldn’t need to clean semaphores in any especial way when a client disconnect. By applying common good practices of locking and unlocking mutexes should be enough. Also with resources like file descriptors, you need to release them before ending the thread either by returning from the thread start function or with pthread_exit. Maybe I didn’t understand this part of the question.
3-Regarding threads: if you work with multiple threads to optimum is to have a pool of pre-created consumer/worker threads that will check the circular buffer to consume the next available connection. Creating and destroying threads is costly for the operating system.
Threads are resource consuming and you may exhaust operating system resources if you need to create 1,000 threads for example.
Another alternative, is to have only one consumer thread that manages all connections (sockets) asynchronously: a) Each connection has its own state. b) The main thread goes through all connections and use function “select” to detect when connection reads or a writes are ready. 3)Use of non-blocking sockets but this is not essential because from select you know which sockets are ready and will not block.
You can use functions select, poll, epoll.
One link about select and non-blocking sockets: Using select() for non-blocking sockets
Other link with an example: http://linux.die.net/man/2/select

Multithreaded socket server using libev

I'm implementing a socket server.
All clients (up to 10k) are supposed to stay connected.
Here's my current design:
The main thread creates an event loop (use epoll by default) and a watcher for accepting clients.
The accept callback
Accept fd and set it to non-blocking mode.
Add watcher for the fd to monitor read events.
The read callback
Read data and add a task to thread pool to send response.
Is it OK to move read part to thread pool, or any other better idea?
Thanks.
Hard to say. You don't want 10k threads running in the background. You should keep the read part in the main thread. This way if suddently all clients start asking for things, you pile those resources only in the threadpool queue (You don't end up with 10k threads running at the same time). Also you might get better performance this way because you avoid doing some unnecessary context switches (between your own threads).
On the other hand if your clients are unlikely to send requests at the same time, or if the replies are very simple, it might be simpler to just have one thread per client, and avoid the context switch between the main thread and the thread pool.

General question about parallel threading in C++

I haven't used threading in my program before. But there is a problem I am having with this 3rd party application.
It is an offsite backup solution and it has a server and many clients. We have an admin console to manage all the clients and that is where there is a problem.
If one of the client side application gets stuck, or is running in a broken condition, the admin console waits forever to get a response and does not display anything.
$for(client= client1; client < last_client; client++){
if (getOServConnection(client, &socHandler)!=NULL) { .. }
}
I want two solutions to this. I want to know if there is anyway, I can set a timeout for the function getOServConnection, so that I get a response within X seconds.
And, I want to know how to call this function in parallel for all clients, so that I get the response from all clients within X seconds.
the getOServConnection contains a WSAConnect call, and I don't want to use any options on the socket, since it is used by other modules and it will affect the application severely.
First.. If you move the call that hangs into a separate thread you can use the main thread for starting a timer an waiting for the timeout. If you are using Visual C++ and if you are in Win32 you can use the (rather old) MFC based timer. Once this timer expires it will launch a function call OnTimer. This timer does not affect your application's main thread as it works in a different system based thread.
Second.. If you need to start any number of threads with that connection you should start thinking of a design pattern to use for that. You could use a fixed number of threads, and in that case you may want to use a object pool. Or if the number of threads is (relatively) limitless you may want to use a factory method

Keeping FTP control connection alive

A while back I asked a question regarding keeping the control connection on an FTP session alive during a large transfer. Although I though I had success after implementing a solution for a question I'd already asked, it appears as though the ISP is the problem, i.e. they are causing my control connections to die during large transfers.
Interestingly, the old-school FTP client program "Leap-FTP" gets around this issue by just sending 'NOOP' commands to the server on the control connection during a download. While other popular clients die during transfers (Filezilla, my Python FTP script), LeapFTP runs strong due to this workaround.
I've done some research into threading and Queue, but am having trouble coming up with the code to make this happen.
The solution seems simple enough (in my head, at least): initiate a download, while that download function runs, send a NOOP command every n seconds. Stop sending the NOOP command after the download function completes.
I'm hoping that someone can give me a suggestion as to how this might be done. Will it involve the use of threading, Queue, or is there a more simple solution?
Bottom line is, after a lot of testing, the 'NOOP' command is going to have to be sent during the large downloads (which take place on high-numbered TCP ports).
Thanks!
In order to handle multiple sockets at one time in a single program, you can use the select function instead of threads. This is either simpler or more complicated, depending on your programming experience.
I find threads are usually simple but when something does go wrong debugging it is a real pain, while writing the code for socket multiplexing using select is more complex but less difficult to debug than threads.
The basics of using select is that you set up your sockets and call the select function. It will tell you which sockets are ready to read or write. Then you check the time. If it's been X seconds since your last NOOP, send one on the control socket. If the transfer socket is ready to read or write, handle it. If the control socket is ready to read, read it and check for NOOP response, error messages, control channel being closed, etc.
Since you don't care (much, anyway) about performance in this case, it's probably easiest to use a separate thread for it that sits in a loop simply sleeps for N seconds, checks to see if it's been cancelled, and if not sends a NOP and sleeps again.
If you are running on a Unix, it would be just as efficient to have the control connection program open the sockets for a transfer and then spawn a new process to do the transfer. That would leave the control program ready to wait for completion, send NOOP commands, or even start new transfers if the FTP server can support it.
That is sort of how the original FTP model was supposed to work and the reason it uses a control connection and separate data connections instead of the HTTP model with control and data mixed together.

Working with TADOQuery in thread

I'm writing the application, which connects to the DB and repetitively (1 minute interval) reads the data from a database. It's something like RSS feed reader, but with local DB. If the data reading fails, I try to reestablish the connection. I've designed it with TADOConnection and TADOQuery placed on the form (so with no dynamic creation). My aim is to keep the application "alive" from the user's point of view, so I placed the connection and the reading part into a single thread. The question is, how to do it best way ?
My design looks like this:
application start, the TADOConnection and TADOQuery are created along with the form
open connection in a separate thread (TADOConnection)
if the connection is established, suspend the connection thread, start the timer on the form, which periodically resumes another thread for data reading
if the reading thread succeeds, nothing happens and form timer keeps going, if it fails, the thread stops the timer and resume connection thread
Is it better to create TADOConnection or TADOQuery dynamically or it doesn't matter ? Is it better to use e.g. critical section in the threads or something (I have only one access to the component at the same time and only one thread) ?
Thanks for your suggestions
This question is fairly subjective, probably not subjective enough to get closed but subjective any way. Here's why I'd go for dynamically created ADO objects:
Keeps everything together: the code and the objects used to access the code. Using data access objects created on the form requires the Thread to have intimate knowledge of the Form's inner workings, that's never a good idea.
It's safer because you can't access those objects from other threads (including the main VCL thread). Sure, you're not planing on using those connections for anything else, you're not planning on using multiple threads etc, but maybe you'll some day forget about those restrictions.
It's future-proof. You might want to use that same thread from an other project. You might want to add an second thread accesing some other data to the same app.
I have a personal preference for creating data access objects dynamically from code. Yes, an subjective answer to a subjective question.
Run everything in the thread. Have a periodic timer in the thread that opens the DB connection, reads the data, "posts" it back to the main thread, and then disconnects. The thread needs to "sleep" while waiting for the time, e.g. on a Windows even that is signalled by the timer. The DB components, which are local and private to the thread, can be created inside the thread when thread executions starts (on application startup), and freed when thread execution finishes (on application shutdown). This will always work, regardless of whether the DB conncetion is temporarily available or not, and the main thread does not even have to communicate with the "DB thread". It is an architcture that I use all the time and is absolulutely bullet-proof.

Resources