I know it is forbidden in the OpenSSL API to call SSL_read and SSL_write from two different threads on the same SSL context, but it's important in my application to have secure full-duplex communication. I thought of some solutions, none of which I really like:
Use two SSL contexts per connection. I don't like this because it uses more resources, and it would complicate my implementation. However, I would be fine using this if I could just "duplicate" an existing SSL context rather than creating a whole new connection from scratch.
Use non-blocking sockets with a mutex controlling access to an SSL context. This would require resource-hogging polling, and I heard the non-blocking implementation is just not very good.
This seems like it would be a rather common thing to do, so what is an accepted solution to this problem?
Use non-blocking sockets with a mutex controlling access to an SSL context.
With non-blocking sockets inside a single thread you would not need mutexes, because you only either read or write (because of single thread).
This would require resource-hogging polling, ...
You would not need "resource-hogging" polling. I assume you mean here busy polling instead of using the usual facilities of the system (like select) to wait (not loop) until data are available or data can be send. But contrary to read/write on plain sockets, SSL sockets can need a read if they want to write and a write if they want to read and they might have data inside even if the socket is not readable. Look out for SSL_WANT_READ, SSL_WANT_WRITE and SSL_pending.
...and I heard the non-blocking implementation is just not very good.
It looks more complex if you are used to using threads. But did you ever wonder why high performance servers like nginx don't use threads but non-blocking I/O? That is because it needs less resources and has not the problems associated with threads, like needing to mutex your way around critical sections (overhead) and getting strange and sporadic errors when forgetting to mutex something. nginx also uses openssl with non-blocking I/O.
I personally use non-blocking I/O all the time and while it is harder to do correctly with SSL because of the protocol itself and not the OpenSSL implementation it is doable and fast.
This means, that non-blocking I/O within a single thread is a way you could go to solve your problem. The other way would be to let SSL only work with memory BIOs instead of real file descriptors and do all the reading and writing yourself. But this is probably even more complex than non-blocking I/O.
BTW, usually SSL context means the SSL_CTX object which can be shared between multiple SSL connections and which has probably no problems with multiple threads. What you mean is that the same SSL connection (the SSL object) should not be used from multiple threads.
Related
I am currently developing a networking software that uses a datagram socket (UDP) to send data to clients. Whenever I'd like to send data to a client I am currently invoking sendto() and passing the respective parameters. Yet I am wondering whether or not making a blocking call to sendto() from multiple threads at the same time is good idea or whether data might get interleaved or corrupted in some other way.
I have already found this answer: is winsock2 thread safe? but I am not sure if this holds true for sendto() as it does for send()
System calls are not atomic, you can't assume they are thread safe.
Thread safety depends on the system implementation. But thread safety just means you won't encounter crashes or memory corruption, it doesn't tell you anything about the behaviour. For example, you may have data interleaved in what you sent, with no respect to your threads calling order.
If you're working on Windows, Winsock2 seems to be thread safe on recent versions of the os. But once again, it doesn't mean it'll behave as you expect.
Rather than using several threads to send to or receive from a socket, you should consider using IO ports, which are meant for multithreading and asynchronous processing.
It is a system call, and system calls are atomic, and therefore thread-safe.
It is UDP, and UDP send()/sendto()/sendmsg() sends a single datagram, and UDP guarantees datagram integrity, if it arrives at all.
But IMHO two threads writing to the same socket is probably never going to work at the application level without extreme care at a higher level.
I am trying to write a server program which supports one client till now and over the few days i was trying to develop it, I concluded i needed threads. The reason for such a decision was since I take input from a wifi socket and later process it and finally write to a file, the processing time is slow and hence i needed a input thread -> circular buffer -> output thread pattern with producer consumer model which is quite common in network programming.
Now, The situation becomes complicated, as I need to manage client disconnection and re connection. I thought of using pthread_exit() and cleaning up all the semaphores and then re initializing them each time the single client re connects.
My question is that is this a efficient approach i.e. everytime killing the threads and semaphores and re creating them. Are there any better solutions.
Thanks.
My question is that is this a efficient approach i.e. everytime killing the threads and semaphores and re creating them. Are there any better solutions.
Learn how to use non-blocking sockets and an event loop. Or use a library that provides TCP sessions for you using non-blocking sockets under the hood. Such as boost::asio.
Learn how to use multi-threading without polluting your code with any synchronization primitives by using message passing to communicate between threads, not shared state. The event loop library you use for non-blocking I/O should also provide means for cross-thread message passing.
Some comments and suggestions.
1-In TCP detecting that the other side has silently disconnected it very difficult if not impossible. A client could disconnect sending a RST TCP message to the server or sending a FIN message, this is the good case. Sometimes the client can disconnect without notice (crash, cable disconnection, etc).
One suggestion here is that you consider the way client and server will communicate. For example, you can use function “select” to set a timeout for receiving a message from client and detect a silent client.
Additionally, depending on the programming language and operating system you may need to handle broken pipe (SIGPIPE) signal (in Linux, with C/C++), for a server trying to send a message through a connection closed by the client.
2-Regarding semaphores, you shouldn’t need to clean semaphores in any especial way when a client disconnect. By applying common good practices of locking and unlocking mutexes should be enough. Also with resources like file descriptors, you need to release them before ending the thread either by returning from the thread start function or with pthread_exit. Maybe I didn’t understand this part of the question.
3-Regarding threads: if you work with multiple threads to optimum is to have a pool of pre-created consumer/worker threads that will check the circular buffer to consume the next available connection. Creating and destroying threads is costly for the operating system.
Threads are resource consuming and you may exhaust operating system resources if you need to create 1,000 threads for example.
Another alternative, is to have only one consumer thread that manages all connections (sockets) asynchronously: a) Each connection has its own state. b) The main thread goes through all connections and use function “select” to detect when connection reads or a writes are ready. 3)Use of non-blocking sockets but this is not essential because from select you know which sockets are ready and will not block.
You can use functions select, poll, epoll.
One link about select and non-blocking sockets: Using select() for non-blocking sockets
Other link with an example: http://linux.die.net/man/2/select
I have two processes: a producer which pushes messages via ZMQ to a consumer in a simple PULL-PUSH point-to-point pattern. The producer has several internal threads that send() via zmq. However, 0MQ's docs suggest not to share sockets between threads.
Must I use a single thread to send?
Assuming there is no strict requirement for keeping the sending order between the threads, doesn't the fact that the socket is a one-directional simplex allow multiple threads to use it without introducing locks?
The easiest thing to do is to create a separate PUSH socket on each of producer's threads and connect all these sockets to a single PULL socket in consumer.
It's explicitly stated in the guide that ZeroMQ sockets must be used on a single thread. I'd say that violating this requirement is not a good idea, even if it seems to work: things may break in the next version of the library or on some specific platform or in some specific load scenario. So, it's just too risky.
I have created one application with server and client class which have methods for creating either creating a tcp socket or udp socket. Now my requirement is i have created two application instances of this application. Since application is in c++ in unix environment I am using putty software to run the application. I have opened two instances of putty. But now my requirement is as follows:
There can be multiple communication instances between the 2 application instances
Each communication instance, There can be multiple communication instances between the 2 application instances
Each communication instance, can be either UDP or TCP (determined from the config file)be either UDP or TCP (determined from the config fil
Anybody who knows how to create such multiple instances.
Hmm, so there are two processes, but they want the processes to be able to communicate with each other via more than one pair of sockets? i.e. you could have two (or more) TCP socket connections between the two processes, and/or two (or more) pairs of UDP sockets sending packets back and forth.
If my above paragraph is correct (i.e. if I haven't misunderstood the request), that is certainly possible, although it's not terribly obvious what advantage you'd gain by doing it. Nevertheless, what you'd need to do is have each instance of your application create multiple sockets (either by socket()+bind() for a UDP socket, or by socket()+bind()+listen()+accept() for accepting an incoming TCP connection, or by socket()+connect() to initiate a TCP connection to the other program instance.
The tricky part with managing multiple sockets is handling the waiting correctly. With just one socket you can often get away with using the default blocking I/O semantics, and that way you can end up treating the socket something like a file, and just let each send() or recv() operation (etc) take however long it needs to take to complete before it returns to your calling function.
With more than one socket, on the other hand, you typically want to be able to respond to data on any of the sockets that are ready, which means that you can't just block waiting on any one particular socket, because if you do that, you may end up stuck waiting for a long time (potentially forever!) before that blocking call returns, and in the meantime you are unable to handle any data coming in from any of the other sockets. (The problem becomes particularly obvious when one of the connections is to a computer whose plug was just pulled, as it will typically take the TCP stack several minutes to figure out that the remote computer has gone away)
To deal with the problem, you'll typically want to either use non-blocking I/O and a socket-multiplexing call (e.g. poll() or select() or kqueue()), or spawn multiple threads and let each thread handle a single socket. Neither approach is particularly easy -- the socket-multiplexing approach works well once you get the hang of it, but the multiplexing calls' semantics are somewhat complex, and it takes a while to understand fully how it is intended to work. Non-blocking I/O complicates things further, since it means your code has to correctly deal with partial reads and writes. The multithreading approach seems simpler at first, but it has its own much larger and more subtle set of 'gotchas' (race conditions, deadlocks) that can cause much pain in the long run if you aren't very careful about what the threads are doing and how.
ps Since you're in a Unix environment, a third possible approach would be to fork() a child process for each socket. This would be similar to the multithreading approach, except a bit safer since your "threads" would actually be processes and each would have their own separate memory space, and thus they'd be less likely to trip over each other while doing their work. The downside would be higher memory usage, and also it becomes a bit harder (and slower) for the processes to communicate with each other due to the process space separation.
I've read the C10K doc as well as many related papers on scaling up a socket server. All roads point to the following:
Avoid the classic mistake of "thread per connection".
Prefer epoll over select.
Likewise, legacy async io mechanism in unix may be hard to use.
My simple TCP server just listens for client connections on a listen socket on a dedicated port. Upon receiving a new connection, parses the request, and sends a response back. Then gracefully closes the socket.
I think I have a good handle on how to scale this up on a single thread using epoll. Just one loop that calls epoll_wait for the listen socket as well as for the existing client connections. Upon return, the code will handle new creating new client connections as well as managing state of existing connections depending on which socket just got signaled. And perhaps some logic to manage connection timeouts, graceful closing of sockets, and efficient resource allocation for each connection. Seems straightforward enough.
But what if I want to scale this to take advantage of multiple threads and multiple cpu cores? The core idea that springs to mind is this:
One dedicated thread for listening for incoming connections on the TCP listen socket. Then a set of N threads (or thread pool) to handle all the active concurrent client connections. Then invent some thread safe way in which the listen thread will "dispatch" the new connection (socket) to one of the available worker threads. (ala IOCP in Windows). The worker thread will use an epoll loop on all the connections it is handling to do what the single threaded approach would do.
Am I on the right track? Or is there a standard design pattern for doing a TCP server with epoll on multiple threads?
Suggestions on how the listen thread would dispatch a new connection to the thread pool?
Firstly, note that it's C*10K*. Don't concern yourself if you're less than about 100 (on a typical system). Even then it depends on what your sockets are doing.
Yes, but keep in mind that epoll manipulation requires system calls, and their cost may or may not be more expensive than the cost of managing a few fd_sets yourself. The same goes for poll. At low counts its cheaper to be doing the processing in user space each iteration.
Asynchronous IO is very painful when you're not constrained to just a few sockets that you can juggle as required. Most people cope by using event loops, but this fragments and inverts your program flow. It also usually requires making use of large, unwieldy frameworks for this purpose since a reliable and fast event loop is not easy to get right.
The first question is, do you need this? If you're handily coping with the existing traffic by spawning off threads to handle each incoming request, then keep doing it this way. The code will be simpler for it, and all your libraries will play nicely.
As I mentioned above, juggling simultaneous requests can be complex. If you want to do this in a single loop, you'll also need to make guarantees about CPU starvation when generating your responses.
The dispatch model you proposed is the typical first step solution if your responses are expensive to generate. You can either fork or use threads. The cost of forking or generating a thread should not be a consideration in selecting a pooling mechanism: rather you should use such a mechanism to limit or order the load placed on the system.
Batching sockets onto multiple epoll loops is excessive. Use multiple processes if you're this desperate. Note that it's possible to accept on a socket from multiple threads and processes.
I would guess you are on the right track. But I also think details depend upon the particular situation (bandwidh, request patterns, indifidual request processing, etc.). I think you should try, and benchmark carefully.