import multiprocess as mp
What are the key differences between mp.Pipe() and mp.Queue()? They seem to be the same to me: basically Pipe.recv() is equivalent to Queue.get(), and Pipe.send() is Queue.put().
They're very different things, with very different behavior.
A Queue instance has put, get, empty, full, and various other methods. It has an optional maximum size (number of items, really). Anyone can put or get into any queue. It is process-safe as it handles all the locking.
The Pipe function—note that this is a function, not a class instance—returns two objects of type Connection (these are the class instances). These two instances are connected to each other. The connection between them can be single-duplex, i.e., you can only send on one and only receive on the other, or it can be full-duplex, i.e., whatever you actually send on one is received on the other. The two objects have send, recv, send_bytes, recv_bytes, fileno, and close methods among others. The send and receive methods use the pickling code to translate between objects and bytes since the actual data transfer is via byte-stream. The connection objects are not locked and therefore not process-safe.
Data transfer between processes generally uses these Connection objects: this and shared-memory are the underlying mechanism for all process-to-process communication in the multiprocessing code. Queue instances are much higher level objects, which ultimately need to use a Connection to send or receive the byte-stream that represents the object being transferred across the queue. So in that sense, they do the same thing—but that's a bit like saying that a USB cable does the same thing as the things that connect them. Usually, you don't want to deal with individual voltages on a wire: it's much nicer to just send or receve a whole object. (This analogy is a little weak, because Connection instances have send and recv as well as send_bytes and recv_bytes, but it's probably still helpful.)
Related
What is the reasoning behind passing a list of ArraySegment<byte> to Socket.BeginReceive/SocketAsyncEventArgs?
MSDN for the Socket.BeginReceive constructor doesn't even correctly describe the first argument):
public IAsyncResult BeginReceive(
IList<ArraySegment<byte>> buffers,
SocketFlags socketFlags,
AsyncCallback callback,
object state
)
Paremeters:
buffers
Type: System.Collections.Generic.IList<ArraySegment<Byte>>
An array of type Byte that is the storage location for the received data.
...
I thought that the main idea was to allocate a large buffer on the Large Object Heap, and then pass a segment of this buffer to Socket.BeginReceive, to avoid pinning small objects around the heap and messing up GC's work.
But why should I want to pass several segments to these methods? In case of SocketAsyncEventArgs, it seems it will complicate pooling of these objects, and I don't see the reasoning behind this.
What I found out in this question and in MSDN:
There is an overloaded version of BeginReceive that takes a Byte-Array. When this is full or a packet has been received (that is logically in order), the callback is fired.
As stated in the answer I linked:
Reads can be multiples of that because if packets arrive out of order all of them are made visible to application the moment the logically first one arrives. You get to read from all contiguous queued packets at once in this case.
That means: If there is an incoming packet which is out of order (i.e. with a higher sequence number than the one expected), it will be kept back. As soon as the missing packet has arrived, all available packets are written to your list and only one callback is fired instead of firing the callback over and over again for all packets already available, each filling your buffer as far as possible, and so on.
So that means, this implementation saves a lot of overhead by providing all available packets in an array list calling the callback only once instead of doing a lot of memcopies from the network stack to your buffer and repeatedly calling you callback.
Let's have a worker thread which is accessed from a wide variety of objects. This worker object has some public slots, so anyone who connects its signals to the worker's slots can use emit to trigger the worker thread's useful tasks.
This worker thread needs to be almost global, in the sense that several different classes use it, some of them are deep in the hierarchy (child of a child of a child of the main application).
I guess there are two major ways of doing this:
All the methods of the child classes pass their messages upwards the hierarchy via their return values, and let the main (e.g. the GUI) object handle all the emitting.
All those classes which require the services of the worker thread have a pointer to the Worker object (which is a member of the main class), and they all connect() to it in their constructors. Every such class then does the emitting by itself. Basically, dependency injection.
Option 2. seems much more clean and flexible to me, I'm only worried that it will create a huge number of connections. For example, if I have an array of an object which needs the thread, I will have a separate connection for each element of the array.
Is there an "official" way of doing this, as the creators of Qt intended it?
There is no magic silver bullet for this. You'll need to consider many factors, such as:
Why do those objects emit the data in the first place? Is it because they need to do something, that is, emission is a “command”? Then maybe they could call some sort of service to do the job without even worrying about whether it's going to happen in another thread or not. Or is it because they inform about an event? In such case they probably should just emit signals but not connect them. Its up to the using code to decide what to do with events.
How many objects are we talking about? Some performance tests are needed. Maybe it's not even an issue.
If there is an array of objects, what purpose does it serve? Perhaps instead of using a plain array some sort of “container” class is needed? Then the container could handle the emission and connection and objects could just do something like container()->handle(data). Then you'd only have one connection per container.
Can some one give me idea how to send and receive data thru one connection in multithreading.
The model look like this:
What I know is that if all three clients are sending data at the same time, "client X" will receive a merge of all received data, and "client X" can't separate that data to identify which part is from which client.
Delphi 2010, Indy, Win7.
Sorry if my english is bad, I hope you understand the idea.
You need to implement a locking mechanism, such as a critical section or mutex, to prevent multiple threads from writing to the socket at exactly the same time.
When receiving data that is destined for multiple threads, you need to do the reading in one thread only, and have it pass on the data to the other threads as needed.
Either way, you need to frame your data so the receiver knows where one message ends and the next begins. Either be sending a message's length before sending the message contents, or by sending a unique delimiter in between messages that will never appear in the messages themselves.
Interestingly, I cannot find any discussion on this rather than some
old slides from 2004.
IMHO, the current scheme of epoll() usage is begging for something
like epoll_ctlv() call. Although this call does not make sense for
typical HTTP web servers, it does make sense in a game server where
we are sending same data to multiple clients at once. This does not
seem hard to implement given the fact that epoll_ctl() is already there.
Do we have any reason for not having this functionality? Maybe no
optimization window, there?
You would typically only use epoll_ctl() to add and remove sockets from the epoll set as clients connect and disconnect, which doesn't happen very often.
Sending the same data to multiple sockets would rather require a version of send() (or write()) that takes a vector of file descriptors. The reason this hasn't been implemented is probably just because no-one with sufficient interest in it has done so yet (of course, there are lots of subtle issues - what if each destination file descriptor can only successfully write a different number of bytes).
is it possible to have multiple threads sending on the same socket? will there be interleaving of the streams or will the socket block on the first thread (assuming tcp)? the majority of opinions i've found seems to warn against doing this for obvious fears of interleaving, but i've also found a few comments that state the opposite. are interleaving fears a carryover from winsock1 and are they well-founded for winsock2? is there a way to setup a winsock2 socket that would allow for lack of local synchronization?
two of the contrary opinions below... who's right?
comment 1
"Winsock 2 implementations should be completely thread safe. Simultaneous reads / writes on different threads should succeed, or fail with WSAEINPROGRESS, depending on the setting of the overlapped flag when the socket is created. Anyway by default, overlapped sockets are created; so you don't have to worry about it. Make sure you don't use NT SP6, if ur on SP6a, you should be ok !"
source
comment 2
"The same DLL doesn't get accessed by multiple processes as of the introduction of Windows 95. Each process gets its own copy of the writable data segment for the DLL. The "all processes share" model was the old Win16 model, which is luckily quite dead and buried by now ;-)"
source
looking forward to your comments!
jim
~edit1~
to clarify what i mean by interleaving. thread 1 sends the msg "Hello" thread 2 sends the msg "world!". recipient receives: "Hwoel lorld!". this assumes both messages were NOT sent in a while loop. is this possible?
I'd really advice against doing this in any case. The send functions might send less than you tell it to for various very legit reasons, and if another thread might enter and try to also send something, you're just messing up your data.
Now, you can certainly write to a socket from several threads, but you've no longer any control over what gets on the wire unless you've proper locking at the application level.
consider sending some data:
WSASend(sock,buf,buflen,&sent,0,0,0:
the sent parameter will hold the no. of bytes actually sent - similar to the return value of the send()function. To send all the data in buf you will have to loop doing a WSASend until all all the data actually get sent.
If, say, the first WSASend sends all but the last 4 bytes, another thread might go and send something while you loop back and try to send the last 4 bytes.
With proper locking to ensure that can't happen, it should e no problem sending from several threads - I wouldn't do it anyway just for the pure hell it will be to debug when something does go wrong.
is it possible to have multiple threads sending on the same socket?
Yes - although, depending on implementation this can be more or less visible. First, I'll clarify where I am coming from:
C# / .Net 3.5
System.Net.Sockets.Socket
The overall visibility (i.e. required management) of threading and the headaches incurred will be directly dependent on how the socket is implemented (synchronously or asynchronously). If you go the synchronous route then you have a lot of work to manually manage connecting, sending, and receiving over multiple threads. I highly recommend that this implementation be avoided. The efforts to correctly and efficiently perform the synchronous methods in a threaded model simply are not worth the comparable efforts to implement the asynchronous methods.
I have implemented an asynchronous Tcp server in less time than it took for me to implement the threaded synchronous version. Async is much easier to debug - and if you are intent on Tcp (my favorite choice) then you really have few worries in lost messages, missing data, or whatever.
will there be interleaving of the streams or will the socket block on the first thread (assuming tcp)?
I had to research interleaved streams (from wiki) to ensure that I was accurate in my understanding of what you are asking. To further understand interleaving and mixed messages, refer to these links on wiki:
Real Time Messaging Protocol
Transmission Control Protocol
Specifically, the power of Tcp is best described in the following section:
Due to network congestion, traffic load balancing, or other unpredictable network behavior, IP packets can be
lost, duplicated, or delivered out of order. TCP detects these problems, requests retransmission of lost
packets, rearranges out-of-order packets, and even helps minimize network congestion to reduce the
occurrence of the other problems. Once the TCP receiver has finally reassembled a perfect copy of the data
originally transmitted, it passes that datagram to the application program. Thus, TCP abstracts the application's
communication from the underlying networking details.
What this means is that interleaved messages will be re-ordered into their respective messages as sent by the sender. It is expected that threading is or would be involved in developing a performance-driven Tcp client/server mechanism - whether through async or sync methods.
In order to keep a socket from blocking, you can set it's Blocking property to false.
I hope this gives you some good information to work with. Heck, I even learned a little bit...