What is the reasoning behind passing a list of ArraySegment<byte> to Socket.BeginReceive/SocketAsyncEventArgs?
MSDN for the Socket.BeginReceive constructor doesn't even correctly describe the first argument):
public IAsyncResult BeginReceive(
IList<ArraySegment<byte>> buffers,
SocketFlags socketFlags,
AsyncCallback callback,
object state
)
Paremeters:
buffers
Type: System.Collections.Generic.IList<ArraySegment<Byte>>
An array of type Byte that is the storage location for the received data.
...
I thought that the main idea was to allocate a large buffer on the Large Object Heap, and then pass a segment of this buffer to Socket.BeginReceive, to avoid pinning small objects around the heap and messing up GC's work.
But why should I want to pass several segments to these methods? In case of SocketAsyncEventArgs, it seems it will complicate pooling of these objects, and I don't see the reasoning behind this.
What I found out in this question and in MSDN:
There is an overloaded version of BeginReceive that takes a Byte-Array. When this is full or a packet has been received (that is logically in order), the callback is fired.
As stated in the answer I linked:
Reads can be multiples of that because if packets arrive out of order all of them are made visible to application the moment the logically first one arrives. You get to read from all contiguous queued packets at once in this case.
That means: If there is an incoming packet which is out of order (i.e. with a higher sequence number than the one expected), it will be kept back. As soon as the missing packet has arrived, all available packets are written to your list and only one callback is fired instead of firing the callback over and over again for all packets already available, each filling your buffer as far as possible, and so on.
So that means, this implementation saves a lot of overhead by providing all available packets in an array list calling the callback only once instead of doing a lot of memcopies from the network stack to your buffer and repeatedly calling you callback.
Related
import multiprocess as mp
What are the key differences between mp.Pipe() and mp.Queue()? They seem to be the same to me: basically Pipe.recv() is equivalent to Queue.get(), and Pipe.send() is Queue.put().
They're very different things, with very different behavior.
A Queue instance has put, get, empty, full, and various other methods. It has an optional maximum size (number of items, really). Anyone can put or get into any queue. It is process-safe as it handles all the locking.
The Pipe function—note that this is a function, not a class instance—returns two objects of type Connection (these are the class instances). These two instances are connected to each other. The connection between them can be single-duplex, i.e., you can only send on one and only receive on the other, or it can be full-duplex, i.e., whatever you actually send on one is received on the other. The two objects have send, recv, send_bytes, recv_bytes, fileno, and close methods among others. The send and receive methods use the pickling code to translate between objects and bytes since the actual data transfer is via byte-stream. The connection objects are not locked and therefore not process-safe.
Data transfer between processes generally uses these Connection objects: this and shared-memory are the underlying mechanism for all process-to-process communication in the multiprocessing code. Queue instances are much higher level objects, which ultimately need to use a Connection to send or receive the byte-stream that represents the object being transferred across the queue. So in that sense, they do the same thing—but that's a bit like saying that a USB cable does the same thing as the things that connect them. Usually, you don't want to deal with individual voltages on a wire: it's much nicer to just send or receve a whole object. (This analogy is a little weak, because Connection instances have send and recv as well as send_bytes and recv_bytes, but it's probably still helpful.)
I have an application with Many Producers and consumers.
From my understanding, RingBuffer creates objects at start of RingBuffer init and you then copy object when you publish in Ring and get them from it in EventHandler.
My application LogHandler buffers received events in a List to send it in Batch mode further once the list has reached a certain size. So EventHandler#onEvent puts the received object in the list , once it has reached the size , it sends it in RMI to a server and clears it.
My question, is do I need to clone the object before I put in list, as I understand, once consumed they can be reused ?
Do I need to synchronize access to the list in my EventHandler#onEvent ?
Yes - your understanding is correct. You copy your values in and out of the ringbuffer slots.
I would suggest that yes you clone the values as you extract it from the ring buffer and into your event handler list; otherwise the slot can be reused.
You should not need to synchronise access to the list as long as it is a private member variable of your Event Handler and you only have one event handler instance per thread. If you have multiple event handlers adding to the same (eg static) List instance then you would need synchronisation.
Clarification:
Be sure to read the background in OzgurH's comments below. If you stick to using the endOfBatch flag on disruptor and use that to decide the size of your batch, you do not have to copy objects out of the list. If you are using your own accumulation strategy (such as size - as per the question), then you should clone objects out as the slot could be reused before you have had the chance to send.
Also worth noting that if you are needing to synchronize on the list instance, then you have missed a big opportunity with disruptor and will destroy your performance anyway.
It is possible to use slots in the Disruptor's RingBuffer (including ones containing a List) without cloning/copying values. This may be a preferable solution for you depending on whether you are worried about garbage creation, and whether you actually need to be concerned about concurrent updates to the objects being placed in the RingBuffer. If all the objects being placed in the slot's list are immutable, or if they are only being updated/read by a single thread at a time (a precondition which the Disruptor is often used to enforce), there will be nothing gained from cloning them as they are already immune to data races.
On the subject of batching, note that the Disruptor framework itself provides a mechanism for taking items from the RingBuffer in batches in your EventHandler threads. This is approach is fully thread-safe and lock-free, and could yield better performance by making your memory access patterns more predictable to the CPU.
I'm sending various custom message structures down a nonblocking TCP socket. I want to send either the whole structure in one send() call, or return an error with no bytes sent if there's only room in the send buffer for part of the message (ie send() returns EWOULDBLOCK). If there's not enought room, I will throw away the whole structure and report overflow, but I want to be recoverable after that, ie the receiver only ever receives a sequence of valid complete structures. Is there a way of either checking the send buffer free space, or telling the send() call to do as described? Datagram-based sockets aren't an option, must be connection-based TCP. Thanks.
Linux provides a SIOCOUTQ ioctl() to query how much data is in the TCP output buffer:
http://www.kernel.org/doc/man-pages/online/pages/man7/tcp.7.html
You can use that, plus the value of SO_SNDBUF, to determine whether the outgoing buffer has enough space for any particular message. So strictly speaking, the answer to your question is "yes".
But there are two problems with this approach. First, it is Linux-specific. Second, what are you planning to do when there is not enough space to send your whole message? Loop and call select again? But that will just tell you the socket is ready for writing again, causing you to busy-loop.
For efficiency's sake, you should probably bite the bullet and just deal with partial writes; let the network stack worry about breaking your stream up into packets for optimal throughput.
TCP has no support for transactions; this is something which you must handle on layer 7 (application).
I am trying to create a p2p applications on Linux, which I want to run as efficiently as possible.
The issue I have is with managing packets. As we know, there may be more than one packet in the recv() buffer at any time, so there is a need to have some kind of message framing system to make sure that multiple packets are not treated as one big packet.
So at the moment my packet structure is:
(u16int Packet Length):(Packet Data)
Which requires two calls to recv(); one to get the packet size, and one to get the packet.
There are two main problems with this:
1. A malicious peer could send a packet with a size header of
something large, but not send any more data. The application will
hang on the second recv(), waiting for data that will never come.
2. Assuming that calling Recv() has a noticeable performance penalty
(I actually have no idea, correct me if I am wrong) calling Recv() twice
will slow the program down.
What is the best way to structure packets/Recieving system for both the best efficiency and stability? How do other applications do it? What do you recommend?
Thankyou in advance.
I think your "framing" of messages within a TCP stream is right on.
You could consider putting a "magic cookie" in front of each frame (e.g. write the 32-bit int "0xdeadbeef" at the top of each frame header in addition to the packet length) such that it becomes obvious that your are reading a frame header on the first of each recv() pairs. It the magic integer isn't present at the start of the message, you have gotten out of sync and need to tear the connection down.
Multiple recv() calls will not likely be a performance hit. As a matter of fact, because TCP messages can get segmented, coalesced, and stalled in unpredictable ways, you'll likely need to call recv() in a loop until you get all the data you expected. This includes your two byte header as well as for the larger read of the payload bytes. It's entirely possible you call "recv" with a 2 byte buffer to read the "size" of the message, but only get 1 byte back. (Call recv again, and you'll get the subsequent bytes). What I tell the developers on my team - code your network parsers as if it was possible that recv only delivered 1 byte at a time.
You can use non-blocking sockets and the "select" call to avoid hanging. If the data doesn't arrive within a reasonable amount of time (or more data arrives than expected - such that syncing on the next message becomes impossible), you just tear the connection down.
I'm working on a P2P project of my own. Would love to trade notes. Follow up with me offline if you like.
I disagree with the others, TCP is a reliable protocol, so a packet magic header is useless unless you fear that your client code isn't stable or that unsolicited clients connect to your port number.
Create a buffer for each client and use non-blocking sockets and select/poll/epoll/kqueue. If there is data available from a client, read as much as you can, it doesn't matter if you read more "packets". Then check whether you've read enough so the size field is available, if so, check that you've read the whole packet (or more). If so, process the packet. Then if there's more data, you can repeat this procedure. If there is partial packet left, you can move that to the start of your buffer, or use a circular buffer so you don't have to do those memmove-s.
Client timeout can be handled in your select/... loop.
That's what I would use if you're doing something complex with the received packet data. If all you do is to write the results to a file (in bigger chunks) then sendfile/splice yields better peformance. Just read packet length (could be multiple reads) then use multiple calls to sendfile until you've read the whole packet (keep track of how much left to read).
You can use non-blocking calls to recv() (by setting SOCK_NONBLOCK on the socket), and wait for them to become ready for reading data using select() (with a timeout) in a loop.
Then if a file descriptor is in the "waiting for data" state for too long, you can just close the socket.
TCP is a stream-oriented protocol - it doesn't actually have any concept of packets. So, in addition to recieving multiple application-layer packets in one recv() call, you might also recieve only part of an application-layer packet, with the remainder coming in a future recv() call.
This implies that robust reciever behaviour is obtained by receiving as much data as possible at each recv() call, then buffering that data in an application-layer buffer until you have at least one full application-layer packet. This also avoids your two-calls-to-recv() problem.
To always recieve as much data as possible at each recv(), without blocking, you should use non-blocking sockets and call recv() until it returns -1 with errno set to EWOULDBLOCK.
As others said, a leading magic number (OT: man file) is a good (99.999999%) solution to identify datagram boundaries, and timeout (using non-blocking recv()) is good for detecting missing/late packet.
If you count on attackers, you should put a CRC in your packet. If a professional attacker really wants, he/she will figure out - sooner or later - how your CRC works, but it's even harder than create a packet without CRC. (Also, if safety is critical, you will find SSL libs/examples/code on the Net.)
I have a list of a bunch of file descriptors that I have created kevents for, and I'm trying to figure out if there's any way to get the number of them that are ready for read or write access.
Is there any way to get a list of "ready" file descriptors, like what epoll_wait provides?
Events that occurred are placed into the eventlist buffer passed to the kevent call. So making this buffer sufficiently large will give you the list you are seeking. The return
value of the kevent call will tell you have many events
are in the eventlist buffer.
If using a large buffer is not feasible for some reason,
you can always do a loop calling kevent with a zero timeout
and a smaller buffer, until you get zero events in the eventlist.
To give a little more context...
One of the expected scenarios with kevent() is that you will thread pool calls to it. If you had 3 thread pools all asking for 4 events, the OS would like to be able to pool and dispatch the actual events as it sees fit.
If 7 events are available the OS might want to dispatch to 3 threads, or it might want to dispatch to all 3 threads if it believed it had empty cores and less overhead.
I'm not saying your scenario is invalid at all; just that the system is more or less designed to keep that information away from you so it doesn't get into scenarios of saying 'well, 12 descriptors are ready. Oh, hrm, I just told you that but 3 of them got surfaced before you had a chance to do anything'.
Grrr pretty much nailed the scenario. You register/deregister your descriptors once and the relevent descriptor will be provided back to you with the event when the event triggers.