How should one use Disruptor (Disruptor Pattern) to build real-world message systems? - disruptor-pattern

As the RingBuffer up-front allocates objects of a given type, how can you use a single ring buffer to process messages of various different types?
You can't create new object instances to insert into the ringBuffer and that would defeat the purpose of up-front allocation.
So you could have 3 messages in an async messaging pattern:
NewOrderRequest
NewOrderCreated
NewOrderRejected
So my question is how are you meant to use the Disruptor pattern for real-world messageing systems?
Thanks
Links:
http://code.google.com/p/disruptor-net/wiki/CodeExamples
http://code.google.com/p/disruptor-net
http://code.google.com/p/disruptor

One approach (our most common pattern) is to store the message in its marshalled form, i.e. as a byte array. For incoming requests e.g. Fix messages, binary message, are quickly pulled of the network and placed in the ring buffer. The unmarshalling and dispatch of different types of messages are handled by EventProcessors (Consumers) on that ring buffer. For outbound requests, the message is serialised into the preallocated byte array that forms the entry in the ring buffer.
If you are using some fixed size byte array as the preallocated entry, some additional logic is required to handle overflow for larger messages. I.e. pick a reasonable default size and if it is exceeded allocate a temporary array that is bigger. Then discard it when the entry is reused or consumed (depending on your use case) reverting back to the original preallocated byte array.
If you have different consumers for different message types you could quickly identify if your consumer is interested in the specific message either by knowing an offset into the byte array that carries the type information or by passing a discriminator value through on the entry.
Also there is no rule against creating object instances and passing references (we do this in a couple of places too). You do lose the benefits of object preallocation, however one of the design goals of the disruptor was to allow the user the choice of the most appropriate form of storage.

There is a library called Javolution (http://javolution.org/) that let's you defined objects as structs with fixed-length fields like string[40] etc. that rely on byte-buffers internally instead of variable size objects... that allows the token ring to be initialized with fixed size objects and thus (hopefully) contiguous blocks of memory that allow the cache to work more efficiently.
We are using that for passing events / messages and use standard strings etc. for our business-logic.

Back to object pools.
The following is an hypothesis.
If you will have 3 types of messages (A,B,C), you can make 3 arrays of those pre-allocated. That will create 3 memory zones A, B, C.
It's not like there is only one cache line, there are many and they don't have to be contiguous. Some cache lines will refer to something in zone A, other B and other C.
So the ring buffer entry can have 1 reference to a common ancestor or interface of A & B & C.
The problem is to select the instance in the pools; the simplest is to have the same array length as the ring buffer length. This implies a lot of wasted pooled objects since only one of the 3 is ever used at any entry, ex: ring buffer entry 1234 might be using message B[1234] but A[1234] and C[1234] are not used and unusable by anyone.
You could also make a super-entry with all 3 A+B+C instance inlined and indicate the type with some byte or enum. Just as wasteful on memory size, but looks a bit worse because of the fatness of the entry. For example a reader only working on C messages will have less cache locality.
I hope I'm not too wrong with this hypothesis.

Related

Is it possible to have non-interleaved vertex data in libgdx?

From the code it looks like Vertex data is always stored interleaved.
I am trying to not use interleaving as I want to update one attribute every frame, while 3-4 other attributes stay the same. Perhaps separating out that attribute in one tightly packed buffer will allow buffer subregion changes to be sent faster to the GPU. Interleaved data will always transmit the entire buffer I guess?
Same issue applies to instance attributes, which are technically vertex attributes.

AF_XDP: map `(SRC-IP, DST-IP, DST-Port)` to index to `BPF_MAP_TYPE_XSKMAP`

I want to spawn multiple user-space processes with each one processing packets from a single source (triple of (SRC-IP, DST-IP, DST-Port)).
Because there are going to pass a lot of packets through the AF-XDP kernel program and time is critical, I thought of a separate map in the kernel-program which is populated by a user-space program beforehand.
This map defines a mapping from the previously mentioned triple to an index which is then used in bpf_redirect_map(&xsks_map, index, 0) to send packets to a specific socket in user-space.
My initial idea was to just concatenate src-ip, destination-ip and destination port into a (32 + 32 + 16)-bit value.
Is it possible to define maps with such a large key-size? Which map would be the best fit for this problem? Furthermore, is it possible to fill the map from user-space?
A struct as a key
There are several types of maps that can be used with eBPF. Some of them are generic (hash maps, arrays, ...) and some are very specific (redirect maps, sockmaps, ...).
The case you describes sounds like a perfect use case for a hash maps. Such maps take a struct as a key, and another struct as a value. So you could have something like:
struct my_key {
uint32_t src_ip;
uint32_t dst_ip;
uint16_t dst_port;
};
... and use it as a key. The value, in your case, would be the index for the xskmap, i.e. a simple integer. Hash maps are efficient for retrieving a value from a given key (no linear search as for an array), so you get good performance with that.
Key size for hash maps
There are no specific restrictions for the size of the keys or the values, as long as the size holds on a 32-bit integer :) (Note that there may be size restrictions in the case of hardware offload).
Update from user space
It is perfectly doable to update a hash map from user space (some very specific map types may not allow it, though, but generic maps like are arrays or hash maps are entirely fine). You would do it:
From the command line, with bpftool,
From a C program, with the helpers from libbpf,
In your own language. In all three cases, the update itself is done through a call to the bpf() system call.

Atomic reads and writes of properties

This question has been asked before but i still don't understand it fully so here it goes.
If i have a class with a property (a non-nullable double or int) - can i read and write the property with multiple threads?
I have read somewhere that since doubles are 64 bytes, it is possible to read a double property on one thread while it is being written on a different one. This will result in the reading thread returning a value that is neither the original value nor the new written value.
When could this happen? Is it possible with ints as well? Does it happen with both 64 and 32 bit applications?
I haven't been able to replicate this situation in a console
If i have a class with a property (a non-nullable double or int) - can i read and write the property with multiple theads?
I assume you mean "without any synchronization".
double and long are both 64 bits (8 bytes) in size, and are not guaranteed to be written atomically. So if you were moving from a value with byte pattern ABCD EFGH to a value with byte pattern MNOP QRST, you could potentially end up seeing (from a different thread) ABCD QRST or MNOP EFGH.
With properly aligned values of size 32 bits or lower, atomicity is guaranteed. (I don't remember seeing any guarantees that values will be properly aligned, but I believe they are by default unless you force a particular layout via attributes.) The C# 4 spec doesn't even mention alignment in section 5.5 which deals with atomicity:
Reads and writes of the following data types are atomic: bool, char, byte, sbyte, short, ushort, uint, int, float, and reference types. In addition, reads and writes of enum types with an underlying type in the previous list are also atomic. Reads and writes of other types, including long, ulong, double, and decimal, as well as user-defined types, are not guaranteed to be atomic. Aside from the library functions designed for that purpose, there is no guarantee of atomic read-modify-write, such as in the case of increment or decrement.
Additionally, atomicity isn't the same as volatility - so without any extra care being taken, a read from one thread may not "see" a write from a different thread.
These operations are not atomic, that's why the Interlocked class exists in the first place, with methods like Increment(Int32) and Increment(Int64).
To ensure thread safety, you should use at least this class, if not more complex locking (with ReaderWriterLockSlim, for example, in case you want to synchronize access to groups of properties).

What value for length field for Freescale PowerPC Security Engine 2.0, when using Link Tables?

I am working on the code to use the security engine of my MPC83XX with Openssl.
I can already encrypt/decrypt AES up to 64KByte of data.
The problem comes with data greater than 64KByte since the maximum value of the length-bits is 65535.
I can assume the data is always in one piece on the Ram.
So now I am collecting all the data in a Link Table and use the pointer to the table instead of the pointer to the data and set the J bit to 1.
Now I am not sure what a value I should use for the length-bits since 0 would mean the Dword will be ignored.
The real length of the data is too also big for 16 bit.
http://cache.freescale.com/files/32bit/doc/app_note/AN2755.pdf?fpsp=1
Possible Informations can be found in Chapter 8.
You set LENGTH to the length of the data. See Page 19:
For any sequence of data parcels accessed by a link table or chain of link tables, the combined lengths of the parcels (the sum of their LENGTH and/or EXTENT fields) must equal the combined lengths of the link table memory segments (SEGLEN fields). Otherwise the channel sets the appropriate error bit in the Channel Pointer Status Register...
I'm not sure what mode you're using (and the documentation seems unnecessarily confusing!) but for the usual cipher modes (CBC/CTR/CFB/OFB) the the usual method is simply to chain AES invocations, reusing the same context. You might be able to do this by simply setting "Pointer Dword1" and "Pointer Dword5" to the same thing. There's very little documentation, though; I can't work out where it gets the IV from.

To pad or not to pad - creating a communication protocol

I am creating a protocol to have two applications talk over a TCP/IP stream and am figuring out how to design a header for my messages. Using the TCP header as an initial guide, I am wondering if I will need padding. I understand that when we're dealing with a cache, we want to make sure that data being stored fits in a row of cache so that when it is retrieved it is done so efficiently. However, I do not understand how it makes sense to pad a header considering that an application will parse a stream of bytes and store it how it sees fit.
For example: I want to send over a message header consisting of a 3 byte field followed by a 1 byte padding field for 32 bit alignment. Then I will send over the message data.
In this case, the receiver will just take 3 bytes from the stream and throw away the padding byte. And then start reading message data. As I see it, he will not be storing the 3 bytes and the message data the way he wants. The whole point of byte alignment is so that it will be retrieved in an efficient manner. But if the retriever doesn't care about the padding how will it be retrieved efficiently?
Without the padding, the retriever just takes the 3 header bytes from the stream and then takes the data bytes. Since the retriever stores these bytes however he wants, how does it matter whether or not the padding is done?
Maybe I'm missing the point of padding.
It's slightly hard to extract a question from this post, but with what I've said you guys can probably point out my misconceptions.
Please let me know what you guys think.
Thanks,
jbu
If word alignment of the message body is of some use, then by all means, pad the message to avoid other contortions. The padding will be of benefit if most of the message is processed as machine words with decent intensity.
If the message is a stream of bytes, for instance xml, then padding won't do you a whole heck of a lot of good.
As far as actually designing a wire protocol, you should probably consider using a plain text protocol with compression (including the header), which will probably use less bandwidth than any hand-designed binary protocol you could possibly invent.
I do not understand how it makes sense to pad a header considering that an application will parse a stream of bytes and store it how it sees fit.
If I'm a receiver, I might pass a buffer (i.e. an array of bytes) to the protocol driver (i.e. the TCP stack) and say, "give this back to me when there's data in it".
What I (the application) get back, then, is an array of bytes which contains the data. Using C-style tricks like "casting" and so on I can treat portions of this array as if it were words and double-words (not just bytes) ... provided that they're suitably aligned (which is where padding may be required).
Here's an example of a statement which reads a DWORD from an offset in a byte buffer:
DWORD getDword(const byte* buffer)
{
//we want the DWORD which starts at byte-offset 8
buffer += 8;
//dereference as if it were pointing to a DWORD
//(this would fail on some machines if the pointer
//weren't pointing to a DWORD-aligned boundary)
return *((DWORD*)buffer);
}
Here's the corresponding function in Intel assembly; note that it's a single opcode i.e. quite an efficient way to access the data, more efficient that reading and accumulating separate bytes:
mov eax,DWORD PTR [esi+8]
Oner reason to consider padding is if you plan to extend your protocol over time. Some of the padding can be intentionally set aside for future assignment.
Another reason to consider padding is to save a couple of bits on length fields. I.e. always a multiple of 4, or 8 saves 2 or 3 bits off the length field.
One other good reason that TCP has padding (which probably does not apply to you) is it allows dedicated network processing hardware to easily separate the data from the header. As the data always starts on a 32 bit boundary, it's easier to separate the header from the data when the packet gets routed.
If you have a 3 byte header and align it to 4 bytes, then designate the unused byte as 'reserved for future use' and require the bits to be zero (rejecting messages where they are not as malformed). That leaves you some extensibility. Or you might decide to use the byte as a version number - initially zero, and then incrementing it if (when) you make incompatible changes to the protocol. Don't let the value be 'undefined' and "don't care"; you'll never be able to use it if you start out that way.

Resources