Is it possible to use the Disruptor/Ringbuffer pattern without using a fixed-length array? - disruptor-pattern

We have a Disruptor implementation that has a fixed-length array. Is it possible to implement a version of the pattern that does not rely on this array, but instead contains (possibly self-describing) lists of variable length objects. For example, a Ringbuffer of Protobuf objects?
I'm aware that the fixed-length array is for the "pre-allocation" step, but I consider it possible to approximate that step with one or more object pools.

The ringbuffer in the java version of the disruptor is any array of references to objects. You can put whatever objects you want in there through the EventFactory instance you create.

It's definitely possible to implement a version of the disruptor that isn't backed by an Object array but it won't be high performance. A lot of thought and mechanical sympathy has gone into the design and implementation of the LMAX Exchange Disruptor.
Essentially the ring buffer is an pre-allocated object pool. From my experience, I've never had to worry about managing the ring buffer's resources directly in any real world code. The disruptor will automatically apply back pressure when necessary.
The library provides a nice DSL to construct a dependency graph for your application and basically gives you parallelism for free.

Related

What is the difference between date transfer object (DTO) and representation object of domain driven design pattern?

I know DTO is returned by the server-side and received by the client-side, but I am confused by the representation object in DDD. I think they are almost the same. Can someone tell me their differences?
Can someone tell me their differences?
They solve different problems in different contexts
Data transfer is a boundary concern - how do we move information from here to there (across a remote interface)? Among the issues that you may run into: the transfer of information is slow, or expensive. One way of keeping this under control is to move information in a larger grain.
the main reason for using a Data Transfer Object is to batch up what would be multiple remote calls into a single call -- Martin Fowler, Patterns of Enterprise Application Architecture
In other words, a DTO is your programs representation of a fat message.
In DDD, the value object pattern is a modeling concern; it is used to couple immutable representations of information and related computations.
A DTO tends to look like a data structure, with methods that can be used to transform that data structure into a representation (for example: an array of bytes) that can be sent across a boundary.
A value object tends to look like a data structure, with methods that can be used to compute other information that is likely to be interesting in your domain.
DTO tend to be more stable (or at least backwards compatible) out of necessity -- because producer and consumer are remote from one another, coordinating a change to both requires more effort than a single local change.
Value objects, in contrast, are easier to change because they are a domain model concern. IF you want to change the model, that's just one thing, and correspondingly easier to coordinate.
(There's kind of a hedge - for system that need persistence, we need some way to get the information out of the object into a representation that can be stored and retrieved. That's not necessarily a value object concern, especially if you are willing to use general purpose data structures to move information in and out of "the model".)
In the kingdom of nouns, the lines can get blurry - partly because any information that isn't a general purpose data structure/primitive is "an object", and partly because you can often get away with using the same objects for your internal concerns and boundary cnocerns.

Managing complex state in FP

I want to write a simulation of a multi-entity system. I believe such systems motivated creation of Simula and OOP where each object would maintain its own state and the runtime would manage the the entire system (e.g. stop threads, serialize data).
On the other hand, I would like to have ability to rewind, change the simulation parameters and compare the results. Thus, immutability sounds great (at least up to almost certain garbage collection issues caused by keeping track of possibly redundant data).
However I don't know how to model this. Does this mean that I must put every interacting entity into a single, huge structure where each object update would require locating it first?
I'm worried that such approach would affect performance badly because of GC overhead and constant structure traversals as opposed to keeping one fixed address of entity in memory.
UPDATE
To clarify, this question asks if there is any other design option available other than creating a single structure that contains all possibly interacting entities as a root. Intuitively, such a structure would imply logarithmic single update penalty unless updates are "clustered" somehow to amortize.
Is there a known system where interactions could be modelled differently? For example, like in cold/hot data storage optimization?
After some research, there seems to be a connection with N-body simulation where systems can be clustered but I'm not familiar with it yet. Even so, would that also mean I need to have a single structure of clusters?
While I agree with the people commenting that this is a vague question, I'll still try to address some of the issues put forth.
It's true that there's some performance overhead from immutability, because when you use mutable state, you can update some values in-place, whereas with immutable state, some copying has to take place.
It is, however, a common misconception that this is causes problems with big 'object' graphs. It doesn't have to.
Consider a Haskell data structure:
data BigDataStructure = BigDataStructure {
bigChild1 :: AnotherBigDataStructure
, bigChild2 :: YetAnotherBigDataStructure
-- more elements go here...
, bigChildN :: Whatever }
deriving (Show, Eq)
Imagine that each of these child elements are big and complex themselves. If you want to change, say, bigChild2, you could write something like:
updatedValue = myValue { bigChild2 = updatedChild }
When you do that, some data copying takes place, but it's often less that most people think. This expression does create a new BigDataStructure record, but it doesn't 'deep copy' any of its values. It just reuses bigChild1, updatedChild, bigChildN, and all the other values, because they're immutable.
In theory (but we'll get back to that in a minute), the flatter your data structures are, the more data sharing should be enabled. If, on the other hand, you have some deeply nested data structures, and you need to update the leafs, you'll need to create a copy of the immediate parents of those leafs, plus the parents of those parents, and their parents as well, all the way to the root. That might be expensive.
That's the theory, though, but we've known for decades that it's impractical to try predict how software will perform. Instead, try to measure it.
While the OP suggest that significant data is involved, it doesn't state how much, and neither does it state the hardware specs of the system that's going to run the simulation. So, as Eric Lippert explains so well, the person who can best answer questions about performance is you.
P.S. It's my experience that when I start to encounter performance problems, I need to get creative with how I design my system. Efficient data structures can address many performance issues. This is just as much the case in OOP as it is in FP.

Re-use Hazelcast serialization logic in a MapStore implementation

I have implemented a custom serializer for my POJOs using IdentifiedDataSerializable so that I can maintain a fine-grained control as classes evolve and new fields need to be added/removed. For some collections I also need persistency and have implemented a MapStore using an embedded key-value store. My problem is that I would like to re-use the IdentifiedDataSerializable serialization in the MapStore implementation, to leverage on code re-use as well as to ensure class evolution in the future.
I have tried to obtain a reference to the internal Hazelcast SerializationService, but it's not readily available via the "public" APIs in HazelcastInstance. In fact I was unable to figure out any decent way of getting hold of it.
What I would in fact like is a version of the MapStore interface which works on byte array level where Hazelcast handles the serialization/deserialization before invocation. New feature request I guess...
Any ideas how to solve this in the meanwhile are welcome.
Breaking your question into two parts:
How to avoid having to maintain different serialization behaviors
for Hazelcast and a persistent store.
How to avoid the overhead of an extra serialization-deserialization
step when going between the Hazelcast machinery and the persistence
machinery.
For the first part:
I've had great results using Jackson both for Hazelcast serialization and for persistence of the sort needed to implement a MapStore. This allows me to use the same serialization logic consistently, without having to worry about different semantics for different settings.
Fuad Malikov's blog entry comparing various kinds of Hazelcast serialization has an example of how to use Jackson binary serialization. You can use the same logic in your MapStore implementation. Even better, you can use a different ObjectMapper for the latter that writes JSON instead of the binary Smile format -- the serialization semantics are otherwise identical, but the binary format is more compact and faster to read and write, which is what you want for Hazelcast. I've found it very useful to be able to persist human-readable data in the persistent store, for which I'm less concerned about space and time efficiency and more concerned about maintainability. (I've actually been able to "patch" data in production by editing the raw JSON, not something I would ever consider doing with a binary format!)
Jackson is very flexible when it comes to adding and removing fields, thanks to its #JsonAnyGetter and #JsonAnySetter annotations, which allow you to handle unrecognized fields gracefully.
There's also the ability to "inject" serialization behavior into classes over which you don't have control.
For the second part:
As you've noticed, Hazelcast doesn't support a MapStore variant that works exclusively in terms of the serialized form, so each load involves a deserialization followed by a serialization to the Hazelcast structure. From my experience and reading, highly optimized serialization libraries like Jackson and Kryo are so fast that the extra work is entirely dominated by the cost of reading from the persistent store.
If it's important enough to you to avoid that extra step, you could
use MapInterceptors on a IMap<String, byte[]> to perform the deserialization in interceptGet and serialization in interceptPut. I don't recommend it, though, because you're then stuck with recreating functionality that Hazelcast provides out of the box.

Core Data - Are primitive setters / getters faster? When not to use?

From Apple's Core Data Programming Guide:
Core Data dynamically generates efficient public and primitive get and
set attribute accessor methods and relationship accessor methods for
managed object classes.
...
Primitive accessor methods are similar to "normal" or public key-value
coding compliant accessor methods, except that Core Data uses them as
the most basic data methods to access data, consequently they do not
issue key-value access or observing notifications. Put another way,
they are to primitiveValueForKey: and setPrimitiveValue:forKey: what
public accessor methods are to valueForKey: and setValue:forKey:.
I would then expect the primitive accessor methods to be better performant then the public accessors because they do not trigger KVO notifications. Is there a way to effectively test this theory with Time Profiler? (Surely it can't be as easy as putting the two call in their own for-loops that iterate a zillion times and compare the results...)
Obviously the primitive accessors aren't to be called by objects or functions outside of the Managed Object subclass, but when shouldn't you use them from within the class?
edelaney05,
As you appear to know, Core Data depends upon the KVC/KVO features of Objective-C. Yes, you are correct that the path length is slightly longer through the accessors. What of it? Performance of Core Data is dominated by the performance of the I/O subsystem.
IOW, tuning your fetch request is much more important than avoiding the accessor overhead. Can you do what you're proposing? Yes. Should you? No. You should, IMO, focus upon how to get your data into a MOC efficiently and then refine it with predicates and other filter techniques. Learning how to use the various key path operators and predicate language after the fetch is very important to writing performant CD code. Only after Instruments can document that you are spending an appreciable amount of time in the accessors would I consider your strategy of avoiding them.
In answer to your specific question, you should generally restrict your use of the primitive accessors to within your reimplementation of the public accessors. Sticking with accessors for all of your code then becomes your standard pattern. This gives you the long term engineering benefit of having the ability to associate arbitrary behavior with any property. Finally, if you can use the various key path and set operators, then the CD team has already optimized those access patterns. They are quite performant.
Andrew

Are there any good patterns for handling list of entities

In "DDD" what is the best patterns for handling different versions of your entities, e.g. Entities in a list vs the full object. I would like to avoid the overhead of getting properties I do not need when displaying the entities in a list
Would you have a separate entity type used in lists or just fill up your full entity type partially?
Would you use inheritance?
I understand your urge to create "views" of models in the domain, but would recommend against it. Personally, I use the entire entity inside of the domain, regardless of the situation. The entity is the entity, and anything less or more just does not feel clean. That does not mean that I can't use a reference to the entity to help focus my use of the items in the list, though.
The entity does not cross the domain boundary in my implementation. Instead, I return a type of DTO and have application services that can abstract a view from it. This allows, for example, allowing a presenter to generate the correct view model from a DTO and provide it to the view. I don't know if you are talking about operations in the domain services or in the application services, but there are a couple of things you can do that could be applied to either (or both).
You can do certain things to reduce the performance penalty of working with the entire entity in the domain layers, as well. One thing to look at is implementing some sort of cache-aside implementation. When an entity is requested, check to see if it is cached. If it is, return the cached version. If it isn't, pull it and then cache it before returning. When the entity is updated, evict it from the cache and do your update. I have purposely created my concrete repository implementations to be cache-aware to facilitate this. One other thing to consider using an approach like this is that it is beneficial to do as many fine-grained operations as possible. While that seems illogical at first, if entities are commonly "gotten" from your data store, it is easy to set up some logging to measure the number of cache hits to cache misses.
Coming full circle, to your question... Most lists I deal with are small, so I incur the penalty of loading up the entity in its entirety. Assuming that most use cases will involve the user drilling into one or more of the items, they are pre-cached because of the cache-aside implementation. The number of items is fluid, but I generally apply this approach to anything less than twenty five entities in a list.
For larger lists, I just use IDs. Most likely, the use case here is some sort of search result. Search results are commonly paged, for example, and this does not fit into the above pattern. Instead, I use the larger list of IDs as a sliding range window of entities I am interested in that I then pass to a GetRangeById() method that all of my repositories have - written to purposely take a list of identifiers and load them one at a time so they are cached. In essence, this will take a larger lightweight list and zero in just on the area I am interested in at a given point in time.
With an approach like this, the important thing to realize is that it is highly scalable. It might not baseline as fast as a non-cached approach with small sets of data, but will perform better with larger sets of data. There is an implied performance overhead of operation at play here, but it degrades at a slower rate than a standard "load 'em up" pattern, as well.
You can use CQRS pattern to separate query processing and command processing. And you can do it even on a single database. In such a case you would map you view models directly to the tables in databse (via NHibernate for example). Commands (writes) would go through real domain model and would be persisted in the DB. Queries (like get me a list of entities) would bypass the domain a go straight do DB. There is no point in querying domain object because you actually don't invoke any business logic in them, just retrieving some data.
You can also extend this solution to full-featured CQRS by having separate stores for command side and for query side. Query side would be synchronized by means of replication or pub/sub messaging.

Resources