Hazelcast Entry Processor: When objects affect objects in other maps

Hazelcast Entry Processor: When objects affect objects in other maps - hazelcast

My example use case:
I have an IMap with an old membership list (where members' addresses are stored as attributes of the member-object) and an IMap with valid addresses from an alternate member list. I know that my old member IMap has older information and the alternate address list has up-to-date address information (such as 4 digit zip code extensions).
I want to visit each member entry in the old member IMap and create a new address-object from the member-object. Each unique address will eventually be stored in a database, so I don't want duplicate address objects. I want to store each unique address in an IMap of valid addresses.
If I iterate over the old member IMap I get a Class Not Found exception when I try to put new address object into the new valid-addresses IMap.
If I'm creating new objects while visiting objects in one IMap, how do I collect them in another IMap?

Entry Processors are intended to work only on the entry they're submitted to (the target of the executeOnKey() method). This allows for a number of optimizations, as the entry processor runs on an operation thread dedicated to the partition holding the data. If the Entry Processor was to access data in another partition, locks would be required and there is the possibility that a deadlock could occur. If the other entry was on a different cluster node, this could adversely impact performance of the Entry Processor which should be a quick operation.
The executeOnKey method returns an Object; you could code your Entry Processor to return the new address object, and the client could then put this object to the appropriate map (which might be on a different cluster node).
This is also something that might be appropriate for Hazelcast Jet; Jet could use one map as a source and the other map as a sink and perform the appropriate operations as part of its pipeline.

The correct way to deal with this situation (effecting items in a second map) is to use an EventListener.
I implemented an EventListener for the old members map that adds an address-object to the address map if it doesn't already exist, whenever a member in the old members map is updated.
Jet does seem like a great way to deal with exactly this kind of process model, but I haven't implemented it yet to verify how it works in my use case.

Related

Why does Spring's Hazelcast4IndexSessionRepository implement deltas and a Hazelcast EntryProcessor?

Spring-session provides a Hazelcast4IndexedSessionRepository that stores the session data (represented as MapSession) in a Hazelcast cache. Instead of returning the MapSession directly from the repository, it uses its own wrapper class HazelcastSession (both MapSession and HazelcastSession implement the same Session interface). One of the reasons is for sure so that it can implement the different flush and save modes and support the update of the principal name index. But it also remembers all changes to the session as deltas and when the save() method on the respository is called, it uses an Hazelcast4SessionUpdateEntryProcessor to update the corresponding map entry of the Hazelcast IMap.
Why does the session repository not just set the MapSession object on the IMap directly via put without using an EntryProcessor? What is the benefit of the current implementation of recording the change deltas?
To my understanding of the Hazelcast EntryProcessor documentation, an entry processor is useful when a map entry should be updated often without the client having to retrieve the existing value first. Instead of first getting the old value (which might require a network round-trip), the entry processor can be executed directly on the Hazelcast member that holds the data.
But in case of a Spring Session, the session data is loaded from the Hazelcast map at the beginning of each incoming web request anyway (or latest when the application code wants to read/modify the session content) and then held in local memory. All changes to the session during the processing of such a request are done to the local session object and it is then saved again to the Hazelcast cache when the request ends (or earlier depending on the flush/save mode). That means the saving can be done without executing an extra get request on the IMap first. So why not just call map.put(MapSession) instead of using an EntryProcessor to update only the attributes noted in the delta list?
The only explanation I could think of would be the attempt to minimize concurrent modification of the same attributes. By saving only the deltas in the EntryProcessor instead of storing the whole MapSession which was loaded earlier, the chance to overwrite an attribute value that was modified concurrently in a parallel process is less likely. But it is not zero. Especially if my application code stores and updates only the same couple of attributes in the session all the time, even with the EntryProcessor the update will not be safe because there is no optimistic lock scheme in place.
Thanks for the insight!

When and why should I use a domain service?

I have a Rental entity that is an aggregate root. Among other things it maintains a list of Allocations (chunks of time that is reserved).
How do I add a new allocation? Since Rental is aggregate root, any new allocation should go through it but it is impossible to say if a rental can be allocated, before we try to save the allocation in the database. Another user could have reserved it in the meantime. I'm guessing, I should use a Domain Service for this?
I would hate to have to inject anything every time I need a new Rental but what is the difference between injecting a Domain Service, instead of a Repository, other than the terminology being different?

When and why should I use a domain service?
You use a domain service to allow an aggregate to run queries. Tax calculation is an example that shows up form time to time. The aggregate passes some state to the calculator, the calculator reports the tax, the aggregate decides what to do with that information (ignore it, reject the update that needs it, etc).
Running the query doesn't modify the domain service instance in any way, so you can repeat queries as often as you like without worrying that the calculations contaminate each other.
Think read only service provider.
Since Rental is aggregate root, any new allocation should go through it but it is impossible to say if a rental can be allocated, before we try to save the allocation in the database. Another user could have reserved it in the meantime. I'm guessing, I should use a Domain Service for this?
No - completely the wrong use case.
If an allocation is part of the Rental aggregate, then it's fine to have the Rental aggregate create allocations of its own. You don't need a service for that (you could, potentially, delegate the work to a factory if you like separation of concerns).
If "another user could have reserved that allocation in the meantime", then you have contention -- two users trying to change the same aggregate at the same time. This is normally managed in one of two ways.
Locking: you only let one user at a time modify the Rental aggregate. So in a data race, the loser has to wait for the winner to finish, then the aggregate can reject the loser's command because that particular allocation is already taken.
Optimistic concurrency: you allow both users to modify different copies of the aggregate at the same time, but save is only allowed if the original state is unchanged. Think "compare and swap"; the race is in the save, between these two instructions
state.compareAndSwap(originalState, loserState)
state.compareAndSwap(originalState, winnerState)
Winner's compare and swap succeeds, but the loser's fails (because originalState != winnerState), and so the losers modification is rejected.
Either way, only one write to your database reserving the allocation is allowed.
If I understand you correctly, you're saying that in this case it would be okay to use a repository from inside the Rental domain entity?
No, you shouldn't need to - the allocation, being part of the Rental aggregate, gets created by the aggregate in memory, and first appears in your data store when the aggregate is saved.
Why use aggregates at all, if everything of consequence has to be extracted into surrounding code or factories?
Some of the answer here is separation of concerns - the primary concern of the aggregate is enforcing the business invariant: ensuring that creating an allocation with some specific state is consistent with everything else going on. The factory is responsible for ensuring that the created object is wired up correctly.
To use your example: the factory would have responsibility for creating the allocation in memory, but would not need to know anything about making sure that the allocation is unique. The rules to ensure that the allocation is unique are described and enforced by the aggregate.

Use a static factory method to create a Rental object.
public static class RentalFactory
{
public Rental CreateRental()
{
var allocationSvc = new RentalAllocationService();
return new Rental(allocationSvc);
}
}
Repositories should only be concerned about persistence to underlying store.
Domain services primary concern is carrying out some behavior involving entities or value objects.

Sort documents including fake one in MongoDB

I'm creating an application that will sort items in the DB in order to create a selective process.
This process, will consist of Users, and Registers in courses.
Each course, will have Users in it, and the SORT method to select them will vary depending on each course.
I'm implementing a way of 'simulating' the position of a user in a course, without matriculating it in the course, so that he can 'know' it's position prior to entering the selection process.
To do so, I imagined that I could use the same logic used after the user has already registered: Sort in the DB, return the list of ID's, and see what's the user index in that list.
However, I want just to simulate, without creating/updating anything. I cannot find a way to do that, without during the query, inserting a 'fake' document, but that cannot happen for reasons of security and integrity (inserting/removing items let the DB integrity broken during a short period of time, and can cause conflicts within logics of the application).
Doing the sorting on the DB, and re-doing it on the system is also not a good Idea as well, since there will be duplicated logic going on.
How can I 'fake' an document, without creating it during a query?

How are consistency violations handled in event sourcing?

First of all, let me state that I am new to Command Query Responsibility Segregation and Event Sourcing (Message-Drive Architecture), but I'm already seeing some significant design benefits. However, there are still a few issues on which I'm unclear.
Say I have a Customer class (an aggregate root) that contains a property called postalAddress (an instance of the Address class, which is a value object). I also have an Order class (another aggregate root) that contains (among OrderItem objects and other things) a property called deliveryAddress (also an instance of the Address class) and a string property called status.
The customer places an order by issueing a PlaceOrder command, which triggers the OrderReceived event. At this point in time, the status of the order is "RECEIVED". When the order is shipped, someone in the warehouse issues an ShipOrder command, which triggers the OrderShipped event. At this point in time, the status of the order is "SHIPPED".
One of the business rules is that if a Customer updates their postalAddress before an order is shipped (i.e., while the status is still "RECEIVED"), the deliveryAddress of the Order object should also be updated. If the status of the Order were already "SHIPPED", the deliveryAddress would not be updated.
Question 1. Is the best place to put this "conditionally cascading address update" in a Saga (a.k.a., Process Manager)? I assume so, given that it is translating an event ("The customer just updated their postal address...") to a command ("... so update the delivery address of order 123").
Question 2. If a Saga is the right tool for the job, how does it identify the orders that belong to the user, given that an aggregate can only be retrieved by it's unique ID (in my case a UUID)?
Continuing on, given that each aggregate represents a transactional boundary, if the system were to crash after the Customer's postalAddress was updated (the CustomerAddressUpdated event being persisted to the event store) but before the OrderDeliveryAddressUpdated could be updated (i.e., between the two transactions), then the system is left in an inconsistent state.
Question 3. How are such "violations" of consistency rules detected and rectified?

In most instances the delivery address of an order should be independent of any other data change as a customer may want he order sent to an arbitrary address. That being said, I'll give my 2c on how you could approach this:
Is the best place to handle this in a process manager?
Yes. You should have an OrderProcess.
How would one get hold of the correct OrderProcess instance given that it can only be retrieve by aggregate id?
There is nothing preventing one from adding any additional lookup mechanism that associates data to an aggregate id. In my experimental, going-live-soon, mechanism called shuttle-recall I have a IKeyStore mechanism that associates any arbitrary key to an AR Id. So you would be able to associate something like [order-process]:customerId=CID-123; as a key to some aggregate.
How are such "violations" of consistency rules detected and rectified?
In most cases they could be handled out-of-band, if possible. Should I order something from Amazon and I attempt to change my address after the order has shipped the order is still going to the original address. If your case of linking the customer postal address to the active order address you could notify the customer that n number of orders have had their addresses updated but that a recent order (within some tolerance) has not.
As for the system going down before processing you should have some guaranteed delivery mechanism to handle this. I do not regard these domain event in the same way I regard system events in a messaging infrastructure such as a service bus.
Just some thoughts :)

LMAX Disruptor: Must EventHandler clone object received from EventHandler#onEvent

I have an application with Many Producers and consumers.
From my understanding, RingBuffer creates objects at start of RingBuffer init and you then copy object when you publish in Ring and get them from it in EventHandler.
My application LogHandler buffers received events in a List to send it in Batch mode further once the list has reached a certain size. So EventHandler#onEvent puts the received object in the list , once it has reached the size , it sends it in RMI to a server and clears it.
My question, is do I need to clone the object before I put in list, as I understand, once consumed they can be reused ?
Do I need to synchronize access to the list in my EventHandler#onEvent ?

Yes - your understanding is correct. You copy your values in and out of the ringbuffer slots.
I would suggest that yes you clone the values as you extract it from the ring buffer and into your event handler list; otherwise the slot can be reused.
You should not need to synchronise access to the list as long as it is a private member variable of your Event Handler and you only have one event handler instance per thread. If you have multiple event handlers adding to the same (eg static) List instance then you would need synchronisation.
Clarification:
Be sure to read the background in OzgurH's comments below. If you stick to using the endOfBatch flag on disruptor and use that to decide the size of your batch, you do not have to copy objects out of the list. If you are using your own accumulation strategy (such as size - as per the question), then you should clone objects out as the slot could be reused before you have had the chance to send.
Also worth noting that if you are needing to synchronize on the list instance, then you have missed a big opportunity with disruptor and will destroy your performance anyway.

It is possible to use slots in the Disruptor's RingBuffer (including ones containing a List) without cloning/copying values. This may be a preferable solution for you depending on whether you are worried about garbage creation, and whether you actually need to be concerned about concurrent updates to the objects being placed in the RingBuffer. If all the objects being placed in the slot's list are immutable, or if they are only being updated/read by a single thread at a time (a precondition which the Disruptor is often used to enforce), there will be nothing gained from cloning them as they are already immune to data races.
On the subject of batching, note that the Disruptor framework itself provides a mechanism for taking items from the RingBuffer in batches in your EventHandler threads. This is approach is fully thread-safe and lock-free, and could yield better performance by making your memory access patterns more predictable to the CPU.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string