Hazelcast/Coherence Grid Computing EntryProcessor with data for each key - hazelcast

I want to use hazelcast or coherence EntryProcessor to process some logic in parallel execution on different nodes where the keys are stored in a cache. I see i can use something like sendToEachKey(EntryProcessor process).
My problem comes when i need to send also with the logic a piece of data to process too that belongs to another system and i receive it(in a http request for example).
Sure i can do something like sendToEachKey(EntryProcessor(data) process). But if the data it's diferent to each key and i want to send to a specific key only his data to process, how can i do that? Why i want to do that is because the data is too big and I have network overload.
Sure that if I open a thread pool to send each data to each key is possible but it is inefficient because of the huge requests.
Thanks!

For Hazelcast you could retrieve all values and send each key it's own EntryProcessor, however this will create a lot of overhead.
The other option would be to use a combination of EntryProcessor and our distributed ExecutorService.
You send a Runnable to an ExecutorService. Inside the Runnable you retrieve the local keyset, retrieve all the external values (all that already local to the node) and than you emit one EntryProcessor per local key. Since you're already local to the node, there's no more traffic flying around (apart from the backups, obviously :)). That said you might want to implement a specific EntryProcessor that only transmits the changed value but not the full processor itself (to save even more traffic).

In Coherence you could use PartitionedService to find association of cache keys to cluster members. Then you could invoke entry processor with data for each member, using PartitionedFilter to make sure that data is sent only to that member. Something like this:
// keys in this map are also keys in cache
void processData(Map<String, Data> externalData) {
PartitionedService partitionedService = (PartitionedService) cache.getCacheService();
Map<Member, Map<String, Data>> dataForMembers = splitDataByMembers(partitionedService, externalData);
for (Entry<Member, Map<String, Data>> dataForMember : dataForMembers.entrySet()) {
Member member = dataForMember.getKey();
Map<String, Data> data = dataForMember.getValue();
PartitionSet partitions = partitionedService.getOwnedPartitions(member);
PartitionedFilter filter = new PartitionedFilter<>(Filters.always(), partitions);
EntryProcessor processor = new MyEntryProcessor(data);
cache.async().invokeAll(filter, processor);
}
}
Map<Member, Map<String, Data>> splitDataByMembers(
PartitionedService partitionedService,
Map<String, Data> externalData) {
Map<Member, Map<String, Data>> dataForMembers = new HashMap<>();
for (Object member : partitionedService.getInfo().getServiceMembers()) {
dataForMembers.put((Member) member, new HashMap<>());
}
for (Entry<String, Data> dataForKey : externalData.entrySet()) {
Member member = partitionedService.getKeyOwner(dataForKey.getKey());
dataForMembers.get(member).put(dataForKey.getKey(), dataForKey.getValue());
}
return dataForMembers;
}
This way there will be only one entry processor invocation for each member in cluster and each member will get only data it is interested in.
I used String as a cache key and arbitrary Data type for data associated with this key, but you could of course use any other types (and you don't have to model external data as a map at all).

In Hazelcast you would be doing executeOnKeys(keys, new EntryProcessor(data)) and this is too much as the data is too big.
Why not
executeOnKey(key1, new EntryProcessor(data1));
executeOnKey(key2, new EntryProcessor(data2));
executeOnKey(key3, new EntryProcessor(data3));
to send the data subset each key needs ?

Related

Spring integration: How to implement a JDBCOutboundGateway in the middle of a MessageChain?

This appears, to me, to be a simple problem that is probably replicated all over the place. A very basic application of the MessageHandlerChain, probably using nothing more than out of the box functionality.
Conceptually, what I need is this:
(1) Polled JDBC reader (sets parameters for integration pass)
|
V
(2) JDBC Reader (uses input from (1) to fetch data to feed through channel
|
V
(3) JDBC writer (writes data fetched by (2) to target)
|
V
(4) JDBC writer (writes additional data from the original parameters fetched in (1))
What I think I need is
Flow:
From: JdbcPollingChannelAdapter (setup adapter)
Handler: messageHandlerChain
Handlers (
JdbcPollingChannelAdapter (inbound adapter)
JdbcOutboundGateway (outbound adapter)
JdbcOutboundGateway (cleanup gateway)
)
The JdbcPollingChannelAdapter does not implement the MessageHandler API, so I am at a loss how to read the actual data based on the setup step.
Since the JdbcOutboundGateway does not implement the MessageProducer API, I am at a bit of a loss as to what I need to use for the outbound adapter.
Are there OOB classes I should be using? Or do I need to somehow wrap the two adapters in BridgeHandlers to make this work?
Thanks in advance
EDIT (2)
Additional configuration problem
The setup adapter is pulling a single row back with two timestamp columns. They are being processed correctly by the "enrich headers" piece.
However, when the inbound adapter is executing, the framework is passing in java.lang.Object as parameters. Not String, not Timestamp, but an actual java.lang.Object as in new Object ().
It is passing the correct number of objects, but the content and datatypes are lost. Am I correct that the ExpressionEvaluatingSqlParameterSourceFactory needs to be configured?
Message:
GenericMessage [payload=[{startTime=2020-11-18 18:01:34.90944, endTime=2020-11-18 18:01:34.90944}], headers={startTime=2020-11-18 18:01:34.90944, id=835edf42-6f69-226a-18f4-ade030c16618, timestamp=1605897225384}]
SQL in the JdbcOutboundGateway:
Select t.*, w.operation as "ops" from ADDRESS t
Inner join TT_ADDRESS w
on (t.ADDRESSID = w.ADDRESSID)
And (w.LASTUPDATESTAMP >= :payload.from[0].get("startTime") and w.LASTUPDATESTAMP <= :payload.from[0].get("endTime") )
Edit: added solution java DSL configuration
private JdbcPollingChannelAdapter setupAdapter; // select only
private JdbcOutboundGateway inboundAdapter; // select only
private JdbcOutboundGateway insertUpdateAdapter; // update only
private JdbcOutboundGateway deleteAdapter; // update only
private JdbcMessageHandler cleanupAdapter; // update only
setFlow(IntegrationFlows
.from(setupAdapter, c -> c.poller(Pollers.fixedRate(1000L, TimeUnit.MILLISECONDS).maxMessagesPerPoll(1)))
.enrichHeaders(h -> h.headerExpression("ALC_startTime", "payload.from[0].get(\"ALC_startTime\")")
.headerExpression("ALC_endTime", "payload.from[0].get(\"ALC_endTime\")"))
.handle(inboundAdapter)
.enrichHeaders(h -> h.headerExpression("ALC_operation", "payload.from[0].get(\"ALC_operation\")"))
.handle(insertUpdateAdapter)
.handle(deleteAdapter)
.handle(cleanupAdapter)
.get());
flowContext.registration(flow).id(this.getId().toString()).register();
If you would like to carry the original arguments down to the last gateway in your flow, you need to store those arguments in the headers since after every step the payload of reply message is going to be different and you won't have original setup data over there any more. That's first.
Second: if you deal with IntegrationFlow and Java DSL, you don't need to worry about messageHandlerChain since conceptually the IntegrationFlow is a chain by itself but much more advance.
I'm not sure why you need to use a JdbcPollingChannelAdapter to request data on demand according incoming message from the source in the beginning of your flow.
You definitely still need to use a JdbcOutboundGateway for just SELECT mode. The updateQuery is optional, so that gateway is just going to perform SELECT and return a data for you in a payload of the reply message.
If you two next steps are just "write" and you don't care about the result, you probably can just take a look into a PublishSubscribeChannel and two JdbcMessageHandler as subscribers to it. Without a provided Executor for the PublishSubscribeChannel they are going to be executed one-by-one.

Hazelcast Jet with an IMap source and OBJECT in-memory format

I have items in a Hazelcast IMap in OBJECT format, and I'm using a Jet aggregation operation with that IMap as a pipeline source. I was hoping, because of the OBJECT format, to avoid any serialisation/deserialisation of the items in my IMap during processing, the same way native Hazelcast entry processing and queries work. However, I can see that my items are in fact being serialised and then deserialised before being passed to my aggregator.
Is it possible to avoid the serialisation/deserialisation step when using Jet in this way? If so, how?
Yes, local map reader will always serialize/deserialize the entries. The only way I can think of to work around this is to use a custom source which uses map.localKeySet() and then use mapUsingIMap to do a join on those keys. The source would look like below:
SourceBuilder.batch("localKeys", c -> c.jetInstance().getMap("map"))
.fillBufferFn((map, buf) -> {
for (Object key : map.localKeySet(predicate)) {
buf.add(key);
}
buf.close();
}).distributed(1).build());

How we can save a Hazelcast map entry in a database or a file when the entry reaches its Time To Live?

In my application, we are using Hazelcast and an oracle DB.
When we start a transition, we put an entry containing the transition id in a Hazelcast map.
If that transition id is consumed within the TTL, it is fine. My application used it and the same thread evicts that entry from hazelcast map and save that transition id in the database. No issue
My issue comes when that transition id is not consumed within its TTL. Hazelcast simply removes that entry from the map.
I want to retrieve that entry and save it in the database.
How would you approach that problem? I am working with Hazelcast running in a cluster with at least 4 nodes.
One solution can be to store Hazelcast Map via Mapstore Interface so even though TTL enabled data is evicted, it will be kept in your backing store.
The other solution is to catch the evicted entries and store it some backing store.
Here is a sample application that you can try yourself.
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance();
IMap<String, String> map = hazelcastInstance.getMap("test");
map.addEntryListener(new EntryEvictedListener<String,String>() {
#Override
public void entryEvicted(EntryEvent<String, String> entryEvent) {
System.out.println(entryEvent.getKey() + ":" + entryEvent.getOldValue());
}
},true);
map.put( "1", "John", 10, TimeUnit.SECONDS );
After 10 seconds, you will see that entry is evicted and printed out to the console.
The events are sent to all nodes so it can be better to implement EntryListenerConfig, add it to MapConfig and set local parameter true to receive only local events.
Please see Registering Map Listeners for more info.

How to load document out of database instead of memory

Using Raven client and server #30155. I'm basically doing the following in a controller:
public ActionResult Update(string id, EditModel model)
{
var store = provider.StartTransaction(false);
var document = store.Load<T>(id);
model.UpdateEntity(document) // overwrite document property values with those of edit model.
document.Update(store); // tell document to update itself if it passes some conflict checking
}
Then in document.Update, I try do this:
var old = store.Load<T>(this.Id);
if (old.Date != this.Date)
{
// Resolve conflicts that occur by moving document period
}
store.Update(this);
Now, I run into the problem that old gets loaded out of memory instead of the database and already contains the updated values. Thus, it never goes into the conflict check.
I tried working around the problem by changing the Controller.Update method into:
public ActionResult Update(string id, EditModel model)
{
var store = provider.StartTransaction(false);
var document = store.Load<T>(id);
store.Dispose();
model.UpdateEntity(document) // overwrite document property values with those of edit model.
store = provider.StartTransaction(false);
document.Update(store); // tell document to update itself if it passes some conflict checking
}
This results in me getting a Raven.Client.Exceptions.NonUniqueObjectException with the text: Attempted to associate a different object with id
Now, the questions:
Why would Raven care if I try and associate a new object with the id as long as the new object carries the proper e-tag and type?
Is it possible to load a document in its database state (overriding default behavior to fetch document from memory if it exists there)?
What is a good solution to getting the document.Update() to work (preferably without having to pass the old object along)?
Why would Raven care if I try and associate a new object with the id as long as the new object carries the proper e-tag and type?
RavenDB leans on being able to serve the documents from memory (which is faster). By checking for persisting objects for the same id, hard to debug errors are prevented.
EDIT: See comment of Rayen below. If you enable concurrency checking / provide etag in the Store, you can bypass the error.
Is it possible to load a document in its database state (overriding default behavior to fetch document from memory if it exists there)?
Apparantly not.
What is a good solution to getting the document.Update() to work (preferably without having to pass the old object along)?
I went with refactoring the document.Update method to also have an optional parameter to receive the old date period, since #1 and #2 don't seem possible.
RavenDB supports optimistic concurrency out of the box. The only thing you need to do is to call it.
session.Advanced.UseOptimisticConcurrency = true;
See:
http://ravendb.net/docs/article-page/3.5/Csharp/client-api/session/configuration/how-to-enable-optimistic-concurrency

Can I query the redis cache to get values by a list of keys?

I'm using the azure redis cache with a .NET implementation. I have a list of keys that I need to get from the cache. Right now, this is my implementation: `
List<string> planIds = ...; // already initialized.
List<customObj> plans = new List<customObj>();
foreach (string currentId in planIds)
{
var plan = Database.StringGet(key);
if (plan != null) plans.Add(plan);
}
I've simplified it a bit for my explanation, but it works just fine. However, I was wondering if I could do a similar setup to the batch set for the batch download by passing it a list of keys that I want to retrieve. It's usually around 200+ ids. Is that doable?
Have a look at StringGet overloaded method
You can pass an array of keys to get the array of values in single call.
It will execute Redis MGET call here

Resources