DAG source return false on emitFromTraverser and processor wait for all element loaded by source before start processing - hazelcast-jet

USECASE
HazelcastJet version 0.6.1
Hazelcast version 3.10.2
Given this (simpified version) of a DAG
VERTICES
S1
Source that emits 5 items of type A (read from DB with partitioning)
Local parallelism = 1
S2
Source that emits 150K items of type B (Iterator that read from DB in batch of 100 with partitioning)
Local parallelism = 1
AD
Processor that adapts types A->A1 and B->B1 and emits one by one
FA
Processors.filterP that accepts only items of type A1 and emits one by one
FB
Processors.filterP that accepts only items of type B1 and emits one by one
CL
Processor that first accumulate all items of type A1, then when it receive an item of type B1, enriches it with some staff got from proper A1, and emit, one by one.
WR
Sink that writes B1
Local parallelism = 1
NOTE:
Just to give meaning to the filter processor: in the DAG there are other sources that flows into the same adapter AD and then goes to other paths using filter processors.
EDGES
S1 --> AD
S2 --> AD
AD --> FA (from ordinal 0)
AD --> FB (from ordinal 1)
FA --> CL (to ordinal 0 with priority 0 distributed and broadcast)
FB --> CL (to ordinal 1 with priority 1)
CL --> WR
PROBLEM
If source S2 have "few" items to load (i.e. 15K) the emitFromTraverser never returns false.
If source S2 have "many" items to load (i.e. 150K) the emitFromTraverser returns false after:
All A1 items have been processed by CL
About 30% of B1 items have already been transmitted to CL but no one have been processed by CL (DiagnosticProcessor log that element are sent to CL but not processed)
S2 code for reference:
protected void init(Context context) throws Exception {
super.init(context);
this.iterator = new BQueryIterator(querySupplier, batchSize);
this.traverser = Traversers.traverseIterator(this.iterator);
}
public boolean complete() {
boolean result = emitFromTraverser(this.traverser);
return result;
}
QUESTION
Is it correct that CL doesn't process items until source ends?
Is the usage of priority + distributed + broadcast correct on CL Vertex?
UPDATE
It seems that the completeEdge on CL edge 1 is never called.
Someone can tell me why?
Thanks!

You suffer from a deadlock caused by priority. Your DAG branches from AD and then rejoins in CL, but with a priority.
AD --+---- FA ----+-- CL
\ /
+-- FB --+
Setting a priority causes that no item from lower-priority edge is processed before all items from higher-priority edge are processed. AD will eventually get blocked by backpressure from the lower-priority path, which is not processed by CL. So AD is blocked because it can't emit to the lower priority edge and CL is blocked, because it it's still waiting for items from the higher priority edge, resulting in a deadlock.
In your case, you can resolve it by making 2 AD vertices, each processing items from one of the sources:
S1 --- AD1 ----+--- CL
/
S2 --- AD2 --+

After a while I've understood what's the problem...
CL processor cannot know when all the A1 items have been processed because all items they all come from the AD processor.
So it need to wait for all sources coming from AD before starting the processing of B1 items.
Not sure but probably after a lot of items B loaded, all Inboxes buffers in the DAG become full and can't accept any other B from S2, but at the same time cannot process B1 items to continue: that's the deadlock.
Maybe DAG would be able to detect this?
I don't know Jet so deeply but it would be nice to have that warning.
Maybe is there some logging to enable?
I hope someone can confirm my answer and suggest how to improve and detect these problems.

Related

Is there a way in AnyLogic to assign resources to a population of agents in use rather than individual agents?

I have a simple example of dish washers at a restaurant to illustrate the issue I am having.
Question
How can I ensure that the correct number of dish washers are seized & released when it's depended on the number of agents being used?
Problem
Using a function to assign the resources, the number of dish washers are not always correct due to different times in which sinks are used and not used.
Example
Main:
Generates dishes and randomly assigns them to one of three sinks in the exit block.
Sinks is a population of agents.
dish_washers is a ResourcePool with a capacity of 10.
Sink:
Dishes enter a queue and are entered one at a time using a hold block.
Once the dish is cleaned, the hold is unblocked to grab the next dish.
Details:
I have a shared ResourcePool of dish_washers at a restaurant.
There are 3 sinks at the restaurant.
Dishes are generated and randomly assigned to each sink.
If only 1 sink is being used, then two dish washers are needed.
However, if 2 or more sinks are being used then the number of dish washers becomes:
numberOfDishWashers = 2 + numberOfSinksInUse;
In order to change the numberOfDishWashers as more sinks are being used, I created a function that defines the numberOfDishWashers to be seized from the dish_washer ResourcePool.
int numberOfSinksUsed = 0;
int numberOfWorkersToSeize = 0;
int numberOfWorkersAlreadySeized = 0;
int numberOfWorkersToAssign = 0;
ResourcePool[][] dish_washers;
for(Sink curSink : main.sinks){
if(curSink.queue.size() > 0){
numberOfSinksUsed += 1;
}
}
numberOfWorkersAlreadySeized = main.dish_washers.busy();
numberOfWorkersToSeize = 2 + numberOfSinksUsed;
numberOfWorkersToAssign = numberOfWorkersToSeize - numberOfWorkersAlreadySeized;
dish_washers = new ResourcePool[1][numberOfWorkersToAssign];
for(int i = 0; i < numberOfWorkersToAssign; i++){
dish_washers[0][i] = main.dish_washers;
}
return dish_washers;
Error Description:
However, depending on which sink completes first & releases, the number of dish washer assigned will be incorrect. A traceln at the end of the sink process illustrates this where the numberOfDishWashers seized on the exit block doesn't match "2 + numberOfSinksInUse".
There is an instance where 3 sinks are in used but only 4 workers were seized.
Exit, Sink: C Workers Currently Seized: 4
Sinks in Use: 2
Exit, Sink: C Workers Currently Seized: 4
Sinks in Use: 3
Exit, Sink: C Workers Currently Seized: 5
Sinks in Use: 2
Exit, Sink: C Workers Currently Seized: 4
Sinks in Use: 2
Another way to look at the issue, is this Excel table outlining the current logic.
The number of busy workers doesn't match the number of busy workers there should be based on the number of active sinks.
Methods I have Tried
Custom function to release only the necessary workers to keep the correct total.
Generates an error because the resource gets assigned to the 'agent' or dish.
When the dish gets destroyed it has unreleased resources attached to it.
Passing the "sink" agent through an "enter", "seize", and "exit" block to assign the
resource to the agent "sink" instead of the dish that is generated.
Error regarding the "dish" agent being in the flowchart of the "sink" agent while the
"sink" agent is seizing the workers.
How can I ensure the correct number of dish washers are always grabbed?
So your fundamental problem here is that inside the sink you will seize a dishwasher, then the dish goes into the delay (With the number of dishwashers seized) and once out of the delay it will release what ever dishwashers it seized... But during the time it is in the delay the situation might have changed and you actually wanted to seize a different number of dishwashers for that specific sink...
Your options are to either
Remove dishes from the delay, release the correct amount of dishwashers, return back into the delay and delay for the remainder of the time...
Implement your own logic.
I would go for option 2 as option 1 means that you develop a workaround for the block created by AnyLogic and you will end up not using the blocks the way they were designed, this is unfortunately the issue with blockification
So I would have a collection inside of a sink that shows the number of dishwashers currently assigned to this sink. Then whenever a new dish enters a sink we recalculate the number of dishwashers to assign (perhaps at every sink? ) and then make the correct assignment.
Here is an example with some sample code - I did not test it but you will have something similar

How to save data using multiple threads in grails-2.4.4 application using thread pool

I have a multithreaded program running some logic to come up with rows of data that I need to save in my grails (2.4.4) application. I am using a fixedthreadpool with 30 threads. The skeleton of my program is below. My expectation is that each thread calculates all the attributes and saves on a row in the table. However, the end result I am seeing is that there are some random rows that are not saved. Upon repeating this exercise, it is seen that a different set of rows are not saved in the table. So, overall, each time this is attempted a certain set of rows are NOT saved in table at all. GORMInstance.errors did not reveal any errors. So, I have no clue what is incorrect in this program.
ExecutorService exeSvc = Executors.newFixedThreadPool(30)
for (obj in list){
exeSvc.execute({-> finRunnable obj} as Callable)
}
Also, here's the runnable program that the above snippet invokes.
def finRunnable = {obj ->
for (item in LIST-1){
for (it in LIST-2){
for (i in LIST-3){
rowdata = calculateValues(item, it, i);
GORMInstance instance = new GORMInstance();
instance.withTransaction{
instance.attribute1=rowdata[0];
instance.attribute2=rowdata[1];
......so on..
instance.save(flush:true)/*without flush:true, I am
running into HeuristicCompletion exception. So I need it
here. */
}//endTransaction
}//forloop 3
}//forloop 2
}//forloop 1
}//runnable closure

Spring Integration aggregator based on content of next message

I have to read a file and split each line and group lines based on first column, when the first column value changes I have to release previous group. Can this be done in Spring integration DSL ?
here is how file look like, and it's sorted:
x 1
x 2
x 3
y 4
y 5
y 6
The out put should be two messages with x =1, 2, 3 and y = 4, 5, 6.
Since this doesn't have any other relation, regarding when message should be grouped, Can I group message as soon as I hit next non matching record ? In this case as son as I hit "y" at line number 4, group the previous "x" messages and release it ? Is it possible using custom aggregator ?
The simplest solution is to rely on the groupTimeout() as far as you split and aggregate in a single thread and quickly enough. So, all your records will be processed and distributed to their groups. But since we don't know how to release them, we will rely on some scheduled timeout. So, the configuration for the aggregator would be like this:
.aggregate(a -> a
.correlationExpression("payload.column1")
.releaseStrategy(g -> false)
.groupTimeout(1000)
.sendPartialResultOnExpiry(true)
.outputProcessor(g -> {
Collection<Message<?>> messages = g.getMessages();
// iterate and build your output payload
})
)

Sync parallel operation in Azure

I need some achitectural suggestions about an Azure application. So, there is a queue with items, let's say it is [A, B, A, B, D].
Each distinct item in the queue will get a random category assigned and it is possible to have the same item multiple time in the queue. The category assignment is done by some worker roles which do the following: if the item has already a category assigned, it will add the item to the category, otherwise it will create a new category and the add the item. So it goes like:
D: has category? no. Create category 123. Assign [D, 123]
B: has category? no. Create category 435. Assign [B, 435]
A: has category? no. Create category 154. Assign [A, 154]
B: has category? yes. Assign [B, 435] (category already created)
... etc ...
My dillema is: how do I syncronize workers so that the same item doesn't get two categories? If two workers pick item B them it would be possible to have two categories for "B".
The only way to ensure that you don't get duplicates is to have a lock on the assigning of categories that can be accessed from both instances. The most popular way of doing this in Azure is with a lease on a blob in storage. If your items are of type Foo and you're passing the Id of the Foo through the queue, the pseudo code would look something like this:
int fooId = GetIdFromQueue();
Foo myFoo = LoadFooFromStorage(fooId);
if (myFoo.Category == null)
{
CreateLockBlobIfNoExistForFoo(fooId);
while (not GetLockOnBlobForFoo(fooId))
{
WaitForSomeTime();
}
// Need to reload the underlying item as another thread may have
// been assigning the category while we were waiting on the lock
Foo myFoo = LoadFooFromStorage(fooId);
if (myFoo.Category == null)
{
myFoo.Category = GetRandomCategory();
SaveFoo(myFoo);
}
ReleaseLease(fooId);
}
You'll need to look up some specifics on blob leases, but hopefully that's enough to get you started.
Maintain your Item/Category list in an azure table that is accessible to your worker roles, but this is still likely to end up in duplicates without some kind of throttling. For throttling, for example, set your GetMessage() in a timer loop with a reasonable wait (1-3 seconds) - and before each call to GetMessage(), call PeekMessages(5) to view but not dequeue the next 5 messages. Loop through them and assign categories any unassigned items and store them in the Azure table before calling GetMessage().

GroupBy then ObserveOn loses items

Try this in LinqPad:
Observable
.Range(0, 10)
.GroupBy(x => x % 3)
.ObserveOn(Scheduler.NewThread)
.SelectMany(g => g.Select(x => g.Key + " " + x))
.Dump()
The results are clearly non-deterministic, but in every case I fail to receive all 10 items. My current theory is that the items are going through the grouped observable unobserved as the pipeline marshals to the new thread.
Linqpad doesn't know that you're running all of these threads - it gets to the end of the code immediately (remember, Rx statements don't always act synchronously, that's the idea!), waits a few milliseconds, then ends by blowing away the AppDomain and all of its threads (that haven't caught up yet). Try adding a Thread.Sleep to the end to give the new threads time to catch up.
As an aside, Scheduler.NewThread is a very inefficient scheduler, EventLoopScheduler (create exactly one thread), or Scheduler.TaskPool (use the TPL pool, as if you created a Task for each item) are much more efficient (of course in this case since you only have 10 items, Scheduler.Immediate is the best!)
It appears here that the problem is in timing between starting the subscription to the new group in the GroupBy operation and the delay of implementing the new subscription. If you increase the number of iterations from 10 to 100, you should start seeing some results after a period of time.
Also, if you change the GroupBy to .Where(x => x % 3 == 0), you will likely notice that no values are lost because the dynamic subscription to the IObservable groups doesn't need to initialize new observers.

Resources