Memory issue with camel resequencer - memory-leaks

I have a camel resequencer for giving priorities to the incoming messages.
from(fromLocation)
.routeId(routeName)
.autoStartup(true)
.convertBodyTo(String.class)
.threads(1)
.setHeader(PRIORITY, constant(1))
.to(SEDA_PROCESS_NOTIFICATIONS+id);
from(SEDA_PROCESS_NOTIFICATIONS+id)
.resequence(header(PRIORITY))
.allowDuplicates
from(endpoint2)
.setHeader(PRIORITY, constant(2))
.to(SEDA_PROCESS_NOTIFICATIONS+id);
However, after running this for some 3-4 hours, I get memory exception with the batchprocessor of Resequencer. The batchProcessor is not releasing the memory. The error persists even if I try to decrease the batch size. If I use stream mode of resequencer, then all the messages are not processed. Please let me know what am I missing here?

Related

Service Fabric runtime is not reclaiming unused memory from Actor Service Instance

Our app, one request creates one actor in any one partition, and for processing of one request Memory increasing 200MB-250MB during execution of actor, and after execution is finished I have deleted the actor
Task.Run(() =>
{
actorObject.ExecuteGrainAsync(requestId, jsonModel).ContinueWith(async (t) =>
{
await GrainFactory.DeleteActor(actorObject.GetActorId(), "Workflow", CancellationToken.None);
});
});
IActorService myActorServiceProxy = ActorServiceProxy.Create(
new Uri($"fabric:/APPSeConnect.WebAgent/{actorName}"), actorId);
await myActorServiceProxy.DeleteActorAsync(actorId, cancellationToken);
There are no active actors in partition which we can check by querying the fabric, but still the memory is blocked many minutes(5min - more than 50min). Later exe memory size decreases just few MB.
According to this Document memory should be claimed by runtime in idealtimeout. My setting is
new ActorGarbageCollectionSettings(10, 2)
I am using this attribute too
[StatePersistence(StatePersistence.None)]
During parallel processing of actors it's impact is huge.
We have been facing similar issue as yours, where the Actor Service is not removing its claimed memory in 15 minutes.
The test case is like :
We request an Actor and wait for the actor to complete its execution.
The claimed memory is in 100s of MBs.
After execution is complete, the Actor is programmatically deleted using DeleteActorAsync API as mentioned by you.
The garbageCollection settings is also set to ActorGarbageCollectionSettings = new ActorGarbageCollectionSettings(10, 5).
But in our case the whole of the memory is getting released in at least 30 minutes time. Which makes us think probably it is because of Generations of GC collection.

RabbitMQ: how to limit consuming rate

I need to limit the rate of consuming messages from rabbitmq queue.
I have found many suggestions, but most of them offer to use prefetch option. But this option doesn't do what I need. Even if I set prefetch to 1 the rate is about 6000 messages/sec. This is too many for consumer.
I need to limit for example about 70 to 200 messages per second. This means consuming one message every 5-14ms. No simultaneous messages.
I'm using Node.JS with amqp.node library.
Implementing a token bucket might help:
https://en.wikipedia.org/wiki/Token_bucket
You can write a producer that produces to the "token bucket queue" at a fixed rate with a TTL on the message (maybe expires after a second?) or just set a maximum queue size equal to your rate per second. Consumers that receive a "normal queue" message must also receive a "token bucket queue" message in order to process the message effectively rate limiting the application.
NodeJS + amqplib Example:
var queueName = 'my_token_bucket';
rabbitChannel.assertQueue(queueName, {durable: true, messageTtl: 1000, maxLength: bucket.ratePerSecond});
writeToken();
function writeToken() {
rabbitChannel.sendToQueue(queueName, new Buffer(new Date().toISOString()), {persistent: true});
setTimeout(writeToken, 1000 / bucket.ratePerSecond);
}
I've already found a solution.
I use module nanotimer from npm for calculation delays.
Then I calculate delay = 1 / [message_per_second] in nanoseconds.
Then I consume message with prefetch = 1
Then I calculate really delay as delay - [processing_message_time]
Then I make timeout = really delay before sending ack for the message
It works perfectly. Thanks to all
See 'Fair Dispatch' in RabbitMQ Documentation.
For example in a situation with two workers, when all odd messages are heavy and even messages are light, one worker will be constantly busy and the other one will do hardly any work. Well, RabbitMQ doesn't know anything about that and will still dispatch messages evenly.
This happens because RabbitMQ just dispatches a message when the message enters the queue. It doesn't look at the number of unacknowledged messages for a consumer. It just blindly dispatches every n-th message to the n-th consumer.
In order to defeat that we can use the prefetch method with the value of 1. This tells RabbitMQ not to give more than one message to a worker at a time. Or, in other words, don't dispatch a new message to a worker until it has processed and acknowledged the previous one. Instead, it will dispatch it to the next worker that is not still busy.
I don't think RabbitMQ can provide you this feature out of the box.
If you have only one consumer, then the whole thing is pretty easy, you just let it sleep between consuming messages.
If you have multiple consumers I would recommend you to use some "shared memory" to keep the rate. For example, you might have 10 consumers consuming messages. To keep 70-200 messages rate across all of them, you will make a call to Redis, to see if you are eligible to process message. If yes, then update Redis, to show other consumers that currently one message is in process.
If you have no control over consumer, then implement option 1 or 2 and publish message back to Rabbit. This way the original consumer will consume messages with the desired pace.
This is how I fixed mine with just settimeout
I set mine to process consume every 200mls which will consume 5 data in 1 seconds I did mine to do update if exist
channel.consume(transactionQueueName, async (data) => {
let dataNew = JSON.parse(data.content);
const processedTransaction = await seperateATransaction(dataNew);
// delay ack to avoid duplicate entry !important dont remove the settimeout
setTimeout(function(){
channel.ack(data);
},200);
});
Done

using MPI_Send_variable many times in a row before MPI_Recv_variable

To my current understanding, after calling MPI_Send, the calling thread should block until the variable is received, so my code below shouldn't work. However, I tried sending several variables in a row and receiving them gradually while doing operations on them and this still worked... See below. Can someone clarify step by step what is going on here?
matlab code: (because I am using a matlab mex wrapper for MPI functions)
%send
if mpirank==0
%arguments to MPI_Send_variable are (variable, destination, tag)
MPI_Send_variable(x,0,'A_22')%thread 0 should block here!
MPI_Send_variable(y,0,'A_12')
MPI_Send_variable(z,1,'A_11')
MPI_Send_variable(w,1,'A_21')
end
%recieve
if mpirank==0
%arguments to MPI_Recv_variable are (source, tag)
a=MPI_Recv_variable(0,'A_12')*MPI_Recv_variable(0,'A_22');
end
if mpirank==1
c=MPI_Recv_variable(0,'A_21')*MPI_Recv_variable(0,'A_22');
end
MPI_SEND is a blocking call only in the sense that it blocks until it is safe for the user to use the buffer provided to it. The important text to read here is in Section 3.4:
The send call described in Section 3.2.1 uses the standard communication mode. In this mode, it is up to MPI to decide whether outgoing messages will be buffered. MPI may buffer outgoing messages. In such a case, the send call may complete before a matching receive is invoked. On the other hand, buffer space may be unavailable, or MPI may choose not to buffer outgoing messages, for performance reasons. In this case, the send call will not complete until a matching receive has been posted, and the data has been moved to the receiver.
I highlighted the part that you're running up against in bold there. If your message is sufficiently small (and there are sufficiently few of them), MPI will copy your send buffers to an internal buffer and keep track of things internally until the message has been received remotely. There's no guarantee that when MPI_SEND is done, the message has been received.
On the other hand, if you do want to know that the message was actually received, you can use MPI_SSEND. That function will synchronize (hence the extra S both sides before allowing them to return from the MPI_SSEND and the matching receive call on the other end.
In a correct MPI program, you cannot do a blocking send to yourself without first posting a nonblocking receive. So a correct version of your program would look something like this:
Irecv(..., &req1);
Irecv(..., &req2);
Send(... to self ...);
Send(.... to self ...);
Wait(&req1, ...);
/* do work */
Wait(&req2, ...);
/* do more work */
Your code is technically incorrect, but the reason it is working correctly is because the MPI implementation is using internal buffers to buffer your send data before it is transmitted to the receiver (or matched to the later receive operation in the case of self sends). An MPI implementation is not required to have such buffers (generally called "eager buffers"), but most implementations do.
Since the data you are sending is small, the eager buffers are generally sufficient to buffer them temporarily. If you send large enough data, the MPI implementation will not have enough eager buffer space and your program will deadlock. Try sending, for example, 10 MB instead of a double in your program to notice the deadlock.
I assume that there is just a MPI_Send() behind MPI_Send_variable() and MPI_Receive() behind MPI_Receive_variable().
How do a process can ever receive a message that he sent to himself if both the send and receive operations are blocking ? Either send to self or receive to self are non-blocking or you will get a deadlock, and sending to self is forbidden.
Following answer of #Greginozemtsev Is the behavior of MPI communication of a rank with itself well-defined? , the MPI standard states that send to self and receive to self are allowed. I guess it implies that it's non blocking in this particular case.
In MPI 3.0, in section 3.2.4 Blocking Receive here, page 59, the words have not changed since MPI 1.1 :
Source = destination is allowed, that is, a process can send a message to itself.
(However, it is unsafe to do so with the blocking send
and receive operations described above, since this may lead to deadlock.
See Section 3.5.)
I rode section 3.5, but it's not clear enough for me...
I guess that the parenthesis are here to tell us that talking to oneself is not a good practice, at least for MPI communications !

NIO Server : use a worker thread or not?

I'm building a server with NIO, I have two questions.
Do I have to use a worker thread or a thread pool to process the messages received, or let the main thread do all this stuff ( I have performance needs).
I have two kind of sending, sendNow method which ends with selector.selectNow() and simple send method which ends with selector.wakeup().. can I have loss of data those methods?
thanks
If possible try to do it all in one thread. It gets very complicated very quickly otherwise.
I don't know why you think a sendNow() method needs to end with either selectNow() or wakeup(), but neither of them is intrinsically going to cause a data loss.

640 enterprise library caching threads - how?

We have an application that is undergoing performance testing. Today, I decided to take a dump of w3wp & load it in windbg to see what is going on underneath the covers. Imagine my surprise when I ran !threads and saw that there are 640 background threads, almost all of which seem to say the following:
OS Thread Id: 0x1c38 (651)
Child-SP RetAddr Call Site
0000000023a9d290 000007ff002320e2 Microsoft.Practices.EnterpriseLibrary.Caching.ProducerConsumerQueue.WaitUntilInterrupted()
0000000023a9d2d0 000007ff00231f7e Microsoft.Practices.EnterpriseLibrary.Caching.ProducerConsumerQueue.Dequeue()
0000000023a9d330 000007fef727c978 Microsoft.Practices.EnterpriseLibrary.Caching.BackgroundScheduler.QueueReader()
0000000023a9d380 000007fef9001552 System.Threading.ExecutionContext.runTryCode(System.Object)
0000000023a9dc30 000007fef72f95fd System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
0000000023a9dc80 000007fef9001552 System.Threading.ThreadHelper.ThreadStart()
If i had to give a guess, I'm thinkign that one of these threads are getting spawned for each run of our app - we have 2 app servers, 20 concurrent users, and ran the test approximately 30 times...it's in the neighborhood.
Is this 'expected behavior', or perhaps have we implemented something improperly? The test ran hours ago, so i would have expected any timeouts to have occurred already.
Edit: Thank you all for your replies. It has been requested that more detail be shown about the callstack - here is the output of !mk from sosex.dll.
ESP RetAddr
00:U 0000000023a9cb38 00000000775f72ca ntdll!ZwWaitForMultipleObjects+0xa
01:U 0000000023a9cb40 00000000773cbc03 kernel32!WaitForMultipleObjectsEx+0x10b
02:U 0000000023a9cc50 000007fef8f5f595 mscorwks!WaitForMultipleObjectsEx_SO_TOLERANT+0xc1
03:U 0000000023a9ccf0 000007fef8f59f49 mscorwks!Thread::DoAppropriateAptStateWait+0x41
04:U 0000000023a9cd50 000007fef8e55b99 mscorwks!Thread::DoAppropriateWaitWorker+0x191
05:U 0000000023a9ce50 000007fef8e2efe8 mscorwks!Thread::DoAppropriateWait+0x5c
06:U 0000000023a9cec0 000007fef8f0dc7a mscorwks!CLREvent::WaitEx+0xbe
07:U 0000000023a9cf70 000007fef8fba72e mscorwks!Thread::Block+0x1e
08:U 0000000023a9cfa0 000007fef8e1996d mscorwks!SyncBlock::Wait+0x195
09:U 0000000023a9d0c0 000007fef9463d3f mscorwks!ObjectNative::WaitTimeout+0x12f
0a:M 0000000023a9d290 000007ff002321b3 *** ERROR: Module load completed but symbols could not be loaded for Microsoft.Practices.EnterpriseLibrary.Caching.DLL
Microsoft.Practices.EnterpriseLibrary.Caching.ProducerConsumerQueue.WaitUntilInterrupted()(+0x0 IL)(+0x11 Native)
0b:M 0000000023a9d2d0 000007ff002320e2 Microsoft.Practices.EnterpriseLibrary.Caching.ProducerConsumerQueue.Dequeue()(+0xf IL)(+0x18 Native)
0c:M 0000000023a9d330 000007ff00231f7e Microsoft.Practices.EnterpriseLibrary.Caching.BackgroundScheduler.QueueReader()(+0x9 IL)(+0x12 Native)
0d:M 0000000023a9d380 000007fef727c978 System.Threading.ExecutionContext.runTryCode(System.Object)(+0x18 IL)(+0x106 Native)
0e:U 0000000023a9d440 000007fef9001552 mscorwks!CallDescrWorker+0x82
0f:U 0000000023a9d490 000007fef8e9e5e3 mscorwks!CallDescrWorkerWithHandler+0xd3
10:U 0000000023a9d530 000007fef8eac83f mscorwks!MethodDesc::CallDescr+0x24f
11:U 0000000023a9d790 000007fef8f0cbd2 mscorwks!ExecuteCodeWithGuaranteedCleanupHelper+0x12a
12:U 0000000023a9da20 000007fef945e572 mscorwks!ReflectionInvocation::ExecuteCodeWithGuaranteedCleanup+0x172
13:M 0000000023a9dc30 000007fef7261722 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)(+0x60 IL)(+0x51 Native)
14:M 0000000023a9dc80 000007fef72f95fd System.Threading.ThreadHelper.ThreadStart()(+0x8 IL)(+0x2a Native)
15:U 0000000023a9dcd0 000007fef9001552 mscorwks!CallDescrWorker+0x82
16:U 0000000023a9dd20 000007fef8e9e5e3 mscorwks!CallDescrWorkerWithHandler+0xd3
17:U 0000000023a9ddc0 000007fef8eac83f mscorwks!MethodDesc::CallDescr+0x24f
18:U 0000000023a9e010 000007fef8f9ae8d mscorwks!ThreadNative::KickOffThread_Worker+0x191
19:U 0000000023a9e330 000007fef8f59374 mscorwks!TypeHandle::GetParent+0x5c
1a:U 0000000023a9e380 000007fef8e52045 mscorwks!SVR::gc_heap::make_heap_segment+0x155
1b:U 0000000023a9e450 000007fef8f66139 mscorwks!ZapStubPrecode::GetType+0x39
1c:U 0000000023a9e490 000007fef8e1c985 mscorwks!ILCodeStream::GetToken+0x25
1d:U 0000000023a9e4c0 000007fef8f594e1 mscorwks!Thread::DoADCallBack+0x145
1e:U 0000000023a9e630 000007fef8f59399 mscorwks!TypeHandle::GetParent+0x81
1f:U 0000000023a9e680 000007fef8e52045 mscorwks!SVR::gc_heap::make_heap_segment+0x155
20:U 0000000023a9e750 000007fef8f66139 mscorwks!ZapStubPrecode::GetType+0x39
21:U 0000000023a9e790 000007fef8e20e15 mscorwks!ThreadNative::KickOffThread+0x401
22:U 0000000023a9e7f0 000007fef8e20ae7 mscorwks!ThreadNative::KickOffThread+0xd3
23:U 0000000023a9e8d0 000007fef8f814fc mscorwks!Thread::intermediateThreadProc+0x78
24:U 0000000023a9f7a0 00000000773cbe3d kernel32!BaseThreadInitThunk+0xd
25:U 0000000023a9f7d0 00000000775d6a51 ntdll!RtlUserThreadStart+0x1d
Yes, the caching block has some - issues - with regard to the scavenger threads in older versions of Entlib, particularly if things are coming in faster than the scavenging settings let them come out.
This was completely rewritten in Entlib 5, so that now you'll never have more than two threads sitting in the caching block, regardless of the load, and usually it'll only be one.
Unfortunately there's no easy tweak to change the behavior in earlier versions. The best you can do is change the cache settings so that each scavenge will clean out more items at a time so not as many scavenge requests need to get scheduled.
640 threads is very bad for performance. If they are all waiting for something, then I'd say it's a fair bet that you have a deadlock and they will never exit. If they are all running (not waiting)... well, with 600+ threads on a 2 or 4 core processor none of them will get enough time slices to run very far! ;>
If your app is set up with a main thread that waits on the thread handles to find out when the threads exit, and the background threads get caught up in a loop or in a wait state and never exit the thread proc, then the process and all of its threads will never exit.
Check your thread code to make sure that every threadproc has a clear path to exit the threadproc. It's bad form to write an infinite loop in a background thread on the assumption that the thread will be forcibly terminated when the process shuts down.
If the background thread code spins in a loop waiting for an event handle to signal, make sure that you have some way to signal that event so that the thread can perform a normal orderly exit. Otherwise, you need to write the background thread to wait on multiple events and unblock when any one of the events signals. One of those events can be the activity that the background thread is primarily interested in and the other can be a shutdown event.
From the names of things in the stack dump you posted, it would appear that the thread is waiting for something to appear in the ProducerConsumerQueue. Investigate how that queue object is supposed to be shut down, probably on the producer side, and whether shutting down the queue will automatically release all consumers that are waiting on that queue.
My guess is that either the queue is not being shut down correctly or shutting it down does not implicitly release the consumers that are waiting on it. If the latter case, you may need to pump a terminate message through the queue to wake up all the consumers waiting on that queue and tell them to break out of their wait loop and exit.
You have an major issue. Every Thread occupies 1MB of stack and there is significant cost paid for Context Switching every thread in and out. Especially it becomes worst with managed code because every time GC has to run , it would have walk the threads stack to look for roots and when these threads are paged to the disk the cost to read from the disk is expensive,which adds up Perf issue.
Creating threads are Bad unless you know what you are doing? Jeffery Richter has written in detail about this.
To solve the above issue I would look what these threads are blocked on and also put a break-point on Thread Create (example sxe ct within windbg)
And later rearchitect from avoid creating threads , instead use the thread pool.
It would have been nice to some callstacks of these threads.
In Microsoft Enterprise Library 4.1, the BackgroundScheduler class creates a new thread each time an object is instantiated. It will be fixed in version 5.0. I do not know enough of this Microsoft Library to advise you how to avoid that behavior, but you may try the beta version: http://entlib.codeplex.com/wikipage?title=EntLib5%20Beta2

Resources