Azure Function with Java: configure batchSize and newBatchThreshold efficiently - azure

I'm considering to use such a solution when Function is triggered by Queue on Java. I'm trying to understand how to configure batchSize and newBatchThreshold more efficiently. I would like to mention below what I managed to find out about it. Please correct me as soon as you find a mistake in my reasoning:
Function is executed on 1 CPU-core environment;
Function polls messages from Queue in batches with size 16 by default and executes them in parallel (right from the documentation);
so I make a conclusion that:
if messages need CPU-intensive tasks - they are executed sequentially;
so I make a conclusion that:
since processing of messages starts at the same time (when batch arrived) then processing of more last messages takes longer and longer (confirmed experimentally);
all these longer and longer processings are billable (despite Function's body execution lasts 10 times less);
so I make a conclusion that:
One should set both batchSize and newBatchThreshold to 1 for CPU-intensive tasks and can vary only for non-CPU intensive tasks (looks like only IO-intensive tasks).
Does it make sense?

Related

Durable Functions could reduce my time execution?

I can execute a process "x" in parallel using Azure Functions Durable Fan In/Fan Out.
If I divide my unique process "x" in multiple process using this concept, can I reduce the execution time for the function?
In general Azure Functions Premium allow for higher timeout values. So, if you don't want to deal with the issue, just upgrade ;-)
Azure Durable Functions might or might not reduce your total runtime.
BUT every "Activity Call" is a new function execution with an own timeout.
Either Fanning out or even calling activities it in serial will prevent timeout issue as long the called activity will not extend the timeout period for functions.
If, however you have an activity which will run for an extended period, you will need premium functions anyway. But your solution with "batch" processing looks quite promising to avoid this.
Making use of the fan-out/fan-in approach, you will run tasks in parallel instead of sequentially so the duration of the execution will be the duration of your longest single task to execute. It's the best approach to use if the requests do not require information from each other to process.
You could make use of Task Asynchronous Programming (TAP) to build tasks, call relevant methods and wait for all tasks to finish if you don't want them to be on Durable Functions

How can I debug and see what gets stored over time into the FIFO queue of inter-thread communication plugin?

I have the following JMeter context:
In one Concurrency Thread Group 1, I have a JSR223 Sampler which sends request messages to an MQ queue1 and always gets the JMSMessageID and an epochTimestamp (derived from JMS_IBM_PutDate + JMS_IBM_PutTime) and puts them into one variable. Underneath this Sampler is an Inter-Thread Communication PostProcessor element which gets the data from this variable and puts it into a FIFO QUEUE.
In another Concurrency Thread Group 2, I have another JSR223 Sampler with code to get the response messages for all the messages sent on MQ queue 1 from an MQ queue2.
To do this, (and be able to calculate the response time fore each message) before the JSR223 Sampler executes, I use the Inter-Thread Communication PreProcessor element which gets a message ID and a timestamp from the FIFO queue (60 seconds timeout) and passes it over to a variable with which the JSR223 Sampler can work to calculate the request-response time for each message.
I want to stress-test the system, which is why I am gradually dynamically increasing the Requests per second at every 1 minute (for script testing purposes) in both thread groups, like so:
I use the tstFeedback function of Concurrency Thread Group for this:
${__tstFeedback(ThroughputShapingTimerIn,1,1000,10)}
My problem is this:
When I gradually increase the desired TPS load, during the first 4 target TPS steps, the Consumer threads keep up (synchronized) with the Producer threads, but as the time passes and load increases, the consumer threads seem to be taking more time to find and consume the messages. It's as though the load of the consumer treads is no longer able to keep up with the load of the producer threads, despite both thread groups having the same load pattern. This eventually causes the queue2 which is keeping the response messages to get full. Here is a visual representation of what I mean:
The Consumer samples end up being much less than the producer samples. My expectation is that they should be more or less equal...
I need to understand how I can go about to debug this script and isolate the cause:
I think that something happens at the inter-thread synchronization level because sometimes I am getting null values from the FIFO queue into the consumer threads - I need to understand what gets put into that FIFO queue and what gets taken off of that FIFO queue.
How can I print what is present in the FIFO list at each iteration?
Does anyone have any suggestions for what could be the cause of this behavior and how to mitigate it?
Any help/suggestion is greatly appreciated.
First of all take a look at jmeter.log file, you have at least 865 errors there so I strongly doubt your Groovy scripts are doing what they're supposed to be doing
Don't run your test in GUI mode, it's only for tests development and debugging, when it comes to execution you should be using command-line non-GUI mode
When you call __fifoPop() you can save the value into a JMeter Variable like ${__fifoPop(queue-name,some-variable)}, the variable can be visualized using Debug Sampler. The size of the queue can be checked using __fifoSize() function
Alternatively my expectation is that such a Groovy expert as you shouldn't have any problems printing queue items in Groovy code:

Why is Python consistently struggling to keep up with constant generation of asyncio tasks?

I have a Python project with a server that distributes work to one or more clients. Each client is given a number of assignments which contain parameters for querying a target API. This includes a maximum number of requests per second they can make with a given API key. The clients process the response and send the results back to the server to store into a database.
Both the server and clients use Tornado for asynchronous networking. My initial implementation for the clients relied on the PeriodicCallback to ensure that n-number of calls to the API would occur. I thought that this was working properly as my tests would last 1-2 minutes.
I added some telemetry to collect statistics on performance and noticed that the clients were actually having issues after almost exactly 2 minutes of runtime. I had set the API requests to 20 per second (the maximum allowed by the API itself) which the clients could reliably hit. However, after 2 minutes performance would fluctuate between 12 and 18 requests per second. The number of active tasks steadily increased until it hit the maximum amount of active assignments (100) given from the server and the HTTP request time to the API was reported by Tornado to go from 0.2-0.5 seconds to 6-10 seconds. Performance is steady if I only do 14 requests per second. Anything higher than 15 requests will experience issues 2-3 minutes after starting. Logs can be seen here. Notice how the column of "Active Queries" is steady until 01:19:26. I've truncated the log to demonstrate
I believed the issue was the use of a single process on the client to handle both communication to the server and the API. I proceeded to split the primary process into several different processes. One handles all communication to the server, one (or more) handles queries to the API, another processes API responses into a flattened class, and finally a multiprocessing Manager for Queues. The performance issues were still present.
I thought that, perhaps, Tornado was the bottleneck and decided to refactor. I chose aiohttp and uvloop. I split the primary process in a similar manner to that in the previous attempt. Unfortunately, performance issues are unchanged.
I took both refactors and enabled them to split work into several querying processes. However, no matter how much you split the work, you still encounter problems after 2-3 minutes.
I am using both Python 3.7 and 3.8 on MacOS and Linux.
At this point, it does not appear to be a limitation of a single package. I've thought about the following:
Python's asyncio library cannot handle more than 15 coroutines/tasks being generated per second
I doubt that this is true given that different libraries claim to be able to handle several thousand messages per second simultaneously. Also, we can hit 20 requests per second just fine at the start with very consistent results.
The API is unable to handle more than 15 requests from a single client IP
This is unlikely as I am not the only user of the API and I can request 20 times per second fairly consistently over an extended period of time if I over-subscribe processes to query from the API.
There is a system configuration causing the limitation
I've tried both MacOS and Debian which yield the same results. It's possible that's it a *nix problem.
Variations in responses cause a backlog which grows linearly until it cannot be tackled fast enough
Sometimes responses from the API grow and shrink between 0.2 and 1.2 seconds. The number of active tasks returned by asyncio.all_tasks remains consistent in the telemetry data. If this were true, we wouldn't be consistently encountering the issue at the same time every time.
We're overtaxing the hardware with the number of tasks generated per second and causing thermal throttling
Although CPU temperatures spike, neither MacOS nor Linux report any thermal throttling in the logs. We are not hitting more than 80% CPU utilization on a single core.
At this point, I'm not sure what's causing it and have considered refactoring the clients into a different language (perhaps C++ with Boost libraries). Before I dive into something so foolish, I wanted to ask if I'm missing something simple.
Conclusion
Performance appears to vary wildly depending on time of day. It's likely to be the API.
How this conclusion was made
I created a new project to demonstrate the capabilities of asyncio and determine if it's the bottleneck. This project takes two websites, one to act as the baseline and the other is the target API, and runs through different methods of testing:
Spawn one process per core, pass a semaphore, and query up to n-times per second
Create a single event loop and create n-number of tasks per second
Create multiple processes with an event loop each to distribute the work, with each loop performing (n-number / processes) tasks per second
(Note that spawning processes is incredibly slow and often commented out unless using high-end desktop processors with 12 or more cores)
The baseline website would be queried up to 50 times per second. asyncio could complete 30 tasks per second reliably for an extended period, with each task completing their run in 0.01 to 0.02 seconds. Responses were very consistent.
The target website would be queried up to 20 times per second. Sometimes asyncio would struggle despite circumstances being identical (JSON handling, dumping response data to queue, returning immediately, no CPU-bound processing). However, results varied between tests and could not always be reproduced. Responses would be under 0.4 seconds initially but quickly increase to 4-10 seconds per request. 10-20 requests would return as complete per second.
As an alternative method, I chose a parent URI for the target website. This URI wouldn't have a large query to their database but instead be served back with a static JSON response. Responses bounced between 0.06 seconds to 2.5-4.5 seconds. However, 30-40 responses would be completed per second.
Splitting requests across processes with their own event loop would decrease response time in the upper-bound range by almost half, but still took more than one second each to complete.
The inability to reproduce consistent results every time from the target website would indicate that it's a performance issue on their end.

Autotmatically renewing locks correctly on Azure Service Bus

So i'm trying to understand service bus timings... Especially how the locks works. One can choose to manually call CompleteAsync which is what we're doing. It could also be the case that the processing takes some time. In these cases we want to make sure we don't get unneccessary MessageLockLostException.
Seems there are a couple of numbers to relate to:
Lock duration (found in azure portal on the bus, currently set to 1 minute which is think is default)
AutoRenewTimeout (property on OnMessageOptions, currently set to 1 minute)
AutoComplete (property on OnMessageOptions, currently set to false)
Assuming the processing is running for around 2 minutes, and then either succeeds or crases (doesn't matter which case for now). Let's say this is the normal scenario, so this means that processing takes roughly 2 minutes for each message.
Also, it's indeed a queue and not a topic. And we only have one consumer that asynchronoulsy processes the messages with MaxConcurrentCalls set to 100. We're using OnMessageAsync with ReceiveMode.PeekLock.
What should my settings now be as a single consumer to robustly process all messages?
I'm thinking that leaving Lock duration to 1 minute would be fine, as that's the default, and set my AutoRenewTimeout to 5 minutes for safety, because as i've understood this value should be the maximum time it takes to process a message (atleast according to this answer). Performance is not critical for this system, so i'm resonating as that leaving a message locked for some unneccessary 1, 2 or 3 minutes is not evil, as long as we don't get LockedException because these give no real value.
This thread and this thread gives great examples of how to manually renew the locks, but I thought there is a way to automatically renew the locks.
What should my settings now be as a single consumer to robustly process all messages?
Aside from LockDuration, MaxConcurrentCalls, AutoRenewTimeout, and AutoComplete there are some configurations of the Azure Service Bus client you might want to look into. For example, create not a single client with MaxConcurrentCalls set to 100, but a few clients with total concurrency level distributed among the clients. Note that you'd want to use different MessagingFactory instances to create those clients to ensure you have more than a single "pipe" to receive messages. And even with that, it would be way better to scale out and have competing consumers rather than having a single consumer handling all the load.
Now back to the settings. If your normal processing time is 2 minutes, it's better to set MaxLockDuration on the entities to this time and not 1 minute. This will remove unnecessary lock extension calls to the broker and eliminate MessageLockLostException.
Also, keep in mind that AutoRenewTimeout is a client based operation, not broker, and therefore not guaranteed. You will run into cases where lock will be lost even though the AutoRenewTimeout time has not elapsed yet.
AutoRenewTimeout should always be set to longer than MaxLockDuration as it will be counterproductive to have them equal. Have it somewhat larger than MaxLockDuration as this is clients' "insurance" that when processing takes longer than MaxLockDuration, message lock won't be lost. Having those two equal is, in essence, disables this fallback.

Multithreading Task Library, Threading.Timer or threads?

Hi we are building an application that will have the possibility to register scheduled tasks.
Each task has an time interval when it should be executed
Each task should have an timeout
The amount of tasks can be infinite but around 100 in normal cases.
So we have an list of tasks that need to be executed in intervals, which are the best solution?
I have looked at giving each task their timer and when the timer elapses the work will be started, another timer keeps tracks on the timeout so if the timeout is reached the other timer stops the thread.
This feels like we are overusing timers? Or could it work?
Another solution is to use timers for each task, but when the time elapses we are putting the task on a queue that will be read with some threads that executes the work?
Any other good solutions I should look for?
There is not too much information but it looks like that you can consider RX as well - check more at MSDN.com.
You can think about your tasks as generated events which should be composed (scheduled) in some way. So you can do the following:
Spawn cancellable tasks with Observable.GenerateWithDisposable and your own Scheduler - check more at Rx 101 Sample
Delay tasks with Observable.Delay
Wait for tasks with 'Observable.Timeout
Compose tasks in any preferable way
Once again you can check more at specified above links.
You should check out Quartz.NET.
Quartz.NET is a full-featured, open
source job scheduling system that can
be used from smallest apps to large
scale enterprise systems.
I believe you would need to implement your timeout requirement by yourself but all the plumbing needed to schedule tasks could be handled by Quartz.NET.
I have done something like this before where there were a lot of socket objects that needed periodic starts and timeouts. I used a 'TimedAction' class with 'OnStart' and 'OnTimeout' events, (socket classes etc. derived from this), and one thread that handled all the timed actions. The thread maintained a list of TimedAction instances ordered by the tick time of the next action required, (delta queue). The TimedAction objects were added to the list by queueing them to the thread input queue. The thread waitied on this input queue with a timeout, (this was Windows, so 'WaitForSingleObject' on the handle of the semaphore that managed the queue), set to the 'next action required' tick count of the first item in the list. If the queue wait timed out, the relevant action event of the first item in the list was called and the item removed from the list - the next queue wait would then be set by the new 'first item in the list', which would contain the new 'nearest action time'. If a new TimedAction arrived on the queue, the thread calculated its timeout tick time, (GetTickCount + ms interval from the object), and inserted it in the sorted list at the correct place, (yes, this sometimes meant moving a lot of objects up the list to make space).
The events called by the timeout handler thread could not take any lengthy actions in order to prevent delays to the handling of other timeouts. Typically, the event handlers would set some status enumeration, signal some synchro object or queue the TimedAction to some other P-C queue or IO completion port.
Does that make sense? It worked OK, processing thousands of timed actions in my server in a reasonably timely and efficient manner.
One enhancement I planned to make was to use multiple lists with a restricted set of timeout intervals. There were only three const timeout intervals used in my system, so I could get away with using three lists, one for each interval. This would mean that the lists would not need sorting explicitly - new TimedActions would always go to the end of their list. This would eliminate costly insertion of objects in the middle of the list/s. I never got around to doing this as my first design worked well enough and I had plenty other bugs to fix :(
Two things:
Beware 32-bit tickCount rollover.
You need a loop in the queue timeout block - there may be items on the list with exactly the same, or near-same, timeout tick count. Once the queue timeout happens, you need to remove from the list and fire the events of every object until the newly claculated timeout time is >0. I fell foul of this one. Two objects with equal timeout tick count arrived at the head of the list. One got its events fired, but the system tick count had moved on and so the calcualted timeout tick for the next object was -1: INFINITE! My server stopped working properly and eventually locked up :(
Rgds,
Martin

Resources