Running Faust Agent Synchronously - python-3.x

Check the below code
#app.agent()
async def process(stream):
async for value in stream.take(5000, within=5):
process(value)
The agent takes 5000 records within 5 seconds asynchronously and process them. I don't want the agent to pick another 5000 thousand records until the processing of previous one is finished. Basically I want to run the agent Synchronously. Is there a way we can do it?

I think you could set the concurrency to 1 on the agent and that'd effectively render it synchronous.
You might also find modifying the topic partitions to be useful if you do that but I don't have a complete understanding of the relationship between these two settings (just wanted to point out a potentially useful avenue).

I tried with the following code to see whether the worker is executing second batch of record while the processing of first batch has not yet finished
#app.agent()
async def process(stream):
async for value in stream.take(5000, within=5):
print(1)
await async.sleep(30)
The worker printed 1 and waited for 30 seconds to print 2. The await statement gives control back to the event loop but in this case it waited which implies that the batches are executed one after the another. Hence this is synchronous.
PS. Committing offset, rebalancing, monitoring etc are asynchronous operations which are handled by event loop.

Related

Firebase/google cloud functions time based trigger prevent concurrent executions

I have a firebase function that runs every 2 minutes. The problem is that sometimes it takes over 540sec. to finish. Hence two executions of the function occur which messes up things.
Is there a way to ensure that the function does not fire till a previous instance finishes?
I tried to handle it using a flag stored in firestore which was set to true when function would start running, and false when function would finish. However sometimes function execution times out hence the flag is never set to false, thereby stopping all future executions.
So how do I make sure that only one execution of the function is running at a time?
You can limit the number of instances using the runWith method and using the maxInstances parameter. Read more here.
By the way, why are your functions taking too long to execute? are you terminating them correctly? You can post relevant part of you code so we can see why or you can learn about how to terminate your function here
Is there a way to ensure that the function does not fire till a previous instance finishes?
No. You'll have to store some value in a database as you are doing now and terminate the function if an instance is active.
However sometimes function execution times out hence the flag is never set to false, thereby stopping all future executions.
Checkout Cloud Functions V2 (beta) or Cloud Run itself that can run up to 1 hour.
Also, if you know a function execution is going to take more than 540 seconds every time, it might be best to increase the interval between 2 invocations.

What is better way to making longer delay inside a series of tasks?

I'm trying to build a workflow system, which will process a series of tasks & delays. Delay can be changed or removed from a running workflow.
What is the better way to making longer delay inside a series of tasks? (Like 3-4 months). Right now two ways are pocking around my head:
Pre-calculating & saving delay time. Setup a scheduler that will check delay repeatedly after a specific interval(1 minute maybe). This will make a lot of database queries, but the delay can be changed instantly.
Schedule a job for a delay. This can reduce a lot of database queries &, but the problem is maintaining & changing delay in these long-running jobs. Also, these jobs need to survive a server crash or restart.
Right now I'm not sure how to do it in a better way and still studying about it. If anyone has a similar experience, please share.
You can store the tasks into the database, like :
{
_id: String,
status: Enum,
executionTime: timestamp,
}
When you declare a new task, push a new entry into the DB.
At your server start, or when a new task is declared, create a setTimeout that will wake up your node.js when it's necessary.
Optimization
To avoid having X setTimeout, with X the number of task to execute. Keep only one setTimeout, with the time to wait equals to the closest task to execute.
For example, you have three task, one must run in 1 hour, one in 2 hour and one in 3 hour. Use a setTimeout of 1 hour. When it get triggered, it execute the task 1 and then look at the remaining tasks to re-run.

Azure Function with Java: configure batchSize and newBatchThreshold efficiently

I'm considering to use such a solution when Function is triggered by Queue on Java. I'm trying to understand how to configure batchSize and newBatchThreshold more efficiently. I would like to mention below what I managed to find out about it. Please correct me as soon as you find a mistake in my reasoning:
Function is executed on 1 CPU-core environment;
Function polls messages from Queue in batches with size 16 by default and executes them in parallel (right from the documentation);
so I make a conclusion that:
if messages need CPU-intensive tasks - they are executed sequentially;
so I make a conclusion that:
since processing of messages starts at the same time (when batch arrived) then processing of more last messages takes longer and longer (confirmed experimentally);
all these longer and longer processings are billable (despite Function's body execution lasts 10 times less);
so I make a conclusion that:
One should set both batchSize and newBatchThreshold to 1 for CPU-intensive tasks and can vary only for non-CPU intensive tasks (looks like only IO-intensive tasks).
Does it make sense?

Do processing in separated threads with Activiti

With activiti it is possible to design parallel tasks, however these tasks are internally executed sequentially (by the same thread).
I need to execute tasks in a asynchronous way, and then "join" the tasks once they are finished.
The process is:
preparation -> execute task 1
-> execute task 2 at the same time
-> Then once both are finished, go one
It is a matter of optimization, because tasks 1 and 2 are web-service calls and may require a lot of time.
From everything I read, this is not possible with activiti. Using async tasks, it is not possible to join then properly (detect that both are finished). The first finished task is OK, but the second throws an OptimisticLockException and is restarted (which is not acceptable).
Maybe there is something I misunderstood and this is something possible or even easy??? Did anyone succeed in it?
I am not sure if i understand your question clearly.
but Activiti does support Async processing.
To Join two Async processes you can create another task that will wait till both the Async tasks are completed.

Multithreading Task Library, Threading.Timer or threads?

Hi we are building an application that will have the possibility to register scheduled tasks.
Each task has an time interval when it should be executed
Each task should have an timeout
The amount of tasks can be infinite but around 100 in normal cases.
So we have an list of tasks that need to be executed in intervals, which are the best solution?
I have looked at giving each task their timer and when the timer elapses the work will be started, another timer keeps tracks on the timeout so if the timeout is reached the other timer stops the thread.
This feels like we are overusing timers? Or could it work?
Another solution is to use timers for each task, but when the time elapses we are putting the task on a queue that will be read with some threads that executes the work?
Any other good solutions I should look for?
There is not too much information but it looks like that you can consider RX as well - check more at MSDN.com.
You can think about your tasks as generated events which should be composed (scheduled) in some way. So you can do the following:
Spawn cancellable tasks with Observable.GenerateWithDisposable and your own Scheduler - check more at Rx 101 Sample
Delay tasks with Observable.Delay
Wait for tasks with 'Observable.Timeout
Compose tasks in any preferable way
Once again you can check more at specified above links.
You should check out Quartz.NET.
Quartz.NET is a full-featured, open
source job scheduling system that can
be used from smallest apps to large
scale enterprise systems.
I believe you would need to implement your timeout requirement by yourself but all the plumbing needed to schedule tasks could be handled by Quartz.NET.
I have done something like this before where there were a lot of socket objects that needed periodic starts and timeouts. I used a 'TimedAction' class with 'OnStart' and 'OnTimeout' events, (socket classes etc. derived from this), and one thread that handled all the timed actions. The thread maintained a list of TimedAction instances ordered by the tick time of the next action required, (delta queue). The TimedAction objects were added to the list by queueing them to the thread input queue. The thread waitied on this input queue with a timeout, (this was Windows, so 'WaitForSingleObject' on the handle of the semaphore that managed the queue), set to the 'next action required' tick count of the first item in the list. If the queue wait timed out, the relevant action event of the first item in the list was called and the item removed from the list - the next queue wait would then be set by the new 'first item in the list', which would contain the new 'nearest action time'. If a new TimedAction arrived on the queue, the thread calculated its timeout tick time, (GetTickCount + ms interval from the object), and inserted it in the sorted list at the correct place, (yes, this sometimes meant moving a lot of objects up the list to make space).
The events called by the timeout handler thread could not take any lengthy actions in order to prevent delays to the handling of other timeouts. Typically, the event handlers would set some status enumeration, signal some synchro object or queue the TimedAction to some other P-C queue or IO completion port.
Does that make sense? It worked OK, processing thousands of timed actions in my server in a reasonably timely and efficient manner.
One enhancement I planned to make was to use multiple lists with a restricted set of timeout intervals. There were only three const timeout intervals used in my system, so I could get away with using three lists, one for each interval. This would mean that the lists would not need sorting explicitly - new TimedActions would always go to the end of their list. This would eliminate costly insertion of objects in the middle of the list/s. I never got around to doing this as my first design worked well enough and I had plenty other bugs to fix :(
Two things:
Beware 32-bit tickCount rollover.
You need a loop in the queue timeout block - there may be items on the list with exactly the same, or near-same, timeout tick count. Once the queue timeout happens, you need to remove from the list and fire the events of every object until the newly claculated timeout time is >0. I fell foul of this one. Two objects with equal timeout tick count arrived at the head of the list. One got its events fired, but the system tick count had moved on and so the calcualted timeout tick for the next object was -1: INFINITE! My server stopped working properly and eventually locked up :(
Rgds,
Martin

Resources