How to stop Azure Worker Role

How to stop Azure Worker Role - multithreading

We have 1-2 worker,which spins 5 threads, each thread read messages from Azure queue and do long processing, each processing may take around 1-2 hrs. We would like to implement logic to Stop particular thread at particular worker role. User will submit request to cancel particular processing. We are saving worker role and thread information in our azure table. But we are stuck in implementing to stop particular worker role's thread which is processing. can any one give some idea/design to stop particular thread in particular worker. Can we make use of cancellation token of thread to stop thread. Please help us in stopping worker role's thread.

You will need a flag of some sort. So either a new queue which is monitored or a DB update.
Then have a new thread started in your worker role that monitors for these cancellation messages/flags, picks the right thread and stops it.
I wouldn't recommend doing anything within the thread that is processing because it would slow down your work, however if your thread has an OnStop method, you can use that to tidy up the thread before shutting it down.

Related

Why does a thread pool create a lot of unique threads?

A COM application based on the 'free' threading model subscribes to events published from another COM application that runs out of process.
The application works normally. But in some cases (or configurations?) it burns through a lot of so called Tpp worker threads.
These threads apparently belong to a thread pool managed by Windows/COM. And they are at least used by COM to deliver incoming events to the application.
When the application receives events, that always happens in the context of one of these worker threads.
In the normal situation, updates are coming in from at most 2 or 3 unique worker threads.
But in the abnormal situation the application sees new & unique worker thread IDs appear every 3-8 minutes. Over the course of a week the application has seen about 1000 unique threads (!).
I highly suspect there is something wrong here. Because surely the thread pool doesn't need so many different threads, right?
What could be a reason for the thread pool behavior I'm seeing. Is it just normal that it creates different threads from time to time? Are the old threads still sticking around doing nothing? What action could trigger this while the application is running in the context of the worker thread?
Notes:
Our application is an OPC DA client (and the other application is the Siemens OPC-DA server)
The OS is Windows 10
I do not yet know if the worker threads have exited or that they stick around doing nothing
By way of an experiment I have tried several bad/illegal things to see if it is possible for our application to somehow break a worker thread
- which would then explain why the thread pool would have to create a new one - we might have destroyed the old one. But that seems more difficult than I had expected:
When running in the context of the worker thread, I have tried...
deliberately hanging with while (true) {}, result: event delivery process just stalls, no new worker thread is being created for us though
the deliberate uncaught c++ exception, no new worker thread is created
triggering a deliberate (read) access violation, no new thread either...
And that got me thinking, if our application can't kill that worker thread in an obvious way, what else could or why would the thread pool behave like this?

Start a thread inside an Azure Service Fabric actor?

I know that actors inside Service Fabric are single-threaded. But suppose I start a new thread inside an actor method, what will happen then? Will an actor be deactivated even though the spawned thread is still executing?
According to the documentation, an actor is deactivated when it has not been 'used' for some time. 'Used' in this context means either:
receiving a call
IRemindable.ReceiveReminderAsync being invoked
So it seems that the new thread I started is not taken into account. But maybe someone can confirm this?

Actors are just object.
The Actor will be deactivated and it becomes available for garbage collection.
Actual threads in the OS (when running) keep running until they complete or terminate. Their managed representation can be collected, but this doesn't affect the actual thread.
Also, threads spawned inside an Actor are not tracked in any way, so you're responsible to manage their life cycle yourself.

Exactly.
When you start a new thread, the original thread(The actor) will continue running and go out of scope, and the spanned thread continue running. look what happens then:
When an actor receive a call, the thread handling this call will acquire a lock using a SemaphoreSlim to handle the actor object, if another thread has already acquired the lock, the current thread will wait for its release, so that it can continue once free.
Once the lock is acquired, the thread will execute and return from the method called, and them release the lock for the following thread to continue.
When you span a new thread as part of the actor logic, it will just run as part of the service process, but the problem here is that once you leave the method scope, you won't have control of this thread anymore, but it will keep running, and for the actor runtime the task has finished, the next actor call will create another thread, and the things will keep going.
The problem will start when:
You don't have control how many threads are running, your services will start consuming too much memory and SF might try to balance the actors\services across instances, because it does not know about these threads, if it move the actor service, all you threads will be aborted and you loose these operations.
The spanned thread from previous call will compete with the new thread for the next actor call.
If the new thread uses the actor data to continue other operation both, the spanned thread and the actor thread will face concurrency issues, in cases where no exception happens you will have strange behaviors where you can't investigate easily. For example, one thread changing a value being used by the other.
And many other concurrency issues you might face
In scenarios where you might(think) need another thread you could:
Create another actor to handle the task
Create a message in a queue to be processed by another service
Do the task as part of the actor call.

How to implement work stealing in SimPy 3?

I want to implement something akin to work stealing or task migration in multiprocessor systems. Details below.
I am simulating a scheduling system with multiple worker nodes (resources, each with multiple capacity), and tasks (process) that arrive randomly and are queued by the scheduler at a specific worker node. This is working fine.
However, I want to trigger an event when a worker node has spare capacity, so that it steals the front task from the worker with the longest wait queue.
I can implement the functionality described above. The problem is that all the tasks waiting on the worker queue from which we are stealing work receive the event notification. I want to notify ONLY the task at the front of the queue (or only N tasks at the front of the queue).
The Bank reneging example is the closest example to what I want to implement. However, it (1) ALL the customers leave the queue when they are notified that the event was triggered, and (2) when event is triggered, the customers leave the system; in my example, I want to make the task wait at another worker (though it wouldn't wait, since the queue of that worker is empty).
Old question: Can this be done in SimPy?
New questions: How can I do this in SimPy?
1) How can I have many processes, waiting for a resource, listen for an event, but notify only the first one?
2) How can I make a process migrate to another resource?

Failure handling for Queue Centric work pattern

I am planning to use a queue centric design as described here for one of my applications. That essentially consists of using a Azure queue where work requests are queued from the UI. A worker reads from the queue, processes and deletes the message from the queue.
The 'work' done by the worker is within a transaction so if the worker fails before completing, upon restart it again picks up the same message (as it has not be deleted from the queue) and tries to perform the operation again (up to a max number of retries)
To scale I could use two methods:
Multiple workers each with a separate queue. So if I have five workers W1 to W5, I have 5 queues Q1 to Q5 and each worker knows which queue to read from and failure handling is similar as the case with one queue and one worker
One queue and multiple workers. Here failure/Retry handling here would be more involved and might end up using the 'Invisibility' time in the message queue to make sure no two workers pick up the same job. The invisibility time would have to be calculated to make sure that its enough for the job to complete and yet not be large enough that retries are performed after a long time.
Would like to know if the 1st approach is the correct way to go? What are robust ways of handling failures in the second approach above?

You would be better off taking approach 2 - a single queue, but with multiple workers.
This is better because:
The process that delivers messages to the queue only needs to know about a single queue endpoint. This reduces complexity at this end;
Scaling the number of workers that are pulling from the queue is now decoupled from any code / configuration changes - you can scale up and down much more easily (and at runtime)
If you are worried about the visibility, you can initially choose a default timespan, and then if the worker looks like it's taking too long, it can periodically call UpdateMessage() to update the visibility of the message.
Finally, if your worker timesout and failed to complete processing of the message, it'll be picked up again by some other worker to try again. You can also use the DequeueCount property of the message to manage number of retries.

Multiple workers each with a separate queue. So if I have five workers
W1 to W5, I have 5 queues Q1 to Q5 and each worker knows which queue
to read from and failure handling is similar as the case with one
queue and one worker
With this approach I see following issues:
This approach makes your architecture tightly coupled (thus beating the whole purpose of using queues). Because each worker role listens to a dedicated queue, the web application responsible for pushing messages in the queue always need to know how many workers are running. Anytime you scale up or down your worker role, some how you need to tell web application so that it can start pushing messages in appropriate queue.
If a worker role instance is taken down for whatever reason there's a possibility that some messages may not be processed ever as other worker role instances are working on their dedicated queues.
There may be a possibility of under utilization/over utilization of worker role instances depending on how web application pushes the messages in the queue. For optimal utilization, web application should know about the worker role utilization so that it can decide which queue to send message to. This is certainly not a desired thing for a web application to do.
I believe #2 is the correct way to go. #Brendan Green has covered your concerns about #2 in his answer excellently.

How to manage Managed Executor Service

I'm using Managed Executor Service to implement a process manager which will process tasks in the background upon receiving an JMS message event. Normally, there will be a small number of tasks running (maybe 10 max) but what if something happens and my application starts getting hundred of JMS message events. How do I handle such event?
My thought is to limit the number of threads if possible and save all the other messages to database and will be run when thread available. Thanks in advance.

My thought is to limit the number of threads if possible and save all the other messages to database and will be run when thread available.
The detailed answer to this question depends on which Java EE app server you choose to run on, since they all have slightly different configuration.
Any Java EE app server will allow you to configure the thread pool size of your Managed Executor Service (MES), this is the number of worker threads for your thread pool.
Say you have a 10 worker threads, and you get flooded with 100 requests all at once, the MES will keep a queue of requests that are backlogged, and the worker threads will take work off the queue whenever they finish work until the queue is empty.
Now, it's fine if work goes to the queue sometimes but if overall your work queue increases more quickly than your worker threads can take work off the queue, you will run into problems. The solution to this is to increase your thread pool size otherwise the backlog will get overrun and your server will run out of memory.
what if something happens and my application starts getting hundred of JMS message events. How do I handle such event?
If the load on your server will be so sporadic that tasks need to be saved to a database, it seems that the best approach would be to either:
increase thread pool size
have the server immediately reject incoming tasks when the task backlog queue is full
have clients do a blocking wait for the server task queue to be not full (I would only advise this option if client task submission is in no way connected to user experience)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string