Improving a thread scheduling strategy

Improving a thread scheduling strategy - multithreading

I need help in enhancing a thread scheduling strategy I am working on.
Background
To set the context, I have a couple (20-30 thousand) "tasks" that needs to be executed. Each task can execute independently. In reality, the range of execution time varies between 40ms and 5mins across tasks. Also, each individual task when re-run takes the same amount of time.
I need options to control the way these tasks are executed, so I have come up with a scheduling engine that schedules these tasks based on various strategies. The most basic strategy is FCFS, i.e. my tasks get executed sequentially, one by one. The second one is a batch strategy, the scheduler has a bucket size "b" which controls how many threads can run in parallel. The scheduler will kick off non-blocking threads for the frist "b" tasks it gets, then waits for those started tasks to complete, then proceed with the next "b" tasks, starting them in parallel and then waiting for completion. Each "b" set of tasks processed at a time is termed a batch and hence batch scheduling.
Now, with batch scheduling, activity begins to increase at the beginning of the batch, when threads start getting created, then peaks in the middle, when most of the threads would be running, and then wanes down as we block and wait for the threads to join back in. Batch scheduling becomes FCFS scheduling when batch size "b" = 1.
One way to improve on batch scheduling is what I will term as parallel scheduling - the scheduler will ensure, if sufficient number of tasks are present, that "b" number of threads will keep running at any point in time. The number of threads initially will ramp up to "b", then maintain the count at "b" running threads, until the last set of tasks finish execution. To maintain execution of "b" threads at any time, we need to start a new thread the moment an old thread finishes execution. This approach can reduce the amount of time taken to finish processing all the tasks compared to batch scheduling (average case scenario).
Part where I need help
The logic I have to implement parallel scheduling follows. I would be obliged if anyone can help me on:
Can we avoid the use of the
startedTasks list? I am using that
because I need to be sure that when
the Commit() exits, all tasks have
completed execution, so I just loop
through all startedTasks and block
until they are complete. One current
problem is that list will be long.
--OR--
Is there a better way to do parallel scheduling?
(Any other suggestions/strategies are also welcome - main goal here is to shorten overall execution duration within the constraints of the batch size "b")
ParallelScheduler pseudocode
// assume all variable access/updates are thread safe
Semaphore S: with an initial capacity of "b"
Queue<Task> tasks
List<Task> startedTasks
bool allTasksCompleted = false;
// The following method is called by a callee
// that wishes to start tasks, it can be called any number of times
// passing various task items
METHOD void ScheduleTask( Task t )
if the PollerThread not started yet then start it
// starting PollerThead will call PollerThread_Action
// set up the task so that when it is completed, it releases 1
// on semaphore S
// assume OnCompleted is executed when the task t completes
// execution after a call to t.Start()
t.OnCompleted() ==> S.Release(1)
tasks.Enqueue ( t )
// This method is called when the callee
// wishes to notify that no more tasks are present that needs
// a ScheduleTask call.
METHOD void Commit()
// assume that the following assignment is thread safe
stopPolling = true;
// assume that the following check is done efficiently
wait until allTasksCompleted is set to true
// this is the method the poller thread once started will execute
METHOD void PollerThread_Action
while ( !stopPolling )
if ( tasks.Count > 0 )
Task nextTask = tasks.Deque()
// wait on the semaphore to relase one unit
if ( S.WaitOne() )
// start the task in a new thread
nextTask.Start()
startedTasks.Add( nextTask )
// we have been asked to start polling
// this means no more tasks are going to be added
// to the queue
// finish off the remaining tasks
while ( tasks.Count > 0 )
Task nextTask = tasks.Dequeue()
if ( S.WaitOne() )
nextTask.Start()
startedTasks.Add ( nextTask )
// at this point, there are no more tasks in the queue
// each task would have already been started at some
// point
for every Task t in startedTasks
t.WaitUntilComplete() // this will block if a task is running, else exit immediately
// now all tasks are complete
allTasksCompleted = true

Search for 'work stealing scheduler' - it is one of the most efficient generic schedulers. There are also several open source and commercial implementations around.
The idea is to have fixed number of worker threads, that take tasks from a queue. But to avoid the congestion on a single queue shared by all the threads (very bad performance problems for multi-CPU systems) - each thread has its own queue. When a thread creates new tasks - it places them to its own queue. After finishing tasks, thread gets next task from its own queue. But if the thread's queue is empty, it "steals" work from some other thread's queue.

When your program knows a task needs to be run, place it in a queue data structure.
When your program starts up, also start up as many worker threads as you will need. Arrange for each thread to do a blocking read from the queue when it needs something to do. So, when the queue is empty or nearly so, most of your threads will be blocked waiting for something to go into the queue.
When the queue has plenty of tasks in it, each thread will pull one task from the queue and carry it out. When it is done, it will pull another task and do that one. Of course this means that tasks will be completed in a different order than they were started. Presumably that is acceptable.
This is far superior to a strategy where you have to wait for all threads to finish their tasks before any one can get another task. If long-running tasks are relatively rare in your system, you may find that you don't have to do much more optimization. If long-running tasks are common, you may want to have separate queues and separate threads for short- and long- running tasks, so the short-running tasks don't get starved out by the long-running ones.
There is a hazard here: if some of your tasks are VERY long-running (that is, they never finish due to bugs) you'll eventually poison all your threads and your system will stop working.

You want to use a space-filling-curve to subdivide the tasks. A sfc reduce a 2d complexity to a 1d complexity.

Related

Boost::thread_pool wait for available thread to post new task

I am new to boost::thread_pool.
I can create thread pool with 4 threads.
But how many tasks can I post?
More that 4. Let say 100.
Does that mean that 96 will wait while first 4 will be processed?
From documentation there is only join method which waits when all threads will be done.
There is no method to check if there is only one thread available to post any data.
I would like to wait until there is at least one available thread exists to post a new task.
Is that possible?

Assuming you meant boost::asio::thread_pool, then yes, if you post 100 tasks to a pool of 4 it means that they will all be executed in turn on the next available thread, on average about 25 tasks per thread, assuming they all have similar execution times.
This is the nature of a thread pool. If you want to limit the amount of pending tasks, use a bounded queue. You could have a different queue capacity than the number of threads in the pool.

How ThreaPool reuses threads if thread itself can not be restarted?

I am trying to understand the concept behind the threadpool. Based on my understanding, a thread can not be restarted once completed. One will have to create a new thread in order to execute a new task. If that is the right understanding, does ThreadPool executor recreates new thread for every task that is added?

One will have to create a new thread in order to execute a new task
No. Task are an abstraction of a logical work to perform. It can be typically a function reference/pointer with an ordered list of well-defined parameters (to give to the function). Multiple tasks can be assigned to a given thread. A thread pool is usually a set of threads waiting for new incoming tasks to be executed.
As a result, threads of a given thread-pool are created once.

One thread for many tasks vs many threads for each task. Do sleeping threads aftect the performance of the server?

a) I have a task which I want the server to do every X hours for every user (~5000 users). Is it better to:
1 - Create a worker thread for each user that does the task and sleep for X hours then start again, where each task is running in random time (so that most tasks are sleeping at every moment)
2 - Create one Thread that loops through the users and do the task for each user then start again (even if this takes more than X hours).
b) if plan 1 is used, do sleeping threads aftect the performance of the server?
c) If the answer is yes, do the sleeping thread have the same effect as the thread that is doing the task?
Note that this server is not only used for this task. It is used for all the communications with the ~5000 clients.

Sleeping threads generally do not affect CPU usage. They consume 1MB of stack memory each. This is not a big deal for dozens of threads. It is a big deal for 5000 threads.
Have one thread or timer dedicated to triggering the hourly work. Once per hour you can process the users. You can use parallelism if you want. Process the users using Parallel.ForEach or any other technique you like.
Whether you choose a thread or timer doesn't matter for CPU usage in any meaningful way. Do what fits your use app most.

There is not enough details about your issue for a complete answer. However, based on the information you provided, I would:
create a timer (threading.timer)
set an interval which will be the time interval between processing a "batch" of 5'000 users
Then say the method/task you want to perform is called UpdateUsers:
when timer "ticks", in the UpdateUsers method (callback):
1. stop timer
2. loop and perform task for each user 3. start timer
This way you ensure that the task is performed for each user and there is no overlapping if it takes more than X hours total. The updates will happen every Y time, where Y is the time interval you set for your timer. Also, this uses maximum one thread, depending on how your server/service is coded.

Is it ok to use a semphore as a global pause for worker threads?

I'm thinking of using a semaphore as a pause mechanism for a pool of worker threads like so:
// main thread
for N jobs:
semaphore.release()
create and start worker
// worker thread
while (not done)
semaphore.acquire()
do_work
semaphore.release()
Now, if I want to pause all workers, I can acquire the entire count available in the semaphore. I'm wondering it that is better than:
if (paused)
paused_mutex.lock
wait for condition (paused_mutex)
do_work
Or is there a better solution?
I guess one downside of doing it with the semaphore is that the main thread will block until all workers release. In my case, the unit of work per iteration is very small so that probably won't be a problem.
Update: to clarify, my workers are database backups that act like file copies. The while(not quit) loop quits when the file has been successfully copied. So to relate it to the traditional worker-waits-for-condition to get work: my workers wait for a needed file copy and the while loop you see is doing the work requested. You could think of my do_work above as do_piece_of_work.

The problem with the semaphore approach is that the worker threads have to constantly check for work. They are eating up all the available CPU resources. It is better to use a mutex and a condition (signalling) variable (as in your second example) so that the threads are woken up only when they have something to do.
It is also better to hold the mutex for as short a time as possible. The traditional way to do this is to create a WORK QUEUE and to use the mutex to synchronize queue inserts and removals. The main thread inserts into the work queue and wakes up the worker. The worker acquires the mutex, removes an item from the queue, then release the mutex. NOW the worker performs the action. This maximizes the concurrency between the worker threads and the main thread.
Here is an example:
// main thread
create signal variable
create mutex
for N jobs:
create and start worker
while (wait for work)
// we have something to do
create work item
mutex.acquire();
insert_work_into_queue(item);
mutex.release();
//tell the workers
signal_condition_variable()
//worker thread
while (wait for condition)
mutex.acquire();
work=remove_item_from_queue();
mutex.release();
if (work) do(work);
This is a simple example where all the worker threads are awakened, even though only one worker will actually succeed in getting work off of the queue. If you want even more efficiency, use an array of condition variables, one per worker thread and then just signal the "next" one, using an algorithm for "next" that is as simple or as complex as you want.

Thread pool configurations problems

Note that I'm not talking about any specific implementation in any specific language.
Lets say I have a thread pool and a task queue. When a thread runs it pops a task from the task queue and handles it - that thread might add additional tasks into the task queue, as a result.
The time the thread has to handle a certain task is unlimited - meaning the thread works until the task is finished and never terminates before that.
What kind of problems (e.g. deadlocks) each the following thread pool configurations are susceptible to?
Possible thread pool configurations I'm concerned with:
1) Unbounded task queue with bounded num. of threads
2) Bounded task queue with unbounded num. of threads
3) Bounded task queue with bounded num. of threads.
4) Unbounded task queue with unbounded num. of threads
Also - say that now the thread has a limited time to handle each task, and is forcibly terminated if it doesn't finish the task in the time frame that was given. How does that change things?

If you have a bounded number of threads then you can experience deadlocks if a task running on a pool thread submits a new task to the queue and then waits for that task --- if there is no free thread then the new task will not be run, and the original task will block, holding up the pool thread until the new task can run. If you end up with enough of these blocked tasks then the whole pool can deadlock.
This isn't really helped by bounding the number of tasks, unless the bound is the same as the number of threads --- once each thread is doing something then you can no longer submit new tasks.
What does help is either (a) adding new threads when a thread becomes blocked like this, or (b) if a pool thread task is waiting for another task from the same pool then that thread switches to running the task being waited for.
If you have an unbounded number of threads then you have to watch out for oversubscription --- if I have a quad-core machine, but submit 1000 tasks, and run 1000 threads then these will compete with each other and slow everything down.
In practice, the number of threads is bounded to some large number by the OS either due to a hard-coded number, or due to memory constraints --- each thread needs a new stack, so you can only have as many threads as you've got memory for their stacks.
You can always get a deadlock with 2 tasks if they wait for each other, regardless of any scheme you use, unless you start forcibly terminating tasks after a time limit.
The problem with forcibly terminating tasks is twofold. Firstly, you need to communicate to any code that was waiting for that task that the task was terminated forcibly rather than finished normally. Secondly (and this is the bigger issue) you don't know what state the task was in. It might have owned a lock, or any other resources, and forcibly terminating the task will leak those resources, and potentially leave the application in a bad state.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string