Dequeue a Priority Queue - priority-queue

Elements dequeued from a priority queue follow the rule that:
Priority[i] >= Priority[i - 1]
But if many elements have the same priority, in what order should they be dequeued?
Should the element that was enqueued first be dequeued first or doesn't it matter?
I know this most likely depends on implementation and usage, but I'm looking for the textbook priority queue answer.

A queue should definitely be FIFO, that is its nature. The priority queue changes the queue aspect slightly in that it lets higher priority items through in preference to lower ones, but that shouldn't change the basic nature of the queue.
I've seen implementations that profess to be queues but do not follow the FIFO rules. I prefer a different name for them to better specify the behaviour (such as priority bags).
In a queue (even a priority one), items with the same priority should be extracted in the same order they were inserted.
In other words, inserting A2 B2 C2 D1 (number is priority) should result in the processing as D1 A2 B2 C2.

There isn't a textbook answer. Absent any other information, it could be done as a straight FIFO (within items of the same priority), or undefined, or other behavior.
Other implementations provide the enqueuer the ability to specify additional information that can be used during dequeue to break ties.

Related

Producer consumer using only 1 additional semaphore

Solution traditional to producer-consumer
In Operating-Systems, as you see in the link above for producer consumer, two semaphores full and empty are used, why is it not possible to do this using only one quantity semaphore fullEmpty.
What I mean is, we have a binary semaphore mutex and another semaphore fullEmpty, which is initially 0 because there are no items in the buffer, so why do we need 2 semaphores (full, empty)?
The only thing is the order of wait and signal need to be changed so that the updating of fullEmpty is within the critical section.
Any thoughts or reasons?
The key statement in the description that relates to your answer is "We have a buffer of fixed size."
For the sake of answering your question, let's first assume that the buffer can expand to fit as many items as needed, or in other words the buffer can grow to an unlimited size. In this case, the only synchronization that would need to occur between producers and consumers (apart from locking the mutex to ensure that you don't corrupt items in the critical section) would be ensuring that consumers only consume items after they have been produced by a producer. You could solve this problem with just a mutex and one semaphore. Here is some code, which I borrowed and changed from the link you shared:
Producer
do {
//produce an item
wait(mutex);
//place in buffer
signal(mutex);
signal(full);
} while (true);
Consumer
do {
wait(full);
wait(mutex);
//remove item from buffer
signal(mutex);
//consume item
} while (true);
As you can see above, the producer is always able to add things to the queue (apart from when the mutex is being held) and doesn't need to wait for consumers to consume anything because the buffer will never fill, even if the consumers don't consume items. On the other hand, consumers can't consume anything until producers have produced items.
To answer your question, you need to focus on the statement, "We have a buffer of fixed size." This changes the problem. Since the buffer is no longer able to grow to an unlimited size, you need to get producers to wait when the buffer is full before they can add more things to the buffer. This is why you need a second semaphore. Not only do consumers need to wait for producers, but now producers need to wait for consumers. You get producers to wait for consumers by getting them to call wait on a semaphore that only consumers call signal on.
You can't do this with only one semaphore because the conditions when a producer has to wait are different from the conditions when a consumer has to wait. Since they should be able to decrement and advance past the semaphore in different conditions, you can't use the same semaphore for both.
This is because there 2 conditions you have to wait for: the queue is empty and the queue is full. But classic semaphore allows you to wait only for one condition - wait until semaphore is not 0.
You can solve this problem using a single synchronization object, but such object needs to be more feature-full than a semaphore. "Bounded semaphore" - semaphore that has a maximum value should be enough as it allows you to block waiting for both conditions.
How to get one is another question:
You can build one using mutex and condition variable.
Window's semaphore already has this functionality.
You can use futex on Linux (see FUTEX_WAIT, FUTEX_WAKE) or equivalents on other OSes: on FreeBSD use _umtx_op (see UMTX_OP_WAIT, UMTX_OP_WAKE), on Windows 8 and newer use WaitOnAddress, WakeByAddressSingle/WakeByAddressAll.
I suggest you to get familiar with futex interface - with it you can build more powerful and more efficient synchronization objects than regulars ones. Today most OSes provide an equivalent interface and even C++ might introduce something similar in the future (see std::synchronic<T>).
Few notes:
Linux has eventfd which can act as a semaphore when created with EFD_SEMAPHORE flag, but it has maximum value of 0xfffffffffffffffe and it cannot be changed. Maybe some day this syscall will be extended to support maximum value too.

What is real world example for priority queue?

I'm writing a Lock-Free C library, and I'm going to implement a priority queue. However, the goal of my library is not completeness of the data structures, I just want to implement some typical ones and then write a mirco-benchmark to show that the lock-free ones perform better under some special cases than the lock-based ones. So I want to know if there're some typical applications that the priority queue plays an important roles. (open-source projects are the best.) Then I can use them as a benchmark.
A few to be listed:
1. Dijkstra’s Shortest Path Algorithm
2. Prim’s algorithm
3. Huffman codes for Data compression.
4. Heap sort
5. Load balancing on servers.
There are various applocation being pointed out in :
https://www.cdn.geeksforgeeks.org/applications-priority-queue/
Also, the wiki itself has an extensive list of application and parameters against which you can benchmark your comparision(refer section Summary of running times):
https://en.wikipedia.org/wiki/Priority_queue
Priority queues are different from queues in the sense that they do not act on the FIFO principle.
...The elements of the priority queue are ordered according to their
natural ordering, or by a Comparator provided at queue construction
time...
One of the real world example would be Priority Scheduling algorithm where each job is assigned a priority and the job with the highest priority gets scheduled first
The most common uses for priority queues that I see in real life are:
1) Prioritized work queues: when a thread is ready for more work, it picks the highest priority available task from a priority queue. This is a great application for lock-free queues.
2) Finding the closest restaurants/hotels/bathrooms/whatever to a given location. The algorithm to retrieve these from pretty much any spacial data structure uses a priority queue.
When you build a product, you breakdown things into smaller chunks (stories). Then assign a priority for each. Then pick it up, work on it and close.
JIRA stories are relatable example for Priority Queues.

Priority Queue with change priority feature, which keeps elements ordered

I'm looking for a way to schedule threads by priorities and First-Come-First Serve (FCFS) if two threads have the same priority. I was thinking about using a heap of queues or something like that. The problem is that, even if I implement my own priority queue, the ability to change priorities ruins the order of the insertion to this queue.
To solve this problem I can save the insertion time of each thread, and sort the priority queue also by the insertion time (as a secondary parameter to the primary Priority parameter), but I believe that there is a combination of data structures which can solve the problem without usage of the insertion time.
The complexity should be O(logN) (there are some naive solutions with O(N) complexity, such as having a regular queue, and iterating the queue whenever we have to pop a thread).
May be I didn't get your problem correctly but you could have a separate list for each priority.
So each thread is added to the corresponding list based on its priority. And since you always add at the end of the list and remove from the head you would have a FCFS behavior.
You could also create a Priority Queue to retrieve the next thread with the lowest priority (O(1) to get next thread and O(logN) to insert. For comparison you could use a combination of priority and insert time of each node.

Threadpool multi-queue job dispatch algorithm

I'm curious to know if there is a widely accepted solution for managing thread resources in a threadpool given the following scenario/constraints:
Incoming jobs are all of the same
nature and could be processed by any
thread in the pool.
Incoming jobs
will be 'bucketed' into different
queues based on some attribute of
the incoming job such that all jobs
going to the same bucket/queue MUST
be processed serially.
Some buckets will be less busy than
others at different points during
the lifetime of the program.
My question is on the theory behind a threadpool's implementation. What algorithm could be used to efficiently allocate available threads to incoming jobs across all buckets?
Edit: Another design goal would be to eliminate as much latency as possible between a job being enqueued and it being picked up for processing, assuming there are available idle threads.
Edit2: In the case I'm thinking of there are a relatively large number of queues (50-100) which have unpredictable levels of activity, but probably only 25% of them will be active at any given time.
The first (and most costly) solution I can think of is to simply have 1 thread assigned to each queue. While this will ensure incoming requests are picked up immediately, it is obviously inefficient.
The second solution is to combine the queues together based on expected levels of activity so that the number of queues is inline with the number of threads in the pool, allowing one thread to be assigned to each queue. The problem here will be that incoming jobs, which otherwise could be processed in parallel, will be forced to wait on each other.
The third solution is to create the maximum number of queues, one for each set of jobs that must be processed serially, but only allocate threads based on the number of queues we expect to be busy at any given time (which could also be adjusted by the pool at runtime). So this is where my question comes in: Given that we have more queues than threads, how does the pool go about allocating idle threads to incoming jobs in the most efficient way possible?
I would like to know if there is a widely accepted approach. Or if there are different approaches - who makes use of which one? What are the advantages/disadvantages, etc?
Edit3:This might be best expressed in pseudo code.
You should probably eliminate nr. 2 from your specification. All you really need to comply to is that threads take up buckets and process the queues inside the buckets in order. It makes no sense to process a serialized queue with another threadpool or do some serialization of tasks in parallel. Thus your spec simply becomes that the threads iterate the fifo in the buckets and it's up to the poolmanager to insert properly constructed buckets. So your bucket will be:
struct task_bucket
{
void *ctx; // context relevant data
fifo_t *queue; // your fifo
};
Then it's up to you to make the threadpool smart enough to know what to do on each iteration of the queue. For example the ctx can be a function pointer and the queue can contain data for that function, so the worker thread simply calls the function on each iteration with the provided data.
Reflecting the comments:
If the size of the bucket list is known before hand and isn't likely to change during the lifetime of the program, you'd need to figure out if that is important to you. You will need some way for the threads to select a bucket to take. The easiest way is to have a FIFO queue that is filled by the manager and emptied by the threads. Classic reader/writer.
Another possibility is a heap. The worker removes the highest priority from the heap and processes the bucket queue. Both removal by the workers and insertion by the manager reorders the heap so that the root node is the highest priority.
Both these strategies assume that the workers throw away the buckets and the manager makes new ones.
If keeping the buckets is important, you run the risk of workers only attending to the last modified task, so the manager will either need to reorder the bucket list or modify priorities of each bucket and the worker iterates looking for the highest priority. It is important that memory of ctx remains relevant while threads are working or threads will have to copy this as well. Workers can simply assign the queue locally and set queue to NULL in the bucket.
ADDED: I now tend to agree that you might start simple and just keep a separate thread for each bucket, and only if this simple solution is understood to have problems you look for something different. And a better solution might depend on what exactly problems the simple one causes.
In any case, I leave my initial answer below, appended with an afterthought.
You can make a special global queue of "job is available in bucket X" signals.
All idle workers would wait on this queue, and when a signal is put into the queue one thread will take it and proceed to the corresponding bucket to process jobs there until the bucket becomes empty.
When an incoming job is submitted into an in-order bucket, it should be checked whether a worker thread is assigned to this bucket already. If assigned, the new job will be eventually processed by this worker thread, so no signal should be sent. If not worker is assigned, check whether the bucket is empty or not. If empty, place a signal into the global signal queue that a new job has arrived in this bucket; if not empty, such a signal should have been made already and a worker thread should soon arrive, so do nothing.
ADDED: I got a thought that my idea above can cause starvation for some jobs if the number of threads is less than the number of "active" buckets and there is a non-ending flow of incoming tasks. If all threads are already busy and a new job arrives into a bucket that is not yet served, it may take long time before a thread is freed to work on this new job. So there is a need to check if there are idle workers, and if not, create a new one... which adds more complexity.
Keep it Simple: I'd use 1 thread per queue. Simplicity is worth a lot, and threads are quite cheap. 100 threads won't be an issue on most OS's.
By using a thread per queue, you also get a real scheduler. If a thread blocks (depends on what you're doing), another thread can be queued. You won't get deadlock until every single one blocks. The same cannot be said if you use fewer threads - if the queues the threads happen to be servicing block, then even if other queues are "runnable" and even if these other queue's might unblock the blocked threads, you'll have deadlock.
Now, in particular scenarios, using a threadpool may be worth it. But then you're talking about optimizing a particular system, and the details matter. How expensive are threads? How good is the scheduler? What about blocking? How long are the queues, how frequently updated, etc.
So in general, with just the information that you have around 100 queues, I'd just go for a thread per queue. Yes, there's some overhead: all solutions will have that. A threadpool will introduce synchronization issues and overhead. And the overhead of a limited number of threads is fairly minor. You're mostly talking about around 100MB of address space - not necessarily memory. If you know most queues will be idle, you could further implement an optimization to stop threads on empty queues and start them when needed (but beware of race conditions and thrashing).

Optimal sleep time in multiple producer / single consumer model

I'm writing an application that has a multiple producer, single consumer model (multiple threads send messages to a single file writer thread).
Each producer thread contains two queues, one to write into, and one for a consumer to read out of. Every loop of the consumer thread, it iterates through each producer and lock that producer's mutex, swaps the queues, unlocks, and writes out from the queue that the producer is no longer using.
In the consumer thread's loop, it sleeps for a designated amount of time after it processes all producer threads. One thing I immediately noticed was that the average time for a producer to write something into the queue and return increased dramatically (by 5x) when I moved from 1 producer thread to 2. As more threads are added, this average time decreases until it bottoms out - there isn't much difference between the time taken with 10 producers vs 15 producers. This is presumably because with more producers to process, there is less contention for the producer thread's mutex.
Unfortunately, having < 5 producers is a fairly common scenario for the application and I'd like to optimize the sleep time so that I get reasonable performance regardless of how many producers exist. I've noticed that by increasing the sleep time, I can get better performance for low producer counts, but worse performance for large producer counts.
Has anybody else encountered this, and if so what was your solution? I have tried scaling the sleep time with the number of threads, but it seems somewhat machine specific and pretty trial-and-error.
You could pick the sleep time based on the number of producers or even make the sleep time adapt based on some dyanmic scheme. If the consumer wakes up and has no work, double the sleep time, otherwise halve it. But constrain the sleep time to some minimum and maximum.
Either way you're papering over a more fundamental issue. Sleeping and polling is easy to get right and sometimes is the only approach available, but it has many drawbacks and isn't the "right" way.
You can head in the right direction by adding a semaphore which is incremented whenever a producer adds an item to a queue and decremented when the consumer processes an item in a queue. The consumer will only wake up when there are items to process and will do so immediately.
Polling the queues may still be a problem, though. You could add a new queue that refers to any queue which has items on it. But it rather raises the question as to why you don't have a single queue that the consumer processes rather than a queue per producer. All else being equal that sounds like the best approach.
Instead of sleeping, I would recommend that your consumer block on a condition signaled by the producers. On a posix-compliant system, you could make it work with pthread_cond. Create an array of pthread_cond_t, one for each producer, then create an additional one that is shared between them. The producers first signal their individual condition variable, and then the shared one. The consumer waits on the shared condition and then iterates over the elements of the array, performing a pthread_cond_timed_wait() on each element of the array (use pthread_get_expiration_np() to get the absolute time for "now"). If the wait returns 0, then that producer has written data. The consumer must reinitialize the condition variables before waiting again.
By using blocking waits, you'll minimize the amount time the consumer is needlessly locking-out the producers. You could also make this work with semaphores, as stated in a previous answer. Semaphores have simplified semantics compared to conditions, in my opinion, but you'd have to be careful to decrement the shared semaphore once for each producer that was processed on each pass through the consumer loop. Condition variables have the advantage that you can basically use them like boolean semaphores if you reinitialize them after they are signaled.
Try to find an implementation of a Blocking Queue in the language that you use for programming. No more than one queue will be enough for any number of producers and one consumer.
To me it sounds like you are accidentally introducing some buffering by having the consumer thread be busy somewhere else, either sleeping or doing actual work. (the queue acting as the buffer) Maybe doing some simple buffering on the producer side will reduce your contention.
It seems that your system is highly sensitive to lock-contention between the producer and consumer, but I'm baffled as to why such a simple swap operation would occupy enough cpu time to show up in your run stats.
Can you show some code?
edit: maybe you are taking your lock and swapping queues even when there is no work to do?

Resources