Fast priority queue with incremental updates - haskell

I am trying to write a load-balancer in Haskell with leastconn strategy (partly for fun..). I need a priority queue where only the following operations are required to be 'fast':
read minimum key
+1 on minimum key
-1 on any key
If I had an imperative language with pointers, I would probably come with:
Head
|
Priority 0 -> Item <-> Item <-> Item <-> Item
|
Priority 1 -> Item <-> Item
|
Priority 4 -> Item <-> Item <-> Item
Priorities are connected using a doubly linked list, the items for every priority too. Each Item contains a link to the head Priority. This structure would have complexity:
O(1) for read minimum key - Take first from queue under head
O(1) for +1 - remove first item under first priority, insert it on lower level (possibly creating a new level)
O(1) for -1 - provided we have a pointer to the item, we can immediately access the Priority, remove the item from doubly linked list and insert it into a different one
Is there some (functional?) data structure that would behave approximately the same? The number of items would be approximately at most a couple of hundreds.

It sounds to me like the common structure which suits your needs is a Priority Search Queue as described in Hinze (2001). One of the better hackage packages providing such a structure is here: http://hackage.haskell.org/package/psqueues
This is perhaps not tuned exactly for your workflow, but it certainly isn't shabby!

Related

Cross-checking each item in a list with every other item in the same list in a parallel fashion

I have a list of several thousand items. Each item has an attribute called "address range". I have a function that verifies the correctness of the items in the list by making sure that none of their address ranges overlap with the address ranges of any other items in the list (each item has precisely one address range). If N is the number of entries in the list, I essentially have to run (N-1)*(N/2) address range overlap checks. In other words, if the number of items in the list doubles, the number of overlap checks quadruples.
Months ago, such a list would only have a few thousand items, and the whole operation would finish relatively quickly, but over time the number of items has grown, and now it takes several minutes to run all the cross-checks.
I've been trying to parallelize the cross-checks, but I have yet to think of a feasible approach. My problem is that if I want to distribute the cross-checks to perform over say 8 threads (to fully exploit the CPUs on the computer), I would have to split the possible cross-check combinations into 8 independent chunks.
To use an example, say we have 5 items in our list: ( A, B, C, D, E ). Using the formula (N-1)*(N/2), we can see that this requires (5-1)*(5/2)=10 cross-checks:
A vs B
A vs C
A vs D
A vs E
B vs C
B vs D
B vs E
C vs D
C vs E
D vs E
The only way I can think of to distribute the cross-check combinations across a given number of threads is to first create a list all cross-check combination pairs and then split that list into evenly sized chunks. That would work in principle, but even for just 20,000 items that list would already contain (20,000-1)*(20,000/2)=199,990,000 entries!!
So my question is, is there some super-sophisticated algorithm that would allow me to pass the entire list of items to each thread and then have each individual thread figure out by itself which cross-checks it should run so that no 2 threads would repeat the same cross-checks?
I'm programming this in Perl, but really the problem is independent from any particular programming language.
EDIT: Hmmm, I'm now wondering if I've been going about this the wrong way altogether. If I could sort the items by their address ranges, I could just walk through the sorted list and check if any item overlaps with its successor item. I'll try that and see if that speeds things up.
UPDATE: Oh my God, this actually works!!! :D Using a pre-sorted list, the entire operation takes 0.7 seconds for 11,700 items, where my previous naive implementation would take 2 to 3 minutes!
UPDATE AFTER usr's comment: As usr has noted, just checking each item against its immediate successor is not enough. As I'm walking through the sorted list, I'm dragging along an additional (initially empty) list in which I keep track of all items involved in the current overlap. Each time an item is found to overlap with its successor item, the successor item is added to the list (if the list was previously empty, the current item itself is also added). As soon as an item does NOT overlap with its successor item, I locally cross-check all items in my additional list against each other and then clear that list (the same operation is performed if there are still any items in my additional list after I've finished walking the list of all items).
My unit tests seem to confirm that this algorithm works; at least with all the examples I've fed it so far.
It seems like you could create N threads where N = number of cores on the computer. Each of these threads is identical and consumes items from the queue until there are no more items. Each item is the comparison pair that the thread should work on. Since an item can only be consumed once, you will get no duplicate work.
On the producer side, simply send every valid combination to the queue (just the pairs of items); the threads are what does the work on each item. Thus there is no need to spit the items into chunks.
It would be great if each thread could be pinned to a core, but whatever OS you're running on will most likely do a good enough job at scheduling that you won't need to worry about that.

Getting minimum element from maximum priority queue

What is the most efficient way to get the minimum element from a maximum priority queue?
Let's say i have created a generic priority queue. Now this queue contains cats and cats has a variable fish, which is the number of fishes the cat has eaten and i want to get the cat that has eaten the fewest fish and give her some more fish, then sort the priority queue again(which means i call swim() to get the fish that has eaten closer to the root). But since the priority queue is a max one( it has to be max , it can't be min) how can i get the cat that has eaten the fewest fishes?
Given a priority queue in which the top element is the largest, the only way to get the smallest item is to remove all of the elements from the queue. The last one you remove is the smallest item. The complexity of that approach is O(n log n). Plus you'll have to save the items you've popped and put them back into the queue after you've removed the last one.
It appears that you've implemented the priority queue as a binary max heap. If that's the case, then the smallest item will be a leaf, which by definition will be in the last n/2 items. So, if you have access to the backing array, you could search the last n/2 items for the smallest item. That's an O(n) operation.
If removing the smallest item is something you need to do on a regular basis, you might consider implementing a min-max heap.
Redefine the ordering property for fish.
If you're working in a language which uses the comparator concept, one way to do this is to write a comparator that reverses the default ordering property for fish. Another alternative is to store the negative of the fish count when creating/pushing a cat. The logic could be built into cat objects, so the user only sees and manipulates positive quantities of fish, but they are internally stored as negative values.

How does Dijkstra's Shortest Path Algorithm work with Priority Queues?

I have been reading online websites and everybody says that using a priority queue will make it good, but I don't understand what is used as the "priority" here.
At the very beginning, is the first item on the priority queue always the starting point node? If so, when we extract the starting node with distance 0, how do we get its neighbors from the priority queue?
A priority queue Q stores a set of distinct elements. Each element
x has an associated key x.key
When Dijkstra is based on a priority queue. Then we store the vertices In the queue whose distances from the source are yet to be settled, keyed on their current distance from the source.
Take a look at this pdf where the algorithm is based on the abstract data structure called a priority queue, which can be implemented using a binary heap.
http://www.cs.bris.ac.uk/~montanar/teaching/dsa/dijkstra-handout.pdf

Reverse a linked list as big as having 7 million nodes efficiently using Threads

I was asked this question to reverse a singly linked list as big as having 7 million nodes by using threads efficiently. Using recursion doesn't look feasible if there are so many nodes so I opted for divide and conquer where in each thread be given a chunk of linked list which gets reversed by just making the node pointer point back to previous node by store a reference to current, future and past node and later adding it with reversed chunks from other threads. But the interviewer insisted that the size of the link list is not know, and you can do it without finding the size in an efficient manner. Well I couldn't figure it out , how would you go about it ?
Such questions I like to implement "top-down":
Assume that you already have a Class that implement Runnable or extends Thread out of which you can create instances and run, each instance receives two parameters: a pointer to a Node in the List and number of Nodes to reverse
Your main traverse all 7 million nodes and "marks" the starting points for your threads, say we have 7 threads, the marked points will be: 1, 1,000,000, 2,000,000,... save the marked nodes in an array or whichever data-structure you like
After you finished "marking the starting points, create the threads and give each one of them its starting point and the counter 1,000,000
After all the threads are done, "glue" each of the marking points to point back to the last node of the previous thread (which should be saved in another "static" ordered data-structure).
Now that we have a plan - all that's left to do is implement a (considerably easy) algorithm that, give the number N and a Node x, it will reverse the next N nodes (including x) in a singly linked list :)

C# Concurrent, Fixed-Size Queue with Ability to Reference Individual Item

I need the following collection:
1 - Fixed-size length. So, it will automatically dequeue the tail when it reaches the fixed-size limit.
2 - Can access individual elements, not necessarily the head or tail only.
3 - FIFO.
4 - Allows safe concurrent access (however, I can compromise on this bit for now).
5 - Enqueue and Dequeue methods.
I am using .NET 4.5 and aware of the ConcurrentQueue class, however, it is missing points 1 and 2. I am thinking of building my class that implements IEnumerable and uses IList in the background.
I could inherit from ConcurrentQueue, but I need to continuously access body elements (not just the head and tail) and enumerating it every time would be inefficient.
Do you have a better approach or do you recommend any collection that does a similar job?
Use queue implementation based on old plain System.Array. Your points:
1 - Array has fixed length.
2 - Can access individual elements in O(1) by definition.
3 - This will be queue, so it FIFO.
4 - Array will be concurrent safe, just use lock for Enqueue and Dequeue methods.
5 - Implement them easily.
Build a wrapper around a List or an array. Provide the queue methods your want, keep track of the size, etc etc. Program your own concurrency requirements into this class.

Resources