Split a ConcurrentLinkedQueue into half using Spliterator - multithreading

I have a ConcurrentLinkedQueue and I want to split it into two halves and let two separate threads handle each. I have tried using Spliterator but I do not understand how to get the partitioned queues.
ConcurrentLinkedQueue<int[]> q = // contains a large number of elements
Spliterator<int[]> p1 = q.spliterator();
Spliterator<int[]> p2 = p1.trySplit();
p1.getQueue();
p2.getQueue();
I want to but cannot do p1.getQueue() etc.
Please let me know the correct way to do it.

You can't split it in half in general, I mean to split in half this queue must have a size at each point in time. And while CLQ does have a size() method, it's documentation is pretty clear that this size requires O(n) traversal time and because this is a concurrent queue it's size might not be accurate at all (it is named concurrent for a reason after all). The current Spliterator from CLQ splits it in batches from what I can see.
If you want to split it in half logically and process the elements, then I would suggest moving to some Blocking implementation that has a drainTo method, this way you could drain the elements to an ArrayList for example, that will split much better (half, then half again and so on).
On a side note, why would you want to do the processing in different threads yourself? This seems very counter-intuitive, the Spliterator is designed to work for parallel streams. Calling trySplit once is probably not even enough - you have to call it until it returns null... Either way doing these things on your own sounds like a very bad idea to me.

Related

Which one I should use in Clojure? go block or thread?

I want to see the intrinsic difference between a thread and a long-running go block in Clojure. In particular, I want to figure out which one I should use in my context.
I understand if one creates a go-block, then it is managed to run in a so-called thread-pool, the default size is 8. But thread will create a new thread.
In my case, there is an input stream that takes values from somewhere and the value is taken as an input. Some calculations are performed and the result is inserted into a result channel. In short, we have input and out put channel, and the calculation is done in the loop. So as to achieve concurrency, I have two choices, either use a go-block or use thread.
I wonder what is the intrinsic difference between these two. (We may assume there is no I/O during the calculations.) The sample code looks like the following:
(go-loop []
(when-let [input (<! input-stream)]
... ; calculations here
(>! result-chan result))
(recur))
(thread
(loop []
(when-let [input (<!! input-stream)]
... ; calculations here
(put! result-chan result))
(recur)))
I realize the number of threads that can be run simultaneously is exactly the number of CPU cores. Then in this case, is go-block and thread showing no differences if I am creating more than 8 thread or go-blocks?
I might want to simulate the differences in performance in my own laptop, but the production environment is quite different from the simulated one. I could draw no conclusions.
By the way, the calculation is not so heavy. If the inputs are not so large, 8,000 loops can be run in 1 second.
Another consideration is whether go-block vs thread will have an impact on GC performance.
There's a few things to note here.
Firstly, the thread pool that threads are created on via clojure.core.async/thread is what is known as a cached thread pool, meaning although it will re-use recently used threads inside that pool, it's essentially unbounded. Which of course means it could potentially hog a lot of system resources if left unchecked.
But given that what you're doing inside each asynchronous process is very lightweight, threads to me seem a little overkill. Of course, it's also important to take into account the quantity of items you expect to hit the input stream, if this number is large you could potentially overwhelm core.async's thread pool for go macros, potentially to the point where we're waiting for a thread to become available.
You also didn't mention preciously where you're getting the input values from, are the inputs some fixed data-set that remains constant at the start of the program, or are inputs continuously feed into the input stream from some source over time?
If it's the former then I would suggest you lean more towards transducers and I would argue that a CSP model isn't a good fit for your problem since you aren't modelling communication between separate components in your program, rather you're just processing data in parallel.
If it's the latter then I presume you have some other process that's listening to the result channel and doing something important with those results, in which case I would say your usage of go-blocks is perfectly acceptable.

Is there a reliable way to ensure a Go channel does not block on read?

This is a followup to a previous thread with a similar name.
It has an accepted answer, but that answer does not really answer the question. From that thread, here is the use-case:
if len(myChannel) > 0 {
// Possible issue here: length could have changed to 0 making this blocking
elm := <- myChannel
return elm
}
The OP calls it a "Possible issue", but it's a Definite Issue: a race condition in which another consumer may have pulled a value from the channel between the evaluation of the if condition and execution of the two statements.
Now, we are told the Go Way is to favor channels over mutex, but here it seems we can not acheive even basic non-blocking read (by polling length and reading atomically) without pairing a mutex and a channel together, and using our new concurrency data type instead of a channel.
Can that be right? Is there really no way to reliably ensure a recv does not block by checking ahead for space? (Compare with BlockingQueue.poll() in Java, or similar facilities in other queue-based messaging IPC facilities...)
This is exactly what default cases in select are for:
var elm myType
select {
case elm = <-myChannel:
default:
}
return elm
This assigns elm if it can, and otherwise returns a zero value. See "A leaky buffer" from Effective Go for a somewhat more extensive example.
Rob Napier's answer is correct.
However, you are possibly trying too hard to achieve non-blocking behaviour, assuming that it is an anti-pattern.
With Go, you don't have to worry about blocking. Go ahead, block without guilt. It can make code much easier to write, especially when dealing with i/o.
CSP allows you to design data-driven concurrent programs that can scale very well (because of not using mutexes too much). Small groups of goroutines communicating via channels can behave like a component of a larger system; these components (also communicating via channels) can be grouped into larger components; this pattern repeats at increasing scales.
Conventionally, people start with sequential code and then try to add concurrency by adding goroutines, channels, mutexes etc. As an exercise, try something different: try designing a system to be maximally concurrent - use goroutines and channels as deeply as you possibly can. You might be unimpressed with the performance you achieve ... so then perhaps try to consider how to improve it by combining (rather than dividing) blocks, reducing the total number of goroutines and so achieving a more optimal concurrency.

What is better generate random IDs at runtime or keep them handy before?

I am writing an app and need to do something functionally similar to what url shortening websites do. I will be generating 6 character (case insensitive alphanumeric) random strings which would identify their longer versions of the link. This leads to 2176782336 possibilities ((10+26)^6). While assigning these strings, there are two approaches I can think about.
Approach 1: the system generates a random string at the runtime and checks for it uniqueness in the system, if it is not unique it tries again. and finally reaches a unique string somehow. But it might create issues if the user is "unlucky" maybe.
Approach 2: I generate a pool of some possible values and assign them as soon as they are needed, this however would make sure the user is always allocated a unique string almost instantly, while this could at the same time also mean, I would have to do plenty of computation in crons beforehand and will increase over the period of time.
While I already have the code to generate such values, a help on approach might be insightful as I am looking forward to a highly accelerated app experience. I could not find any comparative study on this.
Cheers!
What I do in similar situations is to keep N values queued up so that I can instantly assign them, and then when the queue's size falls below a certain threshold (say, .2 * N) I have a background task add another N items to the queue. It probably makes sense to start this background task as soon as your program starts (as opposed to generating the first N values offline and then loading them at startup), operating on the assumption that there will be some delay between startup and requests for values from the queue.

In Node.JS, any way to make Array.prototype.sort() yield to other processes?

FYI: I am doing this already with Web Workers, and it works fine, but I was just exploring what can and can't be done with process.nextTick.
So I have an array of a million elements that I'm sorting in Node.JS. I want Node to be responsive to other requests while it's doing this.
Is there any way to make Array.prototype.sort() not block other processes? Since this is a core function, I can't insert any process.nextTick().
I could implement quicksort manually, but I can't see how you do that efficiently, in a continuation-passing-style, which seems to be required for process.nextTick(). I can modify a for loop to do this, but sort() seems impossible.
While it's not possible to make Array.prototype.sort somehow behave asynchronously itself, asynchronous sorting is definitely possible, as shown by this sorting demo to demonstrate the speed advantage of setImmediate (shim) over setTimeout.
The source code does not seem to come with any license, unfortunately. The Github repo for the demo at https://github.com/jphpsf/setImmediate-shim-demo names Jason Weber as the author. You may want to ask him if you want to use (parts) of the code.
I think that if you use setImmediate (available since Node 0.10) the individual sort operations will be effectively interleaved with I/O callbacks. For such a big amount of work, I would not recommend process.nextTick (if it works at all, because there's a 1000 maxTickDepth limit). See setImmediate vs. nextTick for some backgroud.
Using setImmediate instead of plain "synchronous" processing will certainly be slower overall, so you could choose to handle a batch of individual sort operations per "tick" to speed things up, at the expense of Node not being responsive during that time. I think the right balance between speed and responsiveness wrt I/O can only be found with experimentation.
A much simpler alternative would be to do it more like web workers: Spawn a child process and do the sorting there. Biggest problem you face then is transferring the sorted data back to your main process(to generate some kind of output, presumably). AFAIK there's nothing like transferable objects for Node.js. After having buffered the sorted array, you could stream the results to the child process stdout and parse the data in the main process, or perhaps more simple; use child process messaging.
You may not have a spare cpu core lying around, so the child process would invade some other process cpu time. To avoid the sort process from hurting your other processes, you may need to assign it a low priority. It's seemingly not possible to do this with Node itself, but you could try using nice, as discussed here: https://groups.google.com/forum/#!topic/nodejs/9O-2gLJzmcQ . I have no experience in this matter.
Well, I initially thought you could use async.sortBy, but upon closer examination it seems that won't behave as you need. See Array.sort and Node.js for a similar question, although at the moment there's no accepted answer.
I know this is a rather old question, but I came across a similar situation, with still no simple solution that I could find.
I modified an exising quick sort and published a package that gives up execution to the eventloop periodically here:
https://www.npmjs.com/package/qsort-async
If you are familiar with a traditional quicksort, my only modification was to to the initial function which does the partitioning. Basically the function still modifies the array in place, but now returns a promise. It stops execution for other things in the eventloop if it tries to process too many elements in a single iteration. (I believe the default size I specified was 10000).
Note: Its important to use setImmedate here and not process.nextTick or setTimeout here. nextTick will actually place your execution before process IO and you will still have issues responding to other requests tied to IO. setTimeout is just too slow (which I believe one of the other answers linked a demo for).
Note 2: If something like a mergesort is more your style, you could do the same sort of logic in the 'merge' function of the of the sort.
const immediate = require('util').promisify(setImmediate);
async function quickSort(items, compare, left, right, size) {
let index;
if (items.length > 1) {
index = partition(items, compare, left, right);
if (Math.abs(left - right) > size) {
await immediate();
}
if (left < index - 1) {
await quickSort(items, compare, left, index - 1, size);
}
if (index < right) {
await quickSort(items, compare, index, right, size);
}
}
return items;
}
The full code is here: https://github.com/Mudrekh/qsort-async/blob/master/lib/quicksort.js

Solve a maze using multicores?

This question is messy, i dont need a working solution, i need some psuedo code.
How would i solve this maze? This is a homework question. I have to get from point green to red. At every fork i need to 'spawn a thread' and go that direction. I need to figure out how to get to red but i am unsure how to avoid paths i already have taken (finishing with any path is ok, i am just not allowed to go in circles).
Heres an example of my problem, i start by moving down and i see a fork so one goes right and one goes down (or this thread can take it, it doesnt matter). Now lets ignore the rest of the forks and say the one going right hits the wall, goes down, hits the wall and goes left, then goes up. The other thread goes down, hits the wall then goes all the way right. The bottom path has been taken twice, by starting at different sides.
How do i mark this path has been taken? Do i need a lock? Is this the only way? Is there a lockless solution?
Implementation wise i was thinking i could have the maze something like this. I dont like the solution because there is a LOT of locking (assuming i lock before each read and write of the haveTraverse member). I dont need to use the MazeSegment class below, i just wrote it up as an example. I am allowed to construct the maze however i want. I was thinking maybe the solution requires no connecting paths and thats hassling me. Maybe i could split the map up instead of using the format below (which is easy to read and understand). But if i knew how to split it up i would know how to walk it thus the problem.
How do i walk this maze efficiently?
The only hint i receive was dont try to conserve memory by reusing it, make copies. However that was related to a problem with ordering a list and i dont think the hint was a hint for this.
class MazeSegment
{
enum Direction { up, down, left, right}
List<Pair<Direction, MazeSegment*>> ConnectingPaths;
int line_length;
bool haveTraverse;
}
MazeSegment root;
class MazeSegment
{
enum Direction { up, down, left, right}
List<Pair<Direction, MazeSegment*>> ConnectingPaths;
bool haveTraverse;
}
void WalkPath(MazeSegment segment)
{
if(segment.haveTraverse) return;
segment.haveTraverse = true;
foreach(var v in segment)
{
if(v.haveTraverse == false)
spawn_thread(v);
}
}
WalkPath(root);
Parallel Breadth-First Search
Search for parallel or multithread bread first traversal, which is basically what you're doing. Each time you come to a fork in your maze (where you can take one of several paths), you create a new worker thread to continue the search down each of the possible paths and report back which one gets you to the end.
It's similar to "simple" breadth first searches, except the subtrees can be searched in parallel.
If this were a pure tree data structure, you wouldn't need to lock, but since it's a graph traversal you will need to keep track of your visited nodes. So, the code which sets the "visited" flag of each node will need to be protected with a lock.
Here's a good example program. It uses audio feedback, so be sure your speakers are on.
http://www.break.com/games/maze15.html
Off hand, given your structure above, I could see solving this by adding an 'int Seen' to each MazeSegement instead of 'bool haveTraverse'. You could then use a interlocked increment on the 'Seen' variable as you're looping over the ConnectedPaths and only spawn a thread to take the path if the 'Seen' increment returns 1 (assuming Seen is initialized to 0).
So the code becomes something like
void WalkPath(MazeSegment segment)
{
foreach(var v in segment.ConnectedPaths)
{
if( Interlocked.Increment( &v.Path.Seen ) == 1)
spawn_thread(v.Path);
}
}
Other threads which might attempt to take the same path should get something >1. Because interlocked.increment would guarantee a thread-safe increment then we don't have to worry about 2 threads getting a result of '1' so only one thread should take a given path.
You can do this using the usual "read, calculate new value, compare-and-swap, repeat until CAS succeeds" method commonly found in lock-free programming.
All grid-squares in your maze start should hold a pointer representing the direction to move to reach the exit. Initially they all are "unknown".
Walk the maze starting at the exit. On each square reached, use compare and swap to replace "unknown" with the direction to the square this thread previously processed. If CAS fails, you've got a loop, prune that branch. If CAS succeeds, continue forward. When you assign a direction to the entrance, you now can follow the path to the exit.
Create a class (Worker) instances of which hold a path taken so far, and can only advance() through a straight corridor at given direction. At every intersection, drop the worker object which holds the path before intersection, and create two (or three) new objects, with a copy of that path and different turns taken.
Put these worker objects into a queue. Notice how every one of them is independent from another, so you may take several of them from the queue and advance() in parallel. You can simply create as many threads, or use a pool of threads according to the number of cores you have. Once any of the workers advance to the destination square, output the paths it holds, it is a solution.
Consider traversing the maze from exit to entry. In a real maze, blind alleys are intended to slow down motion form entry to exit, but rarely the other way around.
Consider adding a loop detection mechanism, e.g. by comparing intersections that make up your path with an intersection you encounter.
Consider using a hand-made linked list to represent the path. Note how inserting a new head to a linked list does not change the rest of it, so you can share the tail with other instances that don't modify it. This will reduce memory footprint and time needed to spawn a worker (only noticeable at rather large mazes).

Resources