Sort with threads - multithreading

Sort with threads - multithreading

I have an assignment and i need working code. Before i start i want to understand the problem but i cant figure out how to write it.
I have an array of data, take this for example
var arr = new byte[] {5,3,1,7,8,5,3,2,6,7,9,3,2,4,2,1}
I need to split this array in half, throw it into a thread pool and do that recursively until i have <=2 elements. If i have 2 elements i need to check which is less and put it on the left side then return the array.
What i don't understand is how do i merge the array? am i suppose to split the array, throw a thread into the pool and block until its ready? How do i get the results of the thread? I'm going to assume its not possible to merge the arrays without blocking?
Heres what i have so far.
static void Main(string[] args)
{
var arr = new byte[] { 5, 3, 1, 7, 8, 5, 3, 2, 6, 7, 9, 3, 2, 4, 2, 1 };
var newarr = Sort(arr);
Console.Write(BitConverter.ToString(newarr));
}
static byte[] Sort(byte[] arr)
{
if (arr.Length <= 2)
return arr;
if (arr.Length == 2)
{
if (arr[0] > arr[1])
{
var t = arr[0];
arr[0] = arr[1];
arr[1] = t;
}
return arr;
}
var arr1 = arr.Take(arr.Length / 2).ToArray();
var arr2 = arr.Skip(arr1.Count()).ToArray();
//??
return arr;
}
Note: The prof did say we can ask others for help. I think i can solve this without asking but i want to get the best answer. Threading is my weakness (i dont everything else, db, binary io, web interface, just never complex threads)

This appears to be a parallel version of merge sort. You should essential make it work like the recursive sequential version, but apparently run each recursive sort as a separate task.
In your task API, there should be some way to wait for completion, and perhaps to also pass results. With that, you can copy the traditional mergesort fairly well: for each sub-sort, put a task into the pool, and wait for completion of the two subtasks. Then perform the merge, and pass back your own result to your calling tasking.
If you have a regular thread API only (i.e. no real task library), then I suggest you provide the output in a third array: each thread will have two input arrays, and one output array. If you are allowed to create fresh threads for each task, you can wait for task completion by joining the two subthreads.

To add onto Martin's answer, I would not create smaller copies of the array. Instead, I would have each thread work on a subset of the original array.

More than happy to oblige: Any multi-core sorting implementation in .NET?

Related

What are the ways to avoid object mutation in Dart-lang?

If we take the following snippet as an example:
main() {
List<int> array = [1, 2, 3, 4];
List<int> newArray = change(array);
print(array); // [99, 2, 3, 4]
print(newArray); // [99, 2, 3, 4]
print(newArray == array); // true
}
change(List<int> array) {
var newArray = array;
newArray[0] = 99;
return newArray;
}
The original array gets mutated. I was expecting that by passing the array (object) to the change function and assigning a new variable to it that I could avoid mutation. I am aware that the built_collection library seems like the main go-to for immutable collections. Is there any native way the core library that would allow for a deep freeze or prevent side effects (operations inside another function)?

You can wrap an array in an UnmodifiableListView from dart:collection and pass this around instead of the array itself. I think this is the most basic buit-in way.

Objects are passed by reference. This is by design: objects are often large data structures and internally making a copy every time an object is passed to a function can be very inefficient. (This is the same approach as that used by other major object oriented languages.)
As a result, array and newArray are two names for the same underlying List in your code.
If you want to explicitly create a new list, just change
var newArray = array;
to:
var newArray = new List.from(array);

Lazy evaluation of chained functional methods in Groovy

What I've seen in Java
Java 8 allows lazy evaluation of chained functions in order to avoid performance penalties.
For instance, I can have a list of values and process it like this:
someList.stream()
.filter( v -> v > 0)
.map( v -> v * 4)
.filter( v -> v < 100)
.findFirst();
I pass a number of closures to the methods called on a stream to process the values in a collection and then only grab the first one.
This looks as if the code had to iterate over the entire collection, filter it, then iterate over the entire result and apply some logic, then filter the whole result again and finally grab just a single element.
In reality, the compiler handles this in a smarter way and optimizes the number of iterations required.
This is possible because no actual processing is done until findFirst is called. This way the compiler knows what I want to achieve and it can figure out how to do it in an efficient manner.
Take a look at this video of a presentation by Venkat Subramaniam for a longer explanation.
What I'd like to do in Groovy
While answering a question about Groovy here on StackOverflow I figured out a way to perform the task the OP was trying to achieve in a more readable manner. I refrained from suggesting it because it meant a performance decrease.
Here's the example:
collectionOfSomeStrings.inject([]) { list, conf -> if (conf.contains('homepage')) { list } else { list << conf.trim() } }
Semantically, this could be rewritten as
collectionOfSomeStrings.grep{ !it.contains('homepage')}.collect{ it.trim() }
I find it easier to understand but the readability comes at a price. This code requires a pass of the original collection and another iteration over the result of grep. This is less than ideal.
It doesn't look like the GDK's grep, collect and findAll methods are lazily evaluated like the methods in Java 8's streams API. Is there any way to have them behave like this? Is there any alternative library in Groovy that I could use?
I imagine it might be possible to use Java 8 somehow in Groovy and have this functionality. I'd welcome an explanation on the details but ideally, I'd like to be able to do that with older versions of Java.
I found a way to combine closures but it's not really what I want to do. I'd like to chain not only closures themselves but also the functions I pass them to.
Googling for Groovy and Streams mostly yields I/O related results. I haven't found anything of interest by searching for lazy evaluation, functional and Groovy as well.

Adding the suggestion as an answer taking cfrick's comment as an example:
#Grab( 'com.bloidonia:groovy-stream:0.8.1' )
import groovy.stream.Stream
List integers = [ -1, 1, 2, 3, 4 ]
//.first() or .last() whatever is needed
Stream.from integers filter{ it > 0 } map{ it * 4 } filter{ it < 15 }.collect()
Tim, I still know what you did few summers ago. ;-)

Groovy 2.3 supports jdk8 groovy.codehaus.org/Groovy+2.3+release+notes. your example works fine using groovy closures:
[-1,1,2,3,4].stream().filter{it>0}.map{it*4}.filter{it < 100}.findFirst().get()
If you can't use jdk8, you can follow the suggestion from the other answer or achieve "the same" using RxJava/RxGroovy:
#Grab('com.netflix.rxjava:rxjava-groovy:0.20.7')
import rx.Observable
Observable.from( [-1, 1, 2, 3, 4, 666] )
.filter { println "f1 $it"; it > 0 }
.map { println "m1 $it"; it * 4 }
.filter { println "f2 $it"; it < 100 }
.subscribe { println "result $it" }

Multithread+Recursion strategies

I am just starting to learn the ins-and-outs of multithread programming and have a few basic questions that, once answered, should keep me occupied for quite sometime. I understand that multithreading loses its effectiveness once you have created more threads than there are cores (due to context switching and cache flushing). With that understood, I can think of two ways to employ multithreading of a recursive function...but am not quite sure what is the common way to approach the problem. One seems much more complicated, perhaps with a higher payoff...but thats what I hope you will be able to tell me.
Below is pseudo-code for two different methods of multithreading a recursive function. I have used the terminology of merge sort for simplicity, but it's not that important. It is easy to see how to generalize the methods to other problems. Also, I will personally be employing these methods using the pthreads library in C, so the thread syntax mildly reflects this.
Method 1:
main ()
{
A = array of length N
NUM_CORES = get number of functional cores
chunk[NUM_CORES] = array of indices partitioning A into (N / NUM_CORES) sized chunks
thread_id[NUM_CORES] = array of thread id’s
thread[NUM_CORES] = array of thread type
//start NUM_CORES threads on working on each chunk of A
for i = 0 to (NUM_CORES - 1) {
thread_id[i] = thread_start(thread[i], MergeSort, chunk[i])
}
//wait for all threads to finish
//Merge chunks appropriately
exit
}
MergeSort ( chunk )
{
MergeSort ( lowerSubChunk )
MergeSort ( higherSubChunk )
Merge(lowerSubChunk, higherSubChunk)
}
//Merge(,) not shown
Method 2:
main ()
{
A = array of length N
NUM_CORES = get number of functional cores
chunk = indices 0 and N
thread_id[NUM_CORES] = array of thread id’s
thread[NUM_CORES] = array of thread type
//lock variable aka mutex
THREADS_IN_USE = 1
MergeSort( chunk )
exit
}
MergeSort ( chunk )
{
lock THREADS_IN_USE
if ( THREADS_IN_USE < NUM_CORES ) {
FREE_CORE = find index of unused core
thread_id[FREE_CORE] = thread_start(thread[FREE_CORE], MergeSort, lowerSubChunk)
THREADS_IN_USE++
unlock THREADS_IN_USE
MergeSort( higherSubChunk )
//wait for thread_id[FREE_CORE] and current thread to finish
lock THREADS_IN_USE
THREADS_IN_USE--
unlock THREADS_IN_USE
Merge(lowerSubChunk, higherSubChunk)
}
else {
unlock THREADS_IN_USE
MergeSort( lowerSubChunk )
MergeSort( higherSubChunk )
Merge(lowerSubChunk, higherSubChunk)
}
}
//Merge(,) not shown
Visually, one can think of the differences between these two methods as follows:
Method 1: creates NUM_CORES separate recursion trees, each one having a single core traversing it.
Method 2: creates a single recursion tree but has all cores traversing it. In particular, whenever there is a free core, it is set to work on the "left child subtree" of the first node where MergeSort is called after the core is freed.
The problem with Method 1 is that if it is the case that the running time of the recursive function varies with the distribution of values within each initial subchunk (i.e. the chunk[i]), one thread could finish much faster leaving a core sitting idle while the others finish. With Merge Sort this is not likely to be the case since the work of MergeSort happens in Merge whose runtime isn't affected much by the distribution of values in the (sorted) subchunks. However, with a more involved recursive function, the running time on one subchunk could be much longer!
With Method 2 it is possible to have the same problem. Again, with merge sort its not clear since the running time for each subchunk is likely to be similar, but the line //wait for thread_id[FREE_CORE] and current thread to finish would also require one core to wait for the other. However, with Method 2, all calls to Merge run ASAP as opposed to Method 1 where one must wait for NUM_CORES calls to MergeSort to finish and then do NUM_CORES - 1 merges afterward (although you can multithread this as well...to an extent)
(though the syntax might not be completely correct)
Are both of these methods used in practice? Are there situations where one is more beneficial over the other? Is this the correct way to implement Method 2? (in this case, THREADS_IN_USE is a semaphore?)
Thanks so much for your help!

What are the benefits of `while(condition) { //work }` and `do { //work } while(condition)`?

I found myself confronted with an interview question where the goal was to write a sorting algorithm that sorts an array of unsorted int values:
int[] unsortedArray = { 9, 6, 3, 1, 5, 8, 4, 2, 7, 0 };
Now I googled and found out that there are so many sorting algorithms out there!
Finally I could motivate myself to dig into Bubble Sort because it seemed pretty simple to start with.
I read the sample code and came to a solution looking like this:
static int[] BubbleSort(ref int[] array)
{
long lastItemLocation = array.Length - 1;
int temp;
bool swapped;
do
{
swapped = false;
for (int itemLocationCounter = 0; itemLocationCounter < lastItemLocation; itemLocationCounter++)
{
if (array[itemLocationCounter] > array[itemLocationCounter + 1])
{
temp = array[itemLocationCounter];
array[itemLocationCounter] = array[itemLocationCounter + 1];
array[itemLocationCounter + 1] = temp;
swapped = true;
}
}
} while (swapped);
return array;
}
I clearly see that this is a situation where the do { //work } while(cond) statement is a great help to be and prevents the use of another helper variable.
But is this the only case that this is more useful or do you know any other application where this condition has been used?

In general:
use do...while when you want the body to be executed at least once.
use while... when you may not want the body to be executed at all.
EDIT: I'd say the first option comes up about 10% of the time and the second about 90%. You can always re-factor to use either, in either circumstance. Use the one that's closest to what you want to say.

do...while guarantees that the body of code inside the loop executes at least once. This can be handy under certain conditions; when coding a REPL loop, for example.

Anytime you need to loop through some code until a condition is met is a good example of when to use do...while or while...
A good example of when to use do...while or while... is if you have a game or simulation where the game engine is continuously running the various components until some condition occurs like you win or lose.
Of course this is only one example.

The above posts are correct regarding the two conditional looping forms. Some languages have a repeat until form instead of do while. Also there's a minimalist view where only the necessary control structures should exist in a language. The while do is necessary but the do while isn't. And as for bubble sort you'll want to avoid going there as it's the slowest of the commonly known sorting algorithms. Look at selection sort or insertion sort instead. Quicksort and merge sort are fast but are hard to write without using recursion and perform badly if you happen to chose a poor pivot value.

is it possible to break out of closure in groovy

is there a way to 'break' out of a groovy closure.
maybe something like this:
[1, 2, 3].each {
println(it)
if (it == 2)
break
}

I often forget that Groovy implements an "any" method.
[1, 2, 3].any
{
println it
return (it == 2)
}

12/05/2013 Heavily Edited.
Answering the question that was asked.
Is it possible to break out of a Closure?
You would "break" out of a closure by issuing the return keyword. However that isn't helpful in the example that is given. The reason for this is that the closure (think of it as a method) is called by the each method for every item in the collection.
If you run this example you will see it will print 1 then 3.
[1, 2, 3].each {
if (it == 2) return
println(it)
}
Why break in the context of each doesn't make sense.
To understand why you cannot break out of the each method like you could break out of a for loop you need to understand a bit of what is actually happening. Here is a gross simplification what the each method on a collection does.
myEach([0,1,3])
void myEach(List things) {
for (i in things) {
myEachMethod(i)
}
}
void myEachMethod(Object it) { // this is your Closure
if(it == 2) return
println it
}
As you can see the closure is basically a method that can be passed around. Just as in java you cannot break from within method call or closure.
What to do instead of breaking from each.
In Groovy you are supposed to express your code using high level abstractions as such primitive looping is not idiomatic. For the example that you gave I would consider making use of findAll. For example:
[1,2,3].findAll { it < 2 }.each { println it }
I hope this helps you understand what is going on.
Answering the implied question.
Can you break out of the Collection.each iterations against your supplied closure?
You cannot break out of the each method without throwing and catching an exception as John Wagenleitner has said. Although I would argue that throwing and catching an exception in the name of flow control is a code smell and a fellow programmer might slap your hands.

You can throw an exception:
try {
[1, 2, 3].each {
println(it)
if (it == 2)
throw new Exception("return from closure")
}
} catch (Exception e) { }
Use could also use "findAll" or "grep" to filter out your list and then use "each".
[1, 2, 3].findAll{ it < 3 }.each{ println it }

Take a look at Best pattern for simulating continue in groovy closure for an extensive discussion.

Try to use any instead of each
def list = [1, 2, 3, 4, 5, -1, -2]
list.any { element ->
if (element > 3)
return true // break
println element
}
The result : 1, 2, 3

Just using special Closure
// declare and implement:
def eachWithBreak = { list, Closure c ->
boolean bBreak = false
list.each() { it ->
if (bBreak) return
bBreak = c(it)
}
}
def list = [1,2,3,4,5,6]
eachWithBreak list, { it ->
if (it > 3) return true // break 'eachWithBreak'
println it
return false // next it
}

There is an other solution. Although, that groovy stuff like each/find/any is quite cool: if it doesn't fit, don't use it. You can still use the plain old
for (def element : list)
Especially, if you want to leave the method, too. Now you are free to use continue/break/return as you like. The resulting code might not be cool, but it is easy and understandable.

This is in support of John Wagenleiter's answer. Tigerizzy's answer is plain wrong. It can easily be disproved practically by executing his first code sample, or theoretically by reading Groovy documentation. A return returns a value (or null without an argument) from the current iteration, but does not stop the iteration. In a closure it behaves rather like continue.
You won't be able to use inject without understanding this.
There is no way to 'break the loop' except by throwing an exception. Using exceptions for this purpose is considered smelly. So, just as Wagenleiter suggests, the best practice is to filter out the elements you want to iterate over before launching each or one of its cousins.

With rx-java you can transform an iterable in to an observable.
Then you can replace continue with a filter and break with takeWhile
Here is an example:
import rx.Observable
Observable.from(1..100000000000000000)
.filter { it % 2 != 1}
.takeWhile { it<10 }
.forEach {println it}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Sort with threads - multithreading

To add onto Martin's answer, I would not create smaller copies of the array. Instead, I would have each thread work on a subset of the original array.

More than happy to oblige: Any multi-core sorting implementation in .NET?

Related

What are the ways to avoid object mutation in Dart-lang?

Lazy evaluation of chained functional methods in Groovy

Multithread+Recursion strategies

What are the benefits of `while(condition) { //work }` and `do { //work } while(condition)`?

is it possible to break out of closure in groovy

Categories

Resources