Scala : Merge Sort using Futures timing out - multithreading

The following sequential Merge Sort returns the result very quickly :-
def mergeSort(xs: List[Int]): List[Int] = {
def merge(xs: List[Int], ys: List[Int]): List[Int] = (xs, ys) match {
case (Nil, _) => ys
case (_, Nil) => xs
case (x :: xs1, y :: ys1) => if (x <= y) x :: merge(xs1, ys) else y :: merge(xs, ys1)
}
val mid = xs.length / 2
if (mid <= 0) xs
else {
val (xs1, ys1) = xs.splitAt(mid)
merge(mergeSort(xs1), mergeSort(ys1))
}
}
val newList = (1 to 10000).toList.reverse
mergeSort(newList)
However, when I try to parallelize it using Futures, it times out :-
def mergeSort(xs: List[Int]): List[Int] = {
def merge(xs: List[Int], ys: List[Int]): List[Int] = (xs, ys) match {
case (Nil, _) => ys
case (_, Nil) => xs
case (x :: xs1, y :: ys1) => if (x <= y) x :: merge(xs1, ys) else y :: merge(xs, ys1)
}
val mid = xs.length / 2
if (mid <= 0) xs
else {
val (xs1, ys1) = xs.splitAt(mid)
val sortedList1 = Future{mergeSort(xs1)}
val sortedList2 = Future{mergeSort(ys1)}
merge(Await.result(sortedList1,5 seconds), Await.result(sortedList2,5 seconds))
}
}
val newList = (1 to 10000).toList.reverse
mergeSort(newList)
I get a Timeout exception. I understand that this is probably because this code spawns log2 10000 threads which adds a lot of delay as the Execution context Threadpool may not have that many threads.
1.) How do I exploit the inherent parallelism in merge sort and parallelize this code ?
2.) For what use cases are Futures useful and when should they be avoided ?
Edit 1 : Refactored code based on the feedback I've gotten so far :-
def mergeSort(xs: List[Int]): Future[List[Int]] = {
#tailrec
def merge(xs: List[Int], ys: List[Int], acc: List[Int]): List[Int] = (xs, ys) match {
case (Nil, _) => acc.reverse ::: ys
case (_, Nil) => acc.reverse ::: xs
case (x :: xs1, y :: ys1) => if (x <= y) merge(xs1, ys, x :: acc) else merge(xs, ys1, y :: acc)
}
val mid = xs.length / 2
if (mid <= 0) Future {
xs
}
else {
val (xs1, ys1) = xs.splitAt(mid)
val sortedList1 = mergeSort(xs1)
val sortedList2 = mergeSort(ys1)
for (s1 <- sortedList1; s2 <- sortedList2) yield merge(s1, s2, List())
}
}

Usually when using Futures, you should a) await as little as possible and prefer to work within Futures, and b) pay attention to which execution context you are using.
As an example of a), here's how you could change this:
def mergeSort(xs: List[Int]): Future[List[Int]] = {
def merge(xs: List[Int], ys: List[Int]): List[Int] = (xs, ys) match {
case (Nil, _) => ys
case (_, Nil) => xs
case (x :: xs1, y :: ys1) => if (x <= y) x :: merge(xs1, ys) else y :: merge(xs, ys1)
}
val mid = xs.length / 2
if (mid <= 0) Future(xs)
else {
val (xs1, ys1) = xs.splitAt(mid)
val sortedList1 = mergeSort(xs1)
val sortedList2 = mergeSort(ys1)
for (s1 <- sortedList1; s2 <- sortedList2) yield merge(s1, s2)
}
}
val newList = (1 to 10000).toList.reverse
Await.result(mergeSort(newList), 5 seconds)
However there's still a ton of overhead here. Typically you would only parallelize significantly-sized chunks of work to avoid being dominated by overhead, which in this case would probably mean falling back to a single-threaded version when recursion reaches a list below some constant size.

Related

Haskell cannot get right type from ':' cons operator

I am getting an error where I am trying to use map in a depth first search algorithm to test paths to contain the goal cell in a maze that is undirected, and without cycles. where I am running into trouble is with the recursive call with map.
Here is my code:
type Maze = [[Cell]]
data Cell = Cell { top, left, right, bottom :: Bool }
type Pos = (Int, Int)
type Path = [Pos]
findPath :: Maze -> Path
findPath [] = []
findPath maze = dfs maze [] (-1,-1) (1,1)
dfs :: Maze ->Path -> Pos -> Pos -> Path
dfs maze trail prev curr
| (curr == goal) = reverse $ goal : trail -- ?
| (null adj) = []
| otherwise = dfs maze (curr : trail) curr `map` (adj c (fst curr) (snd curr) prev)
where c = maze!!(fst curr- 1)!!(snd curr - 1)
goal = (length maze, length (maze!!0))
adj:: Cell -> Int -> Int -> Pos ->Path
adj c x y prev = if (top c && prev /= (x-1, y)) then [(x-1, y)] else [] ++
if (left c && prev /= (x, y-1)) then [(x, y-1)] else [] ++
if (right c && prev /= (x, y+1)) then [(x, y+1)] else [] ++
if (bottom c && prev /= (x+1, y)) then [(x+1, y)] else []
what I expect with dfs maze (curr : trail) curr 'map' (adj c (fst curr) (snd curr) prev) is that I apply a function f::Pos->trail to each element in [Pos] but what the (curr : trail) gives is a [Path] rather than a Path
The error stack gives me is as follows:
stack: WARNING! Expecting stack options comment at line 1, column 1
stack: WARNING! Missing or unusable stack options specification
stack: WARNING! Using runghc without any additional stack options
SolveMaze.hs:77:24: error:
* Couldn't match type `[Pos]' with `(Int, Int)'
Expected type: Path
Actual type: [Path]
* In the expression:
dfs maze (curr : trail) curr
`map` (adj c (fst curr) (snd curr) prev)
In an equation for `dfs':
dfs maze trail prev curr
| (curr == goal) = reverse $ goal : trail
| (null adj) = []
| otherwise
= dfs maze (curr : trail) curr
`map` (adj c (fst curr) (snd curr) prev)
where
c = maze !! (fst curr - 1) !! (snd curr - 1)
goal = (length maze, length (maze !! 0))
|
77 | | otherwise = dfs maze (curr : trail) curr `map` (adj c (fst curr)
(snd curr) prev)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^
I am sorry if this is a really basic problem for you Haskell wizards but I've been staring at this for so long that I can't take it and needed to reach out for help. Thank you.
Lets zoom in on two lines. First is the type of dfs:
dfs :: Maze ->Path -> Pos -> Pos -> Path
So dfs when fully applied returns a Path, great.
We also have the definition of dfs, which must return a Path, is:
dfs maze (curr : trail) curr `map` (adj c (fst curr) (snd curr) prev)
Or with some simplifications to make explicit what is going on:
map (dfs and some args) (some list)
So dfs must return a path, so says the type, but the definition shows it as returning a list of paths.
What you seem to want is try to descend into one of the adjacent positions and perform a depth first search then descend into the next possible path, taking advantage of lazy evaluation along the way - great!
Lets change dfs to returning a list of Paths ([Path]) - that will be a list of solutions or non-solutions ([]) if a dead end is found. Change reverse ... to [reverse ...]. and map to concatMap.
It doesn't make sense to ask if a function is null, I think you ment the application of adj such as null (adj c (fst curr) ....
Find path now has to select one of the list of solutions now returned by dfs - the first one should suffice. You can use listToMaybe to get a Maybe Path result.
.
import Data.Maybe (listToMaybe)
type Maze = [[Cell]]
data Cell = Cell { top, left, right, bottom :: Bool }
type Pos = (Int, Int)
type Path = [Pos]
findPath :: Maze -> Maybe Path
findPath [] = Just []
findPath maze = listToMaybe $ dfs maze [] (-1,-1) (1,1)
dfs :: Maze ->Path -> Pos -> Pos -> [Path]
dfs maze trail prev curr
| (curr == goal) = [reverse $ goal : trail] -- ?
| (null adjVal) = []
| otherwise = dfs maze (curr : trail) curr `concatMap` adjVal
where c = maze!!(fst curr- 1)!!(snd curr - 1)
goal = (length maze, length (maze!!0))
adjVal = adj c (fst curr) (snd curr) prev
adj:: Cell -> Int -> Int -> Pos ->Path
adj c x y prev = if (top c && prev /= (x-1, y)) then [(x-1, y)] else [] ++
if (left c && prev /= (x, y-1)) then [(x, y-1)] else [] ++
if (right c && prev /= (x, y+1)) then [(x, y+1)] else [] ++
if (bottom c && prev /= (x+1, y)) then [(x+1, y)] else []
There are lots of other things you can clean up if you so desire.
[] to signify a failed path instead of Maybe Path. If the first depth-first search fails then your returned "solution" will be [].
Use of !! and assumption the input is not jagged. You could use an array.
Verbose definition of adj that could use guards instead.

group by until changed sequence

I have a big Excel file, which i read with Excel Provider in F#.
The rows should be grouped by some column. Processing crashes with OutOfMemoryException. Not sure whether the Seq.groupBy call is guilty or excel type provider.
To simplify it I use 3D Point here as a row.
type Point = { x : float; y: float; z: float; }
let points = seq {
for x in 1 .. 1000 do
for y in 1 .. 1000 do
for z in 1 .. 1000 ->
{x = float x; y = float y; z = float z}
}
let groups = points |> Seq.groupBy (fun point -> point.x)
The rows are already ordered by grouped column, e.g. 10 points with x = 10, then 20 points with x = 20 and so one. Instead of grouping them I need just to split the rows in chunks until changed. Is there some way to enumerate the sequence just once and get sequence of rows splitted, not grouped, by some column value or some f(row) value?
If the rows are already ordered then this chunkify function will return a seq<'a list>. Each list will contain all the points with the same x value.
let chunkify pred s = seq {
let values = ref []
for x in s do
match !values with
|h::t -> if pred h x then
values := x::!values
else
yield !values
values := [x]
|[] -> values := [x]
yield !values
}
let chunked = points |> chunkify (fun x y -> x.x = y.x)
Here chunked has a type of
seq<Point list>
Another solution, along the same lines as Kevin's
module Seq =
let chunkBy f src =
seq {
let chunk = ResizeArray()
let mutable key = Unchecked.defaultof<_>
for x in src do
let newKey = f x
if (chunk.Count <> 0) && (newKey <> key) then
yield chunk.ToArray()
chunk.Clear()
key <- newKey
chunk.Add(x)
}
// returns 2 arrays, each with 1000 elements
points |> Seq.chunkBy (fun pt -> pt.y) |> Seq.take 2
Here's a purely functional approach, which is surely slower, and much harder to understand.
module Seq =
let chunkByFold f src =
src
|> Seq.scan (fun (chunk, (key, carry)) x ->
let chunk = defaultArg carry chunk
let newKey = f x
if List.isEmpty chunk then [x], (newKey, None)
elif newKey = key then x :: chunk, (key, None)
else chunk, (newKey, Some([x]))) ([], (Unchecked.defaultof<_>, None))
|> Seq.filter (snd >> snd >> Option.isSome)
|> Seq.map fst
Lets start with the input
let count = 1000
type Point = { x : float; y: float; z: float; }
let points = seq {
for x in 1 .. count do
for y in 1 .. count do
for z in 1 .. count ->
{x = float x; y = float y; z = float z}
}
val count : int = 1000
type Point =
{x: float;
y: float;
z: float;}
val points : seq<Point>
If we try to evalute points then we get a OutOfMemoryException:
points |> Seq.toList
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at Microsoft.FSharp.Collections.FSharpList`1.Cons(T head, FSharpList`1 tail)
at Microsoft.FSharp.Collections.SeqModule.ToList[T](IEnumerable`1 source)
at <StartupCode$FSI_0011>.$FSI_0011.main#()
Stopped due to error
It might be same reason that groupBy fails, but I'm not sure. But it tells us that we have to use seq and yield to return the groups with. So we get this implementation:
let group groupBy points =
let mutable lst = [ ]
seq { for p in points do match lst with | [] -> lst <- [p] | p'::lst' when groupBy p' p -> lst <- p::lst | lst' -> lst <- [p]; yield lst' }
val group : groupBy:('a -> 'a -> bool) -> points:seq<'a> -> seq<'a list>
It is not the most easily read code. It takes each point from the points sequence and prepends it to an accumulator list while the groupBy function is satisfied. If the groupBy function is not satisfied then a new accumulator list is generated and the old one is yielded. Note that the order of the accumulator list is reversed.
Testing the function:
for g in group (fun p' p -> p'.x = p.x ) points do
printfn "%f %i" g.[0].x g.Length
Terminates nicely (after some time).
Other implementation with bug fix and better formatting.
let group (groupBy : 'a -> 'b when 'b : equality) points =
let mutable lst = []
seq {
yield! seq {
for p in points do
match lst with
| [] -> lst <- [ p ]
| p' :: lst' when (groupBy p') = (groupBy p) -> lst <- p :: lst
| lst' ->
lst <- [ p ]
yield (groupBy lst'.Head, lst')
}
yield (groupBy lst.Head, lst)
}
Seems there is no one line purely functional solution or already defined Seq method which I have overseen.
Therefore as an alternative here my own imperative solution. Comparable to #Kevin's answer but actually satisfies more my need. The ref cell contains:
The group key, which is calculated just once for each row
The current chunk list (could be seq to be conform to Seq.groupBy), which contains the elements in the input order for which the f(x) equals to the sored group key (requires equality).
.
let splitByChanged f xs =
let acc = ref (None,[])
seq {
for x in xs do
match !acc with
| None,_ ->
acc := Some (f x),[x]
| Some key, chunk when key = f x ->
acc := Some key, x::chunk
| Some key, chunk ->
let group = chunk |> Seq.toList |> List.rev
yield key, group
acc := Some (f x),[x]
match !acc with
| None,_ -> ()
| Some key,chunk ->
let group = chunk |> Seq.toList |> List.rev
yield key, group
}
points |> splitByChanged (fun point -> point.x)
The function has the following signature:
val splitByChanged :
f:('a -> 'b) -> xs:seq<'a> -> seq<'b * 'a list> when 'b : equality
Correctures and even better solutions are welcome

Scala parallel frequency calculation using aggregate doesn't work

I'm learning Scala by working the exercises from the book "Scala for the Impatient". Please see the following question and my answer and code. I'd like to know if my answer is correct. Also the code doesn't work (all frequencies are 1). Where's the bug?
Q10: Harry Hacker reads a file into a string and wants to use a
parallel collection to update the letter frequencies concurrently on
portions of the string. He uses the following code:
val frequencies = new scala.collection.mutable.HashMap[Char, Int]
for (c <- str.par) frequencies(c) = frequencies.getOrElse(c, 0) + 1
Why is this a terrible idea? How can he really parallelize the
computation?
My answer:
It is not a good idea because if 2 threads are concurrently updating the same frequency, the result is undefined.
My code:
def parFrequency(str: String) = {
str.par.aggregate(Map[Char, Int]())((m, c) => { m + (c -> (m.getOrElse(c, 0) + 1)) }, _ ++ _)
}
Unit test:
"Method parFrequency" should "return the frequency of each character in a string" in {
val freq = parFrequency("harry hacker")
freq should have size 8
freq('h') should be(2) // fails
freq('a') should be(2)
freq('r') should be(3)
freq('y') should be(1)
freq(' ') should be(1)
freq('c') should be(1)
freq('k') should be(1)
freq('e') should be(1)
}
Edit:
After reading this thread, I updated the code. Now the test works if ran alone, but fails if ran as a suite.
def parFrequency(str: String) = {
val freq = ImmutableHashMap[Char, Int]()
str.par.aggregate(freq)((_, c) => ImmutableHashMap(c -> 1), (m1, m2) => m1.merged(m2)({
case ((k, v1), (_, v2)) => (k, v1 + v2)
}))
}
Edit 2:
See my solution below.
++ does not combine the values of identical keys. So when you merge the maps, you get (for shared keys) one of the values (which in this case is always 1), not the sum of the values.
This works:
def parFrequency(str: String) = {
str.par.aggregate(Map[Char, Int]())((m, c) => { m + (c -> (m.getOrElse(c, 0) + 1)) },
(a,b) => b.foldLeft(a){case (acc, (k,v))=> acc updated (k, acc.getOrElse(k,0) + v) })
}
val freq = parFrequency("harry hacker")
//> Map(e -> 1, y -> 1, a -> 2, -> 1, c -> 1, h -> 2, r -> 3, k -> 1)
The foldLeft iterates over one of the maps, updating the other map with the key/values found.
You trouble in first case as you detected by yourself was in ++ operator which just concatenating, dropping second occurence of same key.
Now in the second case you have the (_, c) => ImmutableHashMap(c -> 1) which just drops all of chars found my the map in seqop stage.
My suggestion is to extend the Map type with special compination operation, working like merged in HashMap and preserve collecting from first example at seqop stage:
implicit class MapUnionOps[K, V](m1: Map[K, V]) {
def unionWith[V1 >: V](m2: Map[K, V1])(f: (V1, V1) => V1): Map[K, V1] = {
val kv1 = m1.filterKeys(!m2.contains(_))
val kv2 = m2.filterKeys(!m1.contains(_))
val common = (m1.keySet & m2.keySet).toSeq map (k => (k, f(m1(k), m2(k))))
(common ++ kv1 ++ kv2).toMap
}
}
def parFrequency(str: String) = {
str.par.aggregate(Map[Char, Int]())((m, c) => {m + (c -> (m.getOrElse(c, 0) + 1))}, (m1, m2) => (m1 unionWith m2)(_ + _))
}
Or you can use fold solution from Paul's answer, but for better performance for each merge choose lesser map to traverse:
implicit class MapUnionOps[K, V](m1: Map[K, V]) {
def unionWith(m2: Map[K, V])(f: (V, V) => V): Map[K, V] =
if (m2.size > m1.size) m2.unionWith(m1)(f)
else m2.foldLeft(m1) {
case (acc, (k, v)) => acc + (k -> acc.get(k).fold(v)(f(v, _)))
}
}
This seems to work. I like it better than the other solutions proposed here because:
It's lot less code than an implicit class and slightly less code than using getOrElse with foldLeft.
It uses the merged function from the API which's intended to do what I want.
It's my own solution :)
def parFrequency(str: String) = {
val freq = ImmutableHashMap[Char, Int]()
str.par.aggregate(freq)((_, c) => ImmutableHashMap(c -> 1), _.merged(_) {
case ((k, v1), (_, v2)) => (k, v1 + v2)
})
}
Thanks for taking the time to help me out.

Getting parse error while doing list comprehensions in haskell

I'm writing a function like this:
testing :: [Int] -> [Int] -> [Int]
testing lst1 lst2 =
let t = [ r | (x,y) <- zip lst1 lst2, let r = if y == 0 && x == 2 then 2 else y ]
let t1 = [ w | (u,v) <- zip t (tail t), let w = if (u == 2) && (v == 0) then 2 else v]
head t : t1
What the first let does is: return a list like this: [2,0,0,0,1,0], from the second let and the following line, I want the output to be like this: [2,2,2,2,1,0]. But, it's not working and giving parse error!!
What am I doing wrong?
There are two kinds of lets: the "let/in" kind, which can appear anywhere an expression can, and the "let with no in" kind, which must appear in a comprehension or do block. Since your function definition isn't in either, its let's must use an in, for example:
testing :: [Int] -> [Int] -> [Int]
testing lst1 lst2 =
let t = [ r | (x,y) <- zip lst1 lst2, let r = if y == 0 && x == 2 then 2 else y ] in
let t1 = [ w | (u,v) <- zip t (tail t), let w = if (x == 2) && (y == 0) then 2 else y] in
return (head t : t1)
Alternately, since you can define multiple things in each let, you could consider:
testing :: [Int] -> [Int] -> [Int]
testing lst1 lst2 =
let t = [ r | (x,y) <- zip lst1 lst2, let r = if y == 0 && x == 2 then 2 else y ]
t1 = [ w | (u,v) <- zip t (tail t), let w = if (x == 2) && (y == 0) then 2 else y]
in return (head t : t1)
The code has other problems, but this should get you to the point where it parses, at least.
With an expression formed by a let-binding, you generally need
let bindings
in
expressions
(there are exceptions when monads are involved).
So, your code can be rewritten as follows (with simplification of r and w, which were not really necessary):
testing :: [Int] -> [Int] -> [Int]
testing lst1 lst2 =
let t = [ if y == 0 && x == 2 then 2 else y | (x,y) <- zip lst1 lst2]
t1 = [ if (v == 0) && (u == 2) then 2 else v | (u,v) <- zip t (tail t)]
in
head t : t1
(Note, I also switched u and v so that t1 and t has similar forms.
Now given a list like [2,0,0,0,1,0], it appears that your code is trying to replace 0 with 2 if the previous element is 2 (from the pattern of your code), so that eventually, the desired output is [2,2,2,2,1,0].
To achieve this, it is not enough to use two list comprehensions or any fixed number of comprehensions. You need to somehow apply this process recursively (again and again). So instead of only doing 2 steps, we can write out one step, (and apply it repeatedly). Taking your t1 = ... line, the one step function can be:
testing' lst =
let
t1 = [ if (u == 2) && (v == 0) then 2 else v | (u,v) <- zip lst (tail lst)]
in
head lst : t1
Now this gives:
*Main> testing' [2,0,0,0,1,0]
[2,2,0,0,1,0]
, as expected.
The rest of the job is to apply testing' as many times as necessary. Here applying it (length lst) times should suffice. So, we can first write a helper function to apply another function n times on a parameter, as follows:
apply_n 0 f x = x
apply_n n f x = f $ apply_n (n - 1) f x
This gives you what you expected:
*Main> apply_n (length [2,0,0,0,1,0]) testing' [2,0,0,0,1,0]
[2,2,2,2,1,0]
Of course, you can wrap the above in one function like:
testing'' lst = apply_n (length lst) testing' lst
and in the end:
*Main> testing'' [2,0,0,0,1,0]
[2,2,2,2,1,0]
NOTE: this is not the only way to do the filling, see the fill2 function in my answer to another question for an example of achieving the same thing using a finite state machine.

*** Exception: Prelude.head: empty list in Haskell

I am trying to write a function that does this :
func [3,3,5,7] 1 = [(3,2),(5,1),(7,1)]
It makes tuple of unique elements in a list and gives the occurence of the elements. I wrote this :
func [] n = []
func (x:xs) n = if x == head xs then func (xs) (n + 1) else (x, n) : func (xs) 1
I get this exception :
* Exception: Prelude.head: empty list
In order to fix this I write this :
func [] n = []
func [x] n = (x,n)
func (x:xs) n = if x == head xs then func (xs) (n + 1) else (x, n) : func (xs) 1
But now I get this error :
Couldn't match expected type [a0]' with actual type(t0, t1)' In the
expression: (x, n) In an equation for func: func [x] n = (x, n)
How can I fix this?
The first problem is here:
func (x:xs) n = if x == head xs ...
xs can be the empty list. You need to make sure xs is not empty before taking the head of it.
The second problem is here:
func [] n = []
func [x] n = (x,n)
In the first line you are saying the return type of func is a list, but in the second line you are saying it is a tuple - hence the type error.
What you want is very close to the group function in Data.List - that can give you some ideas on how to write it.
Alternatively, here is some guidance.
Clearly:
func [] = []
For the recursion case, try this:
func (x:xs) = (x, n) : func rest
where (n,rest) = ... (some function of x and xs) ...
That is, write another function which returns the count and the remainder of the list that func has to process.
Your problem is in the second case, where you return a tuple instead of a list. Wrapping that tuple in a list solves your type error.
func :: Eq a => [a] -> Integer -> [(a, Integer)]
func [] _ = []
func [x] n = [(x,n)]
func (x:xs#(y:_)) n = if x == y
then func xs (n + 1)
else (x,n) : func xs 1

Resources