does multithread conflict with Map in F# - multithreading

let len = 25000000
let map = Map.ofArray[|for i =1 to len do yield (i,i+1)|]
let maparr = [|map;map;map;map|]
let f1 i =
for i1 =1 to len do
let l1 = maparr.[i-1].Item(i1)
()
let index = [|1..4|]
let _ = index |> Array.Parallel.map f1
printf "done"
I found that only one core is working at full speed be the code above . But what i except is all the four thread is working together with a high level of cpu usage. So it seems multithread conflict with Map, am i right? If not, how can achieve my initial goal? Thank you in advance

So I think you were tripping a heuristic where the library assumed when there were only a small number of tasks, it would be fastest to just use a single thread.
This code maxes out all threads on my computer:
let len = 1000000
let map = Map.ofArray[|for i =1 to len do yield (i,i+1)|]
let maparr = [|map;map;map;map|]
let f1 (m:Map<_,_>) =
let mutable sum = 0
for i1 =1 to len do
let l1 = m.Item(i1)
for i = 1 to 10000 do
sum <- sum + 1
printfn "%i" sum
let index = [|1..40|]
printfn "starting"
index |> Array.map (fun t -> maparr.[(t-1)/10]) |> Array.Parallel.iter f1
printf "done"
Important changes:
Reduced len significantly. In your code, almost all the time was spent creating the matrix.
Actually do work in the loop. In your code, it is possible that the loop was optimised to a no-op.
Run many more tasks. This tricked the scheduler into using more threads and all is good

Related

OCaml counter does not terminate without using threads

I have the following code:
let counter n =
let rec count i =
if i > n
then ()
else
print_int i;
count (i+1)
in count 0
It should simply output all numbers from 0 to n. To clarify, I know there are easier ways to achieve the same result but I want to know why it is not working in this specific case.
When I run this code with some parameter eg. counter 5 it does not terminate.
Instead when I change the last line of my code in count 0 to in Thread.create count 0 it outputs 012345
Can someone explain this behaviour?
EDIT
Also found that if you modify the code to this:
let counter n =
let rec count i =
if i > n
then ()
else
let i = i
in print_int i;
count (i+1)
in count 0
it works fine. Why is this?
Your indentation is misleading; your code does
if i > n then () else print_int i;
first and then
count (i+1)
Of course it doesn't terminate! What you want is
else begin
print_int i;
count (i+1)
end
(or else ( ... )). See e.g. "Using begin ... end" in https://ocaml.org/learn/tutorials/if_statements_loops_and_recursion.html.

Use a MailboxProcessor with reply-channel to create limited agents that return values in order

Basically, I want to change the following into a limited threading solution, because in my situation the list of calculations is too large, spawning too many threads, and I'd like to experiment and measure performance with less threads.
// the trivial approach (and largely my current situation)
let doWork() =
[1 .. 10]
|> List.map (fun i -> async {
do! Async.Sleep (100 * i) // longest thread will run 1 sec
return i * i // some complex calculation returning a certain type
})
|> Async.Parallel
|> Async.RunSynchronously // works, total wall time 1s
My new approach, this code is borrowed/inspired by this online snippet from Tomas Petricek (which I tested, it works, but I need it to return a value, not unit).
type LimitAgentMessage =
| Start of Async<int> * AsyncReplyChannel<int>
| Finished
let threadingLimitAgent limit = MailboxProcessor.Start(fun inbox -> async {
let queue = System.Collections.Generic.Queue<_>()
let count = ref 0
while true do
let! msg = inbox.Receive()
match msg with
| Start (work, reply) -> queue.Enqueue((work, reply))
| Finished -> decr count
if count.Value < limit && queue.Count > 0 then
incr count
let work, reply = queue.Dequeue()
// Start it in a thread pool (on background)
Async.Start(async {
let! x = work
do! async {reply.Reply x }
inbox.Post(Finished)
})
})
// given a synchronous list of tasks, run each task asynchronously,
// return calculated values in original order
let worker lst =
// this doesn't work as expected, it waits for each reply
let agent = threadingLimitAgent 10
lst
|> List.map(fun x ->
agent.PostAndReply(
fun replyChannel -> Start(x, replyChannel)))
Now, with this in place, the original code would become:
let doWork() =
[1 .. 10]
|> List.map (fun i -> async {
do! Async.Sleep (100 * i) // longest thread will run 1 sec
return i * i // some complex calculation returning a certain type
})
|> worker // worker is not working (correct output, runs 5.5s)
All in all, the output is correct (it does calculate and propagate back the replies), but it does not do so in the (limited set) of threads.
I've been playing around a bit, but think I'm missing the obvious (and besides, who knows, someone may like the idea of a limited-threads mailbox processor that returns its calculations in order).
The problem is the call to agent.PostAndReply. PostAndReply will block until the work has finished. Calling this inside List.map will cause the work to be executed sequentially. One solution is to use PostAndAsyncReply which does not block and also returns you an async handle for getting the result back.
let worker lst =
let agent = threadingLimitAgent 10
lst
|> List.map(fun x ->
agent.PostAndAsyncReply(
fun replyChannel -> Start(x, replyChannel)))
|> Async.Parallel
let doWork() =
[1 .. 10]
|> List.map (fun i -> async {
do! Async.Sleep (100 * i)
return i * i
})
|> worker
|> Async.RunSynchronously
That's of course only one possible solution (getting all async handles back and awaiting them in parallel).

reading integers from a string

I want to read a line from a file, initialize an array from that line and then display the integers.
Why is is not reading the five integers in the line? I want to get output 1 2 3 4 5, i have 1 1 1 1 1
open Array;;
open Scanf;;
let print_ints file_name =
let file = open_in file_name in
let s = input_line(file) in
let n = ref 5 in
let arr = Array.init !n (fun i -> if i < !n then sscanf s "%d" (fun a -> a) else 0) in
let i = ref 0 in
while !i < !n do
print_int (Array.get arr !i);
print_string " ";
i := !i + 1;
done;;
print_ints "string_ints.txt";;
My file is just: 1 2 3 4 5
You might want to try the following approach. Split your string into a list of substrings representing numbers. This answer describes one way of doing so. Then use the resulting function in your print_ints function.
let ints_of_string s =
List.map int_of_string (Str.split (Str.regexp " +") s)
let print_ints file_name =
let file = open_in file_name in
let s = input_line file in
let ints = ints_of_string s in
List.iter (fun i -> print_int i; print_char ' ') ints;
close_in file
let _ = print_ints "string_ints.txt"
When compiling, pass str.cma or str.cmxa as an argument (see this answer for details on compilation):
$ ocamlc str.cma print_ints.ml
Another alternative would be using the Scanf.bscanf function -- this question, contains an example (use with caution).
The Scanf.sscanf function may not be particularly suitable for this task.
An excerpt from the OCaml manual:
the scanf facility is not intended for heavy duty lexical analysis and parsing. If it appears not expressive enough for your needs, several alternative exists: regular expressions (module Str), stream parsers, ocamllex-generated lexers, ocamlyacc-generated parsers
There is though a way to parse a string of ints using Scanf.sscanf (which I wouldn't recommend):
let rec int_list_of_string s =
try
Scanf.sscanf s
"%d %[0-9-+ ]"
(fun n rest_str -> n :: int_list_of_string rest_str)
with
| End_of_file | Scanf.Scan_failure _ -> []
The trick here is to represent the input string s as a part which is going to be parsed into a an integer (%d) and the rest of the string using the range format: %[0-9-+ ]", which will match the rest of the string, containing only decimal digits 0-9, the - and + signs, and whitespace .

F# MailboxProcessor memory leak in try/catch block

Updated after obvious error pointed out by John Palmer in the comments.
The following code results in OutOfMemoryException:
let agent = MailboxProcessor<string>.Start(fun agent ->
let maxLength = 1000
let rec loop (state: string list) i = async {
let! msg = agent.Receive()
try
printfn "received message: %s, iteration: %i, length: %i" msg i state.Length
let newState = state |> Seq.truncate maxLength |> Seq.toList
return! loop (msg::newState) (i+1)
with
| ex ->
printfn "%A" ex
return! loop state (i+1)
}
loop [] 0
)
let greeting = "hello"
while true do
agent.Post greeting
System.Threading.Thread.Sleep(1) // avoid piling up greetings before they are output
The error is gone if I don't use try/catch block.
Increasing the sleep time only postpones the error.
Update 2: I guess the issue here is that the function stops being tail recursive as the recursive call is no longer the last one to execute. Would be nice for somebody with more F# experience to desugar it as I'm sure this is a common memory-leak situation in F# agents as the code is very simple and generic.
Solution:
It turned out to be a part of a bigger problem: the function can't be tail-recursive if the recursive call is made within try/catch block as it has to be able to unroll the stack if the exception is thrown and thus has to save call stack information.
More details here:
Tail recursion and exceptions in F#
Properly rewritten code (separate try/catch and return):
let agent = MailboxProcessor<string>.Start(fun agent ->
let maxLength = 1000
let rec loop (state: string list) i = async {
let! msg = agent.Receive()
let newState =
try
printfn "received message: %s, iteration: %i, length: %i" msg i state.Length
let truncatedState = state |> Seq.truncate maxLength |> Seq.toList
msg::truncatedState
with
| ex ->
printfn "%A" ex
state
return! loop newState (i+1)
}
loop [] 0
)
I suspect the issue is actually here:
while true do
agent.Post "hello"
All the "hello"s that you post have to be stored in memory somewhere and will be pushed much faster than the output can happen with printf
See my old post here http://vaskir.blogspot.ru/2013/02/recursion-and-trywithfinally-blocks.html
random chars in order to satisfy this site rules *
Basically anything that is done after the return (like a try/with/finally/dispose) will prevent tail calls.
See https://blogs.msdn.microsoft.com/fsharpteam/2011/07/08/tail-calls-in-f/
There is also work underway to have the compiler warn about lack of tail recursion: https://github.com/fsharp/fslang-design/issues/82

Joining on the first finished thread?

I'm writing up a series of graph-searching algorithms in F# and thought it would be nice to take advantage of parallelization. I wanted to execute several threads in parallel and take the result of the first one to finish. I've got an implementation, but it's not pretty.
Two questions: is there a standard name for this sort of function? Not a Join or a JoinAll, but a JoinFirst? Second, is there a more idiomatic way to do this?
//implementation
let makeAsync (locker:obj) (shared:'a option ref) (f:unit->'a) =
async {
let result = f()
Monitor.Enter locker
shared := Some result
Monitor.Pulse locker
Monitor.Exit locker
}
let firstFinished test work =
let result = ref Option.None
let locker = new obj()
let cancel = new CancellationTokenSource()
work |> List.map (makeAsync locker result) |> List.map (fun a-> Async.StartAsTask(a, TaskCreationOptions.None, cancel.Token)) |> ignore
Monitor.Enter locker
while (result.Value.IsNone || (not <| test result.Value.Value)) do
Monitor.Wait locker |> ignore
Monitor.Exit locker
cancel.Cancel()
match result.Value with
| Some x-> x
| None -> failwith "Don't pass in an empty list"
//end implentation
//testing
let delayReturn (ms:int) value =
fun ()->
Thread.Sleep ms
value
let test () =
let work = [ delayReturn 1000 "First!"; delayReturn 5000 "Second!" ]
let result = firstFinished (fun _->true) work
printfn "%s" result
Would it work to pass the CancellationTokenSource and test to each async and have the first that computes a valid result cancel the others?
let makeAsync (cancel:CancellationTokenSource) test f =
let rec loop() =
async {
if cancel.IsCancellationRequested then
return None
else
let result = f()
if test result then
cancel.Cancel()
return Some result
else return! loop()
}
loop()
let firstFinished test work =
match work with
| [] -> invalidArg "work" "Don't pass in an empty list"
| _ ->
let cancel = new CancellationTokenSource()
work
|> Seq.map (makeAsync cancel test)
|> Seq.toArray
|> Async.Parallel
|> Async.RunSynchronously
|> Array.pick id
This approach makes several improvements: 1) it uses only async (it's not mixed with Task, which is an alternative for doing the same thing--async is more idiomatic in F#); 2) there's no shared state, other than CancellationTokenSource, which was designed for that purpose; 3) the clean function-chaining approach makes it easy to add additional logic/transformations to the pipeline, including trivially enabling/disabling parallelism.
With the Task Parallel Library in .NET 4, this is called WaitAny. For example, the following snippet creates 10 tasks and waits for any of them to complete:
open System.Threading
Array.init 10 (fun _ ->
Tasks.Task.Factory.StartNew(fun () ->
Thread.Sleep 1000))
|> Tasks.Task.WaitAny
In case you are ok to use "Reactive extensions (Rx)" in your project, the joinFirst method can be implemented as:
let joinFirst (f : (unit->'a) list) =
let c = new CancellationTokenSource()
let o = f |> List.map (fun i ->
let j = fun() -> Async.RunSynchronously (async {return i() },-1,c.Token)
Observable.Defer(fun() -> Observable.Start(j))
)
|> Observable.Amb
let r = o.First()
c.Cancel()
r
Example usage:
[20..30] |> List.map (fun i -> fun() -> Thread.Sleep(i*100); printfn "%d" i; i)
|> joinFirst |> printfn "Done %A"
Console.Read() |> ignore
Update:
Using Mailbox processor :
type WorkMessage<'a> =
Done of 'a
| GetFirstDone of AsyncReplyChannel<'a>
let joinFirst (f : (unit->'a) list) =
let c = new CancellationTokenSource()
let m = MailboxProcessor<WorkMessage<'a>>.Start(
fun mbox -> async {
let afterDone a m =
match m with
| GetFirstDone rc ->
rc.Reply(a);
Some(async {return ()})
| _ -> None
let getDone m =
match m with
|Done a ->
c.Cancel()
Some (async {
do! mbox.Scan(afterDone a)
})
|_ -> None
do! mbox.Scan(getDone)
return ()
} )
f
|> List.iter(fun t -> try
Async.RunSynchronously (async {let out = t()
m.Post(Done out)
return ()},-1,c.Token)
with
_ -> ())
m.PostAndReply(fun rc -> GetFirstDone rc)
Unfortunately, there is no built-in operation for this provided by Async, but I'd still use F# asyncs, because they directly support cancellation. When you start a workflow using Async.Start, you can pass it a cancellation token and the workflow will automatically stop if the token is cancelled.
This means that you have to start workflows explicitly (instead of using Async.Parallel), so the synchronizataion must be written by hand. Here is a simple version of Async.Choice method that does that (at the moment, it doesn't handle exceptions):
open System.Threading
type Microsoft.FSharp.Control.Async with
/// Takes several asynchronous workflows and returns
/// the result of the first workflow that successfuly completes
static member Choice(workflows) =
Async.FromContinuations(fun (cont, _, _) ->
let cts = new CancellationTokenSource()
let completed = ref false
let lockObj = new obj()
let synchronized f = lock lockObj f
/// Called when a result is available - the function uses locks
/// to make sure that it calls the continuation only once
let completeOnce res =
let run =
synchronized(fun () ->
if completed.Value then false
else completed := true; true)
if run then cont res
/// Workflow that will be started for each argument - run the
/// operation, cancel pending workflows and then return result
let runWorkflow workflow = async {
let! res = workflow
cts.Cancel()
completeOnce res }
// Start all workflows using cancellation token
for work in workflows do
Async.Start(runWorkflow work, cts.Token) )
Once we write this operation (which is a bit complex, but has to be written only once), solving the problem is quite easy. You can write your operations as async workflows and they'll be cancelled automatically when the first one completes:
let delayReturn n s = async {
do! Async.Sleep(n)
printfn "returning %s" s
return s }
Async.Choice [ delayReturn 1000 "First!"; delayReturn 5000 "Second!" ]
|> Async.RunSynchronously
When you run this, it will print only "returning First!" because the second workflow will be cancelled.

Resources