F# MailboxProcessor limit parallelism - multithreading

I'm new to F# and trying to experiment with the MailboxProcessor to ensure that state changes are done in isolation.
In short, I am posting actions (immutable objects describing state chanage) to the MailboxProcessor, in the recursive function I read the message and generate a new state (i.e. add an item to a collection in the example below) and send that state to the next recursion.
open System
type AppliationState =
{
Store : string list
}
static member Default =
{
Store = List.empty
}
member this.HandleAction (action:obj) =
match action with
| :? string as a -> { this with Store = a :: this.Store }
| _ -> this
type Agent<'T> = MailboxProcessor<'T>
[<AbstractClass; Sealed>]
type AppHolder private () =
static member private Processor = Agent.Start(fun inbox ->
let rec loop (s : AppliationState) =
async {
let! action = inbox.Receive()
let s' = s.HandleAction action
Console.WriteLine("{s: " + s.Store.Length.ToString() + " s': " + s'.Store.Length.ToString())
return! loop s'
}
loop AppliationState.Default)
static member HandleAction (action:obj) =
AppHolder.Processor.Post action
[<EntryPoint>]
let main argv =
AppHolder.HandleAction "a"
AppHolder.HandleAction "b"
AppHolder.HandleAction "c"
AppHolder.HandleAction "d"
Console.ReadLine()
0 // return an integer exit code
Expected output is:
s: 0 s': 1
s: 1 s': 2
s: 2 s': 3
s: 3 s': 4
What I get is:
s: 0 s': 1
s: 0 s': 1
s: 0 s': 1
s: 0 s': 1
Reading the documentation for the MailboxProcessor and googling about it my conclusion is that it is a Queue of messages, processed by a 'single-thread', instead it looks like they are all processed in parallel.
Am I totally off the field here?

The issue is that you think AppHolder.Processor is going to be the same object each time, but it's actually a different MailboxProcessor each time. I changed your AppHolder code to be the following:
[<AbstractClass; Sealed>]
type AppHolder private () =
static member private Processor =
printfn "Starting..."
Agent.Start(fun inbox ->
let rec loop (s : AppliationState) =
async {
let! action = inbox.Receive()
let s' = s.HandleAction action
printfn "{s: %A s': %A}" s s'
return! loop s'
}
loop AppliationState.Default)
static member HandleAction (action:obj) =
AppHolder.Processor.Post action
The only changes I made was to simplify that Console.WriteLine call to use printfn and %A to get more debugging detail, and to add a single printfn "Starting..." call that will be executed immediately before the MailboxProcessor is built and started. And the output I got was:
Starting...
Starting...
Starting...
Starting...
{s: {Store = [];} s': {Store = ["b"];}}
{s: {Store = [];} s': {Store = ["d"];}}
{s: {Store = [];} s': {Store = ["c"];}}
{s: {Store = [];} s': {Store = ["a"];}}
Notice that the printfn "Starting..." line has been executed four times.
This catches a lot of F# newbies: the member keyword defines a property, not a field. Each time you evaluate the property, the body of that property is evaluated afresh. So each time you access AppHolder.Processor, you get a new MailboxProcessor. See https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/members/properties for more details.
What you probably wanted was the following:
[<AbstractClass; Sealed>]
type AppHolder private () =
static let processor =
printfn "Starting..."
Agent.Start(fun inbox ->
// ...
)
static member HandleAction (action:obj) =
processor.Post action

I think the issue must be in your implementation of HandleAction. I implemented the following, and it produces the expected output.
open System
type ApplicationState =
{
Items: int list
}
static member Default = {Items = []}
member this.HandleAction x = {this with Items = x::this.Items}
type Message = Add of int
let Processor = MailboxProcessor<Message>.Start(fun inbox ->
let rec loop (s : ApplicationState) =
async {
let! (Add action) = inbox.Receive()
let s' = s.HandleAction action
Console.WriteLine("s: " + s.Items.Length.ToString() + " s': " + s'.Items.Length.ToString())
return! loop s'
}
loop ApplicationState.Default)
Processor.Post (Add 1)
Processor.Post (Add 2)
Processor.Post (Add 3)
Processor.Post (Add 4)
// OUTPUT
// s: 0 s': 1
// s: 1 s': 2
// s: 2 s': 3
// s: 3 s': 4
EDIT
After seeing the updated code sample, I believe the correct F# solution would just be to switch the AppHolder type from being a class to a module. The updated code would like this:
open System
type AppliationState =
{
Store : string list
}
static member Default =
{
Store = List.empty
}
member this.HandleAction (action:obj) =
match action with
| :? string as a -> { this with Store = a :: this.Store }
| _ -> this
type Agent<'T> = MailboxProcessor<'T>
module AppHolder =
let private processor = Agent.Start(fun inbox ->
let rec loop (s : AppliationState) =
async {
let! action = inbox.Receive()
let s' = s.HandleAction action
Console.WriteLine("{s: " + s.Store.Length.ToString() + " s': " + s'.Store.Length.ToString())
return! loop s'
}
loop AppliationState.Default)
let handleAction (action:obj) =
processor.Post action
AppHolder.handleAction "a"
AppHolder.handleAction "b"
AppHolder.handleAction "c"
AppHolder.handleAction "d"
This outputs the same result as before:
{s: 0 s': 1
{s: 1 s': 2
{s: 2 s': 3
{s: 3 s': 4

Related

Ocaml - Check, given a list of transitions, the word is recognized

I created a function that returns a list of transitions. Transitions are of type int * char * int.
for example (0 'E' 1); (1 'A' 2). The valid alphabet for the transitions are A, C, G and T and the char 'E' represents epsilon.
type transition = int * char * int;;
let get_start (a,_,_) = a;;
let get_char (_,a,_) = a;;
let get_end (_,_,a) = a;;
The initial state and the final state are stored in the following variables.
...
let i_strt = ref !state_initial;;
let i_end = ref !state_end;;
exception Out_of_loop;;
let seq = read_line();;(* string to be tested *)
let len_seq = String.length seq -1;;
let lst_trs_length = List.length !aux_transitions -1;; (* aux_transitions -> all transitions*)
let i = ref 0;;
let f = ref 0;;
while !i <= len_seq do
let c_r = seq.[i]in (* c_r = 'A' seq = ACGT*)
try
while !j <= lst_trs_length do
let aux_trs = List.nth !aux_transitions !j in (* 0 'E' 1 -> 1 'A' 2 ....*)
if (get_start aux_trs) = !i_strt then (* *)
let aux_chr = get_char aux_trs in (* 'A' *)
if aux_chr = c_r then(
i_strt := get_end aux_trs; (* i_strt = 1*)
raise Out_of_loop
)
else if aux_chr = 'E' then(
i_strt := get_end aux_trs;
j := -1
);
j := !j+1
done;
with Out_of_loop ->();
i := !i +1
done;
I am trying to use these two cycles to check whether the string "seq" can be recognized or not by the list of transitions taking into account the initial state. I am having trouble writing this function ... I want a function that, given a list of transitions and a string, returns 'true' in case it is recognized or false in the negative case.

Mutate F# [<Struct>] Record

This code shows how to make a function mutate its input - one of the things we come to F# to avoid.
type Age = { mutable n : int }
let printInside a = printfn "Inside = %d" a.n
let inside a =
a.n <- a.n + 1
a.n
let a = {n = 1}
printInside a //a = 1
inside a
printInside a //a = 2
That being said, how do I do the same bad thing with [<Struct>] Records? I suspect that ref or byref may be involved but I just can't seem to get it to work.
type [<Struct>] Age = { mutable n : int }
let printInside a = printfn "Inside = %d" a.n
let inside a =
a.n <- a.n + 1
a.n
let a = {n = 1}
printInside a //a = 1
inside a
printInside a //a = 2
The fundamental issue is that a mutable field can only be modified if the struct itself is mutable. As you noted, we need to use byref in the declaration of Age. We also need to make sure a is mutable and lastly we need to use the & operator when calling the function inside. The & is the way to call a function with a byref parameter.
type [<Struct>] Age = { mutable n : int }
let printInside a = printfn "Inside = %d" a.n
let inside (a : Age byref) =
a.n <- a.n + 1
a.n
let mutable a = {n = 1}
printInside a //a = 1
inside &a
printInside a //a = 2
Now that I get the pattern, here is a simple example (just an int instead of a struct record) of how to mutate values passed into a function:
let mutable a = 1
let mutate (a : byref<_>) = a <- a + 1
mutate &a
a //a = 2

Swift REPL Unexpected Behaviour

If I use the following code in Swift repl, I get unexpected results:
1> func addA(s: String)->String {
2. return s + "a"
3. }
4.
5. func addB(s: String)->String {
6. return s + "b"
7. }
8>
9> println(addA(""))
a
10> println(addB(""))
b
11> println(addA(addB("")))
ba
12> let p = addA(addB(""))
p: ((String)) = {
_core = {
_baseAddress = 0x0000000100500060
_countAndFlags = 2
_owner = (instance_type = Builtin.RawPointer = 0x0000000100500040)
}
}
13> println(p)
ba
14> let q = "a" + "b"
q: String = "ab"
why does declaring p produce this behaviour, while declaring q does not?
All that's happening here is that the REPL is letting you look inside Swift at some of the underlying efficiencies. p is stored as some sort of function composition. When evaluation is needed, it is evaluated. If you didn't want to see the sausage being made, you should not have entered the sausage factory.

haskell and large binary

When broadcasting BinaryString through a TChan, what gets copied the whole Binary or just the reference ?
if the whole binary gets copied how to send only the reference ?
Only a reference is written to the TChan, the payload is not copied. It would be far too inefficient to copy all the data all the time, and since the data is immutable (in general, you can cheat), it is safe to only transfer references.
To be a bit more precise than Daniel (and confirm Daniels suspicion in his comment): A pointer to the constructor of the BinaryString (do you mean ByteString?) is written to the TVar.
Let us confirm by checking the relevant code. TChan is built upon TVar, and uses writeTVar to write the value, whch is implemented in GHC.Conc.Sync (and re-exported by GHC.Cont and Control.Concurrent.STM.TVar):
-- |Write the supplied value into a TVar
writeTVar :: TVar a -> a -> STM ()
writeTVar (TVar tvar#) val = STM $ \s1# ->
case writeTVar# tvar# val s1# of
s2# -> (# s2#, () #)
The argument is just passed along to the function writeTVar#, which is a primitive operation which is implemented in rts/PrimOps.cmm:
stg_writeTVarzh
{
W_ trec;
W_ tvar;
W_ new_value;
/* Args: R1 = TVar closure */
/* R2 = New value */
MAYBE_GC (R1_PTR & R2_PTR, stg_writeTVarzh); // Call to stmWriteTVar may allocate
trec = StgTSO_trec(CurrentTSO);
tvar = R1;
new_value = R2;
foreign "C" stmWriteTVar(MyCapability() "ptr", trec "ptr", tvar "ptr", new_value "ptr") [];
jump %ENTRY_CODE(Sp(0));
}
This wrap the following code in rts/STM.c:
void stmWriteTVar(Capability *cap,
StgTRecHeader *trec,
StgTVar *tvar,
StgClosure *new_value) {
StgTRecHeader *entry_in = NULL;
TRecEntry *entry = NULL;
TRACE("%p : stmWriteTVar(%p, %p)", trec, tvar, new_value);
ASSERT (trec != NO_TREC);
ASSERT (trec -> state == TREC_ACTIVE ||
trec -> state == TREC_CONDEMNED);
entry = get_entry_for(trec, tvar, &entry_in);
if (entry != NULL) {
if (entry_in == trec) {
// Entry found in our trec
entry -> new_value = new_value;
} else {
// Entry found in another trec
TRecEntry *new_entry = get_new_entry(cap, trec);
new_entry -> tvar = tvar;
new_entry -> expected_value = entry -> expected_value;
new_entry -> new_value = new_value;
}
} else {
// No entry found
StgClosure *current_value = read_current_value(trec, tvar);
TRecEntry *new_entry = get_new_entry(cap, trec);
new_entry -> tvar = tvar;
new_entry -> expected_value = current_value;
new_entry -> new_value = new_value;
}
TRACE("%p : stmWriteTVar done", trec);
}
And here we see that new_value is a pointer that is never looked at, and stored as such.

Joining on the first finished thread?

I'm writing up a series of graph-searching algorithms in F# and thought it would be nice to take advantage of parallelization. I wanted to execute several threads in parallel and take the result of the first one to finish. I've got an implementation, but it's not pretty.
Two questions: is there a standard name for this sort of function? Not a Join or a JoinAll, but a JoinFirst? Second, is there a more idiomatic way to do this?
//implementation
let makeAsync (locker:obj) (shared:'a option ref) (f:unit->'a) =
async {
let result = f()
Monitor.Enter locker
shared := Some result
Monitor.Pulse locker
Monitor.Exit locker
}
let firstFinished test work =
let result = ref Option.None
let locker = new obj()
let cancel = new CancellationTokenSource()
work |> List.map (makeAsync locker result) |> List.map (fun a-> Async.StartAsTask(a, TaskCreationOptions.None, cancel.Token)) |> ignore
Monitor.Enter locker
while (result.Value.IsNone || (not <| test result.Value.Value)) do
Monitor.Wait locker |> ignore
Monitor.Exit locker
cancel.Cancel()
match result.Value with
| Some x-> x
| None -> failwith "Don't pass in an empty list"
//end implentation
//testing
let delayReturn (ms:int) value =
fun ()->
Thread.Sleep ms
value
let test () =
let work = [ delayReturn 1000 "First!"; delayReturn 5000 "Second!" ]
let result = firstFinished (fun _->true) work
printfn "%s" result
Would it work to pass the CancellationTokenSource and test to each async and have the first that computes a valid result cancel the others?
let makeAsync (cancel:CancellationTokenSource) test f =
let rec loop() =
async {
if cancel.IsCancellationRequested then
return None
else
let result = f()
if test result then
cancel.Cancel()
return Some result
else return! loop()
}
loop()
let firstFinished test work =
match work with
| [] -> invalidArg "work" "Don't pass in an empty list"
| _ ->
let cancel = new CancellationTokenSource()
work
|> Seq.map (makeAsync cancel test)
|> Seq.toArray
|> Async.Parallel
|> Async.RunSynchronously
|> Array.pick id
This approach makes several improvements: 1) it uses only async (it's not mixed with Task, which is an alternative for doing the same thing--async is more idiomatic in F#); 2) there's no shared state, other than CancellationTokenSource, which was designed for that purpose; 3) the clean function-chaining approach makes it easy to add additional logic/transformations to the pipeline, including trivially enabling/disabling parallelism.
With the Task Parallel Library in .NET 4, this is called WaitAny. For example, the following snippet creates 10 tasks and waits for any of them to complete:
open System.Threading
Array.init 10 (fun _ ->
Tasks.Task.Factory.StartNew(fun () ->
Thread.Sleep 1000))
|> Tasks.Task.WaitAny
In case you are ok to use "Reactive extensions (Rx)" in your project, the joinFirst method can be implemented as:
let joinFirst (f : (unit->'a) list) =
let c = new CancellationTokenSource()
let o = f |> List.map (fun i ->
let j = fun() -> Async.RunSynchronously (async {return i() },-1,c.Token)
Observable.Defer(fun() -> Observable.Start(j))
)
|> Observable.Amb
let r = o.First()
c.Cancel()
r
Example usage:
[20..30] |> List.map (fun i -> fun() -> Thread.Sleep(i*100); printfn "%d" i; i)
|> joinFirst |> printfn "Done %A"
Console.Read() |> ignore
Update:
Using Mailbox processor :
type WorkMessage<'a> =
Done of 'a
| GetFirstDone of AsyncReplyChannel<'a>
let joinFirst (f : (unit->'a) list) =
let c = new CancellationTokenSource()
let m = MailboxProcessor<WorkMessage<'a>>.Start(
fun mbox -> async {
let afterDone a m =
match m with
| GetFirstDone rc ->
rc.Reply(a);
Some(async {return ()})
| _ -> None
let getDone m =
match m with
|Done a ->
c.Cancel()
Some (async {
do! mbox.Scan(afterDone a)
})
|_ -> None
do! mbox.Scan(getDone)
return ()
} )
f
|> List.iter(fun t -> try
Async.RunSynchronously (async {let out = t()
m.Post(Done out)
return ()},-1,c.Token)
with
_ -> ())
m.PostAndReply(fun rc -> GetFirstDone rc)
Unfortunately, there is no built-in operation for this provided by Async, but I'd still use F# asyncs, because they directly support cancellation. When you start a workflow using Async.Start, you can pass it a cancellation token and the workflow will automatically stop if the token is cancelled.
This means that you have to start workflows explicitly (instead of using Async.Parallel), so the synchronizataion must be written by hand. Here is a simple version of Async.Choice method that does that (at the moment, it doesn't handle exceptions):
open System.Threading
type Microsoft.FSharp.Control.Async with
/// Takes several asynchronous workflows and returns
/// the result of the first workflow that successfuly completes
static member Choice(workflows) =
Async.FromContinuations(fun (cont, _, _) ->
let cts = new CancellationTokenSource()
let completed = ref false
let lockObj = new obj()
let synchronized f = lock lockObj f
/// Called when a result is available - the function uses locks
/// to make sure that it calls the continuation only once
let completeOnce res =
let run =
synchronized(fun () ->
if completed.Value then false
else completed := true; true)
if run then cont res
/// Workflow that will be started for each argument - run the
/// operation, cancel pending workflows and then return result
let runWorkflow workflow = async {
let! res = workflow
cts.Cancel()
completeOnce res }
// Start all workflows using cancellation token
for work in workflows do
Async.Start(runWorkflow work, cts.Token) )
Once we write this operation (which is a bit complex, but has to be written only once), solving the problem is quite easy. You can write your operations as async workflows and they'll be cancelled automatically when the first one completes:
let delayReturn n s = async {
do! Async.Sleep(n)
printfn "returning %s" s
return s }
Async.Choice [ delayReturn 1000 "First!"; delayReturn 5000 "Second!" ]
|> Async.RunSynchronously
When you run this, it will print only "returning First!" because the second workflow will be cancelled.

Resources