F# Async Task Cancellation without Token - multithreading

I am trying to parse hundreds of C source files to map dozens of software signal variables to the names of physical hardware pins. I am trying to do this asynchronously in F#
IndirectMappedHWIO
|> Seq.map IndirectMapFromFile //this is the function with the regex in it
|> Async.Parallel
|> Async.RunSynchronously
The issue is that I cannot figure out how to pass in a CancellationToken to end my task. Each task is reading around 300 C files so I want to be able to stop the task's execution as soon as the regex matches. I tried using Thread.CurrentThread.Abort() but that does not seem to work. Any ideas on how to pass in a CancellationToken for each task? Or any other way to cancel a task based on a condition?
let IndirectMapFromFile pin =
async {
let innerFunc filename =
use streamReader = new StreamReader (filePath + filename)
while not streamReader.EndOfStream do
try
let line1 = streamReader.ReadLine()
streamReader.ReadLine() |> ignore
let line2 = streamReader.ReadLine()
if(obj.ReferenceEquals(line2, null)) then
Thread.CurrentThread.Abort() //DOES NOT WORK!!
else
let m1 = Regex.Match(line1, #"^.*((Get|Put)\w+).*$");
let m2 = Regex.Match(line2, #"\s*return\s*\((\s*" + pin.Name + "\s*)\);");
if (m1.Success && m2.Success) then
pin.VariableName <- m1.Groups.[1].Value
Thread.CurrentThread.Abort() //DOES NOT WORK!!
else
()
with
| ex -> ()
()
Directory.GetFiles(filePath, "Rte*") //all C source and header files that start with Rte
|> Array.iter innerFunc
}

Asyncs cancel on designated operations, such as on return!, let!, or do!; they don't just kill the thread in any unknown state, which is not generally safe. If you want your asyncs to stop, they could for example:
be recursive and iterate via return!. The caller would provide a CancellationToken to Async.RunSynchronously and catch the resulting OperationCanceledException when the job is done.
check some thread-safe state and decide to stop depending on it.
Note that those are effectively the same thing: the workers who iterate over the data check what is going on and cancel in an orderly fashion. In other words, it is clear when exactly they check for cancellation.
Using async cancellation might result in something like this:
let canceler = new System.Threading.CancellationTokenSource()
let rec worker myParameters =
async {
// do stuff
if amIDone() then canceler.Cancel()
else return! worker (...) }
let workers = (...) |> Async.Parallel
try Async.RunSynchronously(workers, -1, canceler.Token) |> ignore
with :? System.OperationCanceledException -> ()
Stopping from common state could look like this:
let keepGoing = ref true
let rec worker myParameters =
if !keepGoing then
// do stuff
if amIDone() then keepGoing := false
worker (...)
let makeWorker initParams = async { worker initParams }
// make workers
workers |> Async.Parallel |> Async.RunSynchronously |> ignore
If the exact timing of cancellation is relevant, I believe the second method may not be safe, as there may be a delay until other threads see the variable change. This doesn't seem to matter here though.

Related

tokio::join and different results

First I wrote the first implementation, then tried to simplify it to the second one, but surprisingly the second one is almost 3x slower. Why?
First implementation (faster):
let data: Vec<Arc<Data>> = vec![d1,d2,d3];
let mut handles = Vec::new();
for d in &data {
let d = d.clone();
handles.push(tokio::spawn(async move {
d.update().await;
}));
}
for handle in handles {
let _ = tokio::join!(handle);
}
Second implementation (slower):
let data: Vec<Arc<Data>> = vec![d1,d2,d3];
for d in &data {
let d = d.clone();
let _ = tokio::join!(tokio::spawn(async move {
d.update().await;
}));
}
In the first example you spawn all your tasks onto the executor, allowing them to run in parallel, and then you join all of them in sequence. In the second example you spawn each task onto the executor in sequence, but you wait for that task to finish before spawning the next one, meaning you get zero parallelism, and thus no speedup. Again, the important observation to make is that in the first example all of your tasks are making progress in the background even though you're waiting for them to finish one by one. Also something like join_all would probably be more appropriate for waiting on the tasks in the first example.

Swift: How to interrupt a sort routine

I want to give users the ability to stop a sorting routine if it takes too long.
I tried to use DispatchWorkItem.cancel. However, this does not actually stop processes that have already started.
let myArray = [String]() // potentially 230M elements!
...
workItem = DispatchWorkItem {
let result = myArray.sorted()
DispatchQueue.main.async {
print ("done")
}
}
// if process is too long, user clicks "Cancel" =>
workItem.cancel() // does not stop sort
How can I kill the workItem's thread?
I don't have access to the sorted routine, therefore I cannot insert tests to check if current thread is in "cancelled" status...
As you have deduced, one simply cannot kill a work item without periodically checking isCancelled and manually performing an early exit if it is set.
There are two options:
You can used sorted(by:), test for isCancelled there, throwing an error if it is canceled. That achieves the desired early exit.
That might look like:
func cancellableSort() -> DispatchWorkItem {
var item: DispatchWorkItem!
item = DispatchWorkItem() {
let unsortedArray = (0..<10_000_000).shuffled()
let sortedArray = try? unsortedArray.sorted { (obj1, obj2) -> Bool in
if item.isCancelled {
throw SortError.cancelled
}
return obj1 < obj2
}
// do something with your sorted array
item = nil
}
DispatchQueue.global().async(execute: item)
return item
}
Where
enum SortError: Error {
case cancelled
}
Be forewarned that even in release builds, this can have a dramatic impact on performance. So you might want to benchmark this.
You could just write your own sort routine, inserting your own test of isCancelled inside the algorithm. This gives you more control over where precisely you perform the test (i.e., you might not want to do it for every comparison, but rather at some higher level loop within the algorithm, thereby minimizing the performance impact). And given the number of records, this gives you a chance to pick an algorithm best suited for your data set.
Obviously, when benchmarking these alternatives, make sure you test optimized/release builds so that your results are not skewed by your build settings.
As an aside, you might considering using Operation, too, as its handling of cancelation is more elegant, IMHO. Plus you can have a dedicated object for the sort operation, which is cleaner.

F# parallel design pattern

I am trying to run some same task in parallel in a F# Console project.
The task is as follow
Dequeue a table name from a ConcurrentQueue(this queue contains the table names my program need to process)
Open a SqlDataReader for the table
Write each row in the SqlDataReader to a StreamWriter
Zip the file created by the StreamWriter
Repeat 1 - 4
So basically each task is a while loop (pose as a recursion) to contiunously process tables. And I would like to start 4 tasks in parallel, and another thing is I would like to stop execution with a user keystroke on the Console, say Enter key. But the execution should only be stopped if the current task has complete step 4.
I have tried the following
let rec DownloadHelper (cq:ConcurrentQueue<Table>) sqlConn =
let success, tb = cq.TryDequeue()
if success then
printfn "Processing %s %s" tb.DBName tb.TBName
Table2CSV tb.DBName tb.TBName sqlConn
DownloadHelper cq sqlConn
let DownloadTable (cq:ConcurrentQueue<Table>) connectionString=
use con = new SqlConnection(connectionString)
con.Open()
DownloadHelper cq con
let asyncDownloadTask = async { return DownloadTable cq connectionString}
let asyncMultiDownload =
asyncDownloadTask
|> List.replicate 4
|> Async.Parallel
asyncMultiDownload
|>Async.RunSynchronously
|>ignore
There are two problems with the above code,
It blockes the main thread, thus I dont know how to do the keystroke part
I am not sure how to stop execution gracefully.
My second try is as below to use CancellationToken,
let tokenSource = new CancellationTokenSource()
let cq = PrepareJobs connectionString
let asyncDownloadTask = async { DownloadTable cq connectionString}
let task = async {
asyncDownloadTask
|> List.replicate 4
|> Async.Parallel
|>ignore}
let val1 = Async.Start(task, cancellationToken =tokenSource.Token)
Console.ReadLine() |> ignore
tokenSource.Cancel()
Console.ReadLine() |> ignore
0
But it seems I am not even able to start the task at all.
There are three problems with your code.
First, the DownloadHelper should do one table only.
By making it recursive, you are taking too much control and
inhibiting parallelism.
Second, simply placing an operation in an async expression does not
magically make it async. Unless the DownloadTable function itself
is async, the code will block until it is finished.
So when you run four downloads in parallel, once started, they will
all run to completion, regardless of the cancellation token.
Thirdly, in your second example you use Async.Parallel
but then throw the output away, which is why your task does nothing!
I think what you wanted to do was throw away the result of the async, not the async itself.
Here's my version of your code, to demonstrate these points.
First, a dummy function that takes up time:
let longAtomicOperation milliSecs =
let sw = System.Diagnostics.Stopwatch()
let r = System.Random()
let mutable total = 0.0
sw.Start()
while sw.ElapsedMilliseconds < int64 milliSecs do
total <- total + sin (r.NextDouble())
// return something
total
// test
#time
longAtomicOperation 2000
#time
// Real: 00:00:02.000, CPU: 00:00:02.000, GC gen0: 0, gen1: 0, gen2: 0
Note that this function is not async -- once started it will run to completion.
Now let's put it an an async:
let asyncTask id = async {
// note that NONE of the operations are async
printfn "Started %i" id
let result = longAtomicOperation 5000 // 5 seconds
printfn "Finished %i" id
return result
}
None of the operations in the async block are async, so we are not getting
any benefit.
Here's the code to create four tasks in parallel:
let fourParallelTasks = async {
let! results =
List.init 4 asyncTask
|> Async.Parallel
// ignore
return ()
}
The result of the Async.Parallel is not ignored, but is assigned to a value,
which forces the tasks to be run. The async expression as a whole returns unit though.
If we test it:
open System.Threading
// start the task
let tokenSource = new CancellationTokenSource()
do Async.Start(fourParallelTasks, cancellationToken = tokenSource.Token)
// wait for a keystroke
System.Console.WriteLine("press a key to cancel")
System.Console.ReadLine() |> ignore
tokenSource.Cancel()
System.Console.ReadLine() |> ignore
We get output that looks like this, even if a key is pressed,
because once started, each task will run to completion:
press a key to cancel
Started 3
Started 1
Started 2
Started 0
Finished 1
Finished 3
Finished 2
Finished 0
On the other hand, if we create a serial version, like this:
let fourSerialTasks = async {
let! result1 = asyncTask 1
let! result2 = asyncTask 2
let! result3 = asyncTask 3
let! result4 = asyncTask 4
// ignore
return ()
}
Then, even though the tasks are atomic, the cancellation token is tested between
each step, which allows cancellation of the subsequence tasks.
// start the task
let tokenSource = new CancellationTokenSource()
do Async.Start(fourSerialTasks, cancellationToken = tokenSource.Token)
// wait for a keystroke
System.Console.WriteLine("press a key to cancel")
System.Console.ReadLine() |> ignore
tokenSource.Cancel()
System.Console.ReadLine() |> ignore
The above code can be cancelled between each step when a key is pressed.
To process all elements of the queue this way in batches of four,
just convert the parallel version into a loop:
let rec processQueueAsync() = async {
let! result = processFourElementsAsync()
if result <> QueueEmpty then
do! processQueueAsync()
// ignore
return ()
}
Finally, to me, using async is not about running things in parallel so much
as it is to write non-blocking code. So if your library code is blocking, the async
approach is not going to provide too much benefit.
To ensure your code is non-blocking, you need to using the async versions of SqlDataReader methods
in your helper, such as NextResultAsync.

Access to modified closure - ref int

int count = itemsToValidate.Count;
foreach(var item in itemsToValidate)
{
item.ValidateAsync += (x, y) => this.HandleValidate(ref count);
}
private void HandleValidate(ref int x)
{
--x;
if (x == 0)
{
// All items are validated.
}
}
For the above code resharper complained "Access to Modified Closure". Doesn't do that if I change that to type of object. Why is this a closure, even though I am passing by ref ?
This happens all the time
ReSharper is warning you that count is implicitly captured by the lambdas that you are assigning as "validation complete" event handlers, and that its value may well change between the time the lambda is created (i.e. when you assign the event handler) and the time when it is invoked. If this happens, the lambda will not see the value one would intuitively expect.
An example:
int count = itemsToValidate.Count;
foreach(var item in itemsToValidate)
{
item.ValidateAsync += (x, y) => this.HandleValidate(ref count);
}
// afterwards, at some point before the handlers get invoked:
count = 0;
In this instance the handlers will read the value of count as 0 instead of itemsToValidate.Count -- which might be called "obvious", but is surprising and counter-intuitive to many developers not familiar with the mechanics of lambdas.
And we usually solve it like this
The usual solution to "shut R# up" is to move the captured variable in an inner scope, where it is much less accessible and R# can be prove that it cannot be modified until the lambda is evaluated:
int count = itemsToValidate.Count;
foreach(var item in itemsToValidate)
{
int inner = count; // this makes inner impossible to modify
item.ValidateAsync += (x, y) => this.HandleValidate(ref inner);
}
// now this will of course not affect what the lambdas do
count = 0;
But your case is special
Your particular case is a comparatively rare one where you specifically want this behavior, and using the above trick would actually make the program behave incorrectly (you need the captured references to point to the same count).
The correct solution: disable this warning using the special line comments that R# recognizes.

What are Lua coroutines even for? Why doesn't this code work as I expect it?

I'm having trouble understanding this code... I was expecting something similar to threading where I would get an output with random "nooo" and "yaaaay"s interspersed with each other as they both do the printing asynchronously, but rather I discovered that the main thread seems to block on the first calling of coroutine.resume() and thus prevents the next from being started until the first has yielded.
If this is the intended operation coroutines, what are they useful for, and how would I achieve the goal I was hoping for? Would I have to implement my own scheduler for these coroutines to operate asynchronously?, because that seems messy, and I may as well use functions!
co1 = coroutine.create(function ()
local i = 1
while i < 200 do
print("nooo")
i = i + 1
end
coroutine.yield()
end)
co2 = coroutine.create(function ()
local i = 1
while i < 200 do
print("yaaaay")
i = i + 1
end
coroutine.yield()
end)
coroutine.resume(co1)
coroutine.resume(co2)
Coroutines aren't threads.
Coroutines are like threads that are never actively scheduled. So yes you are kinda correct that you would have to write you own scheduler to have both coroutines run simultaneously.
However you are missing the bigger picture when it comes to coroutines. Check out wikipedia's list of coroutine uses. Here is one concrete example that might guide you in the right direction.
-- level script
-- a volcano erupts every 2 minutes
function level_with_volcano( interface )
while true do
wait(seconds(5))
start_eruption_volcano()
wait(frames(10))
s = play("rumble_sound")
wait( end_of(s) )
start_camera_shake()
-- more stuff
wait(minutes(2))
end
end
The above script could be written to run iteratively with a switch statement and some clever state variables. But it is much more clear when written as a coroutine. The above script could be a thread but do you really need to dedicate a kernel thread to this simple code. A busy game level could have 100's of these coroutines running without impacting performance. However if each of these were a thread you might get away with 20-30 before performance started to suffer.
A coroutine is meant to allow me to write code that stores state on the stack so that I can stop running it for a while (the wait functions) and start it again where I left off.
Since there have been a number of comments asking how to implement the wait function that would make deft_code's example work, I've decided to write a possible implementation. The general idea is that we have a scheduler with a list of coroutines, and the scheduler decides when to return control to the coroutines after they give up control with their wait calls. This is desirable because it makes asynchronous code be readable and easy to reason about.
This is only one possible use of coroutines, they are a more general abstraction tool that can be used for many different purposes (such as writing iterators and generators, writing stateful stream processing objects (for example, multiple stages in a parser), implementing exceptions and continuations, etc.).
First: the scheduler definition:
local function make_scheduler()
local script_container = {}
return {
continue_script = function(frame, script_thread)
if script_container[frame] == nil then
script_container[frame] = {}
end
table.insert(script_container[frame],script_thread)
end,
run = function(frame_number, game_control)
if script_container[frame_number] ~= nil then
local i = 1
--recheck length every time, to allow coroutine to resume on
--the same frame
local scripts = script_container[frame_number]
while i <= #scripts do
local success, msg =
coroutine.resume(scripts[i], game_control)
if not success then error(msg) end
i = i + 1
end
end
end
}
end
Now, initialising the world:
local fps = 60
local frame_number = 1
local scheduler = make_scheduler()
scheduler.continue_script(frame_number, coroutine.create(function(game_control)
while true do
--instead of passing game_control as a parameter, we could
--have equivalently put these values in _ENV.
game_control.wait(game_control.seconds(5))
game_control.start_eruption_volcano()
game_control.wait(game_control.frames(10))
s = game_control.play("rumble_sound")
game_control.wait( game_control.end_of(s) )
game_control.start_camera_shake()
-- more stuff
game_control.wait(game_control.minutes(2))
end
end))
The (dummy) interface to the game:
local game_control = {
seconds = function(num)
return math.floor(num*fps)
end,
minutes = function(num)
return math.floor(num*fps*60)
end,
frames = function(num) return num end,
end_of = function(sound)
return sound.start+sound.duration-frame_number
end,
wait = function(frames_to_wait_for)
scheduler.continue_script(
frame_number+math.floor(frames_to_wait_for),
coroutine.running())
coroutine.yield()
end,
start_eruption_volcano = function()
--obviously in a real game, this could
--affect some datastructure in a non-immediate way
print(frame_number..": The volcano is erupting, BOOM!")
end,
start_camera_shake = function()
print(frame_number..": SHAKY!")
end,
play = function(soundname)
print(frame_number..": Playing: "..soundname)
return {name = soundname, start = frame_number, duration = 30}
end
}
And the game loop:
while true do
scheduler.run(frame_number,game_control)
frame_number = frame_number+1
end
co1 = coroutine.create(
function()
for i = 1, 100 do
print("co1_"..i)
coroutine.yield(co2)
end
end
)
co2 = coroutine.create(
function()
for i = 1, 100 do
print("co2_"..i)
coroutine.yield(co1)
end
end
)
for i = 1, 100 do
coroutine.resume(co1)
coroutine.resume(co2)
end

Resources