I am trying to run some same task in parallel in a F# Console project.
The task is as follow
Dequeue a table name from a ConcurrentQueue(this queue contains the table names my program need to process)
Open a SqlDataReader for the table
Write each row in the SqlDataReader to a StreamWriter
Zip the file created by the StreamWriter
Repeat 1 - 4
So basically each task is a while loop (pose as a recursion) to contiunously process tables. And I would like to start 4 tasks in parallel, and another thing is I would like to stop execution with a user keystroke on the Console, say Enter key. But the execution should only be stopped if the current task has complete step 4.
I have tried the following
let rec DownloadHelper (cq:ConcurrentQueue<Table>) sqlConn =
let success, tb = cq.TryDequeue()
if success then
printfn "Processing %s %s" tb.DBName tb.TBName
Table2CSV tb.DBName tb.TBName sqlConn
DownloadHelper cq sqlConn
let DownloadTable (cq:ConcurrentQueue<Table>) connectionString=
use con = new SqlConnection(connectionString)
con.Open()
DownloadHelper cq con
let asyncDownloadTask = async { return DownloadTable cq connectionString}
let asyncMultiDownload =
asyncDownloadTask
|> List.replicate 4
|> Async.Parallel
asyncMultiDownload
|>Async.RunSynchronously
|>ignore
There are two problems with the above code,
It blockes the main thread, thus I dont know how to do the keystroke part
I am not sure how to stop execution gracefully.
My second try is as below to use CancellationToken,
let tokenSource = new CancellationTokenSource()
let cq = PrepareJobs connectionString
let asyncDownloadTask = async { DownloadTable cq connectionString}
let task = async {
asyncDownloadTask
|> List.replicate 4
|> Async.Parallel
|>ignore}
let val1 = Async.Start(task, cancellationToken =tokenSource.Token)
Console.ReadLine() |> ignore
tokenSource.Cancel()
Console.ReadLine() |> ignore
0
But it seems I am not even able to start the task at all.
There are three problems with your code.
First, the DownloadHelper should do one table only.
By making it recursive, you are taking too much control and
inhibiting parallelism.
Second, simply placing an operation in an async expression does not
magically make it async. Unless the DownloadTable function itself
is async, the code will block until it is finished.
So when you run four downloads in parallel, once started, they will
all run to completion, regardless of the cancellation token.
Thirdly, in your second example you use Async.Parallel
but then throw the output away, which is why your task does nothing!
I think what you wanted to do was throw away the result of the async, not the async itself.
Here's my version of your code, to demonstrate these points.
First, a dummy function that takes up time:
let longAtomicOperation milliSecs =
let sw = System.Diagnostics.Stopwatch()
let r = System.Random()
let mutable total = 0.0
sw.Start()
while sw.ElapsedMilliseconds < int64 milliSecs do
total <- total + sin (r.NextDouble())
// return something
total
// test
#time
longAtomicOperation 2000
#time
// Real: 00:00:02.000, CPU: 00:00:02.000, GC gen0: 0, gen1: 0, gen2: 0
Note that this function is not async -- once started it will run to completion.
Now let's put it an an async:
let asyncTask id = async {
// note that NONE of the operations are async
printfn "Started %i" id
let result = longAtomicOperation 5000 // 5 seconds
printfn "Finished %i" id
return result
}
None of the operations in the async block are async, so we are not getting
any benefit.
Here's the code to create four tasks in parallel:
let fourParallelTasks = async {
let! results =
List.init 4 asyncTask
|> Async.Parallel
// ignore
return ()
}
The result of the Async.Parallel is not ignored, but is assigned to a value,
which forces the tasks to be run. The async expression as a whole returns unit though.
If we test it:
open System.Threading
// start the task
let tokenSource = new CancellationTokenSource()
do Async.Start(fourParallelTasks, cancellationToken = tokenSource.Token)
// wait for a keystroke
System.Console.WriteLine("press a key to cancel")
System.Console.ReadLine() |> ignore
tokenSource.Cancel()
System.Console.ReadLine() |> ignore
We get output that looks like this, even if a key is pressed,
because once started, each task will run to completion:
press a key to cancel
Started 3
Started 1
Started 2
Started 0
Finished 1
Finished 3
Finished 2
Finished 0
On the other hand, if we create a serial version, like this:
let fourSerialTasks = async {
let! result1 = asyncTask 1
let! result2 = asyncTask 2
let! result3 = asyncTask 3
let! result4 = asyncTask 4
// ignore
return ()
}
Then, even though the tasks are atomic, the cancellation token is tested between
each step, which allows cancellation of the subsequence tasks.
// start the task
let tokenSource = new CancellationTokenSource()
do Async.Start(fourSerialTasks, cancellationToken = tokenSource.Token)
// wait for a keystroke
System.Console.WriteLine("press a key to cancel")
System.Console.ReadLine() |> ignore
tokenSource.Cancel()
System.Console.ReadLine() |> ignore
The above code can be cancelled between each step when a key is pressed.
To process all elements of the queue this way in batches of four,
just convert the parallel version into a loop:
let rec processQueueAsync() = async {
let! result = processFourElementsAsync()
if result <> QueueEmpty then
do! processQueueAsync()
// ignore
return ()
}
Finally, to me, using async is not about running things in parallel so much
as it is to write non-blocking code. So if your library code is blocking, the async
approach is not going to provide too much benefit.
To ensure your code is non-blocking, you need to using the async versions of SqlDataReader methods
in your helper, such as NextResultAsync.
Related
First I wrote the first implementation, then tried to simplify it to the second one, but surprisingly the second one is almost 3x slower. Why?
First implementation (faster):
let data: Vec<Arc<Data>> = vec![d1,d2,d3];
let mut handles = Vec::new();
for d in &data {
let d = d.clone();
handles.push(tokio::spawn(async move {
d.update().await;
}));
}
for handle in handles {
let _ = tokio::join!(handle);
}
Second implementation (slower):
let data: Vec<Arc<Data>> = vec![d1,d2,d3];
for d in &data {
let d = d.clone();
let _ = tokio::join!(tokio::spawn(async move {
d.update().await;
}));
}
In the first example you spawn all your tasks onto the executor, allowing them to run in parallel, and then you join all of them in sequence. In the second example you spawn each task onto the executor in sequence, but you wait for that task to finish before spawning the next one, meaning you get zero parallelism, and thus no speedup. Again, the important observation to make is that in the first example all of your tasks are making progress in the background even though you're waiting for them to finish one by one. Also something like join_all would probably be more appropriate for waiting on the tasks in the first example.
I've been using multiple threads for a long time, yet can not explain such a simple case.
import java.util.concurrent.Executors
import scala.concurrent._
implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(1))
def addOne(x: Int) = Future(x + 1)
def addTwo(x: Int) = Future {addOne(x + 1)}
addTwo(1)
// res5: Future[Future[Int]] = Future(Success(Future(Success(3))))
To my surprise, it works. And I don't know why.
Question:
Why given one thread can it execute two Futures at the same time?
My expectation:
The first Future (addTwo) is occupying the one and only thread (newFixedThreadPool(1)), then it calls another Future (addOne) which again needs another thread.
So the program should end up starved for threads and get stuck.
The reason that your code is working, is that both futures will be executed by the same thread. The ExecutionContext that you are creating will not use a Thread directly for each Future but will instead schedule tasks (Runnable instances) to be executed. In case no more threads are available in the pool these tasks will be put into a BlockingQueue waiting to be executed. (See ThreadPoolExecutor API for details)
If you look at the implementation of Executors.newFixedThreadPool(1) you'll see that creates an Executor with an unbounded queue:
new ThreadPoolExecutor(1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue[Runnable])
To get the effect of thread-starvation that you were looking for, you could create an executor with a limited queue yourself:
implicit val ec = ExecutionContext.fromExecutor(new ThreadPoolExecutor(1, 1, 0L,
TimeUnit.MILLISECONDS, new ArrayBlockingQueue[Runnable](1)))
Since the minimal capacity of ArrayBlockingQueue is 1 you would need three futures to reach the limit, and you would also need to add some code to be executed on the result of the future, to keep them from completing (in the example below I do this by adding .map(identity))
The following example
import scala.concurrent._
implicit val ec = ExecutionContext.fromExecutor(new ThreadPoolExecutor(1, 1, 0L,
TimeUnit.MILLISECONDS, new ArrayBlockingQueue[Runnable](1)))
def addOne(x: Int) = Future {
x + 1
}
def addTwo(x: Int) = Future {
addOne(x + 1) .map(identity)
}
def addThree(x: Int) = Future {
addTwo(x + 1).map(identity)
}
println(addThree(1))
fails with
java.util.concurrent.RejectedExecutionException: Task scala.concurrent.impl.CallbackRunnable#65a264b6 rejected from java.util.concurrent.ThreadPoolExecutor#10d078f4[Running, pool size = 1, active threads = 1, queued tasks = 1, completed tasks = 1]
expand it to Promise is easily to undunderstand
val p1 = Promise[Future[Int]]
ec.execute(() => {
// the fist task is start run
val p2 = Promise[Int]
//the second task is submit , but no run
ec.execute(() => {
p2.complete(Success(1))
println(s"task 2 -> p1:${p1},p2:${p2}")
})
//here the p1 is completed, not wait p2.future finish
p1.complete(Success(p2.future))
println(s"task 1 -> p1:${p1},p2:${p2}")// you can see the p1 is completed but the p2 have not
//first task is finish, will run second task
})
val result: Future[Future[Int]] = p1.future
Thread.sleep(1000)
println(result)
Note: The problem that I solve has only educational purpose, I know that abstraction that I want to create is error prone and so on... I don't need fast solution, I need explanation.
In the book I am reading there is exercise that says that I need to implement SyncVar which has the following interface:
class SyncVar[T] {
def get(): T = ???
def put(x: T): Unit = ???
}
My comment: Alright seems understandable, need some sync variable that I can put or get.
A SyncVar object is used to exchange values between two or more threads.
When created, the SyncVar object is empty:
° Calling get throws an exception
° Calling put adds a value to the SyncVar object
After a value is added to a SyncVar object, we can say that it is non-empty:
° Calling get returns the current value, and changes the state to empty
° Calling put throws an exception
My thoughts: This is variable that throws exception on empty value when calling get, or put when we have a value, when we call get it clears previous value. Seems like I need to use Option.
So I provide the following implementation:
class SyncVar[T] {
var value: Option[T] = None
def get(): T = value match {
case Some(t) => this.synchronized {
value = None
t
}
case None => throw new IllegalArgumentException("error get")
}
def put(x: T): Unit = this.synchronized{
value match {
case Some(t) => throw new IllegalArgumentException("error put")
case None => value = Some(x)
}
}
def isEmpty = value.isEmpty
def nonEmpty = value.nonEmpty
}
My comment:
Synchronously invoking put and get, also have isEmpty and nonEmpty
The next task makes me confused:
The SyncVar object from the previous exercise can be cumbersome to use,
due to exceptions when the SyncVar object is in an invalid state. Implement
a pair of methods isEmpty and nonEmpty on the SyncVar object. Then,
implement a producer thread that transfers a range of numbers 0 until 15
to the consumer thread that prints them.
As I understand I need two threads:
//producer thread that produces numbers from 1 to 15
val producerThread = thread{
for (i <- 0 until 15){
println(s"$i")
if (syncVar.isEmpty) {
println(s"put $i")
syncVar.put(i)
}
}
}
//consumer that prints value from 0 to 15
val consumerThread = thread{
while (true) {
if (syncVar.nonEmpty) println(s"get ${syncVar.get()}")
}
}
Question:
But this code caused by nondeterminism, so it has different result each time, while I need to print numbers from 1 to 15 (in right order). Could you explain me what is wrong with my solution?
First, your synchronized in get is too narrow. It should surround the entire method, like in put (can you think why?).
After fixing, consider this scenario:
producerThread puts 0 into syncVar.
producerThread continues to run and tries to put 1. syncVar.isEmpty returns false so it doesn't put 1. It continues to loop with next i instead.
consumerThread gets 0.
producerThread puts 2.
Etc. So consumerThread can never get and print 1, because producerThread never puts it there.
Think what producerThread should do if syncVar is not empty and what consumerThread should do if it is.
Thanks to #Alexey Romanov, finally I implement transfer method:
Explanation:
The idea is, that producer thread checks is syncVar is empty, if it is it puts it, otherwise it waits with while(syncVar.nonEmpty){} (using busy waiting, which is bad practice, but it is important to know about it in educational purpose) and when we leaving the loop(stop busy waiting) we putting variable and leaving for loop for i == 0. Meanwhile consumer thread busy waiting forever, and reads variable when it is nonEmpty.
Solution:
def transfer() = {
val syncVar = new SyncVar[Int]
val producerThread = thread{
log("producer thread started")
for (i <- 0 until 15){
if (syncVar.isEmpty) {
syncVar.put(i)
} else {
while (syncVar.nonEmpty) {
log("busy wating")
}
if (syncVar.isEmpty) {
syncVar.put(i)
}
}
}
}
val consumerThread = thread{
log("consumer thread started")
while (true) {
if (syncVar.nonEmpty) {
syncVar.get()
}
}
}
}
I am trying to parse hundreds of C source files to map dozens of software signal variables to the names of physical hardware pins. I am trying to do this asynchronously in F#
IndirectMappedHWIO
|> Seq.map IndirectMapFromFile //this is the function with the regex in it
|> Async.Parallel
|> Async.RunSynchronously
The issue is that I cannot figure out how to pass in a CancellationToken to end my task. Each task is reading around 300 C files so I want to be able to stop the task's execution as soon as the regex matches. I tried using Thread.CurrentThread.Abort() but that does not seem to work. Any ideas on how to pass in a CancellationToken for each task? Or any other way to cancel a task based on a condition?
let IndirectMapFromFile pin =
async {
let innerFunc filename =
use streamReader = new StreamReader (filePath + filename)
while not streamReader.EndOfStream do
try
let line1 = streamReader.ReadLine()
streamReader.ReadLine() |> ignore
let line2 = streamReader.ReadLine()
if(obj.ReferenceEquals(line2, null)) then
Thread.CurrentThread.Abort() //DOES NOT WORK!!
else
let m1 = Regex.Match(line1, #"^.*((Get|Put)\w+).*$");
let m2 = Regex.Match(line2, #"\s*return\s*\((\s*" + pin.Name + "\s*)\);");
if (m1.Success && m2.Success) then
pin.VariableName <- m1.Groups.[1].Value
Thread.CurrentThread.Abort() //DOES NOT WORK!!
else
()
with
| ex -> ()
()
Directory.GetFiles(filePath, "Rte*") //all C source and header files that start with Rte
|> Array.iter innerFunc
}
Asyncs cancel on designated operations, such as on return!, let!, or do!; they don't just kill the thread in any unknown state, which is not generally safe. If you want your asyncs to stop, they could for example:
be recursive and iterate via return!. The caller would provide a CancellationToken to Async.RunSynchronously and catch the resulting OperationCanceledException when the job is done.
check some thread-safe state and decide to stop depending on it.
Note that those are effectively the same thing: the workers who iterate over the data check what is going on and cancel in an orderly fashion. In other words, it is clear when exactly they check for cancellation.
Using async cancellation might result in something like this:
let canceler = new System.Threading.CancellationTokenSource()
let rec worker myParameters =
async {
// do stuff
if amIDone() then canceler.Cancel()
else return! worker (...) }
let workers = (...) |> Async.Parallel
try Async.RunSynchronously(workers, -1, canceler.Token) |> ignore
with :? System.OperationCanceledException -> ()
Stopping from common state could look like this:
let keepGoing = ref true
let rec worker myParameters =
if !keepGoing then
// do stuff
if amIDone() then keepGoing := false
worker (...)
let makeWorker initParams = async { worker initParams }
// make workers
workers |> Async.Parallel |> Async.RunSynchronously |> ignore
If the exact timing of cancellation is relevant, I believe the second method may not be safe, as there may be a delay until other threads see the variable change. This doesn't seem to matter here though.
I'm having trouble understanding this code... I was expecting something similar to threading where I would get an output with random "nooo" and "yaaaay"s interspersed with each other as they both do the printing asynchronously, but rather I discovered that the main thread seems to block on the first calling of coroutine.resume() and thus prevents the next from being started until the first has yielded.
If this is the intended operation coroutines, what are they useful for, and how would I achieve the goal I was hoping for? Would I have to implement my own scheduler for these coroutines to operate asynchronously?, because that seems messy, and I may as well use functions!
co1 = coroutine.create(function ()
local i = 1
while i < 200 do
print("nooo")
i = i + 1
end
coroutine.yield()
end)
co2 = coroutine.create(function ()
local i = 1
while i < 200 do
print("yaaaay")
i = i + 1
end
coroutine.yield()
end)
coroutine.resume(co1)
coroutine.resume(co2)
Coroutines aren't threads.
Coroutines are like threads that are never actively scheduled. So yes you are kinda correct that you would have to write you own scheduler to have both coroutines run simultaneously.
However you are missing the bigger picture when it comes to coroutines. Check out wikipedia's list of coroutine uses. Here is one concrete example that might guide you in the right direction.
-- level script
-- a volcano erupts every 2 minutes
function level_with_volcano( interface )
while true do
wait(seconds(5))
start_eruption_volcano()
wait(frames(10))
s = play("rumble_sound")
wait( end_of(s) )
start_camera_shake()
-- more stuff
wait(minutes(2))
end
end
The above script could be written to run iteratively with a switch statement and some clever state variables. But it is much more clear when written as a coroutine. The above script could be a thread but do you really need to dedicate a kernel thread to this simple code. A busy game level could have 100's of these coroutines running without impacting performance. However if each of these were a thread you might get away with 20-30 before performance started to suffer.
A coroutine is meant to allow me to write code that stores state on the stack so that I can stop running it for a while (the wait functions) and start it again where I left off.
Since there have been a number of comments asking how to implement the wait function that would make deft_code's example work, I've decided to write a possible implementation. The general idea is that we have a scheduler with a list of coroutines, and the scheduler decides when to return control to the coroutines after they give up control with their wait calls. This is desirable because it makes asynchronous code be readable and easy to reason about.
This is only one possible use of coroutines, they are a more general abstraction tool that can be used for many different purposes (such as writing iterators and generators, writing stateful stream processing objects (for example, multiple stages in a parser), implementing exceptions and continuations, etc.).
First: the scheduler definition:
local function make_scheduler()
local script_container = {}
return {
continue_script = function(frame, script_thread)
if script_container[frame] == nil then
script_container[frame] = {}
end
table.insert(script_container[frame],script_thread)
end,
run = function(frame_number, game_control)
if script_container[frame_number] ~= nil then
local i = 1
--recheck length every time, to allow coroutine to resume on
--the same frame
local scripts = script_container[frame_number]
while i <= #scripts do
local success, msg =
coroutine.resume(scripts[i], game_control)
if not success then error(msg) end
i = i + 1
end
end
end
}
end
Now, initialising the world:
local fps = 60
local frame_number = 1
local scheduler = make_scheduler()
scheduler.continue_script(frame_number, coroutine.create(function(game_control)
while true do
--instead of passing game_control as a parameter, we could
--have equivalently put these values in _ENV.
game_control.wait(game_control.seconds(5))
game_control.start_eruption_volcano()
game_control.wait(game_control.frames(10))
s = game_control.play("rumble_sound")
game_control.wait( game_control.end_of(s) )
game_control.start_camera_shake()
-- more stuff
game_control.wait(game_control.minutes(2))
end
end))
The (dummy) interface to the game:
local game_control = {
seconds = function(num)
return math.floor(num*fps)
end,
minutes = function(num)
return math.floor(num*fps*60)
end,
frames = function(num) return num end,
end_of = function(sound)
return sound.start+sound.duration-frame_number
end,
wait = function(frames_to_wait_for)
scheduler.continue_script(
frame_number+math.floor(frames_to_wait_for),
coroutine.running())
coroutine.yield()
end,
start_eruption_volcano = function()
--obviously in a real game, this could
--affect some datastructure in a non-immediate way
print(frame_number..": The volcano is erupting, BOOM!")
end,
start_camera_shake = function()
print(frame_number..": SHAKY!")
end,
play = function(soundname)
print(frame_number..": Playing: "..soundname)
return {name = soundname, start = frame_number, duration = 30}
end
}
And the game loop:
while true do
scheduler.run(frame_number,game_control)
frame_number = frame_number+1
end
co1 = coroutine.create(
function()
for i = 1, 100 do
print("co1_"..i)
coroutine.yield(co2)
end
end
)
co2 = coroutine.create(
function()
for i = 1, 100 do
print("co2_"..i)
coroutine.yield(co1)
end
end
)
for i = 1, 100 do
coroutine.resume(co1)
coroutine.resume(co2)
end