tokio::join and different results

tokio::join and different results - rust

First I wrote the first implementation, then tried to simplify it to the second one, but surprisingly the second one is almost 3x slower. Why?
First implementation (faster):
let data: Vec<Arc<Data>> = vec![d1,d2,d3];
let mut handles = Vec::new();
for d in &data {
let d = d.clone();
handles.push(tokio::spawn(async move {
d.update().await;
}));
}
for handle in handles {
let _ = tokio::join!(handle);
}
Second implementation (slower):
let data: Vec<Arc<Data>> = vec![d1,d2,d3];
for d in &data {
let d = d.clone();
let _ = tokio::join!(tokio::spawn(async move {
d.update().await;
}));
}

In the first example you spawn all your tasks onto the executor, allowing them to run in parallel, and then you join all of them in sequence. In the second example you spawn each task onto the executor in sequence, but you wait for that task to finish before spawning the next one, meaning you get zero parallelism, and thus no speedup. Again, the important observation to make is that in the first example all of your tasks are making progress in the background even though you're waiting for them to finish one by one. Also something like join_all would probably be more appropriate for waiting on the tasks in the first example.

Related

Swift: How to interrupt a sort routine

I want to give users the ability to stop a sorting routine if it takes too long.
I tried to use DispatchWorkItem.cancel. However, this does not actually stop processes that have already started.
let myArray = [String]() // potentially 230M elements!
...
workItem = DispatchWorkItem {
let result = myArray.sorted()
DispatchQueue.main.async {
print ("done")
}
}
// if process is too long, user clicks "Cancel" =>
workItem.cancel() // does not stop sort
How can I kill the workItem's thread?
I don't have access to the sorted routine, therefore I cannot insert tests to check if current thread is in "cancelled" status...

As you have deduced, one simply cannot kill a work item without periodically checking isCancelled and manually performing an early exit if it is set.
There are two options:
You can used sorted(by:), test for isCancelled there, throwing an error if it is canceled. That achieves the desired early exit.
That might look like:
func cancellableSort() -> DispatchWorkItem {
var item: DispatchWorkItem!
item = DispatchWorkItem() {
let unsortedArray = (0..<10_000_000).shuffled()
let sortedArray = try? unsortedArray.sorted { (obj1, obj2) -> Bool in
if item.isCancelled {
throw SortError.cancelled
}
return obj1 < obj2
}
// do something with your sorted array
item = nil
}
DispatchQueue.global().async(execute: item)
return item
}
Where
enum SortError: Error {
case cancelled
}
Be forewarned that even in release builds, this can have a dramatic impact on performance. So you might want to benchmark this.
You could just write your own sort routine, inserting your own test of isCancelled inside the algorithm. This gives you more control over where precisely you perform the test (i.e., you might not want to do it for every comparison, but rather at some higher level loop within the algorithm, thereby minimizing the performance impact). And given the number of records, this gives you a chance to pick an algorithm best suited for your data set.
Obviously, when benchmarking these alternatives, make sure you test optimized/release builds so that your results are not skewed by your build settings.
As an aside, you might considering using Operation, too, as its handling of cancelation is more elegant, IMHO. Plus you can have a dedicated object for the sort operation, which is cleaner.

Dynamic variables and promises

It seems that dynamic variables don't always survive subroutine calls in threads:
sub foo($x, &y = &infix:<+>) {
my &*z = &y;
bar($x);
}
sub bar ($x) {
say &*z($x,$x);
my $promise = start { bar($x-1) if $x > 0 }
await $promise;
# bar($x-1) if $x > 0 # <-- provides the expected result: 6, 4, 2, 0
}
foo(3); # 6, 4, Dynamic variable &*z not found
Using a more globally scoped variable also works, so it's not that all variables are lost — it seems specific to dynamics:
our &b;
sub foo($a, &c = &infix:<+>) {
&b = &c;
bar($a);
}
sub bar ($a) {
say &b($a,$a);
my $promise = start { bar($a-1) if $a > 0 }
await $promise;
}
foo(3); # 6, 4, 2, 0
Once the variable is set in foo(), it is read without problem in bar(). But when bar() is called from inside the promise, the value for &*z disappears not on the first layer of recursion but the second.
I'm sensing a bug but maybe I'm doing something weird with the between the recursion/dynamic variables/threading that's messing things up.

Under current semantics, start will capture the context it was invoked in. If dynamic variable lookup fails on the stack of the thread that the start executes on (one of those from the thread pool), then it will fall back to looking at the dynamic scope captured when the start block was scheduled.
When a start block is created during the execution of another start block, the same thing happens. However, there is no relationship between the two, meaning that the context captured by the "outer" start block will not be searched also. While one could argue for that to happen, it seems potentially problematic to do so. Consider this example:
sub tick($n = 1 --> Nil) {
start {
await Promise.in(1);
say $n;
tick($n + 1);
}
}
tick();
sleep;
This is a (not entirely idiomatic) way to produce a tick every second. Were the inner start to retain a reference back to the state of the outer one, for the purpose of dynamic variable lookup, then this program would build up a chain of ever increasing length in memory, which seems like an undesirable behavior.

F# Async Task Cancellation without Token

I am trying to parse hundreds of C source files to map dozens of software signal variables to the names of physical hardware pins. I am trying to do this asynchronously in F#
IndirectMappedHWIO
|> Seq.map IndirectMapFromFile //this is the function with the regex in it
|> Async.Parallel
|> Async.RunSynchronously
The issue is that I cannot figure out how to pass in a CancellationToken to end my task. Each task is reading around 300 C files so I want to be able to stop the task's execution as soon as the regex matches. I tried using Thread.CurrentThread.Abort() but that does not seem to work. Any ideas on how to pass in a CancellationToken for each task? Or any other way to cancel a task based on a condition?
let IndirectMapFromFile pin =
async {
let innerFunc filename =
use streamReader = new StreamReader (filePath + filename)
while not streamReader.EndOfStream do
try
let line1 = streamReader.ReadLine()
streamReader.ReadLine() |> ignore
let line2 = streamReader.ReadLine()
if(obj.ReferenceEquals(line2, null)) then
Thread.CurrentThread.Abort() //DOES NOT WORK!!
else
let m1 = Regex.Match(line1, #"^.*((Get|Put)\w+).*$");
let m2 = Regex.Match(line2, #"\s*return\s*\((\s*" + pin.Name + "\s*)\);");
if (m1.Success && m2.Success) then
pin.VariableName <- m1.Groups.[1].Value
Thread.CurrentThread.Abort() //DOES NOT WORK!!
else
()
with
| ex -> ()
()
Directory.GetFiles(filePath, "Rte*") //all C source and header files that start with Rte
|> Array.iter innerFunc
}

Asyncs cancel on designated operations, such as on return!, let!, or do!; they don't just kill the thread in any unknown state, which is not generally safe. If you want your asyncs to stop, they could for example:
be recursive and iterate via return!. The caller would provide a CancellationToken to Async.RunSynchronously and catch the resulting OperationCanceledException when the job is done.
check some thread-safe state and decide to stop depending on it.
Note that those are effectively the same thing: the workers who iterate over the data check what is going on and cancel in an orderly fashion. In other words, it is clear when exactly they check for cancellation.
Using async cancellation might result in something like this:
let canceler = new System.Threading.CancellationTokenSource()
let rec worker myParameters =
async {
// do stuff
if amIDone() then canceler.Cancel()
else return! worker (...) }
let workers = (...) |> Async.Parallel
try Async.RunSynchronously(workers, -1, canceler.Token) |> ignore
with :? System.OperationCanceledException -> ()
Stopping from common state could look like this:
let keepGoing = ref true
let rec worker myParameters =
if !keepGoing then
// do stuff
if amIDone() then keepGoing := false
worker (...)
let makeWorker initParams = async { worker initParams }
// make workers
workers |> Async.Parallel |> Async.RunSynchronously |> ignore
If the exact timing of cancellation is relevant, I believe the second method may not be safe, as there may be a delay until other threads see the variable change. This doesn't seem to matter here though.

F# parallel design pattern

I am trying to run some same task in parallel in a F# Console project.
The task is as follow
Dequeue a table name from a ConcurrentQueue(this queue contains the table names my program need to process)
Open a SqlDataReader for the table
Write each row in the SqlDataReader to a StreamWriter
Zip the file created by the StreamWriter
Repeat 1 - 4
So basically each task is a while loop (pose as a recursion) to contiunously process tables. And I would like to start 4 tasks in parallel, and another thing is I would like to stop execution with a user keystroke on the Console, say Enter key. But the execution should only be stopped if the current task has complete step 4.
I have tried the following
let rec DownloadHelper (cq:ConcurrentQueue<Table>) sqlConn =
let success, tb = cq.TryDequeue()
if success then
printfn "Processing %s %s" tb.DBName tb.TBName
Table2CSV tb.DBName tb.TBName sqlConn
DownloadHelper cq sqlConn
let DownloadTable (cq:ConcurrentQueue<Table>) connectionString=
use con = new SqlConnection(connectionString)
con.Open()
DownloadHelper cq con
let asyncDownloadTask = async { return DownloadTable cq connectionString}
let asyncMultiDownload =
asyncDownloadTask
|> List.replicate 4
|> Async.Parallel
asyncMultiDownload
|>Async.RunSynchronously
|>ignore
There are two problems with the above code,
It blockes the main thread, thus I dont know how to do the keystroke part
I am not sure how to stop execution gracefully.
My second try is as below to use CancellationToken,
let tokenSource = new CancellationTokenSource()
let cq = PrepareJobs connectionString
let asyncDownloadTask = async { DownloadTable cq connectionString}
let task = async {
asyncDownloadTask
|> List.replicate 4
|> Async.Parallel
|>ignore}
let val1 = Async.Start(task, cancellationToken =tokenSource.Token)
Console.ReadLine() |> ignore
tokenSource.Cancel()
Console.ReadLine() |> ignore
0
But it seems I am not even able to start the task at all.

There are three problems with your code.
First, the DownloadHelper should do one table only.
By making it recursive, you are taking too much control and
inhibiting parallelism.
Second, simply placing an operation in an async expression does not
magically make it async. Unless the DownloadTable function itself
is async, the code will block until it is finished.
So when you run four downloads in parallel, once started, they will
all run to completion, regardless of the cancellation token.
Thirdly, in your second example you use Async.Parallel
but then throw the output away, which is why your task does nothing!
I think what you wanted to do was throw away the result of the async, not the async itself.
Here's my version of your code, to demonstrate these points.
First, a dummy function that takes up time:
let longAtomicOperation milliSecs =
let sw = System.Diagnostics.Stopwatch()
let r = System.Random()
let mutable total = 0.0
sw.Start()
while sw.ElapsedMilliseconds < int64 milliSecs do
total <- total + sin (r.NextDouble())
// return something
total
// test
#time
longAtomicOperation 2000
#time
// Real: 00:00:02.000, CPU: 00:00:02.000, GC gen0: 0, gen1: 0, gen2: 0
Note that this function is not async -- once started it will run to completion.
Now let's put it an an async:
let asyncTask id = async {
// note that NONE of the operations are async
printfn "Started %i" id
let result = longAtomicOperation 5000 // 5 seconds
printfn "Finished %i" id
return result
}
None of the operations in the async block are async, so we are not getting
any benefit.
Here's the code to create four tasks in parallel:
let fourParallelTasks = async {
let! results =
List.init 4 asyncTask
|> Async.Parallel
// ignore
return ()
}
The result of the Async.Parallel is not ignored, but is assigned to a value,
which forces the tasks to be run. The async expression as a whole returns unit though.
If we test it:
open System.Threading
// start the task
let tokenSource = new CancellationTokenSource()
do Async.Start(fourParallelTasks, cancellationToken = tokenSource.Token)
// wait for a keystroke
System.Console.WriteLine("press a key to cancel")
System.Console.ReadLine() |> ignore
tokenSource.Cancel()
System.Console.ReadLine() |> ignore
We get output that looks like this, even if a key is pressed,
because once started, each task will run to completion:
press a key to cancel
Started 3
Started 1
Started 2
Started 0
Finished 1
Finished 3
Finished 2
Finished 0
On the other hand, if we create a serial version, like this:
let fourSerialTasks = async {
let! result1 = asyncTask 1
let! result2 = asyncTask 2
let! result3 = asyncTask 3
let! result4 = asyncTask 4
// ignore
return ()
}
Then, even though the tasks are atomic, the cancellation token is tested between
each step, which allows cancellation of the subsequence tasks.
// start the task
let tokenSource = new CancellationTokenSource()
do Async.Start(fourSerialTasks, cancellationToken = tokenSource.Token)
// wait for a keystroke
System.Console.WriteLine("press a key to cancel")
System.Console.ReadLine() |> ignore
tokenSource.Cancel()
System.Console.ReadLine() |> ignore
The above code can be cancelled between each step when a key is pressed.
To process all elements of the queue this way in batches of four,
just convert the parallel version into a loop:
let rec processQueueAsync() = async {
let! result = processFourElementsAsync()
if result <> QueueEmpty then
do! processQueueAsync()
// ignore
return ()
}
Finally, to me, using async is not about running things in parallel so much
as it is to write non-blocking code. So if your library code is blocking, the async
approach is not going to provide too much benefit.
To ensure your code is non-blocking, you need to using the async versions of SqlDataReader methods
in your helper, such as NextResultAsync.

lock-free bounded MPMC ringbuffer failure

I've been banging my head against (my attempt) at a lock-free multiple producer multiple consumer ring buffer. The basis of the idea is to use the innate overflow of unsigned char and unsigned short types, fix the element buffer to either of those types, and then you have a free loop back to beginning of the ring buffer.
The problem is - my solution doesn't work for multiple producers (it does though work for N consumers, and also single producer single consumer).
#include <atomic>
template<typename Element, typename Index = unsigned char> struct RingBuffer
{
std::atomic<Index> readIndex;
std::atomic<Index> writeIndex;
std::atomic<Index> scratchIndex;
Element elements[1 << (sizeof(Index) * 8)];
RingBuffer() :
readIndex(0),
writeIndex(0),
scratchIndex(0)
{
;
}
bool push(const Element & element)
{
while(true)
{
const Index currentReadIndex = readIndex.load();
Index currentWriteIndex = writeIndex.load();
const Index nextWriteIndex = currentWriteIndex + 1;
if(nextWriteIndex == currentReadIndex)
{
return false;
}
if(scratchIndex.compare_exchange_strong(
currentWriteIndex, nextWriteIndex))
{
elements[currentWriteIndex] = element;
writeIndex = nextWriteIndex;
return true;
}
}
}
bool pop(Element & element)
{
Index currentReadIndex = readIndex.load();
while(true)
{
const Index currentWriteIndex = writeIndex.load();
const Index nextReadIndex = currentReadIndex + 1;
if(currentReadIndex == currentWriteIndex)
{
return false;
}
element = elements[currentReadIndex];
if(readIndex.compare_exchange_strong(
currentReadIndex, nextReadIndex))
{
return true;
}
}
}
};
The main idea for writing was to use a temporary index 'scratchIndex' that acts a pseudo-lock to allow only one producer at any one time to copy-construct into the elements buffer, before updating the writeIndex and allowing any other producer to make progress. Before I am called heathen for implying my approach is 'lock-free' I realise that this approach isn't exactly lock-free, but in practice (if it would work!) it is significantly faster than having a normal mutex!
I am aware of a (more complex) MPMC ringbuffer solution here http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue, but I am really experimenting with my idea to then compare against that approach and find out where each excels (or indeed whether my approach just flat out fails!).
Things I have tried;
Using compare_exchange_weak
Using more precise std::memory_order's that match the behaviour I want
Adding cacheline pads between the various indices I have
Making elements std::atomic instead of just Element array
I am sure that this boils down to a fundamental segfault in my head as to how to use atomic accesses to get round using mutex's, and I would be entirely grateful to whoever can point out which neurons are drastically misfiring in my head! :)

This is a form of the A-B-A problem. A successful producer looks something like this:
load currentReadIndex
load currentWriteIndex
cmpxchg store scratchIndex = nextWriteIndex
store element
store writeIndex = nextWriteIndex
If a producer stalls for some reason between steps 2 and 3 for long enough, it is possible for the other producers to produce an entire queue's worth of data and wrap back around to the exact same index so that the compare-exchange in step 3 succeeds (because scratchIndex happens to be equal to currentWriteIndex again).
By itself, that isn't a problem. The stalled producer is perfectly within its rights to increment scratchIndex to lock the queue—even if a magical ABA-detecting cmpxchg rejected the store, the producer would simply try again, reload exactly the same currentWriteIndex, and proceed normally.
The actual problem is the nextWriteIndex == currentReadIndex check between steps 2 and 3. The queue is logically empty if currentReadIndex == currentWriteIndex, so this check exists to make sure that no producer gets so far ahead that it overwrites elements that no consumer has popped yet. It appears to be safe to do this check once at the top, because all the consumers should be "trapped" between the observed currentReadIndex and the observed currentWriteIndex.
Except that another producer can come along and bump up the writeIndex, which frees the consumer from its trap. If a producer stalls between steps 2 and 3, when it wakes up the stored value of readIndex could be absolutely anything.
Here's an example, starting with an empty queue, that shows the problem happening:
Producer A runs steps 1 and 2. Both loaded indices are 0. The queue is empty.
Producer B interrupts and produces an element.
Consumer pops an element. Both indices are 1.
Producer B produces 255 more elements. The write index wraps around to 0, the read index is still 1.
Producer A awakens from its slumber. It had previously loaded both read and write indices as 0 (empty queue!), so it attempts step 3. Because the other producer coincidentally paused on index 0, the compare-exchange succeeds, and the store progresses. At completion the producer lets writeIndex = 1, and now both stored indices are 1, and the queue is logically empty. A full queue's worth of elements will now be completely ignored.
(I should mention that the only reason I can get away with talking about "stalling" and "waking up" is that all the atomics used are sequentially consistent, so I can pretend that we're in a single-threaded environment.)
Note that the way that you are using scratchIndex to guard concurrent writes is essentially a lock; whoever successfully completes the cmpxchg gets total write access to the queue until it releases the lock. The simplest way to fix this failure is to just replace scratchIndex with a spinlock—it won't suffer from A-B-A and it's what's actually happening.

bool push(const Element & element)
{
while(true)
{
const Index currentReadIndex = readIndex.load();
Index currentWriteIndex = writeIndex.load();
const Index nextWriteIndex = currentWriteIndex + 1;
if(nextWriteIndex == currentReadIndex)
{
return false;
}
if(scratchIndex.compare_exchange_strong(
currentWriteIndex, nextWriteIndex))
{
elements[currentWriteIndex] = element;
// Problem here!
writeIndex = nextWriteIndex;
return true;
}
}
}
I've marked the problematic spot. Multiple threads can get to the writeIndex = nextWriteIndex at the same time. The data will be written in any order, although each write will be atomic.
This is a problem because you're trying to update two values using the same atomic condition, which is generally not possible. Assuming the rest of your method is fine, one way around this would be to combine both scratchIndex and writeIndex into a single value of double-size. For example, treating two uint32_t values as a single uint64_t value and operating atomically on that.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string