How do I create a scheduler which never executes more than one Task at a time using async-await? - c#-4.0

I want to implement a class or pattern that ensures that I never execute more than one Task at a time for a certain set of operations (HTTP calls). The invocations of the Tasks can come from different threads at random times. I want to make use of the async-await pattern so that the caller can handle exceptions by wrapping the call in a try-catch.
Here's an illustration of the intended flow of execution:
Pseudo code from caller:
try {
Task someTask = GetTask();
await SomeScheduler.ThrottledRun(someTask);
}
catch(Exception ex) {
// Handle exception
}
The Taskclass here might instead be an Action class depending on the solution.
Note that I when I use the word "Schedule" in this question I'm not necessarily using it with relation to the .NET Task Scheduler. I don't know the async-await library well enough to know at what angle and with what tools to approach this problem. The TaskScheduler might be relevant here, and it may not. I've read the TAP pattern document and found patterns that almost solve this problem, but not quite (the chapter on interleaving).

There is a new ConcurrentExclusiveSchedulerPair type in .NET 4.5 (I don't remember if it was included in the Async CTP), and you can use its ExclusiveScheduler to restrict execution to one Task at a time.
Consider structuring your problem as a Dataflow. It's easy to just pass a TaskScheduler into the block options for the parts of the dataflow you want restricted.
If you don't want to (or can't) use Dataflow, you can do something similar yourself. Remember that in TAP, you always return started tasks, so you don't have the "creation" separated from the "scheduling" like you do in TPL.
You can use ConcurrentExclusiveSchedulerPair to schedule Actions (or async lambdas without return values) like this:
public static ConcurrentExclusiveSchedulerPair schedulerPair =
new ConcurrentExclusiveSchedulerPair();
public static TaskFactory exclusiveTaskFactory =
new TaskFactory(schedulerPair.ExclusiveScheduler);
...
public static Task RunExclusively(Action action)
{
return exclusiveTaskFactory.StartNew(action);
}
public static Task RunExclusively(Func<Task> action)
{
return exclusiveTaskFactory.StartNew(action).Unwrap();
}
There are a few things to note about this:
A single instance of ConcurrentExclusiveSchedulerPair only coordinates Tasks that are queued to its schedulers. A second instance of ConcurrentExclusiveSchedulerPair would be independent from the first, so you have to ensure the same instance is used in all parts of your system you want coordinated.
An async method will - by default - resume on the same TaskScheduler that started it. So this means if one async method calls another async method, the "child" method will "inherit" the parent's TaskScheduler. Any async method may opt out of continuing on its TaskScheduler by using ConfigureAwait(false) (in that case, it continues directly on the thread pool).

Related

Which coroutine dispatcher to inject when testing a Kotlin console

I have a back-end springboot Kotlin app that has some simple coroutine code for parallel IO operations. It looks something like this
#Service
class AccountService(
private val client: ApiClient
private val coroutineDispatcher: CoroutineDispatcher
) {
fun getAccount(): AccountDTO {
return runBlocking(coroutineDispatcher) {
val foo = async {
client.getFoo() //some long operation
}
val bar = async {
client.getBar() //some other operation
}
AccountDTO(foo.await(), bar.await())
}
}
}
Now in production I can inject let's say a Dispatchers.IO dispatcher and everything works fine. However when testing I don't want multithreading. I want to inject Dispatcher.Main but it's meant for android apps. Alternatively I'd like to inject nothing and let the scope inherit from the parent and run on the main thread, the way runBlocking{} works without any arguments. But I can't figure out how to do that. Should I be using Dispatchers.Unconfined? From what I understand it will stay on the main thread unless I spin up another thread myself explicitly.
What's the standard practice here?
Alternatively I'd like to inject nothing and let the scope inherit from the parent and run on the main thread, the way runBlocking{} works without any arguments. But I can't figure out how to do that
If you have control over getAccount() here, you should make this one suspend and avoid runBlocking at this level. If you're just introducing coroutines in the project, it's of course fine like this as a first step, but I would encourage you to keep going up the call stack soon. This way you'll benefit from context inheritance and avoid blocking threads from other coroutines if they decide to call your service.
Now, you could use kotlinx-coroutines-test for your tests, they provide test dispatchers that you can use, which even allow you to control virtual time and test timeouts etc. Note, however, that due to the virtual time, you cannot use this kind of dispatcher for code that calls an actual system that's not part of virtual time (like some external service, or just a piece of code using a hardcoded non-test dispatcher).

Spawning a new thread for each object load

I have a system which runs multiple service (long lived) and worker (short lived) threads. They all share a state which contains objects. Any thread can request an object an any time, through a singleton-of-sorts class called ObjectManager. If the object is not available it needs to be loaded.
Here's some pseudo-code of how object loading looks now:
class ObjectManager {
getLoadinData(path) {
if (hasLoadingDataFor(path))
return whatWeHave()
else {
loadingData = createNewLoadingData();
loadingData.path = path;
pushLoadingTaskToLoadingThread(loadingData);
return loadingData;
}
}
// loads object and blocks until it's loaded
loadObjectSync(path) {
loadingData = getLoadinData(path);
waitFor(loadingData.conditionVar);
return loadingData.loadedObject;
}
// initiates a load and calls a callback when done
loadObjectAsync(path, callback) {
loadingData = getLoadinData(path);
loadingData.callbacks.add(callback);
}
// dedicated loading thread
loadingThread() {
while (running) {
loadingData = waitForLoadingData();
object = readObjectFromDisk(loadingData.path);
object.onLoaded(); // !!!!
loadingData.object = object;
// unblock cv waiters
loadingData.conditionVar.notifyAll();
// call callbacks
loadingData.callbacks.callAll(object);
}
}
}
The problem is the line object.onLoaded. I have no control over this function. Some objects might decide that they need other objects to be valid. So in their onLoaded method they might call loadObjectSync. Uh-oh! This (naturally) dead locks. It blocks the loading loop until the loading loop makes more iterations.
What I could do to solve this is leave the onLoaded call to the initiating threads. This will change loadObjectSync to something like:
loadObjectSync(path) {
loadingData = getLoadinData(path);
waitFor(loadingData.conditionVar);
if (loadingData.wasCreatedInThisThread()) {
object.onLoaded();
loadingData.onLoadedConditionVar.notifyAll();
loadingData.callbacks.callAll(object);
}
else {
// wait more
waitFor(loadingData.onLoadedConditionVar);
}
return loadingData.loadedObject;
}
... but then the problem is that if I have no calls for loadSync and only for loadAsync or simply the loadAsync call was the first to create the loading data, there will be no one to finalize the object. So to make this work, I have to introduce another thread finalizes objects whose loadingData was created by loadObjectAsync.
It seems that it would work. But I have a simpler idea! What if I change getLoadingData instead. What if it does this:
getLoadinData(path) {
if (hasLoadingDataFor(path))
return whatWeHave()
else {
loadingData = createNewLoadingData();
loadingData.path = path;
///
thread = spawnLoadingThread(loadingData);
thread.detach();
///
return loadingData;
}
}
Spawn a new thread for every object load. Thus there is no dead lock. Every loading thread can safely block until it's done. The rest of the code remains exactly as it is.
This means potentially tens (or why not thousands in certain edge cases) active threads, waiting on condition variables. I know that spawning threads has its overhead but I think it would be negligible compared to the cost of I/O from readObjectFromDisk
So my question is: Is this terrible? Can this somehow backfire?
The target platform is conventional desktop machines. But this software is supposed to run for a long time without stopping: weeks, maybe months.
Alternatively... even though I have an idea how to solve this if the thread-per-load turns out to be terrible, can this be solved in another way?
Very interesting! This is a problem I have bumped into a couple of times, trying to add a synchronous interface to a fundamentally asynchronous operation (i.e. file load, or in my case, network write) that is performed by a service thread.
My own preference would be to not provide the synchronous interface. Why? Because it keeps the code simpler in design & implementation and easier to reason about -- always important for multi-threading.
Benefits of sticking to single thread & async only is that you only have 1 service thread, so resource growth is not a concern, plus the user callbacks are always invoked on this same thread, which simplifies thread-safety concerns for users of ObjectManager (if you have multiple callback threads, every user callback must be thread safe, so it's an important choice to make). However sticking to only an async interface does mean the user of ObjectManager has more work to do.
But if you do want to keep the synchronous interface, then another approach that I have taken could work for you. You stick to a single service thread but inside the implementation of loadObjectSync you check the thread-ID to determine if the invoker is the service thread or any-other thread. If it is any-other thread you queue the request and safely block. But if it is the service thread, you can immediately load the object, say by calling a new function loadObjectImpl. You will need to grab the thread-ID of the service thread during initialization and store it within the ObjectManager instance, and use that for thread identification. And you will need a new function which is basically just the internal scope of the loadingThread function -- i.e. a new function called something like loadObjectImpl.

Rxjava: Subscribe on the specific thread

I'm a newbie in Rxjava.
I have the following code:
System.out.println("1: " + Thread.currentThread().getId());
Observable.create(new rx.Observable.OnSubscribe<String>() {
#Override
public void call(Subscriber<? super String> subcriber) {
System.out.println("2: " + Thread.currentThread().getId());
// query database
String result = ....
subcriber.onNext(result);
}
}).subscribeOn(Schedulers.newThread()).subscribe(countResult -> {
System.out.println("3: " + Thread.currentThread().getId());
});
For example, the output will be:
1: 50
2: 100
3: 100
I want subscribers run on the thread that has id 50. How can I do that?
I think that there are two cases. Either you need it to run on the UI thread, or because of synchronisation. As I know you can not call a function on a specific thread, because when the method is called it is bound to the context of the thread, so it is impossible to call a method from a thread to another thread. Your problem is that the method in subscriber is called from Schedulers.newThread(). I also found this github issue about Schedulers.currentThread(). What you need is to notify the caller thread when the observer gets called.
Also you can use akka, it is way simpler to write concurrent code with it.
Sorry for my bad grammar.
From the docs:
By default, an Observable and the chain of operators that you apply to
it will do its work, and will notify its observers, on the same thread
on which its Subscribe method is called. The SubscribeOn operator
changes this behavior by specifying a different Scheduler on which the
Observable should operate. The ObserveOn operator specifies a
different Scheduler that the Observable will use to send notifications
to its observers.
So you can just use subscribe instead of subscribeOn to observe your collection on the same thread it was created, something like this:
Observable.create(new rx.Observable.OnSubscribe<String>() {
#Override
public void call(Subscriber<? super String> subcriber) {
System.out.println("2: " + Thread.currentThread().getId());
// query database
String result = ....
subcriber.onNext(result);
}
}).subscribe(countResult -> {
System.out.println("3: " + Thread.currentThread().getId());
});
UPDATE:
If your application is an Android application, you can use subscribe on a background thread as you do and pass the results to the main thread using Handler messages.
If your application is a Java application I may suggest using wait() and notify() mechanism or consider using frameworks such as EventBus or akka for more complex scenarios.
With RxJava 1.0.15, you can apply toBlocking() before subscribe and everything will run on the thread that created the entire sequence of operators.
So subscribeOn denotes what thread the Observable will start emitting items on, observeOn "switches" to a thread for the rest of the observable chain. Put a observeOn(schedulers.CurrentThread()) right before the subscribe and you'll be in the thread that this observable is created in rather than the thread it is executed in. Here's a resource that explains rxjava threading really well. http://www.grahamlea.com/2014/07/rxjava-threading-examples/
I believe #akarnokd is right. You either run without the .subscribeOn(Schedulers.newThread()) so that it happens synchronously or use toBlocking() just before the subscribe. Alternatively if you just want everything to happen on the same thread but it doesn't have to be the current thread then you can do this:
Observable
.defer(() -> Observable.create(...))
.subscribeOn(Schedulers.newThread())
.subscribe(subscriber);
defer ensures that the call to create happens on the subscription thread.

Distributed\Parallel computing using app-engine (java api)

I want to use the master-slave (worker) paradigm, to solve a problem. I have read that opening new threads manually (for example using thread pool) is not available and I need to use queue, attached code example:
class MyDeferred implements DeferredTask {
#Override
public void run() {
// Do something interesting
}
};
MyDeferred task = new MyDeferred();
// Set instance variables etc as you wish
Queue queue = QueueFactory.getDefaultQueue();
queue.add(withPayload(task));
How can I get the result of the workers (which were added to the queue)?
I need this info, in-order to solve the bigger problem.
Actually you can use threads on GAE, but there are limitations. If you need long-running threads you can use background threads, but this requires you to use backend instances.
If you opt to use task queue, then keep in mind that tasks do not "return" to caller. To aggregate results you'll need to use datastore.
You will have to write the results into the datastore.
Just as a starting point to think about it, you might pass a JobId as a parameter to the tasks, have each task write an entity with the result and the JobId, and then later query the datstore for the given JobId to get all the results.

Passing a `Disposable` object safely to the UI thread with TPL

We recently adopted the TPL as the toolkit for running some heavy background tasks.
These tasks typically produce a single object that implements IDisposable. This is because it has some OS handles internally.
What I want to happen is that the object produced by the background thread will be properly disposed at all times, also when the handover coincides with application shutdown.
After some thinking, I wrote this:
private void RunOnUiThread(Object data, Action<Object> action)
{
var t = Task.Factory.StartNew(action, data, CancellationToken.None, TaskCreationOptions.None, _uiThreadScheduler);
t.ContinueWith(delegate(Task task)
{
if (!task.IsCompleted)
{
DisposableObject.DisposeObject(task.AsyncState);
}
});
}
The background Task calls RunOnUiThread to pass its result to the UI thread. The task t is scheduled on the UI thread, and takes ownership of the data passed in. I was expecting that if t could not be executed because the ui thread's message pump was shut down, the continuation would run, and I could see that that the task had failed, and dispose the object myself. DisposeObject() is a helper that checks if the object is actually IDisposable, and non-null, prior to disposing it.
Sadly, it does not work. If I close the application after the background task t is created, the continuation is not executed.
I solved this problem before. At that time I was using the Threadpool and the WPF Dispatcher to post messages on the UI thread. It wasn't very pretty, but in the end it worked. I was hoping that the TPL was better at this scenario. It would even be better if I could somehow teach the TPL that it should Dispose all leftover AsyncState objects if they implement IDisposable.
So, the code is mainly to illustrate the problem. I want to learn about any solution that allows me to safely handover Disposable objects to the UI thread from background Tasks, and preferably one with as little code as possible.
When a process closes, all of it's kernel handles are automatically closed. You shouldn't need to worry about this:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686722(v=vs.85).aspx
Have a look at the RX library. This may allow you to do what you want.
From MSDN:
IsCompleted will return true when the Task is in one of the three
final states: RanToCompletion, Faulted, or Canceled
In other words, your DisposableObject.DisposeObject will never be called, because the continuation will always be scheduled after one of the above conditions has taken place. I believe what you meant to do was :
t.ContinueWith(t => DisposableObject.DisposeObject(task.AsyncState),
TaskContinuationOptions.NotOnRanToCompletion)
(BTW you could have simply captured the data variable rather than using the AsyncState property)
However I wouldn't use a continuation for something that you want to ensure happens at all times. I believe a try-finally block will be more fitting here:
private void RunOnUiThread2(Object data, Action<Object> action)
{
var t = Task.Factory.StartNew(() =>
{
try
{
action(data);
}
finally
{
DisposableObject.DisposeObject(task.AsyncState);
//Or use a new *foreground* thread if the disposing is heavy
}
}, CancellationToken.None, TaskCreationOptions.None, _uiThreadScheduler);
}

Resources