Parallel Task advice

Parallel Task advice - c#-4.0

I am trying to use the parallel task library to kick off a number of tasks like this:
var workTasks = _schedules.Where(x => x.Task.Enabled);
_tasks = new Task[workTasks.Count()];
_cancellationTokenSource = new CancellationTokenSource();
_cancellationTokenSource.Token.ThrowIfCancellationRequested();
int i = 0;
foreach (var schedule in _schedules.Where(x => x.Task.Enabled))
{
_log.InfoFormat("Reading task information for task {0}", schedule.Task.Name);
if(!schedule.Task.Enabled)
{
_log.InfoFormat("task {0} disabled.", schedule.Task.Name);
i++;
continue;
}
schedule.Task.ServiceStarted = true;
_tasks[i] = Task.Factory.StartNew(() =>
schedule.Task.Run()
, _cancellationTokenSource.Token);
i++;
_log.InfoFormat("task {0} has been added to the worker threads and has been started.", schedule.Task.Name);
}
I want these tasks to sleep and then wake up every 5 minutes and do their stuff, at the moment I am using Thread.Sleep in the Schedule object whose Run method is the Action that is passed into StartNew as an argument like this:
_tasks[i] = Task.Factory.StartNew(() =>
schedule.Task.Run()
, _cancellationTokenSource.Token);
I read somewhere that Thread.Sleep is a bad solution for this. Can anyone recommend a better approach?

By my understanding, Thread.Sleep is bad generally, because it force-shifts everything out of memory even when that's not necessary. It won't be a big deal in most cases, but it could be a performance issue.
I'm in the habit of using this snippet instead:
new System.Threading.EventWaitHandle(false, EventResetMode.ManualReset).WaitOne(1000);
Fits on one line, and isn't overly complicated -- it creates an event handle that will never be set, and then waits for the full timeout period before continuing.
Anyway, if you're just trying to have something repeat every 5 minutes, a better approach would probably be to use a Timer. You could even make a class to neatly wrap everything if your repeated work methods are already factored out:
using System.Threading;
using System.Threading.Tasks;
public class WorkRepeater
{
Timer m_Timer;
WorkRepeater(Action workToRepeat, TimeSpan interval)
{
m_Timer = new System.Timers.Timer((double)Interval.Milliseconds);
m_Timer.Elapsed +=
new System.Timers.ElapsedEventHandler((o, ea) => WorkToRepeat());
}
public void Start()
{
m_Timer.Start();
}
public void Stop()
{
m_Timer.Stop();
}
}

Bad solution are Tasks here. Task should be used for short living operations, like asynch IO. If you want to control life time of task you should use Thread and sleep as much as you like, because Thread is individual, but Tasks are rotated in thread pool which is shared.

Related

Interrupt parallel Stream execution

Consider this code :
Thread thread = new Thread(() -> tasks.parallelStream().forEach(Runnable::run));
tasks are a list of Runnables that should be executed in parallel.
When we start this thread, and it begins its execution, then depending on some calculations we need to interrupt (cancel) all those tasks.
Interrupting the Thread will only stop one of exections. How do we handle others? or maybe Streams should not be used that way? or you know a better solution?

You can use a ForkJoinPool to interrupt the threads:
#Test
public void testInterruptParallelStream() throws Exception {
final AtomicReference<InterruptedException> exc = new AtomicReference<>();
final ForkJoinPool forkJoinPool = new ForkJoinPool(4);
// use the pool with a parallel stream to execute some tasks
forkJoinPool.submit(() -> {
Stream.generate(Object::new).parallel().forEach(obj -> {
synchronized (obj) {
try {
// task that is blocking
obj.wait();
} catch (final InterruptedException e) {
exc.set(e);
}
}
});
});
// wait until the stream got started
Threads.sleep(500);
// now we want to interrupt the task execution
forkJoinPool.shutdownNow();
// wait for the interrupt to occur
Threads.sleep(500);
// check that we really got an interruption in the parallel stream threads
assertTrue(exc.get() instanceof InterruptedException);
}
The worker threads do really get interrupted, terminating a blocking operation. You can also call shutdown() within the Consumer.
Note that those sleeps might not be tweaked for a proper unit test, you might have better ideas to just wait as necessary. But it is enough to show that it is working.

You aren't actually running the Runnables on the Thread you are creating. You are running a thread which will submit to a pool, so:
Thread thread = new Thread(() -> tasks.parallelStream().forEach(Runnable::run));
In this example you are in lesser terms doing
List<Runnable> tasks = ...;
Thread thread = new Thread(new Runnable(){
public void run(){
for(Runnable r : tasks){
ForkJoinPool.commonPool().submit(r);
}
}
});
This is because you are using a parallelStream that delegates to a common pool when handling parallel executions.
As far as I know, you cannot get a handle of the Threads that are executing your tasks with a parallelStream so may be out of luck. You can always do tricky stuff to get the thread but probably isn't the best idea to do so.

Something like the following should work for you:
AtomicBoolean shouldCancel = new AtomicBoolean();
...
tasks.parallelStream().allMatch(task->{
task.run();
return !shouldCancel.get();
});
The documentation for the method allMatch specifically says that it "may not evaluate the predicate on all elements if not necessary for determining the result." So if the predicate doesn't match when you want to cancel, then it doesn't need to evaluate any more. Additionally, you can check the return result to see if the loop was cancelled or not.

Using worker threads to add new tasks to a taskPool in D

This a simplification and narrowing to another of my questions: Need help parallel traversing a dag in D
Say you've got some code that you want to parallelize. The problem is, some of the things you need to do have prerequisites. So you have to make sure that those prerequisites are done before you add the new task into the pool. The simple conceptual answer is to add new tasks as their prerequisites finish.
Here I have a little chunk of code that emulates that pattern. The problem is, it throws an exception because pool.finish() gets called before a new task is put on the queue by the worker thread. Is there a way to just wait 'till all threads are idle or something? Or is there another construct that would allow this pattern?
Please note: this is a simplified version of my code to illustrate the problem. I can't just use taskPool.parallel() in a foreach.
import std.stdio;
import std.parallelism;
void simpleWorker(uint depth, uint maxDepth, TaskPool pool){
writeln("Depth is: ",depth);
if (++depth < maxDepth){
pool.put( task!simpleWorker(depth,maxDepth,pool));
}
}
void main(){
auto pool = new TaskPool();
auto t = task!simpleWorker(0,5,pool);
pool.put(t);
pool.finish(true);
if (t.done()){ //rethrows the exception thrown by the thread.
writeln("Done");
}
}

I fixed it: http://dpaste.dzfl.pl/eb9e4cfc
I changed to for loop to:
void cleanNodeSimple(Node node, TaskPool pool){
node.doProcess();
foreach (cli; pool.parallel(node.clients,1)){ // using parallel to make it concurrent
if (cli.canProcess()) {
cleanNodeSimple(cli, pool);
// no explicit task creation (already handled by parallel)
}
}
}

Caching requests to reduce processing (TPL?)

I'm currently trying to reduce the number of similar requests being processed in a business layer by:
Caching the requests a method receives
Performing the slow processing task (once for all similar requests)
Return the result to each requesting method calls
Things to note, are that:
The original method calls are not currently in a async BeginMethod() / EndMethod(IAsyncResult)
The requests arrive faster than the time it takes to generate the output
I'm trying to use TPL where possible, as I am currently trying to learn more about this library
eg. Improving the following
byte[] RequestSlowOperation(string operationParameter)
{
Perform slow task here...
}
Any thoughts?
Follow up:
class SomeClass
{
private int _threadCount;
public SomeClass(int threadCount)
{
_threadCount = threadCount;
int parameter = 0;
var taskFactory = Task<int>.Factory;
for (int i = 0; i < threadCount; i++)
{
int i1 = i;
taskFactory
.StartNew(() => RequestSlowOperation(parameter))
.ContinueWith(result => Console.WriteLine("Result {0} : {1}", result.Result, i1));
}
}
private int RequestSlowOperation(int parameter)
{
Lazy<int> result2;
var result = _cacheMap.GetOrAdd(parameter, new Lazy<int>(() => RequestSlowOperation2(parameter))).Value;
//_cacheMap.TryRemove(parameter, out result2); <<<<< Thought I could remove immediately, but this causes blobby behaviour
return result;
}
static ConcurrentDictionary<int, Lazy<int>> _cacheMap = new ConcurrentDictionary<int, Lazy<int>>();
private int RequestSlowOperation2(int parameter)
{
Console.WriteLine("Evaluating");
Thread.Sleep(100);
return parameter;
}
}

Here is a fast, safe and maintainable way to do this:
static var cacheMap = new ConcurrentDictionary<string, Lazy<byte[]>>();
byte[] RequestSlowOperation(string operationParameter)
{
return cacheMap.GetOrAdd(operationParameter, () => new Lazy<byte[]>(() => RequestSlowOperation2(operationParameter))).Value;
}
byte[] RequestSlowOperation2(string operationParameter)
{
Perform slow task here...
}
This will execute RequestSlowOperation2 at most once per key. Please be aware that the memory held by the dictionary will never be released.
The user delegate passed to the ConcurrentDictionary is not executed under lock, meaning that it could execute multiple times! My solution allows multiple lazies to be created but only one of them will ever be published and materialized.
Regarding locking: this solution will take locks, but it does not matter because the work items are far more expensive than the (few) lock operations.

Honestly, the use of TPL as a technology here is not really important, this is just a straight up concurrency problem. You're trying to protect access to a shared resource (the cached data) and, to do that, the only approach is to lock. Either that or, if the cache entry does not already exist, you could allow all incoming threads to generate it and then subsequent requesters benefit from the cached value once it's stored, but there's little value in that if the resource is slow/expensive to generate and cache.
Perhaps some more details will make it clear on exactly why you're trying to accomplish this without a lock. I'll happily to revise my answer if more detail makes it clearer what you're trying to do.

How to optimize tests validating asynchronous code?

We are developing a WPF application using TDD. As we're already working on this solution for almost two years, we've written a huge bunch of tests (almost 2000 Unittests right now).
There are some classes, that need to implement functionality multithreaded and asynchronously. For example a communication-component that can both send and receive messages and parse them. The dependencies are always mocked using RhinoMocks.
Our Test-Methods targeting these classes look very similar, as following:
[TestMethod]
public void Method_Description_ExpectedResult(){
// Arrange
var myStub = MockRepository.GenerateStub<IMyStub>();
var target = new MyAsynchronousClass(myStub);
// Act
var target.Send("Foo");
Thread.Sleep(200);
//Assert
myStub.AssertWasCalled(x => x.Bar("Foo"));
}
As you can see, this test runs at least for 200 ms due to the Thread.Sleep(). We optimized the test replacing the AssertWasCalled with a active polling method, s.th. like this:
public static bool True(Func<bool> condition, int times, int waitTime)
{
for (var i = 0; i < times; i++)
{
if (condition())
return true;
Thread.Sleep(waitTime);
}
return condition();
}
We can now use this WaitFor.True(...) Method by changing the AssertWasCalled to:
var fooTriggered = false;
myStub.Stub(x => x.Bar("Foo")).Do((Action)(() => fooTriggered = true)));
WaitFor.True(() => fooTriggered, 20, 20);
Assert.IsTrue(fooTriggered);
This construct will terminate earlier if the condition matches, but anyway - this takes too long for us. Running all of our 2000 Tests need about 5 Minutes (building and running them).
Is there any smart trick how we could optimize code like this?

You can use a monitor. I'm making this up so please excuse me if it isn't quite compiling, but it'll look something like:
[TestMethod]
public void Method_Description_ExpectedResult(){
// Arrange
var waitingRoom = new object();
var myStub = MockRepository.GenerateStub<IMyStub>();
myStub.Setup(x => x.Bar("Foo")).Callback(x =>
{
Monitor.Enter(waitingRoom);
Monitor.Pulse(waitingRoom);
Monitor.Exit(waitingRoom);
}
var target = new MyAsynchronousClass(myStub);
// Act
Monitor.Enter(waitingRoom);
target.Send("Foo");
Monitor.Wait(waitingRoom);
Monitor.Exit(waitingRoom);
//Assert
myStub.AssertWasCalled(x => x.Bar("Foo"));
}
Code written within the Monitor can't run until it's free. The test will cause the acting thread to wait until Monitor.Wait has been called. Then the callback can enter and pulse the Monitor. The test then "wakes up", and once the callback has exited the monitor, it gets control back and exits too, allowing you to Assert.
The only thing I haven't covered is that if Bar("Foo") doesn't get called it will hang, so you might want to have a timer pulse the thread too.
You can create a class which does the complex monitoring bits for you if you use it a lot. This is one I wrote to deal with asynchronous checks in UI automation; adapting it for what you're doing might help you.

Task is ignoring Thread.Sleep

trying to grasp the TPL.
Just for fun I tried to create some Tasks with a random sleep to see how it was processed. I was targeting a fire and forget pattern..
static void Main(string[] args)
{
Console.WriteLine("Demonstrating a successful transaction");
Random d = new Random();
for (int i = 0; i < 10; i++)
{
var sleep = d.Next(100, 2000);
Action<int> succes = (int x) =>
{
Thread.Sleep(x);
Console.WriteLine("sleep={2}, Task={0}, Thread={1}: Begin successful transaction",
Task.CurrentId, Thread.CurrentThread.ManagedThreadId, x);
};
Task t1 = Task.Factory.StartNew(() => succes(sleep));
}
Console.ReadLine();
}
But I don't understand why it outputs all lines to the Console ignoring the Sleep(random)
Can someone explain that to me?

Important:
The TPL default TaskScheduler does not guarantee Thread per Task - one thread can be used for processing several tasks.
Calling Thread.Sleep might impact other tasks performance.
You can construct your task with the TaskCreationOptions.LongRunning hint this way the TaskScheduler will assign a dedicated thread for the task and it will be safe to block on it.

Your code uses the value of i instead of the generated random number. It does not ignore the sleep but rather sleeps between 0 and 10ms each iteration.
Try:
Thread.Sleep(sleep);

The sentence
Task t1 = Task.Factory.StartNew(() => succes(sleep));
Will create the Task and automatically start it, then will iterate again inside the for, without waiting the task to end its process. So when the second task is created and executed, the first one may be finished. I mean you are not waiting for the tasks to end:
You should try
Task t1 = Task.Factory.StartNew(() => succes(sleep));
t1.Wait();

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Parallel Task advice - c#-4.0

Bad solution are Tasks here. Task should be used for short living operations, like asynch IO. If you want to control life time of task you should use Thread and sleep as much as you like, because Thread is individual, but Tasks are rotated in thread pool which is shared.

Related

Interrupt parallel Stream execution

Using worker threads to add new tasks to a taskPool in D

Caching requests to reduce processing (TPL?)

How to optimize tests validating asynchronous code?

Task is ignoring Thread.Sleep

Categories

Resources