Caching requests to reduce processing (TPL?) - c#-4.0

I'm currently trying to reduce the number of similar requests being processed in a business layer by:
Caching the requests a method receives
Performing the slow processing task (once for all similar requests)
Return the result to each requesting method calls
Things to note, are that:
The original method calls are not currently in a async BeginMethod() / EndMethod(IAsyncResult)
The requests arrive faster than the time it takes to generate the output
I'm trying to use TPL where possible, as I am currently trying to learn more about this library
eg. Improving the following
byte[] RequestSlowOperation(string operationParameter)
Perform slow task here...
Any thoughts?
Follow up:
class SomeClass
private int _threadCount;
public SomeClass(int threadCount)
_threadCount = threadCount;
int parameter = 0;
var taskFactory = Task<int>.Factory;
for (int i = 0; i < threadCount; i++)
int i1 = i;
.StartNew(() => RequestSlowOperation(parameter))
.ContinueWith(result => Console.WriteLine("Result {0} : {1}", result.Result, i1));
private int RequestSlowOperation(int parameter)
Lazy<int> result2;
var result = _cacheMap.GetOrAdd(parameter, new Lazy<int>(() => RequestSlowOperation2(parameter))).Value;
//_cacheMap.TryRemove(parameter, out result2); <<<<< Thought I could remove immediately, but this causes blobby behaviour
return result;
static ConcurrentDictionary<int, Lazy<int>> _cacheMap = new ConcurrentDictionary<int, Lazy<int>>();
private int RequestSlowOperation2(int parameter)
return parameter;

Here is a fast, safe and maintainable way to do this:
static var cacheMap = new ConcurrentDictionary<string, Lazy<byte[]>>();
byte[] RequestSlowOperation(string operationParameter)
return cacheMap.GetOrAdd(operationParameter, () => new Lazy<byte[]>(() => RequestSlowOperation2(operationParameter))).Value;
byte[] RequestSlowOperation2(string operationParameter)
Perform slow task here...
This will execute RequestSlowOperation2 at most once per key. Please be aware that the memory held by the dictionary will never be released.
The user delegate passed to the ConcurrentDictionary is not executed under lock, meaning that it could execute multiple times! My solution allows multiple lazies to be created but only one of them will ever be published and materialized.
Regarding locking: this solution will take locks, but it does not matter because the work items are far more expensive than the (few) lock operations.

Honestly, the use of TPL as a technology here is not really important, this is just a straight up concurrency problem. You're trying to protect access to a shared resource (the cached data) and, to do that, the only approach is to lock. Either that or, if the cache entry does not already exist, you could allow all incoming threads to generate it and then subsequent requesters benefit from the cached value once it's stored, but there's little value in that if the resource is slow/expensive to generate and cache.
Perhaps some more details will make it clear on exactly why you're trying to accomplish this without a lock. I'll happily to revise my answer if more detail makes it clearer what you're trying to do.


Wrapping legacy object in IConnectableObservable

I have a legacy event-based object that seems like a perfect fit for RX: after being connected to a network source, it raises events when a message is received, and may terminate with either a single error (connection dies, etc.) or (rarely) an indication that there will be no more messages. This object also has a couple projections -- most users are interested in only a subset of the messages received, so there are alternate events raised only when well-known message subtypes show up.
So, in the process of learning more about reactive programming, I built the following wrapper:
class LegacyReactiveWrapper : IConnectableObservable<TopLevelMessage>
private LegacyType _Legacy;
private IConnectableObservable<TopLevelMessage> _Impl;
public LegacyReactiveWrapper(LegacyType t)
_Legacy = t;
var observable = Observable.Create<TopLevelMessage>((observer) =>
LegacyTopLevelMessageHandler tlmHandler = (sender, tlm) => observer.OnNext(tlm);
LegacyErrorHandler errHandler = (sender, err) => observer.OnError(new ApplicationException(err.Message));
LegacyCompleteHandler doneHandler = (sender) => observer.OnCompleted();
_Legacy.TopLevelMessage += tlmHandler;
_Legacy.Error += errHandler;
_Legacy.Complete += doneHandler;
return Disposable.Create(() =>
_Legacy.TopLevelMessage -= tlmHandler;
_Legacy.Error -= errHandler;
_Legacy.Complete -= doneHandler;
_Impl = observable.Publish();
public IDisposable Subscribe(IObserver<TopLevelMessage> observer)
return _Impl.RefCount().Subscribe(observer);
public IDisposable Connect()
return Disposable.Create(() => _Legacy.DisconnectFromMessageSource());
public IObservable<SubMessageA> MessageA
// This is the moral equivalent of the projection behavior
// that already exists in the legacy type. We don't hook
// the LegacyType.MessageA event directly.
return _Impl.RefCount()
.Where((tlm) => tlm.MessageType == MessageType.MessageA)
.Select((tlm) => tlm.SubMessageA);
public IObservable<SubMessageB> MessageB
return _Impl.RefCount()
.Where((tlm) => tlm.MessageType == MessageType.MessageB)
.Select((tlm) => tlm.SubMessageB);
Something about how it's used elsewhere feels... off... somehow, though. Here's sample usage, which works but feels strange. The UI context for my test application is WinForms, but it doesn't really matter.
// in Program.Main...
MainForm frm = new MainForm();
// Updates the UI based on a stream of SubMessageA's
IObserver<SubMessageA> uiManager = new MainFormUiManager(frm);
LegacyType lt = new LegacyType();
// ... setup lt...
var w = new LegacyReactiveWrapper(lt);
var uiUpdateSubscription = (from msgA in w.MessageA
where SomeCondition(msgA)
select msgA).ObserveOn(frm).Subscribe(uiManager);
var nonUiSubscription = (from msgB in w.MessageB
where msgB.SubType == MessageBType.SomeSubType
select msgB).Subscribe(
m => Console.WriteLine("Got MsgB: {0}", m),
ex => Console.WriteLine("MsgB error: {0}", ex.Message),
() => Console.WriteLine("MsgB complete")
IDisposable unsubscribeAtExit = null;
frm.Load += (sender, e) =>
var connectionSubscription = w.Connect();
unsubscribeAtExit = new CompositeDisposable(
frm.FormClosing += (sender, e) =>
if(unsubscribeAtExit != null) { unsubscribeAtExit.Dispose(); }
This WORKS -- The form launches, the UI updates, and when I close it the subscriptions get cleaned up and the process exits (which it won't do if the LegacyType's network connection is still connected). Strictly speaking, it's enough to dispose just connectionSubscription. However, the explicit Connect feels weird to me. Since RefCount is supposed to do that for you, I tried modifying the wrapper such that rather than using _Impl.RefCount in MessageA and MessageB and explicitly connecting later, I used this.RefCount instead and moved the calls to Subscribe to the Load handler. That had a different problem -- the second subscription triggered another call to LegacyReactiveWrapper.Connect. I thought the idea behind Publish/RefCount was "first-in triggers connection, last-out disposes connection."
I guess I have three questions:
Do I fundamentally misunderstand Publish/RefCount?
Is there some preferred way to implement IConnectableObservable<T> that doesn't involve delegation to one obtained via IObservable<T>.Publish? I know you're not supposed to implement IObservable<T> yourself, but I don't understand how to inject connection logic into the IConnectableObservable<T> that Observable.Create().Publish() gives you. Is Connect supposed to be idempotent?
Would someone more familiar with RX/reactive programming look at the sample for how the wrapper is used and say "that's ugly and broken" or is this not as weird as it seems?
I'm not sure that you need to expose Connect directly as you have. I would simplify as follows, using Publish().RefCount() as an encapsulated implementation detail; it would cause the legacy connection to be made only as required. Here the first subscriber in causes connection, and the last one out causes disconnection. Also note this correctly shares a single RefCount across all subscribers, whereas your implementation uses a RefCount per message type, which isn't probably what was intended. Users are not required to Connect explicitly:
public class LegacyReactiveWrapper
private IObservable<TopLevelMessage> _legacyRx;
public LegacyReactiveWrapper(LegacyType legacy)
_legacyRx = WrapLegacy(legacy).Publish().RefCount();
private static IObservable<TopLevelMessage> WrapLegacy(LegacyType legacy)
return Observable.Create<TopLevelMessage>(observer =>
LegacyTopLevelMessageHandler tlmHandler = (sender, tlm) => observer.OnNext(tlm);
LegacyErrorHandler errHandler = (sender, err) => observer.OnError(new ApplicationException(err.Message));
LegacyCompleteHandler doneHandler = sender => observer.OnCompleted();
legacy.TopLevelMessage += tlmHandler;
legacy.Error += errHandler;
legacy.Complete += doneHandler;
return Disposable.Create(() =>
legacy.TopLevelMessage -= tlmHandler;
legacy.Error -= errHandler;
legacy.Complete -= doneHandler;
public IObservable<TopLevelMessage> TopLevelMessage
return _legacyRx;
public IObservable<SubMessageA> MessageA
return _legacyRx.Where(tlm => tlm.MessageType == MessageType.MessageA)
.Select(tlm => tlm.SubMessageA);
public IObservable<SubMessageB> MessageB
return _legacyRx.Where(tlm => tlm.MessageType == MessageType.MessageB)
.Select(tlm => tlm.SubMessageB);
An additional observation is that Publish().RefCount() will drop the underlying subscription when it's subscriber count reaches 0. Typically I only use Connect over this choice when I need to maintain a subscription even when the subscriber count on the published source drops to zero (and may go back up again later). It's rare to need to do this though - only when connecting is more expensive than holding on to the subscription resource when you might not need to.
Your understanding is not entirely wrong, but you do appear to have some points of misunderstanding.
You seem to be under the belief that multiple calls to RefCount on the same source IObservable will result in a shared reference count. They do not; each instance keeps its own count. As such, you are causing multiple subscriptions to _Impl, one per call to subscribe or call to the Message properties.
You also may be expecting that making _Impl an IConnectableObservable somehow causes your Connect method to be called (since you seem surprised you needed to call Connect in your consuming code). All Publish does is cause subscribers to the published object (returned from the .Publish() call) to share a single subscription to the underlying source observable (in this case, the object made from your call to Observable.Create).
Typically, I see Publish and RefCount used immediately together (eg as source.Publish().RefCount()) to get the shared subscription effect described above or to make a cold observable hot without needing to call Connect to start the subscription to the original source. However, this relies on using the same object returned from the .Publish().RefCount() for all subscribers (as noted above).
Your implementation of Connect seems reasonable. I don't know of any recommendations for if Connect should be idempotent, but I would not personally expect it to be. If you wanted it to be, you would just need to track calls to it the disposal of its return value to get the right balance.
I don't think you need to use Publish the way you are, unless there is some reason to avoid multiple event handlers being attached to the legacy object. If you do need to avoid that, I would recommend changing _Impl to a plain IObservable and follow the Publish with a RefCount.
Your MessageA and MessageB properties have potential to be a source of confusion for users, since they return an IObservable, but still require a call to Connect on the base object to start receiving messages. I would either change them to IConnectableObservables that somehow delegate to the original Connect (at which point the idempotency discussion becomes more relevant) or drop the properties and just let the users make the (fairly simple) projections themselves when needed.

Code coverage for async methods

When I analyse code coverage in Visual Studio 2012, any of the await lines in async methods are showing as not covered even though they are obviously executing since my tests are passing. The code coverage report says that the uncovered method is MoveNext, which is not present in my code (perhaps it's compiler-generated).
Is there a way to fix code coverage reporting for async methods?
I just ran coverage using NCover, and the coverage numbers make a lot more sense using that tool. As a workaround for now, I'll be switching to that.
This can happen most commonly if the operation you're awaiting is completed before it's awaited.
I recommend you test at least synchronous and asynchronous success situations, but it's also a good idea to test synchronous and asynchronous errors and cancellations.
The reason the code is not shown as being covered has to do with how async methods are implemented. The C# compiler actually translates the code in async methods into a class that implements a state machine, and transforms the original method into a stub that initialized and invokes that state machine. Since this code is generated in your assembly, it is included in the code coverage analysis.
If you use a task that is not complete at the time the code being covered is executing, the compiler-generated state machine hooks up a completion callback to resume when the task completes. This more completely exercises the state machine code, and results in complete code coverage (at least for statement-level code coverage tools).
A common way to get a task that is not complete at the moment, but will complete at some point is to use Task.Delay in your unit test. However, that is generally a poor option because the time delay is either too small (and results in unpredictable code coverage because sometimes the task is complete before the code being tests runs) or too large (unnecessarily slowing the tests down).
A better option is to use "await Task.Yield()". This will return immediately but invoke the continuation as soon as it is set.
Another option - though somewhat absurd - is to implement your own awaitable pattern that has the semantics of reporting incomplete until a continuation callback is hooked up, and then to immediately complete. This basically forces the state machine into the async path, providing the complete coverage.
To be sure, this is not a perfect solution. The most unfortunate aspect is that it requires modification to production code to address a limitation of a tool. I would much prefer that the code coverage tool ignore the portions of the async state machine that are generated by the compiler. But until that happens, there aren’t many options if you really want to try to get complete code coverage.
A more complete explanation of this hack can be found here:
There are situations where I don't care about testing the async nature of a method but just want to get rid of the partial code coverage. I use below extension method to avoid this and it works just fine for me.
Warning "Thread.Sleep" used here!
public static IReturnsResult<TClass> ReturnsAsyncDelayed<TClass, TResponse>(this ISetup<TClass, Task<TResponse>> setup, TResponse value) where TClass : class
var completionSource = new TaskCompletionSource<TResponse>();
Task.Run(() => { Thread.Sleep(200); completionSource.SetResult(value); });
return setup.Returns(completionSource.Task);
and the usage is similar to the Moq's ReturnsAsync Setup.
_sampleMock.Setup(s => s.SampleMethodAsync()).ReturnsAsyncDelayed(response);
I created a test runner that runs a block of code multiple times and varies the task that is delayed using a factory. This is great for testing the different paths through simple blocks of code. For more complex paths you may want to create a test per path.
public async Task ShouldTestAsync()
await AsyncTestRunner.RunTest(async taskFactory =>
this.apiRestClient.GetAsync<List<Item1>>(NullString).ReturnsForAnyArgs(taskFactory.Result(new List<Item1>()));
this.apiRestClient.GetAsync<List<Item2>>(NullString).ReturnsForAnyArgs(taskFactory.Result(new List<Item2>()));
var items = await this.apiController.GetAsync();
Assert.AreEqual(0, items.Count(), "Zero items should be returned.");
public static class AsyncTestRunner
public static async Task RunTest(Func<ITestTaskFactory, Task> test)
var testTaskFactory = new TestTaskFactory();
while (testTaskFactory.NextTestRun())
await test(testTaskFactory);
public class TestTaskFactory : ITestTaskFactory
public TestTaskFactory()
this.firstRun = true;
this.totalTasks = 0;
this.currentTestRun = -1; // Start at -1 so it will go to 0 for first run.
this.currentTaskNumber = 0;
public bool NextTestRun()
// Use final task number as total tasks.
this.totalTasks = this.currentTaskNumber;
// Always return has next as turn for for first run, and when we have not yet delayed all tasks.
// We need one more test run that tasks for if they all run sync.
var hasNext = this.firstRun || this.currentTestRun <= this.totalTasks;
// Go to next run so we know what task should be delayed,
// and then reset the current task number so we start over.
this.currentTaskNumber = 0;
this.firstRun = false;
return hasNext;
public async Task<T> Result<T>(T value, int delayInMilliseconds = DefaultDelay)
if (this.TaskShouldBeDelayed())
await Task.Delay(delayInMilliseconds);
return value;
private bool TaskShouldBeDelayed()
var result = this.currentTaskNumber == this.currentTestRun - 1;
return result;
public async Task VoidResult(int delayInMilliseconds = DefaultDelay)
// If the task number we are on matches the test run,
// make it delayed so we can cycle through them.
// Otherwise this task will be complete when it is reached.
if (this.TaskShouldBeDelayed())
await Task.Delay(delayInMilliseconds);
public async Task<T> FromResult<T>(T value, int delayInMilliseconds = DefaultDelay)
if (this.TaskShouldBeDelayed())
await Task.Delay(delayInMilliseconds);
return value;

Consumer/Producer with order and constraint on consumed items

I have the following scenario
I am writing a server that process files (jobs)
a file has a "prefix" and a time
the files should be processed according to time (older file first) but also take into account the prefix (files with same prefix can't be processed concurrently)
I have a thread (Task with Timer) that watches over a directory and adds files to a "queue" (producer)
I have several consumers that take the file from "queue" (consumer) - they should conform to the above rules.
the job of each task is kept in some list (this indicates the constraints)
There are several consumers, the number of consumers is determined at startup.
One of the requirement is to be able to gracefully stop the consumers (either immediately or let ongoing processes to finish).
I did something along this line:
while (processing)
//limits number of concurrent tasks
//Take next job when available or wait for cancel signal
currentwork = workQueue.Take(taskCancellationToken);
//check that it can actually process this work
if (CanProcess(currnetWork)
var task = CreateTask(currentwork)
task.ContinueWith((t) => { //release processing slot });
//release slot, return job? something else?
The cancellation tokens sources are in the caller code and can be cancelled. There are two in order to be able to stop queuing while not cancelling running tasks.
I tired to implement the "queue" as BlockingCollection wrapping a "safe" SortedSet. The general idea work (ordering by time) except the case in which I need to find a new job that matches the constraint. If I return the job to the queue and try to take again I will get the same one.
It is possible to take jobs from the queue until I find a proper one and then returning the "illegal" jobs back but this may cause issues with other consumers processing out of order jobs
Another alternative is to pass a simple collection and a way to lock it and just lock and do a simple search according to current constraints. Again, this means writing code that will possibly not be thread-safe.
Any other suggestion / pointers / data structures that can help?
I think Hans is right: if you already have a thread-safe SortedSet (that implements IProducerConsumerCollection, so it can be used in BlockingCollection), then all you need is to put only files that can be processed right now into the collection. If you finish a file which makes another file available for processing, add the other file to the collection at this point, not earlier.
I would have implemented your requirement(s) with TPL Dataflow. Look at the way you could implement the Producer-Consumer pattern with it. I believe this will meet all the requirements you have (including cancellation on the consumers).
EDIT (for those that do not like to read documentation, but who does...)
Here is an example of how you could implement the requirements with TPL Dataflow. The beauty of this implementation is that consumers are not bound to a single thread and only uses a pool thread when it needs to process data.
static void Main(string[] args)
BufferBlock<string> source = new BufferBlock<string>();
var cancellation = new CancellationTokenSource();
LinkConsumer(source, "A", cancellation.Token);
LinkConsumer(source, "B", cancellation.Token);
LinkConsumer(source, "C", cancellation.Token);
// Link an action that will process source values that are not processed by other
source.LinkTo(new ActionBlock<string>((s) => Console.WriteLine("Default action")));
while (cancellation.IsCancellationRequested == false)
ConsoleKey key = Console.ReadKey(true).Key;
switch (key)
case ConsoleKey.Escape:
Console.WriteLine("Posted value {0} on thread {1}.", key, Thread.CurrentThread.ManagedThreadId);
private static void LinkConsumer(ISourceBlock<string> source, string prefix, CancellationToken token)
// Link a consumer that will buffer and process all input of the specified prefix
var consumer = new ActionBlock<string>(new Action<string>(Process), new ExecutionDataflowBlockOptions() { MaxDegreeOfParallelism = 1, SingleProducerConstrained = true, CancellationToken = token, TaskScheduler = TaskScheduler.Default });
var linkDisposable = source.LinkTo(consumer, (p) => p == prefix);
// Dispose the link (remove the link) when cancellation is requested.
private static void Process(string arg)
Console.WriteLine("Processed value {0} in thread {1}", arg, Thread.CurrentThread.ManagedThreadId);
// Simulate work

How to optimize tests validating asynchronous code?

We are developing a WPF application using TDD. As we're already working on this solution for almost two years, we've written a huge bunch of tests (almost 2000 Unittests right now).
There are some classes, that need to implement functionality multithreaded and asynchronously. For example a communication-component that can both send and receive messages and parse them. The dependencies are always mocked using RhinoMocks.
Our Test-Methods targeting these classes look very similar, as following:
public void Method_Description_ExpectedResult(){
// Arrange
var myStub = MockRepository.GenerateStub<IMyStub>();
var target = new MyAsynchronousClass(myStub);
// Act
var target.Send("Foo");
myStub.AssertWasCalled(x => x.Bar("Foo"));
As you can see, this test runs at least for 200 ms due to the Thread.Sleep(). We optimized the test replacing the AssertWasCalled with a active polling method, like this:
public static bool True(Func<bool> condition, int times, int waitTime)
for (var i = 0; i < times; i++)
if (condition())
return true;
return condition();
We can now use this WaitFor.True(...) Method by changing the AssertWasCalled to:
var fooTriggered = false;
myStub.Stub(x => x.Bar("Foo")).Do((Action)(() => fooTriggered = true)));
WaitFor.True(() => fooTriggered, 20, 20);
This construct will terminate earlier if the condition matches, but anyway - this takes too long for us. Running all of our 2000 Tests need about 5 Minutes (building and running them).
Is there any smart trick how we could optimize code like this?
You can use a monitor. I'm making this up so please excuse me if it isn't quite compiling, but it'll look something like:
public void Method_Description_ExpectedResult(){
// Arrange
var waitingRoom = new object();
var myStub = MockRepository.GenerateStub<IMyStub>();
myStub.Setup(x => x.Bar("Foo")).Callback(x =>
var target = new MyAsynchronousClass(myStub);
// Act
myStub.AssertWasCalled(x => x.Bar("Foo"));
Code written within the Monitor can't run until it's free. The test will cause the acting thread to wait until Monitor.Wait has been called. Then the callback can enter and pulse the Monitor. The test then "wakes up", and once the callback has exited the monitor, it gets control back and exits too, allowing you to Assert.
The only thing I haven't covered is that if Bar("Foo") doesn't get called it will hang, so you might want to have a timer pulse the thread too.
You can create a class which does the complex monitoring bits for you if you use it a lot. This is one I wrote to deal with asynchronous checks in UI automation; adapting it for what you're doing might help you.

Parallel Task advice

I am trying to use the parallel task library to kick off a number of tasks like this:
var workTasks = _schedules.Where(x => x.Task.Enabled);
_tasks = new Task[workTasks.Count()];
_cancellationTokenSource = new CancellationTokenSource();
int i = 0;
foreach (var schedule in _schedules.Where(x => x.Task.Enabled))
_log.InfoFormat("Reading task information for task {0}", schedule.Task.Name);
_log.InfoFormat("task {0} disabled.", schedule.Task.Name);
schedule.Task.ServiceStarted = true;
_tasks[i] = Task.Factory.StartNew(() =>
, _cancellationTokenSource.Token);
_log.InfoFormat("task {0} has been added to the worker threads and has been started.", schedule.Task.Name);
I want these tasks to sleep and then wake up every 5 minutes and do their stuff, at the moment I am using Thread.Sleep in the Schedule object whose Run method is the Action that is passed into StartNew as an argument like this:
_tasks[i] = Task.Factory.StartNew(() =>
, _cancellationTokenSource.Token);
I read somewhere that Thread.Sleep is a bad solution for this. Can anyone recommend a better approach?
By my understanding, Thread.Sleep is bad generally, because it force-shifts everything out of memory even when that's not necessary. It won't be a big deal in most cases, but it could be a performance issue.
I'm in the habit of using this snippet instead:
new System.Threading.EventWaitHandle(false, EventResetMode.ManualReset).WaitOne(1000);
Fits on one line, and isn't overly complicated -- it creates an event handle that will never be set, and then waits for the full timeout period before continuing.
Anyway, if you're just trying to have something repeat every 5 minutes, a better approach would probably be to use a Timer. You could even make a class to neatly wrap everything if your repeated work methods are already factored out:
using System.Threading;
using System.Threading.Tasks;
public class WorkRepeater
Timer m_Timer;
WorkRepeater(Action workToRepeat, TimeSpan interval)
m_Timer = new System.Timers.Timer((double)Interval.Milliseconds);
m_Timer.Elapsed +=
new System.Timers.ElapsedEventHandler((o, ea) => WorkToRepeat());
public void Start()
public void Stop()
Bad solution are Tasks here. Task should be used for short living operations, like asynch IO. If you want to control life time of task you should use Thread and sleep as much as you like, because Thread is individual, but Tasks are rotated in thread pool which is shared.
