Tune Redis connection inside an Azure Function to prevent timeouts - azure

TL;DR
How does one amend the min number of threads for redis within an Azure Function?
Problem
I have an Azure Function that uses redis (via StackExchange.Redis package) to cache some values, or retrieve the existing value if already exists. I'm currently getting timeout issues that look to be because the Busy IOCP threads exceeds the Min IOCP thread value.
2016-09-08T11:52:44.492 Exception while executing function: Functions.blobtoeventhub. mscorlib: Exception has been thrown by the target of an invocation. StackExchange.Redis: Timeout performing SETNX 586:tag:NULL, inst: 1, mgr: Inactive, err: never, queue: 4, qu: 0, qs: 4, qc: 0, wr: 0, wq: 0, in: 260, ar: 0, clientName: RD00155D3AE265, IOCP: (Busy=8,Free=992,Min=2,Max=1000), WORKER: (Busy=7,Free=32760,Min=2,Max=32767), Local-CPU: unavailable (Please take a look at this article for some common client-side issues that can cause timeouts: https://github.com/StackExchange/StackExchange.Redis/tree/master/Docs/Timeouts.md).
According to the docs on timeouts, the resolution involves adjusting the MinThread count:
How to configure this setting:
In ASP.NET, use the "minIoThreads" configuration setting under the configuration element in machine.config. If you are running inside of Azure WebSites, this setting is not exposed through the configuration options. You should be able to set this programmatically (see below) from your Application_Start method in global.asax.cs.
Important Note: the value specified in this configuration element is a per-core setting. For example, if you have a 4 core machine and want your minIOThreads setting to be 200 at runtime, you would use .
Outside of ASP.NET, use the ThreadPool.SetMinThreads(…) API.
In an Azure Function a global.asax.cs file is not available, and the use of ThreadPool.SetMinThreads has little information associated with it I can parse!
There is a similar question on webjobs that is unanswered.
My specifics
Redis = Azure Redis Cache Standard 1Gb
Azure Function = version 0.5
StackExchange.Redis = version 1.1.603
Redis code is in a separate file to main function.
using StackExchange.Redis;
using System.Text;
private static Lazy<ConnectionMultiplexer> lazyConnection = new Lazy<ConnectionMultiplexer>(() =>
{
string redisCacheName = System.Environment.GetEnvironmentVariable("rediscachename", EnvironmentVariableTarget.Process).ToString();;
string redisCachePassword = System.Environment.GetEnvironmentVariable("rediscachepassword", EnvironmentVariableTarget.Process).ToString();;
return ConnectionMultiplexer.Connect(redisCacheName + ",abortConnect=false,ssl=true,password=" + redisCachePassword);
});
public static ConnectionMultiplexer Connection
{
get
{
return lazyConnection.Value;
}
}
static string depersonalise_value(string input, string field, int account_id)
{
IDatabase cache = Connection.GetDatabase();
string depersvalue = $"{account_id}:{field}:{input}";
string value = $"{account_id}{Guid.NewGuid()}";
bool created = cache.StringSet(depersvalue, value, when: When.NotExists);
string retur = created? value : cache.StringGet(depersvalue).ToString();
return (retur);
}

Ultimately, we needed to pursue #mathewc's answer and added a line into the connection multiplexer code to set the min threads to 500
readonly static Lazy<ConnectionMultiplexer> lazyConnection =
new Lazy<ConnectionMultiplexer>(() =>
{
ThreadPool.SetMinThreads(500, 500);
Additionally, further tuning was required and the code enhanced via a SO code review. The main importance here, is the drastically upping of timeouts.
using StackExchange.Redis;
using System.Text;
using System.Threading;
readonly static Lazy<ConnectionMultiplexer> lazyConnection =
new Lazy<ConnectionMultiplexer>(() =>
{
ThreadPool.SetMinThreads(500, 500);
string redisCacheName = System.Environment.GetEnvironmentVariable("rediscache_name", EnvironmentVariableTarget.Process).ToString();
string redisCachePassword = System.Environment.GetEnvironmentVariable("rediscache_password", EnvironmentVariableTarget.Process).ToString();
return ConnectionMultiplexer.Connect(new ConfigurationOptions
{
AbortOnConnectFail = false,
Ssl = true,
ConnectRetry = 3,
ConnectTimeout = 5000,
SyncTimeout = 5000,
DefaultDatabase = 0,
EndPoints = { { redisCacheName, 0 } },
Password = redisCachePassword
});
});
public static ConnectionMultiplexer Connection => lazyConnection.Value;
static string depersonalise_value(string input, string field, int account_id)
{
IDatabase cache = Connection.GetDatabase();
string depersvalue = $"{account_id}:{field}:{input}";
string existingguid = (string)cache.StringGet(depersvalue);
if (String.IsNullOrEmpty(existingguid)){
string value = $"{account_id}{Guid.NewGuid()}";
cache.StringSet(depersvalue, value);
return value;
}
return existingguid;
}

We don't have a good way for you to perform app level initialization like this currently. This is tracked by an issue in our repo here.
For now, your only real workaround would be for you to put this init code into a shared helper that you invoke at the beginning of your function. The shared init method should have logic in it such that it is only performed once.

Related

singleton in azure queuetrigger not working as expected

My understanding of this is obviously wrong, any clarification would be helpful.
I thought that adding [Singleton] to a web job would force it to run one after another.
This does not seem to be the case.
This is my very basic test code (against a queue with about 149 messages)
[Singleton] //just run one at a time
public static void ProcessQueueMessage([QueueTrigger("datatrac-stops-to-update")] string message, TextWriter log)
{
monitorEntities mDb = new monitorEntities();
//go get the record
int recordToGet = Convert.ToInt32(message);
var record = (from r in mDb.To_Process where r.Id == recordToGet select r).FirstOrDefault();
record.status = 5;
mDb.SaveChanges();
Console.WriteLine($"Finished record {message}");
}
When it runs I get this on the console:
and as I step though it I am getting conflict errors.
What am I not understanding?
RESOLVED - MORE INFO
Here is what I did to address this, like Haitham said in his answer [Singleton] refers to how many instances of the webjob itself is running -- not how many items are processed per instance.
That was addressed by modifying my Main like:
static void Main(string[] args)
{
var config = new JobHostConfiguration();
config.Queues.BatchSize = 2;
Which when set to 1 only ran 1 at a time.
When set to 2 like above then modifying the below code:
public static void ProcessQueueMessage([QueueTrigger("datatrac-stops-to-update")] string message, TextWriter log)
{
var threadID = Thread.CurrentThread.ManagedThreadId;
Console.WriteLine($"{threadID} : started record {message}");
Produces this behavior (which is what was expected):
Link where I found documentation on above:
https://github.com/Azure/azure-webjobs-sdk/wiki/Queues#config
Singleton does not mean it will run it one after another but mainly about instantiation the instance for the web job class.
If you need to run just one at a time, you can use locks on a static variable to prevent the code to execute more than one time.
But I would not recommend that anyway and you have to see why there are conflict errors

Redis Connections May Not be Closing with c#

I'm connecting to Azure Redis and they show me the number of open connections to my redis server. I've got the following c# code that encloses all my Redis sets and gets. Should this be leaking connections?
using (var connectionMultiplexer = ConnectionMultiplexer.Connect(connectionString))
{
lock (Locker)
{
redis = connectionMultiplexer.GetDatabase();
}
var o = CacheSerializer.Deserialize<T>(redis.StringGet(cacheKeyName));
if (o != null)
{
return o;
}
lock (Locker)
{
// get lock but release if it takes more than 60 seconds to complete to avoid deadlock if this app crashes before release
//using (redis.AcquireLock(cacheKeyName + "-lock", TimeSpan.FromSeconds(60)))
var lockKey = cacheKeyName + "-lock";
if (redis.LockTake(lockKey, Environment.MachineName, TimeSpan.FromSeconds(10)))
{
try
{
o = CacheSerializer.Deserialize<T>(redis.StringGet(cacheKeyName));
if (o == null)
{
o = func();
redis.StringSet(cacheKeyName, CacheSerializer.Serialize(o),
TimeSpan.FromSeconds(cacheTimeOutSeconds));
}
redis.LockRelease(lockKey, Environment.MachineName);
return o;
}
finally
{
redis.LockRelease(lockKey, Environment.MachineName);
}
}
return o;
}
}
}
You can keep connectionMultiplexer in a static variable and not create it for every get/set. That will keep one connection to Redis always opening and proceed your operations faster.
Update:
Please, have a look at StackExchange.Redis basic usage:
https://github.com/StackExchange/StackExchange.Redis/blob/master/Docs/Basics.md
"Note that ConnectionMultiplexer implements IDisposable and can be disposed when no longer required, but I am deliberately not showing using statement usage, because it is exceptionally rare that you would want to use a ConnectionMultiplexer briefly, as the idea is to re-use this object."
It works nice for me, keeping single connection to Azure Redis (sometimes, create 2 connections, but this by design). Hope it will help you.
I was suggesting try using Close (or CloseAsync) method explicitly. In a test setting you may be using different connections for different test cases and not want to share a single multiplexer. A search for public code using Redis client shows a pattern of Close followed by Dispose calls.
Noting in the XML method documentation of Redis client that close method is described as doing more:
//
// Summary:
// Close all connections and release all resources associated with this object
//
// Parameters:
// allowCommandsToComplete:
// Whether to allow all in-queue commands to complete first.
public void Close(bool allowCommandsToComplete = true);
//
// Summary:
// Close all connections and release all resources associated with this object
//
// Parameters:
// allowCommandsToComplete:
// Whether to allow all in-queue commands to complete first.
[AsyncStateMachine(typeof(<CloseAsync>d__183))]
public Task CloseAsync(bool allowCommandsToComplete = true);
...
//
// Summary:
// Release all resources associated with this object
public void Dispose();
And then I looked up the code for the client, found it here:
https://github.com/StackExchange/StackExchange.Redis/blob/master/src/StackExchange.Redis/ConnectionMultiplexer.cs
And we can see Dispose method calling Close (not the usual override-able protected Dispose(bool)), further more with the wait for connections to close set to true. It appears to be an atypical dispose pattern implementation in that by trying all the closure and waiting on them it is chancing to run into exception while Dispose method contract is supposed to never throw one.

StackExchange.Redis on Azure is throwing timeout performing get and no connection available exceptions

I recently switched an MVC application that serves data feeds and dynamically generated images (6k rpm throughput) from the v3.9.67 ServiceStack.Redis client to the latest StackExchange.Redis client (v1.0.450) and I'm seeing some slower performance and some new exceptions.
Our Redis instance is S4 level (13GB), CPU shows a fairly constant 45% or so and network bandwidth appears fairly low. I'm not entirely sure how to interpret the gets/sets graph in our Azure portal, but it shows us around 1M gets and 100k sets (appears that this may be in 5 minute increments).
The client library switch was straightforward and we are still using the v3.9 ServiceStack JSON serializer so that the client lib was the only piece changing.
Our external monitoring with New Relic shows clearly that our average response time increases from about 200ms to about 280ms between ServiceStack and StackExchange libraries (StackExchange being slower) with no other change.
We recorded a number of exceptions with messages along the lines of:
Timeout performing GET feed-channels:ag177kxj_egeo-_nek0cew, inst: 12, mgr: Inactive, queue: 30, qu=0, qs=30, qc=0, wr=0/0, in=0/0
I understand this to mean that there are a number of commands in the queue that have been sent but no response available from Redis, and that this can be caused by long running commands that exceed the timeout. These errors appeared for a period when our sql database behind one of our data services was getting backed up, so perhaps that was the cause? After scaling out that database to reduce load we haven't seen very many more of this error, but the DB query should be happening in .Net and I don't see how that would hold up a redis command or connection.
We also recorded a large batch of errors this morning over a short period (couple of minutes) with messages like:
No connection is available to service this operation: SETEX feed-channels:vleggqikrugmxeprwhwc2a:last-retry
We were used to transient connection errors with the ServiceStack library, and those exception messages were usually like this:
Unable to Connect: sPort: 63980
I'm under the impression that SE.Redis should be retrying connections and commands in the background for me. Do I still need to be wrapping our calls through SE.Redis in a retry policy of my own? Perhaps different timeout values would be more appropriate (though I'm not sure what values to use)?
Our redis connection string sets these parameters: abortConnect=false,syncTimeout=2000,ssl=true. We use a singleton instance of ConnectionMultiplexer and transient instances of IDatabase.
The vast majority of our Redis use goes through a Cache class, and the important bits of the implementation are below, in case we're doing something silly that's causing us problems.
Our keys are generally 10-30 or so character strings. Values are largely scalar or reasonably small serialized object sets (hundred bytes to a few kB generally), though we do also store jpg images in the cache so a large chunk of the data is from a couple hundred kB to a couple MB.
Perhaps I should be using different multiplexers for small and large values, probably with longer timeouts for larger values? Or couple/few multiplexers in case one is stalled?
public class Cache : ICache
{
private readonly IDatabase _redis;
public Cache(IDatabase redis)
{
_redis = redis;
}
// storing this placeholder value allows us to distinguish between a stored null and a non-existent key
// while only making a single call to redis. see Exists method.
static readonly string NULL_PLACEHOLDER = "$NULL_VALUE$";
// this is a dictionary of https://github.com/StephenCleary/AsyncEx/wiki/AsyncLock
private static readonly ILockCache _locks = new LockCache();
public T GetOrSet<T>(string key, TimeSpan cacheDuration, Func<T> refresh) {
T val;
if (!Exists(key, out val)) {
using (_locks[key].Lock()) {
if (!Exists(key, out val)) {
val = refresh();
Set(key, val, cacheDuration);
}
}
}
return val;
}
private bool Exists<T>(string key, out T value) {
value = default(T);
var redisValue = _redis.StringGet(key);
if (redisValue.IsNull)
return false;
if (redisValue == NULL_PLACEHOLDER)
return true;
value = typeof(T) == typeof(byte[])
? (T)(object)(byte[])redisValue
: JsonSerializer.DeserializeFromString<T>(redisValue);
return true;
}
public void Set<T>(string key, T value, TimeSpan cacheDuration)
{
if (value.IsDefaultForType())
_redis.StringSet(key, NULL_PLACEHOLDER, cacheDuration);
else if (typeof (T) == typeof (byte[]))
_redis.StringSet(key, (byte[])(object)value, cacheDuration);
else
_redis.StringSet(key, JsonSerializer.SerializeToString(value), cacheDuration);
}
public async Task<T> GetOrSetAsync<T>(string key, Func<T, TimeSpan> getSoftExpire, TimeSpan additionalHardExpire, TimeSpan retryInterval, Func<Task<T>> refreshAsync) {
var softExpireKey = key + ":soft-expire";
var lastRetryKey = key + ":last-retry";
T val;
if (ShouldReturnNow(key, softExpireKey, lastRetryKey, retryInterval, out val))
return val;
using (await _locks[key].LockAsync()) {
if (ShouldReturnNow(key, softExpireKey, lastRetryKey, retryInterval, out val))
return val;
Set(lastRetryKey, DateTime.UtcNow, additionalHardExpire);
try {
var newVal = await refreshAsync();
var softExpire = getSoftExpire(newVal);
var hardExpire = softExpire + additionalHardExpire;
if (softExpire > TimeSpan.Zero) {
Set(key, newVal, hardExpire);
Set(softExpireKey, DateTime.UtcNow + softExpire, hardExpire);
}
val = newVal;
}
catch (Exception ex) {
if (val == null)
throw;
}
}
return val;
}
private bool ShouldReturnNow<T>(string valKey, string softExpireKey, string lastRetryKey, TimeSpan retryInterval, out T val) {
if (!Exists(valKey, out val))
return false;
var softExpireDate = Get<DateTime?>(softExpireKey);
if (softExpireDate == null)
return true;
// value is in the cache and not yet soft-expired
if (softExpireDate.Value >= DateTime.UtcNow)
return true;
var lastRetryDate = Get<DateTime?>(lastRetryKey);
// value is in the cache, it has soft-expired, but it's too soon to try again
if (lastRetryDate != null && DateTime.UtcNow - lastRetryDate.Value < retryInterval) {
return true;
}
return false;
}
}
A few recommendations.
- You can use different multiplexers with different timeout values for different types of keys/values
http://azure.microsoft.com/en-us/documentation/articles/cache-faq/
- Make sure you are not network bound on the client and server. if you are on the server then move to a higher SKU which has more bandwidth
Please read this post for more details
http://azure.microsoft.com/blog/2015/02/10/investigating-timeout-exceptions-in-stackexchange-redis-for-azure-redis-cache/

Wrapping legacy object in IConnectableObservable

I have a legacy event-based object that seems like a perfect fit for RX: after being connected to a network source, it raises events when a message is received, and may terminate with either a single error (connection dies, etc.) or (rarely) an indication that there will be no more messages. This object also has a couple projections -- most users are interested in only a subset of the messages received, so there are alternate events raised only when well-known message subtypes show up.
So, in the process of learning more about reactive programming, I built the following wrapper:
class LegacyReactiveWrapper : IConnectableObservable<TopLevelMessage>
{
private LegacyType _Legacy;
private IConnectableObservable<TopLevelMessage> _Impl;
public LegacyReactiveWrapper(LegacyType t)
{
_Legacy = t;
var observable = Observable.Create<TopLevelMessage>((observer) =>
{
LegacyTopLevelMessageHandler tlmHandler = (sender, tlm) => observer.OnNext(tlm);
LegacyErrorHandler errHandler = (sender, err) => observer.OnError(new ApplicationException(err.Message));
LegacyCompleteHandler doneHandler = (sender) => observer.OnCompleted();
_Legacy.TopLevelMessage += tlmHandler;
_Legacy.Error += errHandler;
_Legacy.Complete += doneHandler;
return Disposable.Create(() =>
{
_Legacy.TopLevelMessage -= tlmHandler;
_Legacy.Error -= errHandler;
_Legacy.Complete -= doneHandler;
});
});
_Impl = observable.Publish();
}
public IDisposable Subscribe(IObserver<TopLevelMessage> observer)
{
return _Impl.RefCount().Subscribe(observer);
}
public IDisposable Connect()
{
_Legacy.ConnectToMessageSource();
return Disposable.Create(() => _Legacy.DisconnectFromMessageSource());
}
public IObservable<SubMessageA> MessageA
{
get
{
// This is the moral equivalent of the projection behavior
// that already exists in the legacy type. We don't hook
// the LegacyType.MessageA event directly.
return _Impl.RefCount()
.Where((tlm) => tlm.MessageType == MessageType.MessageA)
.Select((tlm) => tlm.SubMessageA);
}
}
public IObservable<SubMessageB> MessageB
{
get
{
return _Impl.RefCount()
.Where((tlm) => tlm.MessageType == MessageType.MessageB)
.Select((tlm) => tlm.SubMessageB);
}
}
}
Something about how it's used elsewhere feels... off... somehow, though. Here's sample usage, which works but feels strange. The UI context for my test application is WinForms, but it doesn't really matter.
// in Program.Main...
MainForm frm = new MainForm();
// Updates the UI based on a stream of SubMessageA's
IObserver<SubMessageA> uiManager = new MainFormUiManager(frm);
LegacyType lt = new LegacyType();
// ... setup lt...
var w = new LegacyReactiveWrapper(lt);
var uiUpdateSubscription = (from msgA in w.MessageA
where SomeCondition(msgA)
select msgA).ObserveOn(frm).Subscribe(uiManager);
var nonUiSubscription = (from msgB in w.MessageB
where msgB.SubType == MessageBType.SomeSubType
select msgB).Subscribe(
m => Console.WriteLine("Got MsgB: {0}", m),
ex => Console.WriteLine("MsgB error: {0}", ex.Message),
() => Console.WriteLine("MsgB complete")
);
IDisposable unsubscribeAtExit = null;
frm.Load += (sender, e) =>
{
var connectionSubscription = w.Connect();
unsubscribeAtExit = new CompositeDisposable(
uiUpdateSubscription,
nonUiSubscription,
connectionSubscription);
};
frm.FormClosing += (sender, e) =>
{
if(unsubscribeAtExit != null) { unsubscribeAtExit.Dispose(); }
};
Application.Run(frm);
This WORKS -- The form launches, the UI updates, and when I close it the subscriptions get cleaned up and the process exits (which it won't do if the LegacyType's network connection is still connected). Strictly speaking, it's enough to dispose just connectionSubscription. However, the explicit Connect feels weird to me. Since RefCount is supposed to do that for you, I tried modifying the wrapper such that rather than using _Impl.RefCount in MessageA and MessageB and explicitly connecting later, I used this.RefCount instead and moved the calls to Subscribe to the Load handler. That had a different problem -- the second subscription triggered another call to LegacyReactiveWrapper.Connect. I thought the idea behind Publish/RefCount was "first-in triggers connection, last-out disposes connection."
I guess I have three questions:
Do I fundamentally misunderstand Publish/RefCount?
Is there some preferred way to implement IConnectableObservable<T> that doesn't involve delegation to one obtained via IObservable<T>.Publish? I know you're not supposed to implement IObservable<T> yourself, but I don't understand how to inject connection logic into the IConnectableObservable<T> that Observable.Create().Publish() gives you. Is Connect supposed to be idempotent?
Would someone more familiar with RX/reactive programming look at the sample for how the wrapper is used and say "that's ugly and broken" or is this not as weird as it seems?
I'm not sure that you need to expose Connect directly as you have. I would simplify as follows, using Publish().RefCount() as an encapsulated implementation detail; it would cause the legacy connection to be made only as required. Here the first subscriber in causes connection, and the last one out causes disconnection. Also note this correctly shares a single RefCount across all subscribers, whereas your implementation uses a RefCount per message type, which isn't probably what was intended. Users are not required to Connect explicitly:
public class LegacyReactiveWrapper
{
private IObservable<TopLevelMessage> _legacyRx;
public LegacyReactiveWrapper(LegacyType legacy)
{
_legacyRx = WrapLegacy(legacy).Publish().RefCount();
}
private static IObservable<TopLevelMessage> WrapLegacy(LegacyType legacy)
{
return Observable.Create<TopLevelMessage>(observer =>
{
LegacyTopLevelMessageHandler tlmHandler = (sender, tlm) => observer.OnNext(tlm);
LegacyErrorHandler errHandler = (sender, err) => observer.OnError(new ApplicationException(err.Message));
LegacyCompleteHandler doneHandler = sender => observer.OnCompleted();
legacy.TopLevelMessage += tlmHandler;
legacy.Error += errHandler;
legacy.Complete += doneHandler;
legacy.ConnectToMessageSource();
return Disposable.Create(() =>
{
legacy.DisconnectFromMessageSource();
legacy.TopLevelMessage -= tlmHandler;
legacy.Error -= errHandler;
legacy.Complete -= doneHandler;
});
});
}
public IObservable<TopLevelMessage> TopLevelMessage
{
get
{
return _legacyRx;
}
}
public IObservable<SubMessageA> MessageA
{
get
{
return _legacyRx.Where(tlm => tlm.MessageType == MessageType.MessageA)
.Select(tlm => tlm.SubMessageA);
}
}
public IObservable<SubMessageB> MessageB
{
get
{
return _legacyRx.Where(tlm => tlm.MessageType == MessageType.MessageB)
.Select(tlm => tlm.SubMessageB);
}
}
}
An additional observation is that Publish().RefCount() will drop the underlying subscription when it's subscriber count reaches 0. Typically I only use Connect over this choice when I need to maintain a subscription even when the subscriber count on the published source drops to zero (and may go back up again later). It's rare to need to do this though - only when connecting is more expensive than holding on to the subscription resource when you might not need to.
Your understanding is not entirely wrong, but you do appear to have some points of misunderstanding.
You seem to be under the belief that multiple calls to RefCount on the same source IObservable will result in a shared reference count. They do not; each instance keeps its own count. As such, you are causing multiple subscriptions to _Impl, one per call to subscribe or call to the Message properties.
You also may be expecting that making _Impl an IConnectableObservable somehow causes your Connect method to be called (since you seem surprised you needed to call Connect in your consuming code). All Publish does is cause subscribers to the published object (returned from the .Publish() call) to share a single subscription to the underlying source observable (in this case, the object made from your call to Observable.Create).
Typically, I see Publish and RefCount used immediately together (eg as source.Publish().RefCount()) to get the shared subscription effect described above or to make a cold observable hot without needing to call Connect to start the subscription to the original source. However, this relies on using the same object returned from the .Publish().RefCount() for all subscribers (as noted above).
Your implementation of Connect seems reasonable. I don't know of any recommendations for if Connect should be idempotent, but I would not personally expect it to be. If you wanted it to be, you would just need to track calls to it the disposal of its return value to get the right balance.
I don't think you need to use Publish the way you are, unless there is some reason to avoid multiple event handlers being attached to the legacy object. If you do need to avoid that, I would recommend changing _Impl to a plain IObservable and follow the Publish with a RefCount.
Your MessageA and MessageB properties have potential to be a source of confusion for users, since they return an IObservable, but still require a call to Connect on the base object to start receiving messages. I would either change them to IConnectableObservables that somehow delegate to the original Connect (at which point the idempotency discussion becomes more relevant) or drop the properties and just let the users make the (fairly simple) projections themselves when needed.

Caching requests to reduce processing (TPL?)

I'm currently trying to reduce the number of similar requests being processed in a business layer by:
Caching the requests a method receives
Performing the slow processing task (once for all similar requests)
Return the result to each requesting method calls
Things to note, are that:
The original method calls are not currently in a async BeginMethod() / EndMethod(IAsyncResult)
The requests arrive faster than the time it takes to generate the output
I'm trying to use TPL where possible, as I am currently trying to learn more about this library
eg. Improving the following
byte[] RequestSlowOperation(string operationParameter)
{
Perform slow task here...
}
Any thoughts?
Follow up:
class SomeClass
{
private int _threadCount;
public SomeClass(int threadCount)
{
_threadCount = threadCount;
int parameter = 0;
var taskFactory = Task<int>.Factory;
for (int i = 0; i < threadCount; i++)
{
int i1 = i;
taskFactory
.StartNew(() => RequestSlowOperation(parameter))
.ContinueWith(result => Console.WriteLine("Result {0} : {1}", result.Result, i1));
}
}
private int RequestSlowOperation(int parameter)
{
Lazy<int> result2;
var result = _cacheMap.GetOrAdd(parameter, new Lazy<int>(() => RequestSlowOperation2(parameter))).Value;
//_cacheMap.TryRemove(parameter, out result2); <<<<< Thought I could remove immediately, but this causes blobby behaviour
return result;
}
static ConcurrentDictionary<int, Lazy<int>> _cacheMap = new ConcurrentDictionary<int, Lazy<int>>();
private int RequestSlowOperation2(int parameter)
{
Console.WriteLine("Evaluating");
Thread.Sleep(100);
return parameter;
}
}
Here is a fast, safe and maintainable way to do this:
static var cacheMap = new ConcurrentDictionary<string, Lazy<byte[]>>();
byte[] RequestSlowOperation(string operationParameter)
{
return cacheMap.GetOrAdd(operationParameter, () => new Lazy<byte[]>(() => RequestSlowOperation2(operationParameter))).Value;
}
byte[] RequestSlowOperation2(string operationParameter)
{
Perform slow task here...
}
This will execute RequestSlowOperation2 at most once per key. Please be aware that the memory held by the dictionary will never be released.
The user delegate passed to the ConcurrentDictionary is not executed under lock, meaning that it could execute multiple times! My solution allows multiple lazies to be created but only one of them will ever be published and materialized.
Regarding locking: this solution will take locks, but it does not matter because the work items are far more expensive than the (few) lock operations.
Honestly, the use of TPL as a technology here is not really important, this is just a straight up concurrency problem. You're trying to protect access to a shared resource (the cached data) and, to do that, the only approach is to lock. Either that or, if the cache entry does not already exist, you could allow all incoming threads to generate it and then subsequent requesters benefit from the cached value once it's stored, but there's little value in that if the resource is slow/expensive to generate and cache.
Perhaps some more details will make it clear on exactly why you're trying to accomplish this without a lock. I'll happily to revise my answer if more detail makes it clearer what you're trying to do.

Resources