Azure Search .net SDK- How to use "FindFailedActionsToRetry"? - azure

Using the Azure Search .net SDK, when you try to index documents you might get an exception IndexBatchException.
From the documentation here:
try
{
var batch = IndexBatch.Upload(documents);
indexClient.Documents.Index(batch);
}
catch (IndexBatchException e)
{
// Sometimes when your Search service is under load, indexing will fail for some of the documents in
// the batch. Depending on your application, you can take compensating actions like delaying and
// retrying. For this simple demo, we just log the failed document keys and continue.
Console.WriteLine(
"Failed to index some of the documents: {0}",
String.Join(", ", e.IndexingResults.Where(r => !r.Succeeded).Select(r => r.Key)));
}
How should e.FindFailedActionsToRetry be used to create a new batch to retry the indexing for failed actions?
I've created a function like this:
public void UploadDocuments<T>(SearchIndexClient searchIndexClient, IndexBatch<T> batch, int count) where T : class, IMyAppSearchDocument
{
try
{
searchIndexClient.Documents.Index(batch);
}
catch (IndexBatchException e)
{
if (count == 5) //we will try to index 5 times and give up if it still doesn't work.
{
throw new Exception("IndexBatchException: Indexing Failed for some documents.");
}
Thread.Sleep(5000); //we got an error, wait 5 seconds and try again (in case it's an intermitent or network issue
var retryBatch = e.FindFailedActionsToRetry<T>(batch, arg => arg.ToString());
UploadDocuments(searchIndexClient, retryBatch, count++);
}
}
But I think this part is wrong:
var retryBatch = e.FindFailedActionsToRetry<T>(batch, arg => arg.ToString());

The second parameter to FindFailedActionsToRetry, named keySelector, is a function that should return whatever property on your model type represents your document key. In your example, your model type is not known at compile time inside UploadDocuments, so you'll need to change UploadsDocuments to also take the keySelector parameter and pass it through to FindFailedActionsToRetry. The caller of UploadDocuments would need to specify a lambda specific to type T. For example, if T is the sample Hotel class from the sample code in this article, the lambda must be hotel => hotel.HotelId since HotelId is the property of Hotel that is used as the document key.
Incidentally, the wait inside your catch block should not wait a constant amount of time. If your search service is under heavy load, waiting for a constant delay won't really help to give it time to recover. Instead, we recommend exponentially backing off (e.g. -- the first delay is 2 seconds, then 4 seconds, then 8 seconds, then 16 seconds, up to some maximum).

I've taken Bruce's recommendations in his answer and comment and implemented it using Polly.
Exponential backoff up to one minute, after which it retries every other minute.
Retry as long as there is progress. Timeout after 5 requests without any progress.
IndexBatchException is also thrown for unknown documents. I chose to ignore such non-transient failures since they are likely indicative of requests which are no longer relevant (e.g., removed document in separate request).
int curActionCount = work.Actions.Count();
int noProgressCount = 0;
await Polly.Policy
.Handle<IndexBatchException>() // One or more of the actions has failed.
.WaitAndRetryForeverAsync(
// Exponential backoff (2s, 4s, 8s, 16s, ...) and constant delay after 1 minute.
retryAttempt => TimeSpan.FromSeconds( Math.Min( Math.Pow( 2, retryAttempt ), 60 ) ),
(ex, _) =>
{
var batchEx = ex as IndexBatchException;
work = batchEx.FindFailedActionsToRetry( work, d => d.Id );
// Verify whether any progress was made.
int remainingActionCount = work.Actions.Count();
if ( remainingActionCount == curActionCount ) ++noProgressCount;
curActionCount = remainingActionCount;
} )
.ExecuteAsync( async () =>
{
// Limit retries if no progress is made after multiple requests.
if ( noProgressCount > 5 )
{
throw new TimeoutException( "Updating Azure search index timed out." );
}
// Only retry if the error is transient (determined by FindFailedActionsToRetry).
// IndexBatchException is also thrown for unknown document IDs;
// consider them outdated requests and ignore.
if ( curActionCount > 0 )
{
await _search.Documents.IndexAsync( work );
}
} );

Related

NodeJS - While loop until certain condition is met or certain time is passed

I've seen some questions/answers very similar but none exactly describing what I would like to achieve. Some background, this is a multi step provision flow. In pretty short words this is the goal.
1. POST an action.
2. GET status based in one variable submitted above. If response == "done" then proceed. Returns an ID.
3. POST an action. Returns an ID.
4. GET status based on ID returned above. If response == "done" then proceed. Returns an ID.
5. (..)
I think there are 6/7 steps in total.
The first question is, are there any modules that could help me somehow achieve this? The only requirement is that each attempt to get status should have an X amount of delay and should expire, marking the flow as failed after an X amount of time.
Nevertheless, the best I could get to, is this, assuming for example step 2:
GetNewDeviceId : function(req, res) {
const delay = ms => new Promise((resolve, reject) => setTimeout(resolve, ms));
var ip = req;
async function main() {
let response;
while (true) {
try {
response = await service.GetNewDeviceId(ip);
console.log("Running again for: " + ip + " - " + response)
if (response["value"] != null) {
break;
}
} catch {
// In case it fails
}
console.log("Delaying for: " + ip)
await delay(30000);
}
//Call next step
console.log("Moving on for: "+ ip)
}
main();
}
This brings couple of questions,
I'm not sure this is indeed the best/clean way.
How can I set a global timeout, let's say 30 minutes, forcing it to step out of the loop and call a "failure" function.
The other thing I'm not sure (NodeJS newbie here) is that, assuming this get's called let's say 4 times, with different IP before any of those 4 are finished, NodeJS will run each call in each own context right? I quickly tested this and it seems like so.
I'm not sure this is indeed the best/clean way.
It am unsure whether your function GetNewDeviceId involves a recursion, that is, whether it invokes itself as service.GetNewDeviceId. That would not make sense, service.GetNewDeviceId should perform a GET request, right? If that is the case, your function seems clean to me.
How can I set a global timeout, let's say 30 minutes, forcing it to step out of the loop and call a "failure" function.
let response;
let failAt = new Date().getTime() + 30 * 60 * 1000; // 30 minutes after now
while (true) {
if (new Date().getTime() >= failAt)
return res.status(500).send("Failure");
try {...}
...
await delay(30000);
}
The other thing I'm not sure (NodeJS newbie here) is that, assuming this get's called let's say 4 times, with different IP before any of those 4 are finished, NodeJS will run each call in each own context right?
Yes. Each invocation of the function GetNewDeviceId establishes a new execution context (called a "closure"), with its own copies of the parameters req and res and the variables response and failAt.

Do not process next job until previous job is completed (BullJS/Redis)?

Basically, each of the clients ---that have a clientId associated with them--- can push messages and it is important that a second message from the same client isn't processed until the first one is finished processing (Even though the client can send multiple messages in a row, and they are ordered, and multiple clients sending messages should ideally not interfere with each other). And, importantly, a job shouldn't be processed twice.
I thought that using Redis I might be able to fix this issue, I started with some quick prototyping using the bull library, but I am clearly not doing it well, I was hoping someone would know how to proceed.
This is what I tried so far:
Create jobs and add them to the same queue name for one process, using the clientId as the job name.
Consume jobs while waiting large random amounts of random time on 2 separate process.
I tried adding the default locking provided by the library that I am using (bull) but it locks on the jobId, which is unique for each job, not on the clientId .
What I would want to happen:
One of the consumers can't take the job from the same clientId until the previous one is finished processing it.
They should be able to, however, get items from different clientIds in parallel without problem (asynchronously). (I haven't gotten this far, I am right now simply dealing with only one clientId)
What I get:
Both consumers consume as many items as they can from the queue without waiting for the previous item for the clientId to be completed.
Is Redis even the right tool for this job?
Example code
// ./setup.ts
import Queue from 'bull';
import * as uuid from 'uuid';
// Check that when a message is taken from a place, no other message is taken
// TO do that test, have two processes that process messages and one that sets messages, and make the job take a long time
// queue for each room https://stackoverflow.com/questions/54178462/how-does-redis-pubsub-subscribe-mechanism-works/54243792#54243792
// https://groups.google.com/forum/#!topic/redis-db/R09u__3Jzfk
// Make a job not be called stalled, waiting enough time https://github.com/OptimalBits/bull/issues/210#issuecomment-190818353
export async function sleep(ms: number): Promise<void> {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}
export interface JobData {
id: string;
v: number;
}
export const queue = new Queue<JobData>('messages', 'redis://127.0.0.1:6379');
queue.on('error', (err) => {
console.error('Uncaught error on queue.', err);
process.exit(1);
});
export function clientId(): string {
return uuid.v4();
}
export function randomWait(minms: number, maxms: number): Promise<void> {
const ms = Math.random() * (maxms - minms) + minms;
return sleep(ms);
}
// Make a job not be called stalled, waiting enough time https://github.com/OptimalBits/bull/issues/210#issuecomment-190818353
// eslint-disable-next-line #typescript-eslint/ban-ts-comment
//#ts-ignore
queue.LOCK_RENEW_TIME = 5 * 60 * 1000;
// ./create.ts
import { queue, randomWait } from './setup';
const MIN_WAIT = 300;
const MAX_WAIT = 1500;
async function createJobs(n = 10): Promise<void> {
await randomWait(MIN_WAIT, MAX_WAIT);
// always same Id
const clientId = Math.random() > 1 ? 'zero' : 'one';
for (let index = 0; index < n; index++) {
await randomWait(MIN_WAIT, MAX_WAIT);
const job = { id: clientId, v: index };
await queue.add(clientId, job).catch(console.error);
console.log('Added job', job);
}
}
export async function create(nIds = 10, nItems = 10): Promise<void> {
const jobs = [];
await randomWait(MIN_WAIT, MAX_WAIT);
for (let index = 0; index < nIds; index++) {
await randomWait(MIN_WAIT, MAX_WAIT);
jobs.push(createJobs(nItems));
await randomWait(MIN_WAIT, MAX_WAIT);
}
await randomWait(MIN_WAIT, MAX_WAIT);
await Promise.all(jobs)
process.exit();
}
(function mainCreate(): void {
create().catch((err) => {
console.error(err);
process.exit(1);
});
})();
// ./consume.ts
import { queue, randomWait, clientId } from './setup';
function startProcessor(minWait = 5000, maxWait = 10000): void {
queue
.process('*', 100, async (job) => {
console.log('LOCKING: ', job.lockKey());
await job.takeLock();
const name = job.name;
const processingId = clientId().split('-', 1)[0];
try {
console.log('START: ', processingId, '\tjobName:', name);
await randomWait(minWait, maxWait);
const data = job.data;
console.log('PROCESSING: ', processingId, '\tjobName:', name, '\tdata:', data);
await randomWait(minWait, maxWait);
console.log('PROCESSED: ', processingId, '\tjobName:', name, '\tdata:', data);
await randomWait(minWait, maxWait);
console.log('FINISHED: ', processingId, '\tjobName:', name, '\tdata:', data);
} catch (err) {
console.error(err);
} finally {
await job.releaseLock();
}
})
.catch(console.error); // Catches initialization
}
startProcessor();
This is run using 3 different processes, which you might call like this (Although I use different tabs for a clearer view of what is happening)
npx ts-node consume.ts &
npx ts-node consume.ts &
npx ts-node create.ts &
I'm not familir with node.js. But for Redis, I would try this,
Let's say you have client_1, client_2, they are all publisher of events.
You have three machines, consumer_1,consumer_2, consumer_3.
Establish a list of tasks in redis, eg, JOB_LIST.
Clients put(LPUSH) jobs into this JOB_LIST, in a specific form, like "CLIENT_1:[jobcontent]", "CLIENT_2:[jobcontent]"
Each consumer takes out jobs blockingly (RPOP command of Redis) and process them.
For example, consumer_1 takes out a job, content is CLIENT_1:[jobcontent]. It parses the content and recognize it's from CLIENT_1. Then it wants to check if some other consumer is processing CLIENT_1 already, if not, it will lock the key to indicate that it's processing CLIENT_1.
It goes on to set a key of "CLIENT_1_PROCESSING" , with content as "consumer_1", using the Redis SETNX command (set if the key not exists), with an appropriate timeout. For example, the task norally takes one minute to finish, you set a timeout of the key of five minutes, just in case consumer_1 crashes and holds on the lock indefinitely.
If the SETNX returns 0, it means it fails to acquire the lock of CLIENT_1 (someone is already processing a job of client_1). Then it returns the job (a value of "CLIENT_1:[jobcontent]")to the left side of JOB_LIST, by using Redis LPUSH command.Then it might wait a bit (sleep a few seconds), and RPOP another task from the right side of the LIST. If this time SETNX returns 1, consumer_1 acquires the lock. It goes on to process job, after it finishes, it deletes the key of "CLIENT_1_PROCESSING", releasing the lock. Then it goes on to RPOP another job, and so on.
Some things to consider:
The JOB_LIST is not fair,eg, earlier jobs might be processed later
The locking part is a bit rudimentary, but will suffice.
----------update--------------
I've figured another way to keep tasks in order.
For each client(producer), build a list. Like "client_1_list", push jobs into the left side of the list.
Save all the client names in a list "client_names_list", with values "client_1", "client_2", etc.
For each consumer(processor), iterate the "client_names_list", for example, consumer_1 get a "client_1", check if the key of client_1 is locked(some one is processing a task of client_1 already), if not, right pop a value(job) from client_1_list and lock client_1. If client_1 is locked, (probably sleep one second) and iterate to the next client, "client_2", for example, and check the keys and so on.
This way, each client(task producer)'s task is processed by their order of entering.
EDIT: I found the problem regarding BullJS is starting jobs in parallel on one processor: We are using named jobs and where defining many named process functions on one queue/processor. The default concurrency factor for a queue/processor is 1. So the queue should not process any jobs in parallel.
The problem with our mentioned setup is if you define many (named) process-handlers on one queue the concurrency is added up with each process-handler function: So if you define three named process-handlers you get a concurrency factor of 3 for given queue for all the defined named jobs.
So just define one named job per queue for queues where parallel processing should not happen and all jobs should run sequentially one after the other.
That could be important e.g. when pushing a high number of jobs onto the queue and the processing involves API calls that would give errors if handled in parallel.
The following text is my first approach of answering the op's question and describes just a workaround to the problem. So better just go with my edit :) and configure your queues the right way.
I found an easy solution to operators question.
In fact BullJS is processing many jobs in parallel on one worker instance:
Let's say you have one worker instance up and running and push 10 jobs onto the queue than possibly that worker starts all processes in parallel.
My research on BullJS-queues gave that this is not intended behavior: One worker (also called processor by BullJS) should only start a new job from the queue when its in idle state so not processing a former job.
Nevertheless BullJS keeps starting jobs in parallel on one worker.
In our implementation that lead to big problems during API calls that most likely are caused by t00 many API calls at a time. Tests gave that when only starting one worker the API calls finished just fine and gave status 200.
So how to just process one job after the other once the previous is finished if BullJS does not do that for us (just what the op asked)?
We first experimented with delays and other BullJS options but thats kind of workaround and not the exact solution to the problem we are looking for. At least we did not get it working to stop BullJS from processing more than one job at a time.
So we did it ourself and started one job after the other.
The solution was rather simple for our use case after looking into BullJS API reference (BullJS API Ref).
We just used a for-loop to start the jobs one after another. The trick was to use BullJS's
job.finished
method to get a Promise.resolve once the job is finished. By using await inside the for-loop the next job gets just started immediately after the job.finished Promise is awaited (resolved). Thats the nice thing with for-loops: Await works in it!
Here a small code example on how to achieve the intended behavior:
for (let i = 0; i < theValues.length; i++) {
jobCounter++
const job = await this.processingQueue.add(
'update-values',
{
value: theValues[i],
},
{
// delay: i * 90000,
// lifo: true,
}
)
this.jobs[job.id] = {
jobType: 'socket',
jobSocketId: BackgroundJobTasks.UPDATE_VALUES,
data: {
value: theValues[i],
},
jobCount: theValues.length,
jobNumber: jobCounter,
cumulatedJobId
}
await job.finished()
.then((val) => {
console.log('job finished:: ', val)
})
}
The important part is really
await job.finished()
inside the for loop. leasingValues.length jobs get started all just one after the other as intended.
That way horizontally scaling jobs across more than one worker is not possible anymore. Nevertheless this workaround is okay for us at the moment.
I will get in contact with optimalbits - the maker of BullJS to clear things out.

How to get multiple async results within a given timeout with GPars?

I'd like to retrieve multiple "costly" results using parallel processing but within a specific timeout.
I'm using GPars Dataflow.task but it looks like I'm missing something as the process returns only when all dataflow variable are bound.
def timeout = 500
def mapResults = []
GParsPool.withPool(3) {
def taskWeb1 = Dataflow.task {
mapResults.web1 = new URL('http://web1.com').getText()
}.join(timeout, TimeUnit.MILLISECONDS)
def taskWeb2 = Dataflow.task {
mapResults.web2 = new URL('http://web2.com').getText()
}.join(timeout, TimeUnit.MILLISECONDS)
def taskWeb3 = Dataflow.task {
mapResults.web3 = new URL('http://web3.com').getText()
}.join(timeout, TimeUnit.MILLISECONDS)
}
I did see in the GPars Timeouts doc a way to use Select to get the fastest result within the timeout.
But I'm looking for a way to retrieve as much as possible results in the given time frame.
Is there a better "GPars" way to achieve this?
Or with Java 8 Future/Callable ?
Since you're interested in Java 8 based solutions too, here's a way to do it:
int timeout = 250;
ExecutorService executorService = Executors.newFixedThreadPool(3);
try {
Map<String, CompletableFuture<String>> map =
Stream.of("http://google.com", "http://yahoo.com", "http://bing.com")
.collect(
Collectors.toMap(
// the key will be the URL
Function.identity(),
// the value will be the CompletableFuture text fetched from the url
(url) -> CompletableFuture.supplyAsync(
() -> readUrl(url, timeout),
executorService
)
)
);
executorService.awaitTermination(timeout, TimeUnit.MILLISECONDS);
//print the resulting map, cutting the text at 100 chars
map.entrySet().stream().forEach(entry -> {
CompletableFuture<String> future = entry.getValue();
boolean completed = future.isDone()
&& !future.isCompletedExceptionally()
&& !future.isCancelled();
System.out.printf("url %s completed: %s, error: %s, result: %.100s\n",
entry.getKey(),
completed,
future.isCompletedExceptionally(),
completed ? future.getNow(null) : null);
});
} catch (InterruptedException e) {
//rethrow
} finally {
executorService.shutdownNow();
}
This will give you as many Futures as URLs you have, but gives you an opportunity to see if any of the tasks failed with an exception. The code could be simplified if you're not interested in these exceptions, only the contents of successful retrievals:
int timeout = 250;
ExecutorService executorService = Executors.newFixedThreadPool(3);
try {
Map<String, String> map = Collections.synchronizedMap(new HashMap<>());
Stream.of("http://google.com", "http://yahoo.com", "http://bing.com")
.forEach(url -> {
CompletableFuture
.supplyAsync(
() -> readUrl(url, timeout),
executorService
).thenAccept(content -> map.put(url, content));
});
executorService.awaitTermination(timeout, TimeUnit.MILLISECONDS);
//print the resulting map, cutting the text at 100 chars
map.entrySet().stream().forEach(entry -> {
System.out.printf("url %s completed, result: %.100s\n",
entry.getKey(), entry.getValue() );
});
} catch (InterruptedException e) {
//rethrow
} finally {
executorService.shutdownNow();
}
Both of the codes will wait for about 250 milliseconds (it will take only a tiny bit more because of the submissions of the tasks to the executor service) before printing the results. I found about 250 milliseconds is the threshold where some of these url-s can be fetched on my network, but not necessarily all. Feel free to adjust the timeout to experiment.
For the readUrl(url, timeout) method you could use a utility library like Apache Commons IO. The tasks submitted to the executor service will get an interrupt signal even if you don't explicitely take into account the timeout parameter. I could provide an implementation for that but I believe it's out of scope for the main issue in your question.

StackExchange.Redis on Azure is throwing timeout performing get and no connection available exceptions

I recently switched an MVC application that serves data feeds and dynamically generated images (6k rpm throughput) from the v3.9.67 ServiceStack.Redis client to the latest StackExchange.Redis client (v1.0.450) and I'm seeing some slower performance and some new exceptions.
Our Redis instance is S4 level (13GB), CPU shows a fairly constant 45% or so and network bandwidth appears fairly low. I'm not entirely sure how to interpret the gets/sets graph in our Azure portal, but it shows us around 1M gets and 100k sets (appears that this may be in 5 minute increments).
The client library switch was straightforward and we are still using the v3.9 ServiceStack JSON serializer so that the client lib was the only piece changing.
Our external monitoring with New Relic shows clearly that our average response time increases from about 200ms to about 280ms between ServiceStack and StackExchange libraries (StackExchange being slower) with no other change.
We recorded a number of exceptions with messages along the lines of:
Timeout performing GET feed-channels:ag177kxj_egeo-_nek0cew, inst: 12, mgr: Inactive, queue: 30, qu=0, qs=30, qc=0, wr=0/0, in=0/0
I understand this to mean that there are a number of commands in the queue that have been sent but no response available from Redis, and that this can be caused by long running commands that exceed the timeout. These errors appeared for a period when our sql database behind one of our data services was getting backed up, so perhaps that was the cause? After scaling out that database to reduce load we haven't seen very many more of this error, but the DB query should be happening in .Net and I don't see how that would hold up a redis command or connection.
We also recorded a large batch of errors this morning over a short period (couple of minutes) with messages like:
No connection is available to service this operation: SETEX feed-channels:vleggqikrugmxeprwhwc2a:last-retry
We were used to transient connection errors with the ServiceStack library, and those exception messages were usually like this:
Unable to Connect: sPort: 63980
I'm under the impression that SE.Redis should be retrying connections and commands in the background for me. Do I still need to be wrapping our calls through SE.Redis in a retry policy of my own? Perhaps different timeout values would be more appropriate (though I'm not sure what values to use)?
Our redis connection string sets these parameters: abortConnect=false,syncTimeout=2000,ssl=true. We use a singleton instance of ConnectionMultiplexer and transient instances of IDatabase.
The vast majority of our Redis use goes through a Cache class, and the important bits of the implementation are below, in case we're doing something silly that's causing us problems.
Our keys are generally 10-30 or so character strings. Values are largely scalar or reasonably small serialized object sets (hundred bytes to a few kB generally), though we do also store jpg images in the cache so a large chunk of the data is from a couple hundred kB to a couple MB.
Perhaps I should be using different multiplexers for small and large values, probably with longer timeouts for larger values? Or couple/few multiplexers in case one is stalled?
public class Cache : ICache
{
private readonly IDatabase _redis;
public Cache(IDatabase redis)
{
_redis = redis;
}
// storing this placeholder value allows us to distinguish between a stored null and a non-existent key
// while only making a single call to redis. see Exists method.
static readonly string NULL_PLACEHOLDER = "$NULL_VALUE$";
// this is a dictionary of https://github.com/StephenCleary/AsyncEx/wiki/AsyncLock
private static readonly ILockCache _locks = new LockCache();
public T GetOrSet<T>(string key, TimeSpan cacheDuration, Func<T> refresh) {
T val;
if (!Exists(key, out val)) {
using (_locks[key].Lock()) {
if (!Exists(key, out val)) {
val = refresh();
Set(key, val, cacheDuration);
}
}
}
return val;
}
private bool Exists<T>(string key, out T value) {
value = default(T);
var redisValue = _redis.StringGet(key);
if (redisValue.IsNull)
return false;
if (redisValue == NULL_PLACEHOLDER)
return true;
value = typeof(T) == typeof(byte[])
? (T)(object)(byte[])redisValue
: JsonSerializer.DeserializeFromString<T>(redisValue);
return true;
}
public void Set<T>(string key, T value, TimeSpan cacheDuration)
{
if (value.IsDefaultForType())
_redis.StringSet(key, NULL_PLACEHOLDER, cacheDuration);
else if (typeof (T) == typeof (byte[]))
_redis.StringSet(key, (byte[])(object)value, cacheDuration);
else
_redis.StringSet(key, JsonSerializer.SerializeToString(value), cacheDuration);
}
public async Task<T> GetOrSetAsync<T>(string key, Func<T, TimeSpan> getSoftExpire, TimeSpan additionalHardExpire, TimeSpan retryInterval, Func<Task<T>> refreshAsync) {
var softExpireKey = key + ":soft-expire";
var lastRetryKey = key + ":last-retry";
T val;
if (ShouldReturnNow(key, softExpireKey, lastRetryKey, retryInterval, out val))
return val;
using (await _locks[key].LockAsync()) {
if (ShouldReturnNow(key, softExpireKey, lastRetryKey, retryInterval, out val))
return val;
Set(lastRetryKey, DateTime.UtcNow, additionalHardExpire);
try {
var newVal = await refreshAsync();
var softExpire = getSoftExpire(newVal);
var hardExpire = softExpire + additionalHardExpire;
if (softExpire > TimeSpan.Zero) {
Set(key, newVal, hardExpire);
Set(softExpireKey, DateTime.UtcNow + softExpire, hardExpire);
}
val = newVal;
}
catch (Exception ex) {
if (val == null)
throw;
}
}
return val;
}
private bool ShouldReturnNow<T>(string valKey, string softExpireKey, string lastRetryKey, TimeSpan retryInterval, out T val) {
if (!Exists(valKey, out val))
return false;
var softExpireDate = Get<DateTime?>(softExpireKey);
if (softExpireDate == null)
return true;
// value is in the cache and not yet soft-expired
if (softExpireDate.Value >= DateTime.UtcNow)
return true;
var lastRetryDate = Get<DateTime?>(lastRetryKey);
// value is in the cache, it has soft-expired, but it's too soon to try again
if (lastRetryDate != null && DateTime.UtcNow - lastRetryDate.Value < retryInterval) {
return true;
}
return false;
}
}
A few recommendations.
- You can use different multiplexers with different timeout values for different types of keys/values
http://azure.microsoft.com/en-us/documentation/articles/cache-faq/
- Make sure you are not network bound on the client and server. if you are on the server then move to a higher SKU which has more bandwidth
Please read this post for more details
http://azure.microsoft.com/blog/2015/02/10/investigating-timeout-exceptions-in-stackexchange-redis-for-azure-redis-cache/

Wrapping legacy object in IConnectableObservable

I have a legacy event-based object that seems like a perfect fit for RX: after being connected to a network source, it raises events when a message is received, and may terminate with either a single error (connection dies, etc.) or (rarely) an indication that there will be no more messages. This object also has a couple projections -- most users are interested in only a subset of the messages received, so there are alternate events raised only when well-known message subtypes show up.
So, in the process of learning more about reactive programming, I built the following wrapper:
class LegacyReactiveWrapper : IConnectableObservable<TopLevelMessage>
{
private LegacyType _Legacy;
private IConnectableObservable<TopLevelMessage> _Impl;
public LegacyReactiveWrapper(LegacyType t)
{
_Legacy = t;
var observable = Observable.Create<TopLevelMessage>((observer) =>
{
LegacyTopLevelMessageHandler tlmHandler = (sender, tlm) => observer.OnNext(tlm);
LegacyErrorHandler errHandler = (sender, err) => observer.OnError(new ApplicationException(err.Message));
LegacyCompleteHandler doneHandler = (sender) => observer.OnCompleted();
_Legacy.TopLevelMessage += tlmHandler;
_Legacy.Error += errHandler;
_Legacy.Complete += doneHandler;
return Disposable.Create(() =>
{
_Legacy.TopLevelMessage -= tlmHandler;
_Legacy.Error -= errHandler;
_Legacy.Complete -= doneHandler;
});
});
_Impl = observable.Publish();
}
public IDisposable Subscribe(IObserver<TopLevelMessage> observer)
{
return _Impl.RefCount().Subscribe(observer);
}
public IDisposable Connect()
{
_Legacy.ConnectToMessageSource();
return Disposable.Create(() => _Legacy.DisconnectFromMessageSource());
}
public IObservable<SubMessageA> MessageA
{
get
{
// This is the moral equivalent of the projection behavior
// that already exists in the legacy type. We don't hook
// the LegacyType.MessageA event directly.
return _Impl.RefCount()
.Where((tlm) => tlm.MessageType == MessageType.MessageA)
.Select((tlm) => tlm.SubMessageA);
}
}
public IObservable<SubMessageB> MessageB
{
get
{
return _Impl.RefCount()
.Where((tlm) => tlm.MessageType == MessageType.MessageB)
.Select((tlm) => tlm.SubMessageB);
}
}
}
Something about how it's used elsewhere feels... off... somehow, though. Here's sample usage, which works but feels strange. The UI context for my test application is WinForms, but it doesn't really matter.
// in Program.Main...
MainForm frm = new MainForm();
// Updates the UI based on a stream of SubMessageA's
IObserver<SubMessageA> uiManager = new MainFormUiManager(frm);
LegacyType lt = new LegacyType();
// ... setup lt...
var w = new LegacyReactiveWrapper(lt);
var uiUpdateSubscription = (from msgA in w.MessageA
where SomeCondition(msgA)
select msgA).ObserveOn(frm).Subscribe(uiManager);
var nonUiSubscription = (from msgB in w.MessageB
where msgB.SubType == MessageBType.SomeSubType
select msgB).Subscribe(
m => Console.WriteLine("Got MsgB: {0}", m),
ex => Console.WriteLine("MsgB error: {0}", ex.Message),
() => Console.WriteLine("MsgB complete")
);
IDisposable unsubscribeAtExit = null;
frm.Load += (sender, e) =>
{
var connectionSubscription = w.Connect();
unsubscribeAtExit = new CompositeDisposable(
uiUpdateSubscription,
nonUiSubscription,
connectionSubscription);
};
frm.FormClosing += (sender, e) =>
{
if(unsubscribeAtExit != null) { unsubscribeAtExit.Dispose(); }
};
Application.Run(frm);
This WORKS -- The form launches, the UI updates, and when I close it the subscriptions get cleaned up and the process exits (which it won't do if the LegacyType's network connection is still connected). Strictly speaking, it's enough to dispose just connectionSubscription. However, the explicit Connect feels weird to me. Since RefCount is supposed to do that for you, I tried modifying the wrapper such that rather than using _Impl.RefCount in MessageA and MessageB and explicitly connecting later, I used this.RefCount instead and moved the calls to Subscribe to the Load handler. That had a different problem -- the second subscription triggered another call to LegacyReactiveWrapper.Connect. I thought the idea behind Publish/RefCount was "first-in triggers connection, last-out disposes connection."
I guess I have three questions:
Do I fundamentally misunderstand Publish/RefCount?
Is there some preferred way to implement IConnectableObservable<T> that doesn't involve delegation to one obtained via IObservable<T>.Publish? I know you're not supposed to implement IObservable<T> yourself, but I don't understand how to inject connection logic into the IConnectableObservable<T> that Observable.Create().Publish() gives you. Is Connect supposed to be idempotent?
Would someone more familiar with RX/reactive programming look at the sample for how the wrapper is used and say "that's ugly and broken" or is this not as weird as it seems?
I'm not sure that you need to expose Connect directly as you have. I would simplify as follows, using Publish().RefCount() as an encapsulated implementation detail; it would cause the legacy connection to be made only as required. Here the first subscriber in causes connection, and the last one out causes disconnection. Also note this correctly shares a single RefCount across all subscribers, whereas your implementation uses a RefCount per message type, which isn't probably what was intended. Users are not required to Connect explicitly:
public class LegacyReactiveWrapper
{
private IObservable<TopLevelMessage> _legacyRx;
public LegacyReactiveWrapper(LegacyType legacy)
{
_legacyRx = WrapLegacy(legacy).Publish().RefCount();
}
private static IObservable<TopLevelMessage> WrapLegacy(LegacyType legacy)
{
return Observable.Create<TopLevelMessage>(observer =>
{
LegacyTopLevelMessageHandler tlmHandler = (sender, tlm) => observer.OnNext(tlm);
LegacyErrorHandler errHandler = (sender, err) => observer.OnError(new ApplicationException(err.Message));
LegacyCompleteHandler doneHandler = sender => observer.OnCompleted();
legacy.TopLevelMessage += tlmHandler;
legacy.Error += errHandler;
legacy.Complete += doneHandler;
legacy.ConnectToMessageSource();
return Disposable.Create(() =>
{
legacy.DisconnectFromMessageSource();
legacy.TopLevelMessage -= tlmHandler;
legacy.Error -= errHandler;
legacy.Complete -= doneHandler;
});
});
}
public IObservable<TopLevelMessage> TopLevelMessage
{
get
{
return _legacyRx;
}
}
public IObservable<SubMessageA> MessageA
{
get
{
return _legacyRx.Where(tlm => tlm.MessageType == MessageType.MessageA)
.Select(tlm => tlm.SubMessageA);
}
}
public IObservable<SubMessageB> MessageB
{
get
{
return _legacyRx.Where(tlm => tlm.MessageType == MessageType.MessageB)
.Select(tlm => tlm.SubMessageB);
}
}
}
An additional observation is that Publish().RefCount() will drop the underlying subscription when it's subscriber count reaches 0. Typically I only use Connect over this choice when I need to maintain a subscription even when the subscriber count on the published source drops to zero (and may go back up again later). It's rare to need to do this though - only when connecting is more expensive than holding on to the subscription resource when you might not need to.
Your understanding is not entirely wrong, but you do appear to have some points of misunderstanding.
You seem to be under the belief that multiple calls to RefCount on the same source IObservable will result in a shared reference count. They do not; each instance keeps its own count. As such, you are causing multiple subscriptions to _Impl, one per call to subscribe or call to the Message properties.
You also may be expecting that making _Impl an IConnectableObservable somehow causes your Connect method to be called (since you seem surprised you needed to call Connect in your consuming code). All Publish does is cause subscribers to the published object (returned from the .Publish() call) to share a single subscription to the underlying source observable (in this case, the object made from your call to Observable.Create).
Typically, I see Publish and RefCount used immediately together (eg as source.Publish().RefCount()) to get the shared subscription effect described above or to make a cold observable hot without needing to call Connect to start the subscription to the original source. However, this relies on using the same object returned from the .Publish().RefCount() for all subscribers (as noted above).
Your implementation of Connect seems reasonable. I don't know of any recommendations for if Connect should be idempotent, but I would not personally expect it to be. If you wanted it to be, you would just need to track calls to it the disposal of its return value to get the right balance.
I don't think you need to use Publish the way you are, unless there is some reason to avoid multiple event handlers being attached to the legacy object. If you do need to avoid that, I would recommend changing _Impl to a plain IObservable and follow the Publish with a RefCount.
Your MessageA and MessageB properties have potential to be a source of confusion for users, since they return an IObservable, but still require a call to Connect on the base object to start receiving messages. I would either change them to IConnectableObservables that somehow delegate to the original Connect (at which point the idempotency discussion becomes more relevant) or drop the properties and just let the users make the (fairly simple) projections themselves when needed.

Resources