Azure Function Durable Timer does not wake up until app is touched - azure

I have a Durable Orchestration that scales up and down Azure Cosmos DB throughput on request. The scale up is triggered via HTTP, and the scale down happens later via a Durable Timer that is supposed to wake up the Azure Function at the end of the current or next hour. Here is the Orchestrator Function:
public static class CosmosDbScalerOrchestrator
{
[FunctionName(nameof(CosmosDbScalerOrchestrator))]
public static async Task RunOrchestrator(
[OrchestrationTrigger] IDurableOrchestrationContext context)
{
var cosmosDbScalerRequestString = context.GetInput<string>();
var didScale = await context.CallActivityAsync<bool>(nameof(ScaleUpActivityTrigger), cosmosDbScalerRequestString);
if (didScale)
{
var minutesUntilLastMinuteOfHour = 59 - context.CurrentUtcDateTime.Minute;
var minutesUntilScaleDown = minutesUntilLastMinuteOfHour < 15
? minutesUntilLastMinuteOfHour + 60
: minutesUntilLastMinuteOfHour;
var timeUntilScaleDown = context.CurrentUtcDateTime.Add(TimeSpan.FromMinutes(minutesUntilScaleDown));
await context.CreateTimer(timeUntilScaleDown, CancellationToken.None);
await context.CallActivityAsync(nameof(ScaleDownActivityTrigger), cosmosDbScalerRequestString);
}
}
}
Here is the ScaleUpActivityTrigger:
public class ScaleUpActivityTrigger
{
[FunctionName(nameof(ScaleUpActivityTrigger))]
public static async Task<bool> Run([ActivityTrigger] string cosmosDbScalerRequestString, ILogger log)
{
var cosmosDbScalerRequest =
StorageFramework.Storage.Deserialize<CosmosDbScalerRequest>(cosmosDbScalerRequestString);
var scaler = new ContainerScaler(cosmosDbScalerRequest.ContainerId);
var currentThroughputForContainer = await scaler.GetThroughputForContainer();
// Return if would scale down
if (currentThroughputForContainer > cosmosDbScalerRequest.RequestedThroughput) return false;
var newThroughput = cosmosDbScalerRequest.RequestedThroughput < 25000
? cosmosDbScalerRequest.RequestedThroughput
: 25000;
await scaler.Scale(newThroughput);
return true;
}
}
and the ScaleDownActivityTrigger:
public class ScaleDownActivityTrigger
{
[FunctionName(nameof(ScaleDownActivityTrigger))]
public static async Task Run([ActivityTrigger] string cosmosDbScalerRequestString, ILogger log)
{
var cosmosDbScalerRequest =
StorageFramework.Storage.Deserialize<CosmosDbScalerRequest>(cosmosDbScalerRequestString);
var scaler = new ContainerScaler(cosmosDbScalerRequest.ContainerId);
var minimumRusForContainer = await scaler.GetMinimumRusForContainer();
await scaler.Scale(minimumRusForContainer);
}
}
However, what I observe is that the Function is not awakened until something else triggers the Durable Orchestration. Notice the difference in the timestamps for when this was scheduled and when it happened.
Is the fact that it did not wake up until then by design, or a bug? If it is by design, how can I wake it up when I actually want to?

Related

Durable function not safe?

I am working in a durable function based as serverless timers, for azure but keep getting the error:
[2022-10-11T03:42:06.874Z] ServerlessTimers.Application: Exception of type 'System.Exception' was thrown.
[2022-10-11T03:42:06.883Z] 0396b0bd-6a87-4490-a2fe-b0b9121a9504: Function 'OrchestrateTimerFunction (Orchestrator)' failed with an error. Reason: System.InvalidOperationException: Multithreaded execution was detected. This can happen if the orchestrator function code awaits on a task that was not created by a DurableOrchestrationContext method. More details can be found in this article https://docs.microsoft.com/en-us/azure/azure-functions/durable-functions-checkpointing-and-replay#orchestrator-code-constraints.
[2022-10-11T03:42:06.886Z] at Microsoft.Azure.WebJobs.Extensions.DurableTask.DurableOrchestrationContext.ThrowIfInvalidAccess() in D:\a\_work\1\s\src\WebJobs.Extensions.DurableTask\ContextImplementations\DurableOrchestrationContext.cs:line 1163
[2022-10-11T03:42:06.887Z] at Microsoft.Azure.WebJobs.Extensions.DurableTask.TaskOrchestrationShim.InvokeUserCodeAndHandleResults(RegisteredFunctionInfo orchestratorInfo, OrchestrationContext innerContext) in D:\a\_work\1\s\src\WebJobs.Extensions.DurableTask\Listener\TaskOrchestrationShim.cs:line 150. IsReplay: False. State: Failed. HubName: TestHubName. AppName: . SlotName: . ExtensionVersion: 2.7.1. SequenceNumber: 4. TaskEventId: -1
From my perspective, I don't see anything wrong in that orchestrator function.
What is funny is that when I set a breakpoint inside that orchestrator function and the function gets called, the error is gone, nowhere to be seen in the logs.
Does that mean it might be an race condition regarding the http-triggerd function that invokes the orchestrator? Seems highly unlikely, but please correct me if I am wrong.
Here the orchestrator function. This "timer" is the same as the one on your phone, but in the cloud.
namespace ServerlessTimers.Application.Functions.Durables;
using System;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.DurableTask;
using Microsoft.Extensions.Logging;
using ServerlessTimers.Application.Exceptions;
using ServerlessTimers.Application.Models.DurableEvents;
using ServerlessTimers.Application.Models.Durables;
using ServerlessTimers.Application.Services.Durables;
using ServerlessTimers.Domain.Aggregators.Timers;
using ServerlessTimers.Domain.Services;
public class OrchestrateTimerFunction
{
private readonly ILogger logger;
private readonly IDurableFacade durableFacade;
private readonly ITimerRepository timerRepository;
private readonly ITimerCalculatorFactory calculatorFactory;
private readonly CancellationTokenSource cts;
public OrchestrateTimerFunction(
IDurableFacade durableFacade,
ITimerRepository timerRepository,
ITimerCalculatorFactory calculatorFactory,
ILogger<OrchestrateTimerFunction> logger)
{
this.logger = logger;
this.durableFacade = durableFacade;
this.timerRepository = timerRepository;
this.calculatorFactory = calculatorFactory;
cts = new CancellationTokenSource();
}
[FunctionName(nameof(OrchestrateTimerFunction))]
public async Task RunOrchestrator(
[OrchestrationTrigger]
IDurableOrchestrationContext context)
{
try
{
// Get timer
var input = context.GetInput<TimerOrchestratorInput>();
var timer = await timerRepository.FindByIdAsync(input.TimerId) ??
throw new TimerNotFoundException(input.TimerId);
// Do not run orchestration if timer's shouldn't be running
if(!timer.State.EqualRunningState())
{
logger.LogError($"Timer {timer.Id}: " +
$"Tried to be orchestrated but has {timer.State} state");
throw new Exception();
}
// Calculate the completion date of the timer
var calculator = calculatorFactory.GetCalculator(timer);
var remainingTime = calculator.CalculateRemainingTime();
logger.LogInformation($"Timer {timer.Id}: " +
$"To complete in {remainingTime}");
if (remainingTime <= TimeSpan.Zero)
{
logger.LogError($"Timer {timer.Id}: " +
$"Remaining time is negative");
throw new Exception();
}
// Set external events
var timerPausedEventTask = context.WaitForExternalEvent<DurableEvent>(
name: nameof(TimerPausedDurableEvent),
defaultValue: new TimerCompletedDurableEvent(),
timeout: remainingTime,
cancelToken: cts.Token);
var timerStoppedEventTask = context.WaitForExternalEvent<DurableEvent>(
name: nameof(TimerStoppedDurableEvent),
defaultValue: new TimerCompletedDurableEvent(),
timeout: remainingTime,
cancelToken: cts.Token);
// Await timer
var durableEvent = await Task.WhenAny<DurableEvent>(
timerPausedEventTask, timerStoppedEventTask);
cts.Cancel();
// Handle events
if(durableEvent.Result is TimerCompletedDurableEvent)
{
logger.LogInformation($"Timer {timer.Id}: Completed");
}
else if (durableEvent.Result is TimerStoppedDurableEvent)
{
logger.LogInformation($"Timer {timer.Id}: Stopped");
}
else if (durableEvent.Result is TimerPausedDurableEvent pausedEvent)
{
logger.LogInformation($"Timer {timer.Id}: Paused ({pausedEvent.Reason})");
}
}
catch(Exception ex)
{
logger.LogError(ex, ex.Message);
}
}
}
Your code is probably going outside of code constraints required for orchestrators to work. You should move such code to activity functions to get this to work.
In your case, I would say the following lines are the cause because on replays, these might yield different results. Try moving them into activity functions.
1.
var timer = await timerRepository.FindByIdAsync(input.TimerId) ??
throw new TimerNotFoundException(input.TimerId);
2.
var calculator = calculatorFactory.GetCalculator(timer);
3.
var remainingTime = calculator.CalculateRemainingTime();

Service Fabric Actor notification fails to trigger event handler after first invocation due to possible blocking

I've got 2 Reliable actors called GameActor and PlayerActor. The ClientApp send a message to the PlayerActor when the player makes a move. Then the PlayerActor sends a message to the GameActor to indicate a movement was made. Upon being invoked, the method in the GameActor fires a notification. This notification gets handled by the ClientApp GameEventsHandler. The ClientApp then calls a method on the GameActor to retrieve the latest player positions.
ClientApp -> PlayerActor.MoveTo() -> GameActor.NotifyPlayerMoved() ->
Fire ScoreBoardUpdated event
GameEventsHandler triggered by that event ->
GameActor.GetLatestPlayerInfo()
The problem I'm having is this. The very first time I run it, the GameEventsHandler gets triggered and it tries to call the GameActor as expected. The GameActor receives the message and returns the response expected. But the client doesn't seem to receive the message. It looks like it's blocked as it doesn't throw and error or any output. Any subsequent notifications don't get handled by the event handler at all.
GameActor
public async Task<IList<PlayerInfo>> GetLatestPlayerInfoAsync(CancellationToken cancellationToken)
{
var allPlayers = await StateManager.GetStateAsync<List<string>>("players", cancellationToken);
var tasks = allPlayers.Select(actorName =>
{
var playerActor = ActorProxy.Create<IPlayerActor>(new ActorId(actorName), new Uri(PlayerActorUri));
return playerActor.GetLatestInfoAsync(cancellationToken);
}).ToList();
await Task.WhenAll(tasks);
return tasks
.Select(t => t.Result)
.ToList();
}
public async Task NotifyPlayerMovedAsync(PlayerInfo lastMovement, CancellationToken cancellationToken)
{
var ev = GetEvent<IGameEvents>();
ev.ScoreboardUpdated(lastMovement);
}
PlayerActor
public async Task MoveToAsync(int x, int y, CancellationToken cancellationToken)
{
var playerName = await StateManager.GetStateAsync<string>("playerName", cancellationToken);
var playerInfo = new PlayerInfo()
{
LastUpdate = DateTimeOffset.Now,
PlayerName = playerName,
XCoordinate = x,
YCoordinate = y
};
await StateManager.AddOrUpdateStateAsync("positions", new List<PlayerInfo>() { playerInfo }, (key, value) =>
{
value.Add(playerInfo);
return value;
}, cancellationToken);
var gameName = await StateManager.GetStateAsync<string>("gameName", cancellationToken);
var gameActor = ActorProxy.Create<IGameActor>(new ActorId(gameName), new Uri(GameActorUri));
await gameActor.NotifyPlayerMovedAsync(playerInfo, cancellationToken);
}
public async Task<PlayerInfo> GetLatestInfoAsync(CancellationToken cancellationToken)
{
var positions = await StateManager.GetStateAsync<List<PlayerInfo>>("positions", cancellationToken);
return positions.Last();
}
Client
private static async Task RunDemo(string gameName)
{
var rand = new Random();
Console.WriteLine("Hit return when the service is up...");
Console.ReadLine();
Console.WriteLine("Enter your name:");
var playerName = Console.ReadLine();
Console.WriteLine("This might take a few seconds...");
var gameActor = ActorProxy.Create<IGameActor>(new ActorId(gameName), new Uri(GameActorUri));
await gameActor.SubscribeAsync<IGameEvents>(new GameEventsHandler(gameActor));
var playerActorId = await gameActor.JoinGameAsync(playerName, CancellationToken.None);
var playerActor = ActorProxy.Create<IPlayerActor>(new ActorId(playerActorId), new Uri(PlayerActorUri));
while (true)
{
Console.WriteLine("Press return to move to new location...");
Console.ReadLine();
await playerActor.MoveToAsync(rand.Next(100), rand.Next(100), CancellationToken.None);
}
}
GameEventHandler
public void ScoreboardUpdated(PlayerInfo lastInfo)
{
Console.WriteLine($"Scoreboard updated. (Last move by: {lastInfo.PlayerName})");
var positions = _gameActor.GetLatestPlayerInfoAsync(CancellationToken.None).ConfigureAwait(false).GetAwaiter().GetResult();
//this hangs
foreach (var playerInfo in positions) // this line never gits hit
{
Console.WriteLine(
$"Position of {playerInfo.PlayerName} is ({playerInfo.XCoordinate},{playerInfo.YCoordinate})." +
$"\nUpdated at {playerInfo.LastUpdate}\n");
}
}
But if I wrap the event handler logic inside a Task.Run() it seems to work.
Task.Run(async () =>
{
var positions = await _gameActor.GetLatestPlayerInfoAsync(CancellationToken.None);
foreach (var playerInfo in positions)
{
Console.WriteLine(
$"Position of {playerInfo.PlayerName} is ({playerInfo.XCoordinate},{playerInfo.YCoordinate})." +
$"\nUpdated at {playerInfo.LastUpdate}\n");
}
}
);
Full source code for the demo here https://github.com/dasiths/Service-Fabric-Reliable-Actors-Demo
AFAIK notifications aren't blocking and are not reliable. So I don't understand why my initial implementation doesn't work. The reentrant pattern doesn't apply here as per my understanding either. Can someone explain to me what's going on here? Is it expected behaviour or a bug?

About understanding partition lease expiration

I have an event hub with 4 partitions and 2 consumer groups. I have 2 webjobs that read the data using an EventProcessor. Both for a different consumer group
I have configured the event processors like this:
var host = new EventProcessorHost(
Guid.NewGuid().ToString(),
configurationManager.EventHubConfiguration.Path,
configurationManager.EventHubConfiguration.ConsumerGroupName,
configurationManager.EventHubConfiguration.ListenerConnectionString,
configurationManager.StorageConfiguration.ConnectionString)
{
PartitionManagerOptions = new PartitionManagerOptions
{
AcquireInterval = TimeSpan.FromSeconds(10),
RenewInterval = TimeSpan.FromSeconds(10),
LeaseInterval = TimeSpan.FromSeconds(30)
}
};
var options = EventProcessorOptions.DefaultOptions;
options.MaxBatchSize = 250;
await host.RegisterEventProcessorFactoryAsync(new PlanCareEventProcessorFactory(telemetryClient, configurationManager), options);
return host;
In my EventProcessor I keep track of the progress (some methods skipped to keep it short and readable):
internal class PlanCareEventProcessor : IEventProcessor
{
public Task OpenAsync(PartitionContext context)
{
namespaceManager = NamespaceManager.CreateFromConnectionString(configurationManager.EventHubConfiguration.ManagerConnectionString);
if (namespaceManager == null)
return;
var currentSeqNo = context.Lease.SequenceNumber;
var lastSeqNo = namespaceManager.GetEventHubPartition(context.EventHubPath, context.ConsumerGroupName, context.Lease.PartitionId).EndSequenceNumber;
var delta = lastSeqNo - currentSeqNo;
var msg = $"Last processed seqnr for partition {context.Lease.PartitionId}: {currentSeqNo} of {lastSeqNo} in consumergroup '{context.ConsumerGroupName}' (lag: {delta})";
telemetryClient.TrackTrace(new TraceTelemetry(msg, SeverityLevel.Information));
telemetryClient.TrackMetric(new MetricTelemetry($"Partition_Lag_{context.Lease.PartitionId}_{context.ConsumerGroupName}", delta));
return Task.CompletedTask;
}
public async Task ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> events)
{
progressCounter++;
...
await LogProgress(context);
}
private async Task LogProgress(PartitionContext context)
{
if (progressCounter >= 100)
{
await CheckPointAsync(context);
progressCounter = 0;
}
}
}
Now I noticed a difference in the webjobs when it comes to how often OpenAsync and CloseAsync are called. For one of the consumer groups this is about every half hour while for the other one it is several times a minute.
Since both webjobs use the same code and are running on the same app plan, what could be the reason for this?
It bothers me because checkpointing using await CheckPointAsync(context) is almost never done for one of the webjobs since it does not reach the threshold before the lease is gone.

Does using QueueClient.OnMessage inside an asynchronous method make sense?

I am calling an async method InsertOperation from an async method ConfigureConnectionString. Am I using the client.OnMessage call correctly? I want to process the messages in a queue asynchronously and then store them to the queue storage.
private static async void ConfigureConnectionString()
{
var connectionString =
"myconnstring";
var queueName = "myqueue";
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable table = tableClient.GetTableReference("test");
table.CreateIfNotExists();
Stopwatch sw = Stopwatch.StartNew();
await Task.Run(() => InsertOperation(connectionString, queueName, table));
sw.Stop();
Console.WriteLine("ElapsedTime " + sw.Elapsed.TotalMinutes + " minutes.");
}
private static async Task InsertOperation(string connectionString, string queueName, CloudTable table)
{
var client = QueueClient.CreateFromConnectionString(connectionString, queueName);
client.OnMessage(message =>
{
var bodyJson = new StreamReader(message.GetBody<Stream>(), Encoding.UTF8).ReadToEnd();
var myMessage = JsonConvert.DeserializeObject<VerifyVariable>(bodyJson);
Console.WriteLine();
var VerifyVariableEntityObject = new VerifyVariableEntity()
{
ConsumerId = myMessage.ConsumerId,
Score = myMessage.Score,
PartitionKey = myMessage.ConsumerId,
RowKey = myMessage.Score
};
});
}
OnMessageAsync method provides async programming model, it enables us to process a message asynchronously.
client.OnMessageAsync(message =>
{
return Task.Factory.StartNew(() => ProcessMessage(message));
//you could perofrm table and queue storage in ProcessMessage method
}, options);
Without understanding the actual logic you want to achieve, it looks like you are not using OnMessage correctly.
OnMessage is a way to set up the queue client behavior for a long running client. It makes sense, for example, if you have a singleton instance in your application. In that case, you are specifing to the client how you want to handle any messages that are put in the queue.
In your example, however, you create the client, set up the OnMessage, and don't persist the client, so it effectively doesn't get anything accomplished.

Tight Loop - Disk at 100%, Quad Core CPU #25% useage, only 15MBsec disk write speed

I have a tight loop which runs through a load of carts, which themselves contain around 10 events event objects and writes them to the disk in JSON via an intermediate repository (jOliver common domain rewired with GetEventStore.com):
// create ~200,000 carts, each with ~5 events
List<Cart> testData = TestData.GenerateFrom(products);
foreach (var cart in testData)
{
count = count + (cart as IAggregate).GetUncommittedEvents().Count;
repository.Save(cart);
}
I see the disk says it is as 100%, but the throughout is 'low' (15MB/sec, ~5,000 events per second) why is this, things i can think of are:
Since this is single threaded does the 25% CPU usage actually mean 100% of the 1 core that I am on (any way to show specific core my app is running on in Visual Studio)?
Am i constrained by I/O, or by CPU? Can I expect better performance if i create my own thread pool one for each CPU?
How come I can copy a file at ~120MB/sec, but I can only get throughput of 15MB/sec in my app? Is this due to the write size of lots of smaller packets?
Anything else I have missed?
The code I am using is from the geteventstore docs/blog:
public class GetEventStoreRepository : IRepository
{
private const string EventClrTypeHeader = "EventClrTypeName";
private const string AggregateClrTypeHeader = "AggregateClrTypeName";
private const string CommitIdHeader = "CommitId";
private const int WritePageSize = 500;
private const int ReadPageSize = 500;
IStreamNamingConvention streamNamingConvention;
private readonly IEventStoreConnection connection;
private static readonly JsonSerializerSettings serializerSettings = new JsonSerializerSettings { TypeNameHandling = TypeNameHandling.None };
public GetEventStoreRepository(IEventStoreConnection eventStoreConnection, IStreamNamingConvention namingConvention)
{
this.connection = eventStoreConnection;
this.streamNamingConvention = namingConvention;
}
public void Save(IAggregate aggregate)
{
this.Save(aggregate, Guid.NewGuid(), d => { });
}
public void Save(IAggregate aggregate, Guid commitId, Action<IDictionary<string, object>> updateHeaders)
{
var commitHeaders = new Dictionary<string, object>
{
{CommitIdHeader, commitId},
{AggregateClrTypeHeader, aggregate.GetType().AssemblyQualifiedName}
};
updateHeaders(commitHeaders);
var streamName = this.streamNamingConvention.GetStreamName(aggregate.GetType(), aggregate.Identity);
var newEvents = aggregate.GetUncommittedEvents().Cast<object>().ToList();
var originalVersion = aggregate.Version - newEvents.Count;
var expectedVersion = originalVersion == 0 ? ExpectedVersion.NoStream : originalVersion - 1;
var eventsToSave = newEvents.Select(e => ToEventData(Guid.NewGuid(), e, commitHeaders)).ToList();
if (eventsToSave.Count < WritePageSize)
{
this.connection.AppendToStreamAsync(streamName, expectedVersion, eventsToSave).Wait();
}
else
{
var startTransactionTask = this.connection.StartTransactionAsync(streamName, expectedVersion);
startTransactionTask.Wait();
var transaction = startTransactionTask.Result;
var position = 0;
while (position < eventsToSave.Count)
{
var pageEvents = eventsToSave.Skip(position).Take(WritePageSize);
var writeTask = transaction.WriteAsync(pageEvents);
writeTask.Wait();
position += WritePageSize;
}
var commitTask = transaction.CommitAsync();
commitTask.Wait();
}
aggregate.ClearUncommittedEvents();
}
private static EventData ToEventData(Guid eventId, object evnt, IDictionary<string, object> headers)
{
var data = Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(evnt, serializerSettings));
var eventHeaders = new Dictionary<string, object>(headers)
{
{
EventClrTypeHeader, evnt.GetType().AssemblyQualifiedName
}
};
var metadata = Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(eventHeaders, serializerSettings));
var typeName = evnt.GetType().Name;
return new EventData(eventId, typeName, true, data, metadata);
}
}
It was partially mentioned in the comments, but to enhance on that, as you are working fully single-threaded in the mentioned code (though you use async, you are just waiting for them, so effectively working sync) you are suffering from latency and overhead for context switching and EventStore protocol back and forth. Either really go the async route, but avoid waiting on the async threads and rather parallelize it (EventStore likes parallelization because it can batch multiple writes) or do batching yourself and send, for example, 20 events at a time.

Resources