I found out that you can manually trigger Time Trigger Azure function link and in step number 7 there is a way to pass dictionary, but there is no example on how to read this information. If that is not possible I tried to read how to make fire and forget http request just to trigger my calculations, but that also is not that resalable to do, so in the end I left with durable functions this could be okay, but in this case it would be a bit overkill.
Use case:
Some date is processed daily during a night (Time trigger) and takes default value "Today", but for example data provider was down, but in few days it is up and running, I need a way to reprocess for previous days. Http call will timeout because it takes for example 1 hour and so on.
Maybe someone have any suggestions.
For process previous day data you can use fromdays(-1) for previous days data process.
DateTime previous_day= context.CurrentUtcDateTime.Add(TimeSpan.FromDays(-1));
await context.CreateTimer(previous_day, CancellationToken.None);
await context.CallActivityAsync("Function1_ProcessPreviousDays");
[FunctionName("Function1_ProcessPreviousDays")]
public static string processdata([ActivityTrigger] string name, ILogger log)
{
// apply logic here
return $"Process Previous Days Data!";
}
Related
I have a use case where we are triggering the Logic App only when a record is modified in Salesforce. But the issue is that in testing we had disabled the Logic App for couple of days and when I enabled them back, some 35K records triggered our Logic Apps that overwhelmed the system.
So I am trying to add a trigger condition in my Logic App that would compare the last modified date from the trigger body with utcnow() and trigger the Logic App only if the last modified was within 1 day of UTC time. I tried a couple of conditions but nothing is working.
I tried the hardcoded value like below and it works.
#greater(triggerBody()?['LastModifiedDate'],'2022-02-02T17:25:49Z')
I am trying to modify this like but it is not working.
#greater(triggerBody()?['LastModifiedDate'],utcNow()-1)
#greater(equals(formatDateTime(triggerBody()?['LastModifiedDate'],'yyyy-MM-dd')),utcnow()-1)
I am new to Logic Apps and this kind of scenarios, so any help is appreciated!
You'll need to adjust your logic for adding/minusing the days from a date, you need to use the addDays function.
This worked for me ...
#greater(formatDateTime(triggerBody()?['LastModifiedDate'], 'yyyy-MM-dd'), formatDateTime(addDays(utcNow(), -1), 'yyyy-MM-dd'))
... or this if you want to simply the formatting, they'll both do the same thing ...
#greater(triggerBody()?['LastModifiedDate'], addDays(utcNow(), -1))
I'm triggering an Azure Logic App from an https webhook for a docker image in Azure Container Registry.
The workflow is roughly:
When a HTTP request is received
Queue a new build
Delay until
FinishTime of Queue a new build
See: Workflow image
The Delay until action doesn't work in that the queueried FinishTime is 0001-01-01T00:00:00.
It complains about the wrong format, so I manually added a Z after the FinishTime keyword.
Now the time stamp is in the right format, however, the timestamp 0001-01-01T00:00:00Z obviously doesn't make sense and subsequent steps are executed without delay.
Anything that I am missing?
edit: Queue a new build queues an Azure pipeline build. I.e. the FinishTime property comes from the pipeline.
You need to set a timestamp in future, the timestamp 0001-01-01T00:00:00Z you set to the "Delay until" action is not a future time. If you set a timestamp as 2020-04-02T07:30:00Z, the "Delay until" action will take effect.
Update:
I don't think the "Delay until" can do what you expect, but maybe you can refer to the operations below. Just add a "Condition" action to judge if the FinishTime is greater than current time.
The expression in the "Condition" is:
sub(ticks(variables('FinishTime')), ticks(utcNow()))
In a word, if the FinishTime is greater than current time --> do the "Delay until" aciton. If the FinishTime is less than current time --> do anything else which you want.(By the way you need to pay attention to the time zone of your timestamp, maybe you need to convert all of the time zone to UTC)
I've been in touch with an Azure support engineer, who has confirmed that the Delay until action should work as I intended to use it, however, that the FinishTime property will not hold a value that I can use.
In the meantime, I have found a workaround, where I'm using some logic and quite a few additional steps. Inconvenient but at least it does what I want.
Here are the most important steps that are executed after the workflow gets triggered from a webhook (docker base image update in Azure Container Registry).
Essentially, I'm initializing the following variables and queing a new build:
buildStatusCompleted: String value containing the target value completed
jarsBuildStatus: String value containing the initial value notStarted
jarsBuildResult: String value containing the default value failed
Then, I'm using an Until action to monitor when the jarsBuildStatus's value is switching to completed.
In the Until action, I'm repeating the following steps until jarsBuildStatus changes its value to buildStatusCompleted:
Delay for 15 seconds
HTTP request to Azure DevOps build, authenticating with personal access token
Parse JSON body of previous raw HTTP output for status and result keywords
Set jarsBuildStatus = status
After breaking out of the Until action (loop), the jarsBuildResult is set to the parsed result.
All these steps are part of a larger build orchestration workflow, where I'm repeating the given steps multiple times for several different Azure DevOps build pipelines.
The final action in the workflow is sending all the status, result and other relevant data as a build summary to Azure DevOps.
To me, this is only a workaround and I'll leave this question open to see if others have suggestions as well or in case the Azure support engineers can give more insight into the Delay until action.
Here's an image of the final workflow (at least, the part where I implemented the Delay until action):
edit: Turns out, I can simplify the workflow because there's a dedicated Azure DevOps action in the Logic App called Send an HTTP request to Azure DevOps, which omits the need for manual authentication (Azure support engineer pointed this out).
The workflow now looks like this:
That is, I can query the build status directly and set the jarsBuildStatus as
#{body('Send_an_HTTP_request_to_Azure_DevOps:_jar''s')['status']}
The code snippet above is automagically converted to a value for the Set variable action. Thus, no need to use an additional Parse JSON action.
Initially our flow of cimmunicating with google Pub/Sub was so:
Application accepts message
Checks that it doesn't exist in idempotencyStore
3.1 If doesn't exist - put it into idempotency store (key is a value of unique header, value is a current timestamp)
3.2 If exist - just ignore this message
When processing is finished - send acknowledge
In the acknowledge successfull callback - remove this msg from metadatastore
The point 5 is wrong because theoretically we can get duplicated message even after message has processed. Moreover we found out that sometimes message might not be removed even although successful callback was invoked( Message is received from Google Pub/Sub subscription again and again after acknowledge[Heisenbug]) So we decided to update value after message is proccessed and replace timestamp with "FiNISHED" string
But sooner or later we will encounter that this table will be overcrowded. So we have to cleanup messages in the MetaDataStore. We can remove messages which are processed and they were processed more 1 day.
As was mentioned in the comments of https://stackoverflow.com/a/51845202/2674303 I can add additional column in the metadataStore table where I could mark if message is processed. It is not a problem at all. But how can I use this flag in the my cleaner? MetadataStore has only key and value
In the acknowledge successfull callback - remove this msg from metadatastore
I don't see a reason in this step at all.
Since you say that you store in the value a timestamp that means that you can analyze this table from time to time to remove definitely old entries.
In some my project we have a daily job in DB to archive a table for better main process performance. Right, just because we don't need old data any more. For this reason we definitely check some timestamp in the raw to determine if that should go into archive or not. I wouldn't remove data immediately after process just because there is a chance for redelivery from external system.
On the other hand for better performance I would add extra indexed column with timestamp type into that metadata table and would populate a value via trigger on each update or instert. Well, MetadataStore just insert an entry from the MetadataStoreSelector:
return this.metadataStore.putIfAbsent(key, value) == null;
So, you need an on_insert trigger to populate that date column. This way you will know in the end of day if you need to remove an entry or not.
Azure Function utilising Azure Table Storage
I have an Azure Function which is triggered from Azure Service Bus topic subscription, let's call it "Process File Info" function.
The message on the subscription contains file information to be processed. Something similar to this:
{
"uniqueFileId": "adjsdakajksajkskjdasd",
"fileName":"mydocument.docx",
"sourceSystemRef":"System1",
"sizeBytes": 1024,
... and other data
}
The function carries out the following two operations -
Check individual file storage table for the existing of the file. If it exists, update that file. If it's new, add the file to the storage table (stored on a per system|per fileId basis).
Capture metrics on the file size bytes and store in a second storage table, called metrics (constantly incrementing the bytes, stored on a per system|per year/month basis).
The following diagram gives a brief summary of my approach:
The difference between the individualFileInfo table and the fileMetric is that the individual table has one record per file, where as the metric table stores one record per month that is constantly updated (incremented) gathering the total bytes that are passed through the function.
Data in the fileMetrics table is stored as follows:
The issue...
Azure functions are brilliant at scaling, in my setup I have a max of 6 of these functions running at any one time. Presuming each file message getting processed is unique - updating the record (or inserting) in the individualFileInfo table works fine as there are no race conditions.
However, updating the fileMetric table is proving problematic as say all 6 functions fire at once, they all intend to update the metrics table at the one time (constantly incrementing the new file counter or incrementing the existing file counter).
I have tried using the etag for optimistic updates, along with a little bit of recursion to retry should a 412 response come back from the storage update (code sample below). But I can't seem to avoid this race condition. Has anyone any suggestion on how to work around this constraint or come up against something similar before?
Sample code that is executed in the function for storing the fileMetric update:
internal static async Task UpdateMetricEntry(IAzureTableStorageService auditTableService,
string sourceSystemReference, long addNewBytes, long addIncrementBytes, int retryDepth = 0)
{
const int maxRetryDepth = 3; // only recurively attempt max 3 times
var todayYearMonth = DateTime.Now.ToString("yyyyMM");
try
{
// Attempt to get existing record from table storage.
var result = await auditTableService.GetRecord<VolumeMetric>("VolumeMetrics", sourceSystemReference, todayYearMonth);
// If the volume metrics table existing in storage - add or edit the records as required.
if (result.TableExists)
{
VolumeMetric volumeMetric = result.RecordExists ?
// Existing metric record.
(VolumeMetric)result.Record.Clone()
:
// Brand new metrics record.
new VolumeMetric
{
PartitionKey = sourceSystemReference,
RowKey = todayYearMonth,
SourceSystemReference = sourceSystemReference,
BillingMonth = DateTime.Now.Month,
BillingYear = DateTime.Now.Year,
ETag = "*"
};
volumeMetric.NewVolumeBytes += addNewBytes;
volumeMetric.IncrementalVolumeBytes += addIncrementBytes;
await auditTableService.InsertOrReplace("VolumeMetrics", volumeMetric);
}
}
catch (StorageException ex)
{
if (ex.RequestInformation.HttpStatusCode == 412)
{
// Retry to update the volume metrics.
if (retryDepth < maxRetryDepth)
await UpdateMetricEntry(auditTableService, sourceSystemReference, addNewBytes, addIncrementBytes, retryDepth++);
}
else
throw;
}
}
Etag keeps track of conflicts and if this code gets a 412 Http response it will retry, up to a max of 3 times (an attempt to mitigate the issue). My issue here is that I cannot guarantee the updates to table storage across all instances of the function.
Thanks for any tips in advance!!
You can put the second part of the work into a second queue and function, maybe even put a trigger on the file updates.
Since the other operation sounds like it might take most of the time anyways, it could also remove some of the heat from the second step.
You can then solve any remaining race conditions by focusing only on that function. You can use sessions to limit the concurrency effectively. In your case, the system id could be a possible session key. If you use that, you will only have one Azure Function processing data from one system at one time, effectively solving your race conditions.
https://dev.to/azure/ordered-queue-processing-in-azure-functions-4h6c
Edit: If you can't use Sessions to logically lock the resource, you can use locks via blob storage:
https://www.azurefromthetrenches.com/acquiring-locks-on-table-storage/
I have a azure function with cosmos db trigger which makes some calculations and write results to db. If something goes wrong i want to have a possibility to start from the first item or specific item make calculations again. Is it possible? Thanks
public static void Run([CosmosDBTrigger(
databaseName: "db",
collectionName: "collection",
ConnectionStringSetting = "DocDbConnStr",
CreateLeaseCollectionIfNotExists = true,
LeaseCollectionName = "leases")]IReadOnlyList<Document> input, TraceWriter log)
{
...
}
Right now, the StartFromBeginning option is not exposed to the Cosmos DB Trigger. The default behavior is to start receiving changes from the moment the Function starts running, leases/checkpoints will be generated in case the Host/Runtime shutsdown so when the Host/Runtime is back up it will pickup from the last checkpointed item.
The Trigger does not implement dead-lettering or error handling as it might generate infinite-loops / unexpected billing / multiple processing of the same batch if the error is not related to the batch itself (for example, you process the documents and then send an email and the email fails, the entire batch would be re-processed for an error not related to the feed itself), so we recommend users to implement their own try/catch or error handling logic inside the Function's code. It's the same approach as the Event Hub Trigger.
That being said, we are in the process of exposing several new options on the Trigger and there is a contributor working on an advanced retrying mechanism.
As #Matias Quaranta and #Pankaj Rawat say in the comments, the accept answer is old and is no longer true. You can use StartFromTheBeginning as a C# attribute within your azure function's parameter list like so:
[FunctionName(nameof(MyAzureFunction))]
public async Task RunAsync([CosmosDBTrigger(
databaseName: "myCosmosDbName",
collectionName: "myCollectionName",
ConnectionStringSetting = "cosmosConnectionString",
LeaseCollectionName = "leases",
CreateLeaseCollectionIfNotExists = true,
MaxItemsPerInvocation = 1000,
StartFromBeginning = true)]IReadOnlyList<Document> documents)
{
....
}
Please change the accepted answer.
The current offsets (positions in Cosmos DB change feed) are managed by clients, Azure Functions runtime in this case.
Functions store the offsets in lease collection (it's called leases in your example).
To restart from a specific item, you would have to make a snapshot of documents in leases collection at some point, and then restore your current collection to that snapshot when needed.
I am not familiar with a tool that automates that for you, other than generic tools working with Cosmos DB collections.
Check startFromBeginning option available in Function v2. Unfortunately, I'm still using V1 and not able to verify.
When set, it tells the Trigger to start reading changes from the beginning of the history of the collection instead of the current time. This only works the first time the Trigger starts, as in subsequent runs, the checkpoints are already stored. Setting this to true when there are leases already created has no effect.