I have 2 collections in CosmosDB, Stocks and StockPrices.
StockPrices collection holds all historical prices, and is constantly updated.
I want to create Azure Function that listens to StockPrices updates (CosmosDBTrigger) and then does the following for each Document passed by the trigger:
Find stock with matching ticker in Stocks collection
Update stock price in Stocks collection
I can't do this with CosmosDB input binding, as CosmosDBTrigger passes a List (binding only works when trigger passes a single item).
The only way I see this working is if I foreach on CosmosDBTrigger List, and access CosmosDB from my function body and perform steps 1 and 2 above.
Question: How do I access CosmosDB from within my function?
One of the CosmosDB binding forms is to get a DocumentClient instance, which provides the full range of operations on the container. This way, you should be able to combine the change feed trigger and the item manipulation into the same function, like:
[FunctionName("ProcessStockChanges")]
public async Task Run(
[CosmosDBTrigger(/* Trigger params */)] IReadOnlyList<Document> changedItems,
[CosmosDB(/* Client params */)] DocumentClient client,
ILogger log)
{
// Read changedItems,
// Create/read/update/delete with client
}
It's also possible with .NET Core to use dependency injection to provide a full-fledged custom service/repository class to your function instance to interface to Cosmos. This is my preferred approach, because I can do validation, control serialization, etc with the latest version of the Cosmos SDK.
You may have done so intentionally, but just mentioning to consider combining your data into a single container partitioned by, for example, a combination of record type (Stock/StockPrice) and identifier. This simplifies things and can be more cost/resource efficient relative to multiple containers.
Ended up going with #Noah Stahl's suggestion. Leaving this here as an alternative.
Couldn't figure out how to do this directly, so came up with a work-around:
Add function with CosmosDBTrigger on StockPrices collection with Queue output binding
foreach over Documents from the trigger, serialize and add to the Queue
Add function with QueueTrigger, CosmosDB input binding for Stocks collection (with PartitionKey and Id set to StockTicker), and CosmosDB output binding for Stocks collection
Update Stock from CosmosDB input binding with values from the QueueTrigger
Assign updated Stock to CosmosDB output binding parameter (updates record in DB)
This said, I'd like to hear about more straightforward ways of doing this, as my approach seems like a hack.
Related
Sorry if this is a bit vague or rambly, I'm still getting to grips with Data Factory and a lot of it seems a bit obtuse...
What I want to do is query my Cosmos Database for a list of Ids of records that need to be updated. For each of these records, I want to call a REST API using the Id (i.e. /Record/{Id}/Details)
I've created a Data Flow that took a string as a parameter and then called the REST API fine.
I then made a pipeline using a Lookup with a query (select c.RecordId from c where...) and pass that into a ForEach with items set to #activity('Lookup1').output.value
I then setup the Activity of the ForEach to my Data flow. From research, I think I'm supposed to set the Parameter value to "#item().RecordId", but that gives an error "parameter [name] does not match parameter type 'string'".
I can change the type of the parameter to any (and use toString([parameter]) to cast it ) and then when I try and debug it passes the parameter in, but it gives an error of "Job failed due to reason: at (Line 2/Col 14): Datatype any not found".
I'm not sure what the solution is. Is there a way to cast the result of the lookup to an integer or string? Is there a way to narrow an any down? Is there a better way than toString() that would work? Is there a better way than ForEach?
I tried to reproduce similar scenario what you are trying.
My sample data in cosmos
To query Cosmos Database for a list of Ids and call a REST API using the Id For each of these records.
First, I took Lookup activity in data factory and selected the id's where the last_name is Bluth
Its output and settings are as below:
Then I passed the output of lookup activity to For-each activity.
Then inside for each activity I created Dataflow activity and for that DataSource I gave the source as Rest API. My Rest API to call specific user is https://reqres.in/api/users/2 I gave base URL as https://reqres.in/api/users.
Then I created parameter called demoId as datatype string and in relative URL I gave that dynamic value as #dataset().demoId
After this I gave value source parameter as #item().id as after https://reqres.in/api/users there is only id should be provided to get data in you case you can try Record/#{item().id}/Details.
For each id it is successfully passing id to rest API and fetching data:
I have a azure function with cosmos db trigger which makes some calculations and write results to db. If something goes wrong i want to have a possibility to start from the first item or specific item make calculations again. Is it possible? Thanks
public static void Run([CosmosDBTrigger(
databaseName: "db",
collectionName: "collection",
ConnectionStringSetting = "DocDbConnStr",
CreateLeaseCollectionIfNotExists = true,
LeaseCollectionName = "leases")]IReadOnlyList<Document> input, TraceWriter log)
{
...
}
Right now, the StartFromBeginning option is not exposed to the Cosmos DB Trigger. The default behavior is to start receiving changes from the moment the Function starts running, leases/checkpoints will be generated in case the Host/Runtime shutsdown so when the Host/Runtime is back up it will pickup from the last checkpointed item.
The Trigger does not implement dead-lettering or error handling as it might generate infinite-loops / unexpected billing / multiple processing of the same batch if the error is not related to the batch itself (for example, you process the documents and then send an email and the email fails, the entire batch would be re-processed for an error not related to the feed itself), so we recommend users to implement their own try/catch or error handling logic inside the Function's code. It's the same approach as the Event Hub Trigger.
That being said, we are in the process of exposing several new options on the Trigger and there is a contributor working on an advanced retrying mechanism.
As #Matias Quaranta and #Pankaj Rawat say in the comments, the accept answer is old and is no longer true. You can use StartFromTheBeginning as a C# attribute within your azure function's parameter list like so:
[FunctionName(nameof(MyAzureFunction))]
public async Task RunAsync([CosmosDBTrigger(
databaseName: "myCosmosDbName",
collectionName: "myCollectionName",
ConnectionStringSetting = "cosmosConnectionString",
LeaseCollectionName = "leases",
CreateLeaseCollectionIfNotExists = true,
MaxItemsPerInvocation = 1000,
StartFromBeginning = true)]IReadOnlyList<Document> documents)
{
....
}
Please change the accepted answer.
The current offsets (positions in Cosmos DB change feed) are managed by clients, Azure Functions runtime in this case.
Functions store the offsets in lease collection (it's called leases in your example).
To restart from a specific item, you would have to make a snapshot of documents in leases collection at some point, and then restore your current collection to that snapshot when needed.
I am not familiar with a tool that automates that for you, other than generic tools working with Cosmos DB collections.
Check startFromBeginning option available in Function v2. Unfortunately, I'm still using V1 and not able to verify.
When set, it tells the Trigger to start reading changes from the beginning of the history of the collection instead of the current time. This only works the first time the Trigger starts, as in subsequent runs, the checkpoints are already stored. Setting this to true when there are leases already created has no effect.
I'm writing an Azure Function to access multiple records in Azure Table Storage and want to apply my filter at runtime with a variable passed in to a WebHook. I have successfully run my Function with the filter in function.js, but don't see anything in the docs on how to apply the filter inside index.js.
I tried this, but it had no effect on the entities returned. This same filter works correctly inside function.js.
context.bindings.inputTable.filter = 'name eq "test"';
You can't construct and set the filter in your function code. We do have an open issue here in our repo tracking support for more dynamic binding scenarios, which would enable this.
However, the function.json filter expression does support binding parameters, so if the parameters are part of the JSON payload coming in on the WebHook you can use them in your query. For example, if your payload contains properties region of type string and status of type int you can define a filter like "(Region eq '{region}') and (Status eq {status})" and the filter executed at runtime will be bound to the incoming values.
I have a continuous Azure WebJob that is running off of a QueueInput, generating a report, and outputting a file to a BlobOutput. This job will run for differing sets of data, each requiring a unique output file. (The number of inputs is guaranteed to scale significantly over time, so I cannot write a single job per input.) I would like to be able to run this off of a QueueInput, but I cannot find a way to set the output based on the QueueInput value, or any value except for a blob input name.
As an example, this is basically what I want to do, though it is invalid code and will fail.
public static void Job([QueueInput("inputqueue")] InputItem input, [BlobOutput("fileoutput/{input.Name}")] Stream output)
{
//job work here
}
I know I could do something similar if I used BlobInput instead of QueueInput, but I would prefer to use a queue for this job. Am I missing something or is generating a unique output from a QueueInput just not possible?
There are two alternatives:
Use IBInder to generate the blob name. Like shown in these samples
Have an autogenerated in the queue message object and bind the blob name to that property. See here (the BlobNameFromQueueMessage method) how to bind a queue message property to a blob name
Found the solution at Advanced bindings with the Windows Azure Web Jobs SDK via Curah's Complete List of Web Jobs Tutorials and Videos.
Quote for posterity:
One approach is to use the IBinder interface to bind the output blob and specify the name that equals the order id. The better and simpler approach (SimpleBatch) is to bind the blob name placeholder to the queue message properties:
public static void ProcessOrder(
[QueueInput("orders")] Order newOrder,
[BlobOutput("invoices/{OrderId}")] TextWriter invoice)
{
// Code that creates the invoice
}
The {OrderId} placeholder from the blob name gets its value from the OrderId property of the newOrder object. For example, newOrder is (JSON): {"CustomerName":"Victor","OrderId":"abc42"} then the output blob name is “invoices/abc42″. The placeholder is case-sensitive.
So, you can reference individual properties from the QueueInput object in the BlobOutput string and they will be populated correctly.
Given the following code:
listView.ItemsSource =
App.azureClient.GetTable<SomeTable>().ToIncrementalLoadingCollection();
We get incremental loading without further changes.
But what if we modify the read.js server side script to e.g. use mssql to query another table instead. What happens to the incremental loading? I'm assuming it breaks; if so, what's needed to support it again?
And what if the query used the untyped version instead, e.g.
App.azureClient.GetTable("SomeTable").ReadAsync(...)
Could incremental loading be somehow supported in this case, or must it be done "by hand" somehow?
Bonus points for insights on how Azure Mobile Services implements incremental loading between the server and the client.
The incremental loading collection works by sending the $top and $skip query parameters (those are also sent when you do a query by using the .Take and .Skip methods in the table). So if you want to modify the read script to do something other than the default behavior, while still maintaining the ability to use that table with an incremental loading collection, you need to take those values into account.
To do that, you can ask for the query components, which will contain the values, as shown below:
function read(query, user, request) {
var queryComponents = query.getComponents();
console.log('query components: ', queryComponents); // useful to see all information
var top = queryComponents.take;
var skip = queryComponents.skip;
// do whatever you want with those values, then call request.respond(...)
}
The way it's implemented at the client is by using a class which implements the ISupportIncrementalLoading interface. You can see it (and the full source code for the client SDKs) in the GitHub repository, or more specifically the MobileServiceIncrementalLoadingCollection class (the method is added as an extension in the MobileServiceIncrementalLoadingCollectionExtensions class).
And the untyped table does not have that method - as you can see in the extension class, it's only added to the typed version of the table.