For an Azure Search service, I am looking to get updates reflected closest to real-time. I can execute it via REST API using a logic app with recurrence.
If I invoke the logic app very frequently(ever 3 seconds). Is there a catch to this approach?
can the indexer get blocked because of too frequent calls?
is there any cost-implication of this constant invocation (either on logic app or on azure search)
I am trying to see if I can avoid building logic to determine various scenarios when the indexer needs to be called. (can get complex)
If you're OK with indexer running every 5 minutes, you don't need to invoke it at all - it can run on a schedule.
If you need to invoke indexer more frequently, you can run it once every 3 minutes on a free tier service, and as often as you want on any paid tier service. If an indexer is already running when you run it, the call will fail with a 409 status.
However, if you need to run indexer very frequently, usually it's in response to new data arriving and you have that data in hand. In that case, it may be more efficient to directly index that data using the push-style indexing API. See Add, update, or delete documents for REST API reference, or you can use Azure Search .NET SDK
Related
I have a console application in C# that search for specific records in DB and outputs them to the console. For the console application to find and list all the records it usually takes 30 minutes. If I want to automate that console application to run at specific time and save the records to csv file what feature from Azure platform I can use? Is Azure function suitable for this task or there are some other Azure features I can use?
I tried to investigate Azure Function, however it looks the functions are not suitable for long running task that takes longer than 5 minutes.
A Premium function can run for longer than 5 minutes (a consumption function maxes out at 10 minutes btw).
This is a fudge but you could also take the console app wrap it in a container, run that container in an Azure container instance and start it with a logic app. You only pay for the runtime of the container.
I'd probably ask why it takes 30 minutes to find all the records, is there a better way of running the query?
You could use Azure Durable Functions, but your algorithm should implement Continuation Token logic. Means, that you have some token (which could be JSON object with some properties), and that token allows you to start/continue your algorithms from any point of time.
For example - you are searching for specific records in DB, and memorize current processed DB primary key of the row. And you store it in your continuation token, and that token is stored in Azure Storage Container Blob, or in Table. When your function will be stopped (b/t it will run more than 10 minutes), and will be started again - you will check value of that token, and start search from that point.
I want to use Azure to handle long running tasks that can’t be handled solely by a web server as they exceed the 2 min HTTP limit (and would put unnecessary load on it regardless). In this case, it’s the generation of a PDF report that can take some time (between 2-5 mins). I’ve seen examples of solutions for this using other technologies (Celery, RabbitMQ, AWS Lamda, etc.) but not much using what's available on Azure (Functions and Storage in this case).
Details of my proposed solution are as follows (a diagram is here)
API (that has 3 endpoints):
Generate report – post a message to Azure Queue Storage
Get report generation status – query Azure Table Storage for status
Get report – retrieve PDF from Azure Blob Storage
Azure Queue Storage
Receives a message from the API containing parameters of the requested report
Azure Function
Triggered when a message is added to Azure Queue Storage
Creates report generation status record in Azure Table storage, set to ‘Generating’
Generate a report based on parameters contained in the message
Stores output PDF in Azure Blob Storage
Updates report generation status record in Azure Table storage to ‘Completed’
Azure Table Storage
Contains a table of report generation requests and associated status
Azure Blob Storage
Stores PDF reports
Other points
The app isn’t built yet – so there is no base case I’m comparing against (e.g. Celery/RabbitMQ)
The time it takes to run the report isn’t super important (i.e. I’m not concerned about Azure Function cold starts)
There’s no requirement for immediate notification of completion using something like Webhooks – the client will poll the API every so often using the get report generation endpoint.
There won't be much usage of the app, so having an always active server to handle tasks (vs Azure Function) seems to be a waste of money.
If I find that report generation takes longer than 10 mins, I can split it up into more than one Azure Function (to avoid consumption plan hard limit of 10 mins execution time per function)
My question is essential whether or not this is a good approach (to me it seems good, and relatively cost-effective, I’m just not sure if there’s something I’m missing).
This can be simplified using Durable Functions. Most of the job is already handled by the framework and you also can query an endpoint to check for the completion status.
https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=csharp
Every night I need to get data from external http service and save it to Azure Data Lake.
Actually, I need to get all the orders for all the customers. The problem is that there is now way to get this data via a single call. Id of a customer should be provided per each separate call.
The format of url is something like /api/ordersByCutomer/{cutomerId}
I need to get data for 100 000 different customers. It will result in 100 000 calls to the external service.
I tried to use Azure Data Factory with Foreach activity in parallel mode, but it takes 4 sec per each call there (3 seconds are spent in queue). The overall speed result was not satisfying.
What is the best (I mean the fastest) azure-based approach for this (except Azure Data Factory)?
Thanks
You could write some asynchronous code to hit the API/http service parallelly and execute this code using custom activity in ADF which works by using batch account to get this job done. Use custom activities in an Azure Data Factory
Also, before doing any of this it would nice to contact the owner/stakeholder of the external http service and finding out if there is rate limiting on that service and even if the service can handle such loads.
I currently have a couple of WebApi projects that use a few class libraries such as address lookup, bank validation, image storage etc.
Currently they are all in a shared solution but I'm planning to split them up. I thought about moving the libraries into NuGet packages so that they are separate from the API projects and is properly shared.
However, if I make a change to one of these components I will need to build and redeploy the API service even though it's a separate component which has changed.
I thought about putting these components into a separate service but seems a bit of overhead for what it is.
I've been looking at Azure WebJobs and think I may be able to move these components into this instead. I have two questions related to this:
Are WebJobs suitable for calling on demand (not using a queue)? The request will be activated from a user on a web site which calls my API service which then calls the Web Job so it needs to be quick.
Can a WebJob return data? I've seen examples where it does some processing and updates a database but I need a response (ideally Json) back to my API service.
Thanks
According to your requirement, I assume that you could try to leverage Azure Functions by creating a function using the HTTP trigger, which could be triggered by accessing the Function URL with parameters and return the response as you expected. You could follow this tutorial for getting started with Azure Functions.
I'm trying to figure out a solution for recurring data aggregation of several thousand remote XML and JSON data files, by using Azure queues and WebJobs to fetch the data.
Basically, an input endpoint URL of some sort would be called (with a data URL as parameter) on an Azure website/app. It should trigger a WebJobs background job (or can it continuously running and checking the queue periodically for new work), fetch the data URL and then callback an external endpoint URL on completion.
Now the main concern is the volume and its performance/scaling/pricing overhead. There will be around 10,000 URLs to be fetched every 10-60 minutes (most URLs will be fetched once every 60 minutes). With regards to this scenario of recurring high-volume background jobs, I have a couple of questions:
Is Azure WebJobs (or Workers?) the right option for background processing at this volume, and be able to scale accordingly?
For this sort of volume, which Azure website tier will be most suitable (comparison at http://azure.microsoft.com/en-us/pricing/details/app-service/)? Or would only a Cloud or VM(s) work at this scale?
Any suggestions or tips are appreciated.
Yes, Azure WebJobs is an ideal solution to this. Azure WebJobs will scale with your Web App (formerly Websites). So, if you increase your web app instances, you will also increase your web job instances. There are ways to prevent this but that's the default behavior. You could also setup autoscale to automatically scale your web app based on CPU or other performance rules you specify.
It is also possible to scale your web job independently of your web front end (WFE) by deploying the web job to a web app separate from the web app where your WFE is deployed. This has the benefit of not taking up machine resources (CPU, RAM) that your WFE is using while giving you flexibility to scale your web job instances to the appropriate level. Not saying this is what you should do. You will have to do some load testing to determine if this strategy is right (or necessary) for your situation.
You should consider at least the Basic tier for your web app. That would allow you to scale out to 3 instances if you needed to and also removes the CPU and Network I/O limits that the Free and Shared plans have.
As for the queue, I would definitely suggest using the WebJobs SDK and let the JobHost (from the SDK) invoke your web job function for you instead of polling the queue. This is a really slick solution and frees you from having to write the infrastructure code to retrieve messages from the queue, manage message visibility, delete the message, etc. For a working example of this and a quick start on building your web job like this, take a look at the sample code the Azure WebJobs SDK Queues template punches out for you.