I'm dealing with a legacy piece of software, totally not cloud friendly.
The local workflow is as follows:
Run Software1
Software1 creates some helper files to be used by Software2
Software2 runs and generates a result file
Software2 is a simulation model compiled as executable.
I now need to run hundreds of simulations and since this software doesn't even support multi-threading I'm looking at running it in the cloud. I have little to none experience with cloud computing. Our company mainly works with Azure but I don't have a problem using AWS or another cloud computing service.
What I'm thinking as possible solution is:
Run a virtual machine that runs Software1
Software1 creates several folders. Each folder contains all the necessary files to perform a single simulation.
Each folder is loaded to a blob storage folder
A Function app is triggered by the blob storage folder creation and a run is performed for each folder by running Software2
Once Software2 is done with the simulation, the function app copies the result file back to the blob storage, in the same folder of the corresponding run.
I tested the Function App and it does what I need but I'm not quite sure how run it several times in parallel. Do you have any suggestion on how to achieve this? Or maybe I should be using something different than function apps.
Thank you in advance for your help,
Guido
If I have understood this correctly, you want to run this Function App multiple times in parallel to "simulate" parallel execution. I think you need to look at Event Grid and re-think your architecture.
If you use a blob trigger, your function will be triggered each time you'll be making an operation in the blob container. If 1 file = 1 run for Software2, a blob trigger is OK and Azure will scale and run your function in parallel. The issue is that Software2 needs to write the results back in blob, creating new triggers.
Another way would be to have Software1 send a message to Storage Queue or Service Bus or an event with Event Grid and have your function be triggered by that. You then would write a Durable function using the "Fan out/fan in" pattern to run Software2 in parallel.
You can also look at creating parallel branches in Logic App.
Related
I'm trying to move some computations to Azure cloud services. One of the steps of the workflow I'm trying to implement includes running a Win32 desktop application generating a file. Obviously, we cannot have a user interaction for cloud calculations, so the application is launched with command line arguments. The process starts, generates a file, and then exists. At the moment I cannot refactor the code and move this functionality to command-line windowless utility.
First, I chose Azure Functions because they are intended for event-driven short calculations, and that's exactly what I need. Also they are cheap. But I encountered a problem that processes in Azure Functions are being executed inside a sandbox blocking User32/GDI32 system calls and thus preventing me from launching desktop applications.
Another solution I came up with is mounting a virtual machine drive with all needed Visual C++ redistributables installed and then using Azure Batch with nodes based on the pre-configured drive. But this solution has another drawbacks, since it takes minutes to mount a new node. Of course, I could have some nodes that are always active, but anyway the further scaling is slow and having active nodes is not so cheap. Also I have a feeling that Azure Batch is a bit overkill, because there is no need for HPC in my case. Azure Functions' computation capabilities are enough for me.
Is there some kind of compromise solution? So that I would have a solution with fast scaling and quick responses, but with no need to establish Azure Batch based on Azure Virtual Machines?
A lot of GDI32 calls are available now but in a containerized form.
So, you can deploy a function with the desktop application but inside a docker container.
Refer the following articlefor more explanation.
Refer the following documentation on how to deploy containerized function.
I have two Azure Functions. I can think of them as "Producer-Consumer". One is "HttpTrigger" based Function (Producer) which can be fired randomly. It writes the input data in a static "ConcurrentDictionary". The second one is "Timer Trigger" Azure Function(consumer). It reads the data periodically from the same "ConcurrentDictionary" which was being used by the "Producer" function App and then do some processing.
Both the functions are within the same .Net project (but in different classes). The in-memory data sharing through static "ConcurrentDictionary" works perfectly fine when I run the application locally. While running locally, I assume that they are running under the same process. However, when I deploy these Functions in Azure Portal ( They are in the same function App Resource), I found that data sharing through static "ConcurrentDictionary" is not not working.
I am just curious to know, if in Azure Portal, both the Functions have their own process (Probably, that's why they are not able to share in-process static collection). If that is the case, what are my options that these two Functions work as proper "Producer-Consumer"? Will keeping both the Functions in the same class help?
Probably, the scenario is just opposite to what is described in the post - "https://stackoverflow.com/questions/62203987/do-azure-function-from-same-app-service-run-in-same-instance". As against the question in the post, I would like both the Functions to use the same static member of a static class instance.
I am sorry that I cannot experiment too much because the deployment is done through Azure-DevOps pipeline. Too many check-ins in repository is slightly inconvenient. As I mention, it works well locally. So, I don't know how to recreate what's happening in Azure Portal in local environment so that I can try different options? Is there any configurable thing which I am missing to apply?
Don't do that, use an azure queue, event grid, service bus or something else that is reliable but just don't try using a shared object. It will fail as soon as scale out happens or as soon as one of the processes dies. Do think about functions as independent pieces and do not try to go against the framework.
Yes, it might work when you run the functions locally but then you are running on a single machine and the runtime might use the same process but once deployed that ain't true anymore.
If you really really don't want to decouple your logic into a fully seperated producer and consumer then write a single function that uses an in process queue or collection and have that function deal with the processing.
We are moving an On-Premise solution into Azure and there are few services as part of the application which schedules to run once everyday.
I did it as a Web API and when ever the HTTP call calls the method fires without any trouble.
But the problem is the the method behind this API is a heavy weight one which takes around 40-50 mins to finish.
But since Azure APIs will expire in 230sec, I am really got stuck.
I am calling the API from Timer Triggered Azure functions. Its working fine.
But the 30-40 mins becoming a real challenge.
So how to handle this such situation in Azure when we have a time consuming method to execute.
(Other than APIs as well)
There can be many issues that causing performance problems in Azure Functions. Try to debug with the help of Azure Service Profiler or any other debugging tools, determine which line of code is executing how long.
Few reasons could be like:
There might be inefficient algorithm written form fetching IDs/ADLS (Azure Data Lake Storage) operations.
If await keyword is used in the Function App Code, then use the .ConfigureAwait(false) functionality also.
Enable Automatic Scaling in the Azure Function App..
It also depends on NuGet Packages that you're using which might be taking long time to create the Azure Functions instance.
ReadIDs and ReadData functions should be asynchronous.
Note: You may get doubt like all the functions are with async, but make sure in return type the Function definition should have Task and async keyword.
I am developing a distributed application in Python. The application has two major packages, Package A and Package B that work separately but communicate with each other through a queue. In other words Package A generates some files and enqueue (pushes) them to a queue and Package B dequeues (pops) the files on a first-come-first-service basis and processes them. Both Package A and B are going to be deployed on Google Cloud as docker containers.
I need to plan what is the best storage option to keep the files and the queue. Files and the queue could be stored and used temporarily.
I think that my options are Cloud buckets or Google datastore, but have no idea how to choose from and what could be the best option. The best option would be a solution that has a low cost, reliable and easy-to-use from the development aspect.
Any suggestion is welcome... Thanks!
Google Cloud Storage sounds like the right option for you because it supports large files. You have no need for features provided by datastore etc such as querying by other fields.
If you only need to process a file once, when it is first uploaded, you could use GCS pubsub notifications and trigger your processor from pubsub.
if you need more complex tasks, e.g. one task can dispatch to multiple child tasks that all operate on the same file, then it's probably better to use a separate task system like celery and pass the GCS URL in the task definition.
I have a C# console application which extracts 15GB FireBird database file on a server location to multiple files and loads the data from files to SQLServer database. The console application uses System.Threading.Tasks.Parallel class to perform parallel execution of the dataload from files to sqlserver database.
It is a weekly process and it takes 6 hours to complete.
What is best option to move this (console application) process to azure cloud - WebJob or WorkerRole or Any other cloud service ?
How to reduce the execution time (6 hrs) after moving to cloud ?
How to implement the suggested option ? Please provide pointers or code samples etc.
Your help in detail comments is very much appreciated.
Thanks
Bhanu.
let me give some thought on this question of yours
"What is best option to move this (console application) process to
azure cloud - WebJob or WorkerRole or Any other cloud service ?"
First you can achieve the task with both WebJob and WorkerRole, but i would suggest you to go with WebJob.
PROS about WebJob is:
Deployment time is quicker, you can turn your console app without any change into a continues running webjob within mintues (https://azure.microsoft.com/en-us/documentation/articles/web-sites-create-web-jobs/)
Build in timer support, where WorkerRole you will need to handle on your own
Fault tolerant, when your WebJob fail, there is built-in resume logic
You might want to check out Azure Functions. You pay only for the processing time you use and there doesn't appear to be a maximum run time (unlike AWS Lambda).
They can be set up on a schedule or kicked off from other events.
If you are already doing work in parallel you could break out some of the parallel tasks into separate azure functions. Aside from that, how to speed things up would require specific knowledge of what you are trying to accomplish.
In the past when I've tried to speed up work like this, I would start by spitting out log messages during the processing that contain the current time or that calculate the duration (using the StopWatch class). Then find out which areas can be improved. The slowness may also be due to slowdown on the SQL Server side. More investigation would be needed on your part. But the first step is always capturing metrics.
Since Azure Functions can scale out horizontally, you might want to first break out the data from the files into smaller chunks and let the functions handle each chunk. Then spin up multiple parallel processing of those chunks. Be sure not to spin up more than your SQL Server can handle.